W. Fusco1, H. Barbosa2, D.C. Gomes1
This study examines the relationship between the availability of school infrastructure—particularly digital and physical resources—and educational outcomes in Brazilian upper secondary schools (Ensino Médio), using microdata from the 2021 and 2022 School Census (Censo Escolar) produced by the National Institute for Educational Studies and Research (INEP). In a country marked by deep regional disparities, we investigate how access to adequate school infrastructure varies across federative units, rural and urban areas, and administrative dependency (public and private), and how these inequalities relate to indicators of school flow, such as dropout and failure rates.
Although the School Census does not include direct measures of student academic achievement (e.g., standardized test scores), it offers large-scale administrative data that enables the construction of indirect indicators of performance and educational trajectory. The study explores three central hypotheses:
(1) schools with greater physical and digital infrastructure tend to present better student flow outcomes;
(2) public schools—particularly in rural or underdeveloped regions—have significantly lower access to ICT and essential infrastructure; and
(3) indicators such as the dropout rate and the grade repetition rate are statistically associated with patterns of infrastructural deficiency.
To test these hypotheses, we build synthetic indices of infrastructure using binary variables (presence/absence) related to libraries, science laboratories, computer labs, internet access, type of internet connection (e.g., broadband vs. mobile network), and accessibility resources (e.g., ramps, adapted toilets, assistive technologies). These indices are aggregated at the school level and used as predictors in a set of regression models that estimate their association with outcome indicators such as the proportion of dropouts and repeaters among enrolled students in each school.
We incorporate artificial intelligence (AI) such as machine learning (ML) and optimisation in two complementary dimensions of the study. First, unsupervised learning techniques (such as clustering algorithms) are applied to categorize schools into typologies based on patterns of infrastructure availability, allowing the identification of regional or structural profiles. Second, explainable machine learning models (e.g., decision trees or gradient boosting with SHAP values) are explored to estimate the predictive power of infrastructure variables over school-level outcomes and to understand which features contribute most to undesirable educational trajectories. These tools enable richer interpretation than traditional models alone, especially in high-dimensional data contexts like those provided by the School Census.
The study is currently under development, and its preliminary stages demonstrate the analytical potential of combining large-scale administrative data with ML methods. By identifying and modeling infrastructural disparities, the research aims to provide evidence-informed insights to support policies for more equitable resource allocation and school improvement strategies. As Brazil expands its commitment to digital inclusion in schools, especially in the post-pandemic context, the integration of census microdata and ML-driven analysis represents a promising path to monitor equity and enhance conditions for student retention and success in upper secondary education.
Keywords: School infrastructure, educational inequalities, student dropout, grade repetition, Brazilian School Census, artificial intelligence in education.