ABSTRACT VIEW
Abstract NUM 1918

UNCOVERING CONTEXTUAL CLUSTERS IN BRAZILIAN PUBLIC EDUCATION
D. Vieira Lourenço Correia1, L. Chaves e Silva2, G. Ferreira de Miranda Oliveira1, R. Santos da Silva3, L. Leiverton Pereira dos Santos4, T. Tenório Martins de Oliveira1, E. Tatiane Caetano Chagas1, R. de Amorim Silva1, B. Jacinto Duarte da Costa5, D. Miguilino Pinho Júnior1, N. Cruz1, B. Almeida Pimentel1
1 Federal University of Alagoas (BRAZIL)
2 Federal Rural University of the Semi-Arid (BRAZIL)
3 Federal University of Minas Gerais (BRAZIL)
4 Federal University of the San Francisco Valley (BRAZIL)
5 Alagoas Federal Institute of Education, Science, and Technology (BRAZIL)
This research explores clustering patterns among educational stages in Brazilian public schools, aiming to support more informed decision-making in public education management. Motivated by real operational challenges — such as predicting future student enrollments and improving the distribution of textbooks and teaching resources — we investigate how contextual similarities between school-stage pairs can be systematically identified and analyzed.

The study focuses on categorical and structural variables derived from national educational databases, representing diverse school characteristics. After selecting predictive attributes using Spearman correlation with historical enrollment data, we also apply graph theory to enhance our understanding of feature relevance. By modeling the data as a bipartite directed graph, we compare in-degree and out-degree centralities to identify which features are globally influential and which school-stage tuples require more features to be accurately modeled. We then compute Hamming distances on binary-encoded data to quantify similarity between pairs and apply hierarchical agglomerative clustering to reveal groups of educational stages that share contextual patterns. The resulting dendrogram enables visual inspection and supports the hypothesis that 3 to 5 major groups account for most of the contextual variability across schools.

Additionally, we apply the K-Medoids algorithm using the same Hamming-based distance matrix to complement the hierarchical clustering approach. This alternative unsupervised method provides representative medoids for each group, offering interpretable exemplars of contextual profiles. By comparing both clustering outcomes, we aim to enhance the robustness of the segmentation and validate the internal coherence of groups identified purely from predictive features.

Preliminary findings indicate that schools cluster not only by geographic region or size, as can be presumed, but also by subtle patterns in infrastructure, governance type, and grade configurations. These insights suggest the feasibility of scalable, data-driven strategies for educational policy-making, enabling more efficient planning and customization of federal programs.

By integrating domain knowledge, unsupervised learning techniques, and graph-theoretical analysis, this research contributes to the advancement of educational data science in the Global South and proposes a replicable analytical framework that respects local heterogeneity while promoting national-scale optimization.

Keywords: Clustering, machine learning, education.

Event: ICERI2025
Track: Digital Transformation of Education
Session: Data Science & AI in Education
Session type: VIRTUAL