Abstract View

ABSTRACT VIEW

Abstract NUM 1918

UNCOVERING CONTEXTUAL CLUSTERS IN BRAZILIAN PUBLIC EDUCATION

D. Vieira Lourenço Correia¹, L. Chaves e Silva², G.H Ferreira de Miranda Oliveira¹, R. Santos da Silva³, L. Leiverton Pereira dos Santos⁴, T. Tenório Martins de Oliveira¹, E. Tatiane Caetano Chagas¹, R. de Amorim Silva¹, B.J Duarte da Costa⁵, D. Miguilino Pinho Júnior¹, N. Cruz¹, B. Almeida Pimentel¹

¹ Federal University of Alagoas (BRAZIL)
² Federal Rural University of the Semi-Arid (BRAZIL)
³ Federal University of Minas Gerais (BRAZIL)
⁴ Federal University of the San Francisco Valley (BRAZIL)
⁵ Alagoas Federal Institute of Education, Science, and Technology (BRAZIL)

This research explores clustering patterns among educational stages in Brazilian public schools, aiming to support more informed decision-making in public education management. Motivated by real operational challenges — such as predicting future student enrollments and improving the distribution of textbooks and teaching resources — we investigate how contextual similarities between school-stage pairs can be systematically identified and analyzed.

The study focuses on categorical and structural variables derived from national educational databases, representing diverse school characteristics. After selecting predictive attributes using Spearman correlation with historical enrollment data, we also apply graph theory to enhance our understanding of feature relevance. By modeling the data as a bipartite directed graph, we compare in-degree and out-degree centralities to identify which features are globally influential and which school-stage tuples require more features to be accurately modeled. We then compute Hamming distances on binary-encoded data to quantify similarity between pairs and apply hierarchical agglomerative clustering to reveal groups of educational stages that share contextual patterns. The resulting dendrogram enables visual inspection and supports the hypothesis that 3 to 5 major groups account for most of the contextual variability across schools.

Additionally, we apply the K-Medoids algorithm using the same Hamming-based distance matrix to complement the hierarchical clustering approach. This alternative unsupervised method provides representative medoids for each group, offering interpretable exemplars of contextual profiles. By comparing both clustering outcomes, we aim to enhance the robustness of the segmentation and validate the internal coherence of groups identified purely from predictive features.

Preliminary findings indicate that schools cluster not only by geographic region or size, as can be presumed, but also by subtle patterns in infrastructure, governance type, and grade configurations. These insights suggest the feasibility of scalable, data-driven strategies for educational policy-making, enabling more efficient planning and customization of federal programs.

By integrating domain knowledge, unsupervised learning techniques, and graph-theoretical analysis, this research contributes to the advancement of educational data science in the Global South and proposes a replicable analytical framework that respects local heterogeneity while promoting national-scale optimization.

Keywords: Clustering, machine learning, education.

Event: ICERI2025
Track: Digital Transformation of Education
Session: Data Science & AI in Education
Session type: VIRTUAL