ABSTRACT VIEW
Abstract NUM 1889

AN EXPERIMENTAL ANALYSIS OF PREDICTIVE MODELS TO FORECAST SCHOOL ENROLLMENT DEMAND IN PUBLIC EDUCATION
L. Praxedes1, L. Chaves e Silva2, B. Duarte3, B. Almeida Pimentel3, N. Cruz3, R. de Amorim Silva3
1 Universidade do Estado do Rio Grande do Norte (BRAZIL)
2 Universidade Federal Rural do Semiárido (BRAZIL)
3 Universidade Federal de Alagoas (BRAZIL)
This study tests Machine Learning models to predict school enrollment demand in Basic Education. This is a real challenge for public policy management in Brazil, where school dropout causes significant social and economic problems. For this, we used a sample of 10% of schools from the INEP Basic Education Census data, covering the period from 2011 to 2021.

The method followed a clear and repeatable computer process, starting with detailed data preparation to ensure data quality. This step included cleaning features, changing some variables (logarithmic transformation), fixing extreme values (outliers) using the Winsorization technique, and filling in missing data using the median, which works well for different data distributions.

In the modeling step, we compared three different algorithms to test different prediction methods: a linear model (Linear Regression) as a baseline, and two team-based models known for high performance: Random Forest and LightGBM. We selected the most important features using the Mutual Information metric to find non-linear relationships.

The performance evaluation was about more than just simple accuracy (R²). We focused on the size and type of prediction errors using a set of metrics: Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and Mean Absolute Percentage Error (MAPE).

The results showed that the LightGBM algorithm was much better. It achieved an average R² score above 0.90 and had not only the best precision but also consistently low errors across all 10 school levels analyzed. Its performance proves it is very reliable for practical use.

We conclude that this method, which combines detailed data preparation with a complete error check, is essential to approve and turn predictive models into practical and very safe tools for decision-making in educational management. For future work, we suggest using eXplainable AI (XAI) techniques, like SHAP, to understand what drives demand and doing a pilot project to test the model in a real-world scenario.

Keywords: Educational Data Mining, Demand Forecasting, Machine Learning, Error Metrics, LightGBM.

Event: ICERI2025
Track: Digital & Distance Learning
Session: Learning Analytics & Educational Data Mining
Session type: VIRTUAL