EARLY DETECTION OF AT-RISK UNIVERSITY STUDENTS USING MACHINE LEARNING: A STUDY OF MODEL PERFORMANCE IN EVOLVING ACADEMIC ENVIRONMENTS
D.M. Rodrigues1, A. Lopes2, R. Mauritti1, M. Roque Ferreira1, S. Pintassilgo1
This study explores the early detection of at-risk students using machine learning (ML) models trained on historical academic data. We constructed two datasets from university enrollment records spanning the 2016/2017 to 2022/2023 academic years, focusing on first-time enrollees in their first curricular year. The first dataset included only pre-enrollment information available at the start of the academic year, while the second incorporated first-semester performance data. Each dataset was used to predict two target outcomes: academic success and dropout. We evaluated multiple ML models, including Random Forest, Support Vector Machine (SVM), Naïve Bayes, Decision Tree, AdaBoost, K-Nearest Neighbors (KNN), and Logistic Regression. To optimize model performance, we employed Optuna for hyperparameter tuning, conducting hundreds of trials per algorithm. The best-performing models were tested on two datasets: one from historical student data (2016/2017–2022/2023) and another from the 2023/2024 academic year, reflecting a real-world shift due to a newly implemented university intervention strategy aimed at providing personalized support for at-risk students. Although the intervention itself was not included as a model variable, it may have influenced student success and dropout rates in 2023/2024, potentially impacting model predictions. By comparing performance on pre- and post-intervention data, this study assesses the robustness and generalizability of ML-based early warning systems in dynamically evolving academic environments.
Keywords: Machine Learning, Early Warning Systems, Student Dropout Prediction, Academic Performance Analysis, Intervention Impact.