J. Crespo Álvarez1, K. Tutusaus Pifarré1, C.L. Rodríguez Velasco2, A. Ortega Mancilla2
The WITH_YOU project tackles a persistent and multifaceted challenge in online education: student dropout in virtual learning environments (VLEs). Based on more than three years of applied research, the project developed an integrated platform that improves student retention and academic performance by combining dynamic modeling techniques with intelligent support mechanisms.
In most institutional contexts, dropout prediction relies on static data points and black-box models that offer limited transparency and interpretability. As a result, educators and administrators often struggle to make timely, data-informed decisions that lead to meaningful interventions. WITH_YOU responds to this challenge by an adaptive modeling and prioritizing interpretable machine learning, offering meaningful insights that can directly inform educational strategies.
The project is structured around three core objectives:
(1) to design clustering technologies that dynamically group students based on performance and engagement traits;
(2) to develop predictive models capable of identifying at-risk students early, using behavioral and academic indicators; and
(3) to integrate these models into operational tools that provide real-time, personalized support.
The methodology is organized into three operational blocks:
- Data Acquisition and Preprocessing: The project integrates data from Moodle logs, administrative records, and student demographics. A preprocessing pipeline built in Python (using pandas, MiniSom, scikit-learn, and imbalanced-learn) ensures data quality, normalization, and privacy. From an initial dataset of 280,000+ records, 44 core features were selected and normalized for analysis.
- Student Modeling: Clustering: SOMs are used to group students by performance and behavioral traits into interpretable profiles. These groups are later labeled using supervised methods and form the basis for predictive modeling.
- Classification: Machine learning models (e.g., Gradient Boosting, SVM) are trained to identify students at risk of dropout based on clustered profiles and past data.
These insights are operationalized through a suite of digital tools designed to support different actors within the learning ecosystem:
- CoachBot, a virtual assistant that provides students with personalized messages, reminders, and motivational feedback based on their profile and academic trajectory;
- AlarmBot, a monitoring tool that alerts instructors and administrators when a student exhibits early signs of disengagement or risk;
- and interactive dashboards for students, educators, and institutional leaders, which synthesize key indicators and offer actionable views of academic progress.
Preliminary results shows that the combination of SOM-based clustering and interpretable classifiers enables accurate identification of at-risk students, while preserving transparency—an essential requirement for responsible educational decision-making. The architecture is scalable, modular, and designed for deployment across various types of VLEs.
WITH_YOU offers a practical and interdisciplinary framework for addressing dropout in online education. By emphasizing interpretability, personalization, and integration into everyday teaching workflows, the project contributes to a more human-centered and effective approach to online education, aligned with the growing need for adaptive learning ecosystems.
Keywords: Student Dropout Prediction, Educational Data Mining, Self-Organizing Maps, Learning Analytics, Digital Learning Assistants.