ABSTRACT VIEW
A HIGHLY CONFIGURABLE DATA ANALYTICS PIPELINE FOR LEARNING
J.A. Santana, P. Thajchayapong, S. Rugaber, A. Goel
Georgia Institute of Technology (UNITED STATES)
Learning analytics has become an increasingly important element of modern educational systems, forming the foundations for personalized learning. By transforming raw data into actionable insights, learning analytics can provide both students and educators with an informed perspective of student progress, effective learning strategies, and optimal allocation of time and resources.

Educational data often comes from a multitude of disparate sources, including administrative databases and learning management systems. However, this data is often fragmented across silos, subject to regulatory constraints, and inconsistent in quality. These challenges emphasize the need for a robust data architecture that can effectively extract, standardize, store, analyze, and visualize educational data. To ensure the scalability and resiliency needs of modern educational systems are met, such an architecture must integrate analytics as a configurable, structured pipeline that ingests, processes, and produces analysis results. This paper focuses on the design and implementation of the analytics component of the described data architecture.

Despite the widespread use of configurable data pipelines in enterprise settings, research on the architectural considerations of analytics for data-driven learning systems remains limited. We propose that an event-driven data analytics pipeline constitutes a Highly Configurable System (HCS) which provides an effective and adaptable framework for educational research. By leveraging a modular architecture with a front-loaded declarative configuration layer, our approach is designed to enhance extensibility, usability and efficiency, allowing domain researchers to conduct and replicate analyses on structured and semi-structured data without extensive code modifications.

We demonstrate our architecture through executing the pipeline fed with data from various learning sessions conducted through the Georgia Tech Online Master of Science in Computer Science (OMSCS) program. To validate the merits of parameterization, we compare the results of the configuration-driven analyses with the original static, script-based analyses. Through this effort, we derive templates with varying degrees of specificity, allowing analysis replication for comparable use cases. We also employ a set of metrics for evaluation, including variability modeling to measure the configuration schema’s structural complexity and illustrate the likelihood of specific feature arrangements, as well as function point analysis as a measure of effort-to-match previously validated analyses. Our work aims to establish a reusable learning analytics pipeline that can be reconfigured to the evolving demands of learning environments with minimal programming effort.

Keywords: Learning, analytics, personalization, education, technology, cloud, meso-learning, configuration, configurable, hcs, variability, extensibility, usability, efficiency, declarative, structured, semi-structured, architecture, parameterization, configuration-driven, CDD, modeling, pipeline.

Event: EDULEARN25
Track: Digital & Distance Learning
Session: Learning Analytics & Educational Data Mining
Session type: VIRTUAL