ABSTRACT VIEW
THE EVOLUTION OF COMPUTER SCIENCE EDUCATION RESEARCH: TRENDS, TOPICS, AND GAPS
S. Parsons, N. Khuri
Wake Forest University (UNITED STATES)
The overarching aim of meta-research is to increase reproducibility, replicability, rigor and transparency of research enterprise. In Computer Science Education, meta-research can illuminate emerging trends, identify gaps, and inform evidence-based research and teaching practices. To date, meta-research studies in Computer Science Education have been limited in scope and depth. Prior studies had been fragmented, conducted manually, or using small corpora. This study addresses this limitation by conducting a large-scale text analysis of over 80 million academic works from the Semantic Scholar Open Research Corpus and Scopus. The main objectives were to discern major research topics and publication trends in Computer Science education and uncover factors influencing citation rates of these publications. First, state-of-the art methods, such as dynamic topic modeling and similarity network modeling of context-aware word embeddings, were used to mine publication abstracts. As a result, two major areas of research in Computer Science education were discovered. These areas were further stratified into 55 sub-areas, which were analyzed to identify distinct research methods and teaching pedagogies that were commonly reported in published works. Results show that most publications focused on constructivist approaches and on introductory courses, and that published reports about diversity and inclusion efforts and pedagogies focused mostly on female students. Over time, there was an increased emphasis on teaching techniques and inclusive learning, growing efforts in educational research on instructional interventions and their replication, and decreased interest in professional development and certification.

Next, bibliometric, lexical and semantic factors were extracted from published abstracts and used to train machine learning predictors of citation rates. Results show that number of authors and conference names were most influential factors of citation rates. Gender under-representation and regional disparities were revealed by the machine learning analysis of author’s names and affiliations. However, lexical and semantic features extracted from publication abstracts were better at predicting the citation rates than these bibliometric factors. Machine learning predictors had small mean absolute errors of 5 to 7 citations.

In conclusion, the methodology, tools and visualizations presented in this research enable future fine-grained analyses of the growing body of Computer Science education research, identifying trends, topics and gaps in publications. This work can contribute to the advancement of the field and ultimately improve the quality of computer science education worldwide.

Keywords: Citation rates, computer science education research, natural language processing, text analysis, visualization.

Event: INTED2025
Track: STEM Education
Session: Computer Science Education
Session type: VIRTUAL