Ø. Marøy
Advances in large language models (LLM) are reshaping possibilities for applying artificial intelligence (AI) in physics education. A significant potential application lies in the generation of physics problems and assessment tasks, which are essential for evaluating students’ conceptual understanding, guiding instruction, and accommodating diverse learning needs. Despite the recognized importance of task creation in effective teaching practice, the process remains time-intensive and often poses challenges in producing novel and pedagogically sound problems. While concerns persist regarding ethical implications and the limitations of AI-generated content, emerging research indicates that physics problems produced by LLMs can be comparable in quality to those found in traditional textbooks.
This study investigates the use of a large language model (LLM), specifically Copilot, to support the generation of physics tasks in the domain of basic mechanics. Our analysis focuses on evaluating the quality, relevance, and pedagogical value of AI-produced problems in comparison to conventional problem sets. In particular, we examine the LLM’s capacity to generate problems across a spectrum of difficulty levels and explore how the model’s internal assessments of problem difficulty align with actual student performance outcomes. By analyzing the correspondence between the LLM’s perceived difficulty ratings and empirical student results, we aim to assess the potential of such models to contribute meaningfully to differentiated instruction and adaptive learning environments. Through this investigation, we seek to provide insights into the practical integration of LLMs like Copilot in physics education and to identify both the affordances and limitations inherent in leveraging AI for instructional design and assessment development.
Keywords: Education, Physics, LLM, AI.