ABSTRACT VIEW
Abstract NUM 28

DESIGN, PRE-TEST, REFLECT: AI-ASSISTED ASSESSMENT OF GRADUATE RESEARCH ASSIGNMENTS WITH LARGE LANGUAGE MODELS
R. Beinhauer
FH JOANNEUM (AUSTRIA)
This paper explores the use of large language models (LLMs) for grading several assignments in a master's level research methods course. The course was designed to develop students' competencies in academic reasoning, research design, methodological application, and reflective writing. Across multiple written tasks, students were required to engage with theoretical literature, apply frameworks to real-world problems, and articulate their decisions in structured academic formats. To evaluate the potential of AI-assisted grading in such a context, all assignments were reviewed using a large language model, with the goal of determining whether LLMs can provide meaningful, fair, and pedagogically valuable feedback on complex student work.

The primary focus of this study is one central assignment in which students were asked to design a quantitative survey, conduct a pre-test with real participants, and reflect on their design process and findings. This task required students to select a topic of personal or professional relevance, identify an appropriate theoretical model, and operationalize key constructs through measurable survey items. Using a digital tool such as QuestionPro, students built and tested their survey instruments, gathered pre-test feedback from a small pilot group, and revised their items accordingly. They then submitted their final survey along with a comprehensive written report documenting their methodological choices, pre-test outcomes, and revisions made.

The LLM assessed each submission based on a detailed evaluation rubric, which included criteria such as topic relevance, theoretical foundation, quality of item design, integration of pre-test feedback, and formal academic writing. The model generated thorough formative feedback, particularly effective in identifying inconsistencies between theoretical constructs and survey items, flagging vague or biased question wording, and commenting on the logic of survey structure. In many cases, the AI’s comments mirrored those of an experienced human grader, showing a strong capacity to interpret academic intent and assess quality.

However, some differences in grading outcomes were observed. In certain cases, the AI assigned scores that were either slightly higher or lower than those later confirmed by human review. These discrepancies often occurred in submissions that included creative approaches or interdisciplinary elements, which the AI sometimes struggled to evaluate fairly. Such findings reinforce the importance of combining AI-generated assessment with human oversight to ensure that grading remains sensitive to academic nuance and context.

Overall, this case demonstrates the potential of LLMs to support the assessment process in graduate education. For assignments that involve multiple phases of design, testing, and reflection, AI can significantly enhance feedback quality and reduce instructor workload. While not a full substitute for expert judgment, LLMs offer a promising complement to traditional grading, particularly in helping students receive faster and more detailed responses that can guide their academic development.

Keywords: Assessment, LLM, AI supported evaluation.

Event: ICERI2025
Session: AI for Assessment and Feedback
Session time: Tuesday, 11th of November from 15:00 to 16:45
Session type: ORAL