Abstract View

ABSTRACT VIEW

CAN ARTIFICIAL INTELLIGENCE IMPROVE THE QUALITY OF EXAMS?

J. Soler-Rovira, J.M. Arroyo-Sanz, L. Galvez-Paton, D. Palmero-Llamas, R. Linares-Torres, C. Gonzalez-Garcia, J. Novillo-Carmona, A.F. Obrador-Perez

Universidad Politécnica de Madrid (SPAIN)

Exams must evaluate learning in a valid, reliable and meaningful way. To do this, the evaluation tests must meet a series of requirements such as variety, discrimination capacity, difficulty level, absence of bias, etc. Artificial Intelligence (AI) tools can be useful in the task of designing these tests. Thus, our group of teachers set the objective of evaluating the use of AI to improve the quality of the tests carried out in their subjects. In this way, in the second semester of the 2023/24 academic year, two tests were carried out in two subjects designed by the teachers. Once completed, they were evaluated in terms of level of difficulty and discrimination capacity. Items that were very easy or very difficult or that poorly discriminated the students' learning were formulated again with the help of the AI Copilot. This AI is according to itself: “Microsoft Copilot is an artificial intelligence assistant designed to help users by providing information, answering questions and participating in conversations. I use technology like GPT-4 and Bing Search to provide relevant and useful answers.” A prompt was designed to ask the AI to write multiple choice questions. These new items reformulated by t he AI were use in the next exam. The quality of the new items reformulated by the AI were evaluated in terms of level of difficulty and discrimination capacity. The level of difficulty improved in 50% of the reformulated items and worsened in 15%. The discrimination capacity of the test items improved in 57% of them and worsened in 21%. The use of AI in correcting and improving the quality of evaluation tests is a good practice, but the Prompt used must be improved by asking to the AI about its review and optimization.

Keywords: Specific competencies, Learning evaluation, Artificial intelligence, Progressive evaluation.