Abstract View

ABSTRACT VIEW

Abstract NUM 723

CAN WE TRUST OPEN-SOURCE LARGE LANGUAGE MODELS? AN EVALUATION ON FAITHFULNESS TO SUPPORT LEARNING

E. Carbajal-Degante, J.G. Moreno-Salinas

National Autonomous University of Mexico (MEXICO)

Privacy and transparency rank among the most pressing concerns in AI adoption, particularly when selecting platforms that adhere to ethical frameworks and ensure safe data handling. These challenges become especially critical in educational settings, where teachers and researchers must balance rigorous information management with the protection of sensitive content, like personal student information or proprietary academic data. In recent years, open-source large language models (LLMs) emerge as a promising solution, offering wide range of capabilities from simple interaction to more sophisticated integration with other applications, within the reach of users and developers. This study examines the implementation of a Retrieval Augmented Generation (RAG) system, a widely adopted tool in the LLMs era. We offer insights into parameter tunning and architecture details on a teacher-centered methodology. We conduct a comparative analysis between some open-source (Qwen, Gemma-2, Mistral) and commercial models (two up-to-date GPT versions) to evaluate their respective performance. In our experiment, we adapted the QASPER benchmark, which comprises a set of questions and answers derived from scientific research papers on natural language processing topics. We adjusted this benchmark for evaluating only the material that include text data to obtain 171 papers serving as a context. Subsequently, answers were measured by Relevancy and Faithfulness metrics. Our question-answering evaluation revealed strong performance in answer relevance across all RAG-generated responses, with an 80% match to ground truth answers. This demonstrates high relevance with minimal redundancy. In contrast, open-source LLMs show higher faithfulness than commercial counterparts, making them well suited for accuracy-critical educational applications. Our goal within this project is to develop tailored interactive learning tools by integrating top-performing models into a unified workflow. This approach first prioritizes privacy and open design allowing educators to participate actively in parameter settings for effective classroom integration in a teacher-controlled environment that can be hosted on the institutional infrastructure.

Keywords: Artificial Intelligence, Trust, Open-Source, Retrieval Augmented Generation, Large Language Models, Learning.

Event: ICERI2025
Session: Emerging Technologies in Education
Session time: Monday, 10th of November from 11:00 to 13:45
Session type: POSTER