ABSTRACT VIEW
PHOTO-BASED LEARNING REFLECTION SUPPORT SYSTEM USING LARGE VISION-LANGUAGE MODELS
K. Maruyama1, Y. Morimoto2
1 Tokyo Gakugei University, The United Graduate School of Education (JAPAN)
2 Tokyo Gakugei University, ICT/Information Infrastructure Center (JAPAN)
It is necessary for students to reflect on their progress to engage in learning proactively. In recent years, photos have been used as a way of reflecting on one’s learning. Students choose a photo that represents something they think they did well or something new they have acquired, and then circle the part of the photo that represents those aspects. Then, they typically write a reflection on their learning based on the photo, along with the reason they chose it, explaining what they felt they were able to do well, or what new things they were able to do.

However, it is not easy for teachers and/or facilitators to identify the types of photos each student chooses. Even if it were possible to understand the type of photo that students had chosen, it is not easy to facilitate students in reflecting on their learning based the content of photo they select.

The purpose of this study is to support reflection on learning using photos. Specifically, we propose a method that facilitates reflection on learning by utilizing a large vision-language model (LVLM) to classify images and generate reflection prompts. We then develop a system based on the proposed method, aiming to support students in reflecting on their learning using photos.

In our proposed method of using an LVLM to facilitate reflection on learning using photos, students reflect on their learning using photos in the following steps:
1) Taking photos: Students take photos of learning activities or records (learning materials, handouts etc.) while engaged in learning.
2) Selecting/uploading photo: Students select a photo that depicts something they think they did well or something new they have acquired as their learning evidence. The students then circle the part of the photo which depicts these aspects and upload the photo to an LVLM.
3) Classifying photo: The LVLM classifies the photo based on whether the uploaded photo depicts something they think they did well, something new they have acquired, or both.
4) Generating/providing reflection prompts: The LVLM generates reflection prompts based on the classification of the photo and the learning situation or content that the photo depicts. Then, the model provides the prompts to the students.
5) Reflecting on learning: Students reflect on why they chose the photo, the reflection prompts provided, and their outlook for future learning based on this learning/reflection.
6) Writing the reflections: Students write their reflections and record them with the corresponding photos.

To develop a system based on our proposed method, we designed photo classification function that load the photos selected by students, classifies the types of the photos (Function 1), and function for generating reflection prompts based on the classification results and the learning situation or content depicted in the photos (Function 2).

We then evaluated each function. From the results of Function 1, the recall rate for the photo classification task was high and there were few errors. In addition, by combining the results of the classification task of whether a red or blue circle is included, it is possible to determine which photo the student is looking back on. From the results of Function 2, we determined that for the generated reflection prompts were overall appropriate for the types of photos and the content and situation depicted in them.

In the future, we plan to evaluate the effectiveness of the developed system.

Keywords: Reflection on learning, photos, reflection prompts, large vision-language models, image classification, text-generation.