ABSTRACT VIEW
Abstract NUM 2355

INTEGRATING WIDE-FIELD AND NARROW-FIELD IMAGING FOR ENHANCED BEHAVIORAL ANALYSIS IN CLASSROOMS
H. Mizoguchi1, K. Iso1, R. Sakurai1, H. Takemura2
1 Tokyo Information Design Professional University (JAPAN)
2 Tokyo University of Science (JAPAN)
In classrooms and training environments, it is essential to observe both the overall space and individual details. Narrow-field high-resolution imaging enables facial recognition, eye movement detection, and estimation of attention or fatigue. Detailed views of the mouth support emotion analysis, lip-reading, and language learning. Wide-field low-resolution imaging is vital for tracking movement, identifying body positions, and detecting body parts like hands and head. However, conventional camera systems force educators to choose between observing the entire class at once or zooming in to observe a single student in detail. This technical limitation makes it difficult for teachers to grasp the subtle needs of individual learners while simultaneously instructing a large group. Conventional zoom-capable cameras also face limitations such as high cost, large size, slow zooming, and difficulty tracking fast or multiple subjects. Systems combining fixed wide-view cameras with pan-tilt narrow-view cameras are also bulky, expensive, and limited to single-person tracking.

We propose using multiple high-resolution cameras arranged ad hoc to generate a stitched panoramic image. Images captured by multiple cameras are real-time distortion-corrected and integrated into a single panoramic image. This process is realized through computationally efficient image processing algorithms. This composite image provides a wide field with high resolution, and a low-resolution wide view is obtained by downscaling. Cropping specific regions from the high-resolution panorama simulates pan-tilt functionality, enabling flexible tracking. Users, such as teachers or researchers, can freely specify areas of interest from this panoramic image, and the system tracks those regions in high-resolution in real time. This allows the benefits of both fixed and zoom cameras to be integrated in software. This stitching technique is akin to panoramic photography, allowing for the reuse of existing image synthesis libraries. A preliminary experiment with two cameras will test the system’s feasibility, with later stages expanding to cover the entire room and track multiple moving individuals, including their facial expressions and hand movements.

This system will enable the precise and flexible observation of multiple individuals in educational settings. A primary application in this context is the visualization of attention and concentration. Our system can track learners' gaze in real time to efficiently identify moments of disengagement. Specific patterns, such as a prolonged downward stare, can be inferred as a lack of focus, which could trigger an alert. This data is also valuable for post-hoc analysis to refine teaching methods. Applications extend to remote learning, behavior analysis, human-computer interaction, and surveillance.

Keywords: Observation, learning support, teaching support, tracking, multiple targets, multiple students, lip reading and gesture recognition.

Event: ICERI2025
Session: Emerging Technologies in Education
Session time: Monday, 10th of November from 11:00 to 13:45
Session type: POSTER