Measuring Student Cognitive Engagement with GenAI-based Tutor Conversations-Poster
Measuring cognitive engagement in AI tutor conversations requires moving beyond traditional behavioral metrics like conversation length. Using the ICAP framework [1] we developed a scalable, reliable labeling method to classify engagement (Passive, Active, Constructive). Two human raters independently coded 200 STEM-focused conversations, achieving high inter-rater reliability (Krippendorff’s Alpha = 0.82). We then trained an LLM-as-a-judge, which closely matched human labels (0.77 reliability), enabling large-scale automation. This approach provides a robust, scalable solution for analyzing cognitive engagement in GenAI tutor interactions, paving the way for improved AI tutor design and student learning insights.