Measuring Student Cognitive Engagement with GenAI-based Tutor Conversations-Paper

Measuring cognitive engagement in AI tutor conversations requires moving beyond traditional behavioral metrics like conversation length. Using the ICAP framework [1] we developed a scalable, reliable labeling method to classify engagement (Passive, Active, Constructive). Two human raters independently coded 200 STEM-focused conversations, achieving high inter-rater reliability (Krippendorff’s Alpha = 0.82). We then trained an LLM-as-a-judge, which closely matched human labels (0.77 reliability), enabling large-scale automation. This approach provides a robust, scalable solution for analyzing cognitive engagement in GenAI tutor interactions, paving the way for improved AI tutor design and student learning insights.

See the Resource

Previous
Previous

Measuring Motivation, Improving Engagement

Next
Next

New Normal in Early Elementary Mathematics Learning: Part III – Learning from Variation in Grade 3