Search
On-Site Program Calendar
Browse By Day
Browse By Time
Browse By Person
Browse By Room
Browse By Unit
Browse By Session Type
Search Tips
Change Preferences / Time Zone
Sign In
Bluesky
Threads
X (Twitter)
YouTube
Objective. Collaboration is critical to student success, yet students often lack the skills to engage productively in small group work. Automated systems that assess student collaborative discourse can support teachers in guiding more effective collaboration. Our goal was to develop generalizable (across domains) AI models that can automatically identify aspects collaboration during small group learning in authentic classroom environments.
Framework. We draw on the Collaborative Problem Solving (CPS) framework (Fiore et al., 2018) which conceptualizes collaboration as a set of observable social and cognitive processes (e.g., constructing shared knowledge; negotiation & coordination), and OpenSciEd’s (https://openscied.org/) set of community agreements (CAs) pertaining to idealized classroom collaboration: Being Respectful, Moving Thinking Forward, and Committed to our Community. The CPS framework is composed of behavioral indicators that we mapped onto the three community agreements that we aim to automatically identify from student discourse (see Table 1).
Data and Methods. We trained and evaluated Natural Language Processing (NLP) AI models on five datasets representing diverse collaboration contexts, including sensor programming, an educational physics game, block programming (Minecraft), gaming system moderation, and model car assembly and programming (Table 2). Our primary training dataset was drawn from middle school classrooms, and test datasets spanned both K-12 and university settings. The following modeling strategies were evaluated: a baseline fine-tuned RoBERTa model, an augmented RoBERTa model trained on synthetically varied utterances, a Mistral large language model (LLM) prompting approach incorporating few-shot examples, and a support vector machine (SVM) classifier using Mistral embeddings as input. Models were trained and evaluated with both human and automated speech recognition (Whisper ASR) transcripts and assessed based on their ability to generalize across datasets.
Results. Results are presented in Table 3. The baseline RoBERTa model performed well within the training context but failed to generalize to unseen datasets, highlighting issues of overfitting to curriculum-specific language. The augmented RoBERTa model and Mistral + SVM models both demonstrated improved generalization across all test datasets. These models consistently outperformed the baseline, particularly in detecting the Community and Thinking agreements, which were more sensitive to context-specific language than the Respect agreement. Finally, despite being a LLM, the few-shot Mistral approach struggled to learn and apply the qualitative patterns required by the CA framework. Compared to human transcripts, all models showed some decline when testing with automated transcripts, suggesting robustness to transcript quality remains a challenge.
Significance. This work demonstrates the feasibility of generalizable models of student collaborations from noisy speech in real-world classrooms. It has direct application in AI-enhanced tools that aim to enhance collaborative learning, such as the Community Builder (CoBi), which provides real-time formative feedback on the CAs to facilitate reflection on classroom collaboration (Briedeband et al., 2023). Our findings support broader efforts to balance model accuracy with generalizability (Pugh et al., 2022, Ganesh et al., 2024). Overall this work represents a step forward in lightweight, scalable support for student collaboration and aligns with the broader goal of integrating machine learning into educational tools for multiple learning contexts.