Search
On-Site Program Calendar
Browse By Day
Browse By Time
Browse By Person
Browse By Room
Browse By Unit
Browse By Session Type
Search Tips
Change Preferences / Time Zone
Sign In
Bluesky
Threads
X (Twitter)
YouTube
Objectives
The potential of artificial intelligence (AI) technologies, especially large language models (LLMs), to improve the quality of online professional development (PD) for educators is immense. Nevertheless, challenges to online PD platforms persist, including their limited diagnostic efficiency and insufficient personalized feedback, which can detract from the learning experience for teachers (e.g., Powell & Bodur, 2019). To address these issues, we explored the potential of using AI to enhance teachers’ content-specific expertise in an inquiry-based learning environment.
Perspectives
Our framework was guided by multi-agent frameworks in LLMs. In these frameworks, agents refer to entities that can exchange messages and information with one another, with each agent assigned a specific role (Wu et al., 2023). In such frameworks, multiple AI-based agents can leverage LLMs (Shinn et al., 2024; White et al., 2023) and simulate human actions, such as the ability to discuss, cooperate, and debate to collectively diagnose a teacher’s learning.
Methods
This framework is designed to identify teachers’ mastery of the content through interactions with the system. When given data on users’ interactions, expert-developed activities, and the learning objectives for each activity, the agents can understand all the information and identify which objectives the users have mastered. The multiple agents converse as a group and refine the results they have identified through discussion and debate.
Our system is based on agents with GPT-4 (Achiam et al., 2023), the most advanced and novel LLM available today. In this system, one to five LLM-based agents discuss and debate by exchanging messages about the results they have identified, and then reach a consensus as the final result.
Data
To test the performance of our multi-agent system, we conducted a secondary analysis of data collected from teachers who had completed an AI-based PD program (Author, 2023). In particular, we used the teachers’ responses to nine different activities. To compare the effectiveness of our approach, we also included the language model BERT (Bidirectional Encoder Representations from Transformers), a popular but non-LLM AI method (Devlin et al., 2019). The users’ data were also coded independently by human raters so that we could calculate the accuracy of the AI methods in understanding the users and providing feedback.
Results
We compared the expectations of knowledge identified in user interactions between the agent-based methods and human raters. Our findings indicated that the agent-based methods more accurately identified the expectations of users’ knowledge. On average, the kappa statistic for human raters versus agent-based raters was 0.65 (ranging from 0.50 to 0.83), whereas this value was 0.58 for the BERT (with a range of 0.41 to 0.77 for the nine activities). This result suggests that the LLM-based multi-agent framework significantly outperformed prior AI models and that its consensus was closely aligned with the results of human raters.
Significance of the Work
Our findings provide evidence that the multi-agent LLM framework can handle complex responses. These findings have important implications for the use of multi-agent frameworks to create PD systems that can interact with teachers and help them gain more complex skills.