Individual Submission Summary
Share...

Direct link:

Evaluating and Improving LLMs on Pedagogy

Sun, March 29, 2:45 to 4:00pm, Hilton, Floor: Fourth Floor - Tower 3, Union Square 18

Proposal

Prof. Mrinmaya Sachan leads the Language, Reasoning and Education (LRE) Lab in the Department of Computer Science at ETH Zurich. The LRE lab focuses on Natural language processing and the interface of Machine learning and Education. One of the core foci of our work is on reimagining Intelligent Tutoring Systems with Large Language Models.

Intelligent Tutoring Systems (ITSs), i.e., computer based systems designed to assist student learning with intelligent algorithms that instantiate complex learning principles and continuously adapt to them, have been proposed as a technical solution to this global problem. However, despite their conceptual elegance and decades of RnD, only a handful of ITSs are being used today. ITSs have been expensive to build, requiring the designer to painstakingly capture intelligent behavior in various pedagogical settings. The rigid design of these ITSs confines them to domains associated with pre-defined problem-solving procedures, limiting their applicability to open-ended problems.

Large language models (LLMs) present the potential to reduce the engineering demands in building ITSs. However, LLMs’ opaque nature, tendency to invent facts and their confident inaccuracies makes it hard for students and educators to trust and safely use them. These models are expensive to develop, deploy, and use without losing private student data to big industry players. LLMs are also not trained to be pedagogical [Kasneci et al., 2023] – they struggle to understand students’ knowledge gaps and provide tailored assistance that enables them to achieve long-term learning instead of short-term success in problem-solving. Unlike teachers, LLMs cannot track the evolving knowledge of the student and thus, cannot cater to their precise needs, e.g., by recommending what to learn next. Finally, LLMs find it hard to adapt the style and complexity of their responses to different population age-groups akin to human teachers.

To address the above limitations of ITSs, the LRE Lab explores innovative ways of developing ITSs. While existing ITSs require programming intelligent behavior, we turn to LLMs that have shown an impressive ability to reason about educational material. Our research is grounded in sound theories from learning sciences such as the Socratic method [Copeland, 2005] that aim to develop deep understanding among students instead of just short-term problem solving success, and uses cutting edge techniques from Natural Language Processing and Educational Data Mining to automatically generate personalized and adaptive instructional conversational support for students. With a focus on middle-school mathematics, we rigorously benchmark our work through a multitude of evaluations, including automatic [Macina 2025 (see MathTutorBench)] and human evaluations, surveys, and interactive simulations with learner models [Macina 2023 (MathDial)]. Then, we also implement and pilot our tutors in the university context [Li 2025 (see the Ethel project at ETH Zurich)] as well as middle-school classrooms [Vanzo 2025 (see our Pilot study in a school in Verona)] which is the final stamp of approval of the efficacy of our methods. This helps us measure the benefits of our tutors for student learning, both in terms of short-term problem solving success as well as long-term learning gains.

Author