Search
On-Site Program Calendar
Browse By Day
Browse By Time
Browse By Person
Browse By Room
Browse By Unit
Browse By Session Type
Search Tips
Change Preferences / Time Zone
Sign In
Bluesky
Threads
X (Twitter)
YouTube
Introduction
Mounting evidence that AI large language models (LLMs) can solve diagnostic reasoning problems has fueled aspirations of using AI to teach clinical reasoning to learners (Gin et al. 2025). However, evidence that LLMs can function effectively in the role of a “virtual preceptor” (VP) is scant. We propose to evaluate and refine AI-generated feedback provided by openVP – an open-source, open-access virtual case simulator we developed – to address the question: Can an AI-based “virtual preceptor” provide learners with effective and actionable clinical feedback that promotes critical thinking?
Entrusting AI as a “virtual preceptor” to guide learners’ development requires critical assessment of AI’s pedagogical performance. Çiçek et al (2025) examined whether an LLM could provide medical students feedback on virtual patient encounters but did not measure the effectiveness of AI assessing clinical reasoning. AI’s ability to assess learners and provide quality feedback remains largely unproven. Our project focuses on assessing and refining the quality of feedback provided by openVP, an open-source, open-access platform we developed to assess and teach clinical reasoning skills and deliver personalized, real-time feedback to learners in simulated patient encounters.
This project addresses four intertwined research questions (RQs):
Examine AI feedback quality compared to human standards,
Evaluate how different preceptor models affect AI feedback quality,
Improve AI feedback quality via preceptor model optimization and reinforcement,
Explore alignment between learners’ experience with openVP and their clinical reasoning skill development.
Methods
In openVP, learners engage in case scenarios facilitated by an LLM-based “virtual preceptor” which, via an oral boards style question-answer format, prompts the learner to gather data (e.g., history/exam), provides answers to these queries, and asks the learner to formulate assessments/plans. And two types of feedback can be provided via openVP: on-the-fly feedback and summative end-of-case feedback.
Our consortium consists of nine academic healthcare institutions representing a range of undergraduate and graduate learners. OpenVP will adapt and implement simulated case scenarios already in use at each site. These prompts provide openVP’s multiagent LLM engine with the clinical background to facilitate a text (or voice) dialog with the learner. De-identified transcripts will be collected from learner interactions with openVP cases, which capture all learner and AI-generated dialog. A post-scenario survey will be administered to assess learners’ perceptions of the case, AI-generated feedback, and their development of clinical reasoning skills.
Expected Outcome
To compare the quality of AI-generated and human feedback (RQ1), we will redact AI-generated feedback from learner transcripts and task human preceptors with creating feedback in its place. We will investigate how different preceptor models (e.g., One Minute Preceptor) may affect AI-generated feedback quality (RQ2). By comparing AI to human feedback on the same case scenarios, we will enact a reinforcement learning with human feedback (RLHF) strategy to fine-tune LLM feedback generation (RQ3). To compare learner outcomes with their performance in openVP (RQ4), we will identify potential correlations and thematic overlap between feedback from openVP and local assessment data.
Kuan Xing, University of Iowa
Flávia Oliveira, Centro Universitário de Mineiros (UNIFIMES)
Lukas Shum-Tim, McMaster University
Pooja Varman, Case Western Reserve University
Matthew Kaminsky, University of Chicago
Xiaomei Song, Case Western Reserve University
Brian Christopher Gin, University of Illinois at Chicago