Search
On-Site Program Calendar
Browse By Day
Browse By Time
Browse By Person
Browse By Room
Browse By Unit
Browse By Session Type
Search Tips
Change Preferences / Time Zone
Sign In
Bluesky
Threads
X (Twitter)
YouTube
Introduction: Knowledge syntheses, or literature reviews, are foundational to educational scholarship in the professions, consolidating findings to develop and refine theories and practices. Data extraction, integral to knowledge syntheses, is labor-intensive, requiring human researchers to systematically gather detailed information across multiple manuscripts. Recent advances in artificial intelligence (AI), particularly large language models (LLMs), offer potential efficiency improvements but raise significant concerns about accuracy. Specifically, distinguishing AI-generated "hallucinations" – fabricated or incorrect content – from legitimate interpretative variability driven by subjective judgments is crucial to assessing AI’s suitability for data extraction.
Methods: We developed an extraction platform, MAKMAO (Machine-Assisted Knowledge extraction, Multiple-Agent Oversight), utilizing LLMs for automated data extraction. We evaluated extraction accuracy by comparing AI-generated responses with human responses across 187 manuscripts from a published scoping review in medical education. We measured consistency using interrater reliability for categorical responses and thematic similarity ratings for open-ended responses. Human-human consistency was assessed through manual re-extraction of a targeted subset of data, providing a comparative benchmark. Additionally, AI-AI consistency was evaluated through repeated extractions of identical question/publication pairs to explore variability and interpretability.
Results: MAKMAO demonstrated high consistency with human responses for straightforward extraction questions explicitly addressed within manuscripts (e.g., title, aims). However, consistency decreased for questions requiring subjective interpretation or lacking explicit manuscript descriptions (e.g., Kirkpatrick’s outcomes, methodological rationale). Notably, human-human comparisons revealed similar patterns of variability, suggesting that interpretive differences among human researchers significantly contributed to observed discrepancies. AI-AI consistency further reinforced the conclusion that interpretability, rather than hallucination, was the predominant source of variability, with repeated AI extractions effectively flagging interpretative complexity without necessitating extensive human input.
Discussion: Our findings indicate that variability in AI-assisted data extraction predominantly stems from interpretative complexity rather than hallucination. This interpretive variability mirrors human extraction practices, highlighting intrinsic subjectivity in knowledge synthesis tasks. Consequently, while AI holds promise as a transparent and reliable partner in knowledge synthesis, caution is warranted regarding over-reliance on AI-generated interpretations. Excessive reliance on AI might inadvertently neglect critical human insights, contextual knowledge, and expertise essential to nuanced understanding.
Leveraging repeated AI extractions allows researchers to identify and refine questions prone to interpretive ambiguity. By distinguishing legitimate interpretative variability from undesirable ambiguity or hallucination, researchers can better integrate AI assistance strategically, preserving methodological rigor while enhancing efficiency.
Conclusion: This study underscores the necessity of critically evaluating AI-generated data extraction, highlighting interpretability as a pivotal factor influencing consistency across both human and AI extractors. AI platforms like MAKMAO, particularly when used iteratively, can assist researchers in identifying and clarifying interpretative complexities inherent in knowledge synthesis tasks. Thoughtful integration of AI thus offers opportunities to systematize extraction processes, standardize responses, and significantly improve the efficiency and depth of knowledge syntheses in educational research.
Xi Long, University of Illinois at Chicago
Christy K. Boscardin, University of California - San Francisco
Lauren Maggio, University of Illinois at Chicago
Joseph Costello, University of Illinois at Chicago
Yoon Soo Park, University of Illinois at Chicago
Ralph Gonzales, University of California - San Francisco
Rasmyah Hammoudeh, University of California - San Francisco
Ki Lai, University of California - San Francisco
Brian Christopher Gin, University of Illinois at Chicago