AERA Annual Meeting: Large Language Models as Educational Research Critics: A Human-AI Comparison

Information Menu
Search Tips

Navigation and Settings Menu
Change Preferences / Time Zone
Sign In

Back Home

Refresh: Off

Paper Summary

Share...

Direct link:

Large Language Models as Educational Research Critics: A Human-AI Comparison

In Event: Leveraging Artificial Intelligence and Automation Tools for Systematic Reviewing

Fri, April 25, 1:30 to 3:00pm MDT (1:30 to 3:00pm MDT), The Colorado Convention Center, Floor: Meeting Room Level, Room 704

Abstract

In meta-analyses and literature reviews, it is crucial to understand the quality of the reviewed studies, but this is an effortful procedure. Automating it using Large Language Models (LLMs) might help increase this procedure’s efficiency. This study compared how a human and two LLMs rated the quality of educational intervention studies based on a standardized tool. The agreement among all three raters ranged from none to moderate in different quality criteria (-.06 ≤ κ ≤ .59). The disagreements among human and LLM raters stemmed from missing information in the papers, legitimate diverging considerations, misidentifications by the LLMs, and LLMs’ preference of middle-quality categories. This study demonstrates the potential and limitations of using LLMs to evaluate the quality of educational research.

Large Language Models as Educational Research Critics: A Human-AI Comparison

Abstract

Authors