Search
On-Site Program Calendar
Browse By Day
Browse By Time
Browse By Person
Browse By Room
Browse By Unit
Browse By Session Type
Search Tips
Change Preferences / Time Zone
Sign In
Bluesky
Threads
X (Twitter)
YouTube
In meta-analyses and literature reviews, it is crucial to understand the quality of the reviewed studies, but this is an effortful procedure. Automating it using Large Language Models (LLMs) might help increase this procedure’s efficiency. This study compared how a human and two LLMs rated the quality of educational intervention studies based on a standardized tool. The agreement among all three raters ranged from none to moderate in different quality criteria (-.06 ≤ κ ≤ .59). The disagreements among human and LLM raters stemmed from missing information in the papers, legitimate diverging considerations, misidentifications by the LLMs, and LLMs’ preference of middle-quality categories. This study demonstrates the potential and limitations of using LLMs to evaluate the quality of educational research.