Paper Summary
Share...

Direct link:

Large Language Models as Educational Research Critics: A Human-AI Comparison

Fri, April 25, 1:30 to 3:00pm MDT (1:30 to 3:00pm MDT), The Colorado Convention Center, Floor: Meeting Room Level, Room 704

Abstract

In meta-analyses and literature reviews, it is crucial to understand the quality of the reviewed studies, but this is an effortful procedure. Automating it using Large Language Models (LLMs) might help increase this procedure’s efficiency. This study compared how a human and two LLMs rated the quality of educational intervention studies based on a standardized tool. The agreement among all three raters ranged from none to moderate in different quality criteria (-.06 ≤ κ ≤ .59). The disagreements among human and LLM raters stemmed from missing information in the papers, legitimate diverging considerations, misidentifications by the LLMs, and LLMs’ preference of middle-quality categories. This study demonstrates the potential and limitations of using LLMs to evaluate the quality of educational research.

Authors