Paper Summary
Share...

Direct link:

Fast But Not Ready: Issues and Results from Using ChatGPT to Grade Essays from Chinese ESL Undergraduates (Stage 2, 12:26 PM)

Sun, April 12, 11:45am to 1:15pm PDT (11:45am to 1:15pm PDT), Los Angeles Convention Center, Floor: Level One, Exhibit Hall A - Stage 2

Abstract

To determine if ChatGPT could be utilized to grade ESL essays, we trained ChatGPT to apply China’s Test for English Majors-Band 4 grading rubric to score 667 essays from 88 Chinese ESL students. Results from comparing ChatGPT with human raters, comparing ChatGPT scores from repeated grading of the same essays, and the analysis of ChatGPT's scoring patterns over time showed that ChatGPT scores failed to reach an acceptable agreement with human raters. ChatGPT also failed to apply the same rubric criteria consistently and exhibited significant issues in generating reliable and valid scores. Overall, despite its strengths of providing rapid qualitative and quantitative evaluation, ChatGPT does not meet the necessary criteria for a reliable and valid grading tool in its current form.

Authors