Paper Summary
Share...

Direct link:

Harnessing GenAI for Automated Essay Scoring: A Multi-Agent Architecture for Large-Scale Writing Assessment

Sun, April 12, 11:45am to 1:15pm PDT (11:45am to 1:15pm PDT), InterContinental Los Angeles Downtown, Floor: 5th Floor, Hancock Park West

Abstract

Large-scale writing assessment remains labor-intensive and susceptible to rater drift. This study evaluates whether Generative Artificial Intelligence can approximate expert human holistic scoring. This study proposes a Multi-Agent Scoring Architecture (MASA) to enhance consistency and reliability in automated Chinese essay scoring. Traditional manual scoring is labor-intensive and prone to human bias, while existing automated scoring tools often lack reliability. MASA integrates rubric-aligned agents, few-shot self-revision, and a holistic reviewer to improve scoring accuracy. Using a dataset of 1,038 essays, results show that MASA significantly increases alignment with human ratings (QWK = 0.53) compared to baseline methods (QWK = 0.32). This study highlights the potential of combining AI efficiency with human pedagogical insights to create more effective and fair writing assessments.

Authors