Search
On-Site Program Calendar
Browse By Day
Browse By Time
Browse By Person
Browse By Room
Browse By Unit
Browse By Session Type
Search Tips
Change Preferences / Time Zone
Sign In
Bluesky
Threads
X (Twitter)
YouTube
Large-scale writing assessment remains labor-intensive and susceptible to rater drift. This study evaluates whether Generative Artificial Intelligence can approximate expert human holistic scoring. This study proposes a Multi-Agent Scoring Architecture (MASA) to enhance consistency and reliability in automated Chinese essay scoring. Traditional manual scoring is labor-intensive and prone to human bias, while existing automated scoring tools often lack reliability. MASA integrates rubric-aligned agents, few-shot self-revision, and a holistic reviewer to improve scoring accuracy. Using a dataset of 1,038 essays, results show that MASA significantly increases alignment with human ratings (QWK = 0.53) compared to baseline methods (QWK = 0.32). This study highlights the potential of combining AI efficiency with human pedagogical insights to create more effective and fair writing assessments.