Paper Summary
Share...

Direct link:

A Large Language Model-Based Framework for Automated and Accurate Chinese Essay Evaluation

Wed, April 8, 7:45am to Sun, April 12, 3:00pm PDT (Wed, April 8, 7:45am to Sun, April 12, 3:00pm PDT), Virtual Posters Exhibit Hall, Virtual Poster Hall

Abstract

This study introduces a three-stage automated essay scoring framework leveraging large language models (LLMs) to enhance the accuracy, consistency, and scalability of Chinese essay evaluation. Applied to 57 annotated junior secondary essays, the framework integrates DeepSeek-R1 for grade classification and Qwen3-14B for pairwise ranking and scoring with benchmark adjustments. Results indicate strong alignment with human raters (r = 0.827; QWK = 0.806) and superior performance over direct LLM scoring. This work contributes methodologically to Automated Essay Scoring (AES) by offering a structured, interpretable approach and highlights practical applications in supporting teachers and large-scale assessments.

Authors