AERA Annual Meeting: Assessing Reliability and Divergence of Human vs. Al Writing Evaluation in Chinese EFL Expository

Information Menu
Search Tips

Navigation and Settings Menu
Change Preferences / Time Zone
Sign In

Back Home

Refresh: Off

Paper Summary

Share...

Direct link:

Assessing Reliability and Divergence of Human vs. Al Writing Evaluation in Chinese EFL Expository

In Event: AI for Higher Education Assessment

Fri, April 10, 3:45 to 5:15pm PDT (3:45 to 5:15pm PDT), InterContinental Los Angeles Downtown, Floor: 6th Floor, Broadway

Abstract

This study investigates the consistency and differences between human raters and ChatGPT in assessing Chinese EFL learners' expository writing. Eighty-two compositions were evaluated on language, content, and organization by humans and two ChatGPT sessions (ChatGPT1 and ChatGPT2). A repeated-measures two-way ANOVA and post-hoc t-tests analyzed rating patterns. Results show ChatGPT gave higher scores for content and organization, while humans rated language more favorably. No significant differences emerged between the two ChatGPT sessions, indicating strong internal consistency. The findings suggest AI can serve as a reliable complementary tool in writing assessment and support the development of hybrid evaluation models that combine human and AI strengths in second language writing.

Author

Ruonan Yang, The Ohio State University