Search
On-Site Program Calendar
Browse By Day
Browse By Time
Browse By Person
Browse By Room
Browse By Unit
Browse By Session Type
Search Tips
Change Preferences / Time Zone
Sign In
Bluesky
Threads
X (Twitter)
YouTube
This study investigated the latent preferences of an LLM (Qwen) versus human raters in essay scoring, analyzing their differential weighting of 16 textual features from 505 Chinese high school English essays.While overall scores showed high consistency, the LLM exhibited a marked preference for language features (grammatical accuracy, lexical sophistication, syntactic complexity), especially in high-scoring essays. Human raters, conversely, tolerated minor errors, emphasizing visual presentation and essay content. Lexical complexity compensated for grammar/spelling errors in LLM ratings, and LLMs demonstrated higher cross-group rating stability.This study advises caution when applying LLMs in essay scoring and feedback, stressing the need for enhanced transparency to improve test validity and AI-generated feedback quality.