Paper Summary
Share...

Direct link:

The Best of Two Worlds: An IRT-Enhanced Framework for Interpretable and Generalizable Automated Essay Scoring

Fri, April 10, 9:45 to 11:15am PDT (9:45 to 11:15am PDT), InterContinental Los Angeles Downtown, Floor: 7th Floor, Hollywood Ballroom I

Abstract

High-quality, interpretable, and generalizable scoring is essential for automated essay scoring (AES) in educational situations. However, existing AES face two primary challenges: (a) scoring models lack procedural interpretability, and (b) the models are mostly evaluated on single datasets, limiting generalization across languages and situation. This paper proposes an IRT-enhanced AES framework (IRT-AESF) to address these issues. IRT-AESF consists of three modules: a pre-trained language model encodes essay text, a multi-layer perceptron (MLP) correlates the semantic representation with a latent trait score indicative of student writing ability, and an IRT-based scoring module predicts the score while estimating and presenting significant psychometric parameters. Experiments conducted on three large-scale, multilingual, real-world essay datasets demonstrate that IRT-AESF significantly outperforms baseline models.

Authors