AERA Annual Meeting: Investigating Measurement Equivalence Between AI and Human Scoring of Higher-Order Thinking Skills in Extended Response Items

Information Menu
Search Tips

Navigation and Settings Menu
Change Preferences / Time Zone
Sign In

Back Home

Refresh: Off

Paper Summary

Share...

Direct link:

Investigating Measurement Equivalence Between AI and Human Scoring of Higher-Order Thinking Skills in Extended Response Items

In Event: Research in Progress Roundtable Session 3
In Invited Roundtable: Understanding the Use of Artificial Intelligence in Education: Writing Instruction, Language Learning, Civic Education, Skills Assessment and Creativity (Table 12)

Fri, April 25, 11:40am to 1:10pm MDT (11:40am to 1:10pm MDT), The Colorado Convention Center, Floor: Ballroom Level, Mile High Ballroom 2A and 3A

Abstract

This three-study project addresses one key psychometric question: what is the level of latent trait equivalence between humans and machine learning (ML) based AI-driven rubric scoring? The first study focuses on a qualitative comparison of rubrics designed by AI systems using ML-based models and human experts. The aim of Study 1 is to explore the clarity, comprehensiveness, and alignment of the different rubrics to educational standards. Study two will build on the rubric comparison to quantitatively assess the accuracy and consistency of the AI-based automated scoring approach compared to human scoring defined as the gold standard. Study three will address the main question by examining the measurement and structural invariance of the two scoring methods.

Author

Ernest Yaw Amoateng, Western Michigan University