Paper Summary
Share...

Direct link:

Investigating Measurement Equivalence Between AI and Human Scoring of Higher-Order Thinking Skills in Extended Response Items

Fri, April 25, 11:40am to 1:10pm MDT (11:40am to 1:10pm MDT), The Colorado Convention Center, Floor: Ballroom Level, Mile High Ballroom 2A and 3A

Abstract

This three-study project addresses one key psychometric question: what is the level of latent trait equivalence between humans and machine learning (ML) based AI-driven rubric scoring? The first study focuses on a qualitative comparison of rubrics designed by AI systems using ML-based models and human experts. The aim of Study 1 is to explore the clarity, comprehensiveness, and alignment of the different rubrics to educational standards. Study two will build on the rubric comparison to quantitatively assess the accuracy and consistency of the AI-based automated scoring approach compared to human scoring defined as the gold standard. Study three will address the main question by examining the measurement and structural invariance of the two scoring methods.

Author