Paper Summary
Share...

Direct link:

LLMs as Math Examinees: A Psychometric Analysis on TIMSS to Advance Equity

Sat, April 11, 1:45 to 3:15pm PDT (1:45 to 3:15pm PDT), InterContinental Los Angeles Downtown, Floor: 7th Floor, Hollywood Ballroom I

Abstract

Large language models(LLMs) increasingly mediate mathematics education, yet their opaque evaluation risks perpetuating historical testing inequities. This study applies psychometric methods—factor analysis, multidimensional item response theory, and differential item functioning—to evaluate 300 LLMs using TIMSS, an internationally validated assessment. Treating models as examinees, we will: (1) uncover latent dimensions of mathematical reasoning in AI systems, (2) map proficiency patterns across architectures and training resources, and (3) expose systematic biases mirroring linguistic and socioeconomic disparities. By revealing how AI encodes or amplifies assessment inequities, this work advances ability justice and aligns with AERA 2026's "Unforgetting Histories and Imagining Futures." Findings will guide educators in leveraging AI tools equitably, ensuring technology serves rather than sorts diverse learners.

Author