Search
On-Site Program Calendar
Browse By Day
Browse By Time
Browse By Person
Browse By Room
Browse By Unit
Browse By Session Type
Search Tips
Change Preferences / Time Zone
Sign In
Bluesky
Threads
X (Twitter)
YouTube
Large language models(LLMs) increasingly mediate mathematics education, yet their opaque evaluation risks perpetuating historical testing inequities. This study applies psychometric methods—factor analysis, multidimensional item response theory, and differential item functioning—to evaluate 300 LLMs using TIMSS, an internationally validated assessment. Treating models as examinees, we will: (1) uncover latent dimensions of mathematical reasoning in AI systems, (2) map proficiency patterns across architectures and training resources, and (3) expose systematic biases mirroring linguistic and socioeconomic disparities. By revealing how AI encodes or amplifies assessment inequities, this work advances ability justice and aligns with AERA 2026's "Unforgetting Histories and Imagining Futures." Findings will guide educators in leveraging AI tools equitably, ensuring technology serves rather than sorts diverse learners.