AERA Annual Meeting: Leveraging LLM Tool Calling for Automating Q-Matrix Construction in Cognitive Diagnosis Analysis

Information Menu
Search Tips

Navigation and Settings Menu
Change Preferences / Time Zone
Sign In

Back Home

Refresh: Off

Paper Summary

Share...

Direct link:

Leveraging LLM Tool Calling for Automating Q-Matrix Construction in Cognitive Diagnosis Analysis

In Event: LLM-Driven Innovations in Psychometric Evaluation and Modeling

Fri, April 10, 1:45 to 3:15pm PDT (1:45 to 3:15pm PDT), InterContinental Los Angeles Downtown, Floor: 5th Floor, Wilshire Grand Ballroom I

Abstract

Objectives
This study investigates how large language models (LLMs), using tool calling, can automate Q-matrix construction for diagnostic classification models (DCMs) in educational and psychological assessment. The research focuses on two questions: (1) Does integrating DCM results into an AI psychometric agent improve Q-matrix construction? (2) How does LLM-DCM Q-matrix construction compare to data-driven Q-matrix validation?

Framework
Traditional Q-matrix development—mapping items to cognitive attributes—is labor-intensive, subjective, and prone to bias. Data-driven methods (Köhn & Chiu, 2018) are more objective but require large response datasets, often unavailable during test development. Recent LLM-based Q-matrix construction (Aşiret & Sünbül, 2025) shows promise, but typically does not use item parameters from response data, limiting comprehensiveness. This study aims to enhance LLM-based Q-matrix construction by integrating psychometric results from response data.

Method
The study uses the Social Anxiety Disorder (SAD) scale, with 13 dichotomous items measuring three attributes: public performance, close scrutiny, and interaction. The latent structure is well established via DCM and CFA. The process: (1) Initial Q-matrices are generated by GPT-4.1 and Claude Sonnet 4 using item stems and attribute info, simulating expert review; (2) A Log-Linear Cognitive Diagnosis Model (LCDM) is fitted to response data using the initial Q-matrix; (3) LLMs analyze LCDM item parameters to revise the Q-matrix, adjusting mappings based on statistical significance and model fit; (4) The refined Q-matrix (LLM-suggestion) is compared to the true Q-matrix using Cohen’s Kappa; (5) Results are also compared to data-driven Q-matrix validation methods based on PVAF (gdina-Qval; Nájera et al., 2019; Nájera et al., 2020; Torre & Chiu, 2016).

Results
For RQ1, results show that the item psychometric parameters of DCM can slightly increase the alignment compared to the initial Q-matrix in the conditions of claude. For RQ2, the LLM method is comparable to the data-driven method (gdina-Qval) in terms of the Cohen’s Kappa coefficients. It suggests that LLMs, when properly guided, may effectively replicate expert reasoning in Q-matrix construction. However, attributes may vary in the accuracy of the LLM-generated Q-matrix depending on their conceptual definitions.

Significance
This study demonstrates that integrating psychometric results into LLM-based Q-matrix construction can enhance the accuracy and alignment of automated Q-matrix construction, bringing LLM-generated Q-matrices closer to expert and data-driven standards. The findings highlight the feasibility of using LLMs not only for content-based mapping but also for data-informed refinement, paving the way for scalable, efficient, and expert-level automation in cognitive diagnosis modeling. This approach has the potential to reduce human workload and bias in test development, while maintaining high psychometric quality.

Author

Jihong Zhang, University of Arkansas at Fayetteville