Paper Summary
Share...

Direct link:

Leveraging LLM Tool Calling for Automating Q-Matrix Construction in Cognitive Diagnosis Analysis

Fri, April 10, 1:45 to 3:15pm PDT (1:45 to 3:15pm PDT), InterContinental Los Angeles Downtown, Floor: 5th Floor, Wilshire Grand Ballroom I

Abstract

Objectives
This study investigates how large language models (LLMs), using tool calling, can automate Q-matrix construction for diagnostic classification models (DCMs) in educational and psychological assessment. The research focuses on two questions: (1) Does integrating DCM results into an AI psychometric agent improve Q-matrix construction? (2) How does LLM-DCM Q-matrix construction compare to data-driven Q-matrix validation?

Framework
Traditional Q-matrix development—mapping items to cognitive attributes—is labor-intensive, subjective, and prone to bias. Data-driven methods (Köhn & Chiu, 2018) are more objective but require large response datasets, often unavailable during test development. Recent LLM-based Q-matrix construction (Aşiret & Sünbül, 2025) shows promise, but typically does not use item parameters from response data, limiting comprehensiveness. This study aims to enhance LLM-based Q-matrix construction by integrating psychometric results from response data.

Method
The study uses the Social Anxiety Disorder (SAD) scale, with 13 dichotomous items measuring three attributes: public performance, close scrutiny, and interaction. The latent structure is well established via DCM and CFA. The process: (1) Initial Q-matrices are generated by GPT-4.1 and Claude Sonnet 4 using item stems and attribute info, simulating expert review; (2) A Log-Linear Cognitive Diagnosis Model (LCDM) is fitted to response data using the initial Q-matrix; (3) LLMs analyze LCDM item parameters to revise the Q-matrix, adjusting mappings based on statistical significance and model fit; (4) The refined Q-matrix (LLM-suggestion) is compared to the true Q-matrix using Cohen’s Kappa; (5) Results are also compared to data-driven Q-matrix validation methods based on PVAF (gdina-Qval; Nájera et al., 2019; Nájera et al., 2020; Torre & Chiu, 2016).

Results
For RQ1, results show that the item psychometric parameters of DCM can slightly increase the alignment compared to the initial Q-matrix in the conditions of claude. For RQ2, the LLM method is comparable to the data-driven method (gdina-Qval) in terms of the Cohen’s Kappa coefficients. It suggests that LLMs, when properly guided, may effectively replicate expert reasoning in Q-matrix construction. However, attributes may vary in the accuracy of the LLM-generated Q-matrix depending on their conceptual definitions.

Significance
This study demonstrates that integrating psychometric results into LLM-based Q-matrix construction can enhance the accuracy and alignment of automated Q-matrix construction, bringing LLM-generated Q-matrices closer to expert and data-driven standards. The findings highlight the feasibility of using LLMs not only for content-based mapping but also for data-informed refinement, paving the way for scalable, efficient, and expert-level automation in cognitive diagnosis modeling. This approach has the potential to reduce human workload and bias in test development, while maintaining high psychometric quality.

Author