Paper Summary
Share...

Direct link:

Response Time-Informed Multiple Imputation: Comparing IRT-, MICE-, and Autoencoder-Based Approaches for Planned Missing Data

Sun, April 12, 9:45 to 11:15am PDT (9:45 to 11:15am PDT), InterContinental Los Angeles Downtown, Floor: 5th Floor, Boyle Heights

Abstract

Large-scale survey assessments (LSAs) often employ matrix sampling designs to reduce testing time and participant burden. While efficient, this design introduces a high proportion of planned missing data (often over 70%), which can compromise the validity of subsequent analyses if not properly handled. Traditionally, multiple imputation in LSAs uses item response theory (IRT) models, which estimate item parameters using observed item responses and latent regression parameters using background information, and then generate plausible values for latent proficiency (Mislevy, 1998). However, these methods rely on strong assumptions — unidimensionality, local independence, and positional invariance of item parameters — that may not always hold in practice.

As an alternative, instead of imputing latent proficiency, missing item responses can be imputed directly for reporting following, for example, a market-basket approach (Mislevy, 1997). This can be done with IRT methods (e.g., Zwitser et al., 2017) or with Multivariate Imputation by Chained Equations (MICE), which relaxes IRT assumptions by treating missing values as dependent variables and using iterative regression on observed variables for imputation (van Buuren & Groothuis-Oudshoorn, 2011). Despite its flexibility, MICE can be computationally intensive and may struggle with highly complex or high-dimensional data structures.

Recently, deep learning-based imputation methods, particularly Denoising Autoencoders (DAE) have gained attention for their ability to learn low-dimensional latent representations and reconstruct missing data with fewer assumptions (Lall & Robinson, 2022). In parallel, the growing availability of process data, such as item response times (RT), presents new opportunities to improve imputation. Response time has been shown to correlate with accuracy and reflect cognitive effort during test-taking (van der Linden, 2007; Shin et al., 2022), suggesting its potential value in enhancing imputation performance.

With this background, the current study investigates two key questions: (1) Do autoencoder-based methods outperform IRT- and MICE-based methods in imputing planned missing responses? (2) Does incorporating response-time data improve imputation performance across these approaches? To address these questions, we analyzed U.S. data from the 2022 PISA science assessment (N = 2,179). Several imputation conditions were compared, including IRT-, MICE, and autoencoder-based models, with and without an RT component. For each method, 10 multiply imputed datasets were generated. These datasets were then compared to the original data using multiple evaluation criteria, including imputation accuracy, item mean recovery, preservation of correlational structure, and IRT parameter recovery. Computation time was also recorded for each model.

Results indicated that autoencoder-based methods yielded the highest imputation accuracy, followed by MICE- and IRT-based methods. However, IRT-based methods were most computationally efficient and preserved item mean, correlational structure, and psychometric parameters. Moreover, including response time enhanced the imputation accuracy of autoencoder-based methods but slightly reduced the accuracy of MICE- and IRT-based methods.
These findings highlight the potential of deep learning—particularly autoencoder models enhanced with process data—for addressing the high levels of planned missingness inherent in LSAs. Future research can further explore cross-language generalizability, evaluate the role of background variables, refine machine-learning architectures, and apply these methods in adaptive testing environments as used, for example, in PISA.

Authors