Paper Summary
Share...

Direct link:

Unmodeled DIF in Latent Class Analysis: Consequences for Class Enumeration

Sat, April 11, 9:45 to 11:15am PDT (9:45 to 11:15am PDT), InterContinental Los Angeles Downtown, Floor: 5th Floor, Echo Park

Abstract

This paper builds upon previous research to investigate the effects of misspecified covariate relations in mixture models using a Monte Carlo simulation. Mixture models are widely used in social science research to identify distinct subgroups in a population. Latent class analysis (LCA) is a cross-sectional model within the mixture modeling family, characterized by having binary items as indicators of the categorical latent class variable. There is a wide range of applications of LCA in social science research, often involving covariate effects. A key issue in latent class enumeration arises when direct effects (DEs) between covariates and indicators are unmodeled, potentially altering latent class enumeration and the measurement model, otherwise known as differential item functioning (DIF). When this assumption is violated (i.e., when a covariate has a DE on the indicators or DIF), latent class parameter estimates become biased because the residual association between covariates and indicators is unmodeled. Previous research recommends that latent class enumeration be completed before incorporating auxiliary variables (covariates or distal outcomes) to preserve the measurement model. Specifically, misspecifying covariate relationships within the latent class analysis measurement model can lead to the overextraction of classes and specifying only an indirect path from the covariate to the indicators via the latent class model when enumerating is the worst option (i.e., unmodeled DIF effects lead to inaccurate selection of latent classes). Across all conditions and fit indices, the unconditional enumeration approach more accurately identified the correct number of latent classes; thus, the primary recommendation of this work is that latent class enumeration can and should be conducted without covariates.
Whether these recommendations hold in complex conditions remains unclear, particularly when varying levels of DIF are present but unaccounted for. This paper builds on prior work by examining enumeration in more complex modeling conditions, including varying LCA measurement models (i.e., different numbers of classes and indicators), sample sizes, class proportions, DE magnitudes, class separation, and multiple covariates. Specifically, this study evaluates whether unconditional enumeration remains a reliable approach in these complex settings and identifies which fit indices perform consistently well across conditions. Consistent with previous research, the results indicate that the Bayesian Information Criterion (BIC) consistently identifies the number of classes regardless of misspecification, followed by the Consistent Akaike’s Information Criteria (CAIC). However, they are less accurate with smaller sample sizes. Increasing sample size improves enumeration accuracy, but diminishing effects occur with large sample sizes on misspecified models. These findings have important implications for the current recommended approach when incorporating auxiliary variables in mixture models and lay the foundation for continued work on the importance of detecting DEs in LCA.

Authors