AERA Annual Meeting: Examining Model Complexity Due to Functional Form with Randomly Generated Data

Information Menu
Search Tips

Navigation and Settings Menu
Change Preferences / Time Zone
Sign In

Back Home

Refresh: Off

Paper Summary

Share...

Direct link:

Examining Model Complexity Due to Functional Form with Randomly Generated Data

In Event: Advancements and Challenges in Cognitive Diagnostic Models and Mixture Modeling

Thu, April 24, 5:25 to 6:55pm MDT (5:25 to 6:55pm MDT), The Colorado Convention Center, Floor: Meeting Room Level, Room 302

Abstract

Purpose
Despite the focus on the number of free parameters in model complexity research (Bonifay and Cai, 2017), there remains a need to explore the impact of functional form among model structures. To address this gap, we replicated and extended Bonifay and Cai’s findings, concentrating on the following IRT models, 2PL 2-factor Exploratory Item Factor Analysis (EIFA), 3PL Unidimensional, and 2PL Bifactor. Additionally, we included a 3-class LCA model to our comparison. We aimed to understand how different models handle randomly generated data sets and examined the eigenvalues to characterize these data sets.

Theoretical Framework
The research on model complexity, specifically due to functional form among model structures, primarily focuses on the number of free parameters. In this study, we replicated and extended the findings of Bonifay and Cai (2017) using their data generation method. Based on this, we hypothesized that:
The EIFA model will fit as well as or better than the bifactor model.
The LCA model will outperform other models, including EIFA and bifactor, in fitting propensity, as measured by the M2 statistic.

Methods/Data Sources
We simulated random data with 10,000 observations from the simplex of 128 response patterns, each with equal probabilities. Response patterns were obtained from seven dichotomous items, with 27 = 128 response patterns. 600 simulated datasets were fit across the models in Figure 1. Model fit was evaluated using the Log-likelihood, AIC, BIC, and M2 fit statistics. LCA differs in its functional form as it posits a categorical latent variable where the items are independently conditional on class. A 3-class LCA for 7 items has 23 parameters (2 class probabilities and 21 conditional probabilities). While not strictly comparable in terms of parameterization (with two more parameters), it was of interest to compare the fitting propensity of the LCA with the other models.

Results
Consistent with Bonifay and Cai’s findings, Table 1 shows that EIFA and bifactor models have similar fitting propensities, with EIFA ultimately outperforming the bifactor model. Table 2 highlights the LCA model's exceptional performance, outperforming other models 96% of the time according to Log-likelihood, AIC, and BIC. The M2 statistic shows the LCA model outperforms others 54% of the time, with the bifactor model outperforming 25% of the time. Data suitable for 1- and 2-factor models should have large first and second eigenvalues, but randomly generated data had consistently flat eigenstructures with low eigenvalues (Figure 2 and Table 3).

Scholarly Significance
The LCA model's class membership and conditional probabilities allow for better identification of shared variance among response patterns. Overfitting captures noise that is not generalizable or relevant. Data generated randomly without underlying relationships will have a small first eigenvalue, leading to factor models with poor fit, lack of generalization, and low discrimination slopes.

Examining Model Complexity Due to Functional Form with Randomly Generated Data

Abstract

Authors