AERA Annual Meeting: Monte Carlo Simulation Methods for Class Enumeration With Latent Class Analysis Models

Information Menu
Search Tips

Navigation and Settings Menu
Change Preferences / Time Zone
Sign In

Social Media Menu
Facebook
X (Twitter)

Back Home

Refresh: Off

Paper Summary

Share...

Direct link:

Monte Carlo Simulation Methods for Class Enumeration With Latent Class Analysis Models

In Event: Machine Learning Algorithms, New Modeling Approaches, and Simulation Methods to Understand Model Performance

Fri, April 12, 4:55 to 6:25pm, Philadelphia Marriott Downtown, Floor: Level 4, Room 403

Abstract

An important issue when estimating finite mixture models is selecting the model with the correct number of classes underlying the data, which is generally referred to as class enumeration. Numerous statistical indices are available and used by researchers during the class enumeration process. Although many studies have examined the performance of various fit indices for class enumeration under different conditions, there is not one index that performs optimally in all conditions that may be encountered. It is possible that the lack of coherence in class enumeration practices for finite mixture models is due to potential sampling error when estimating model fit indices in a single sample, which can increase false positive rates in model selection (Lubke & Campbell, 2016). The purpose of this simulation study is to examine the performance of Monte Carlo simulation methods with indices used for class enumeration when selecting the correct latent class analysis (LCA) model. In general, this process would entail the following: (1) generate the data according to the correct population model; (2) fit several k-class models (e.g., 1-, 2-, 3-, and 4-class models) to the data (assumed to represent the original dataset researchers would use when attempting to conduct a mixture modeling analysis); (3) save the parameter estimates from each k-class model, including the class membership probabilities and the item response probabilities; (4) use the saved parameter estimates from each k-class model to generate data based on the respective k-class model parameters; and (5) repeat steps 2 through 4 a sufficient number of times (e.g., 50 times) to generate multiple replications of enumeration indices. The distribution of enumeration indices can then be compared to aid in selecting the correct number of classes.

A preliminary set of results are subsequently described based on data generated using a 2--class LCA model with five items; with N = 200, 400, 800; equal and unequal mixing proportions; and small, moderate, and large class separation based on 50 replications per condition. Information criteria were saved from each replication for each of the k-class models, including Akaike’s information criterion (AIC; Akaike, 1973), the Bayesian information criterion (BIC; Schwarz, 1978), the consistent AIC (CAIC; Bozdogan, 1987), and the sample-size adjusted BIC (ABIC; Sclove, 1987). The results suggest that the CAIC was the most accurate index, but it interacted with sample size, class separation, and mixing proportions.

Additional conditions will be examined in the final paper, including sample size (N = 200, 400, 800, 1200), class separation (small, moderate, and large), mixing proportions (equal and unequal), and number of latent classes (1, 2, 3, and 4). Additional bootstrap options will also be examined. It is hoped that this paper will inform applied researchers about the utility of Monte Carlo simulation methods to use with indices for class enumeration.

Author

Tiffany A. Whittaker, University of Texas at Austin