Paper Summary
Share...

Direct link:

Comparing the Effectiveness of Multilevel vs. Standard Machine Learning Algorithms on Nested Data (Poster 24)

Fri, April 12, 11:25am to 12:55pm, Pennsylvania Convention Center, Floor: Level 200, Exhibit Hall A

Abstract

"This study explores the performance of various machine learning (ML) algorithms, including standard and multilevel variants, on nested data which are prevalent in educational research (e.g., students in classrooms). Dealing with nested data poses unique challenges that demand special analytical approaches. Despite the well-established role of multilevel modeling in inferential statistics, the potential of multilevel ML algorithms in predictive scenarios remains largely unexplored.
Addressing a gap in the literature, this study questions the suitability of standard ML algorithms designed for non-nested data and investigates when and how multilevel ML algorithms might outperform them.
First, this study uses a Monte Carlo simulation to evaluate the prediction performance of both standard and multilevel machine learning algorithms under conditions that emulate the complexities of real-world data settings. An empirical analysis is also conducted to illustrate the performance of each algorithm in practice, focusing on predicting STEM persistence using a large-scale education dataset from the Maryland Longitudinal Data System. This empirical application is significant for its potential to inform targeted interventions and support mechanisms for STEM students who might be at risk of not completing their studies.
In addition, the study aims to enhance the accessibility of these techniques. Future work will result in a user-friendly tool (an R Shiny application), enabling researchers, even those without extensive coding knowledge, to easily use advanced ML algorithms in their research endeavors. The hope is that the results from this study will guide researchers in choosing and using the best tools for their studies involving nested data."

Author