Paper Summary
Share...

Direct link:

Missing Data Imputation Using Machine Learning: A Comparative Study with Traditional Mean Imputation

Sat, April 11, 1:45 to 3:15pm PDT (1:45 to 3:15pm PDT), JW Marriott Los Angeles L.A. LIVE, Floor: Ground Floor, Gold 4

Abstract

Missing data, as a challenge in educational assessments, often undermines the validity and reliability of statistical inferences. This study evaluates the performance of six imputation methods, which are traditional mean imputation, k-Nearest Neighbors (KNN), matrix factorization, deep autoencoders (DAE), multiple imputation by chained equations (MICE), and Random Forest. The dataset of the PIRLS 2021 is applied for this study, and the accuracies of the imputation methods were compared after multiple iterations of imputations. Results show that MICE achieved the lowest RMSE (12.16), while Random Forest attained the highest accuracy (0.67), significantly outperforming traditional methods and other machine learning approaches. Mean imputation and KNN exhibited limited effectiveness, while matrix factorization and autoencoders struggled to generalize in this context.

Authors