Paper Summary
Share...

Direct link:

Evaluating Machine Learning Model Performance on Educational Text Data: A Simulation Study

Sat, April 26, 5:10 to 6:40pm MDT (5:10 to 6:40pm MDT), The Colorado Convention Center, Floor: Meeting Room Level, Room 106

Abstract

Text data in education is prevailing, such as academic records, interview transcripts, and survey responses. Traditional methods for analyzing text data are labor-intensive, non-scalable, with complicated procedures for text classification. Hence, giving rise to machine learning (ML) techniques, ML offers robust automated classification of text data into predefined categories for actionable insights and decision-making. Using synthetic educational text data, this study evaluates the performance of commonly used ML models in research, such as Random Forest (RF), Support Vector Machine (SVM), and Naive Bayes (NB), using quantitative metrics (accuracy, precision, recall, and F1 score). This study aims to assist education researchers in understanding the use of ML models for text data analyses to improve research validity with educational text data.

Authors