Paper Summary
Share...

Direct link:

Exploring Key Features for Effective Machine Learning Scoring of Science Constructed Responses

Sat, April 11, 11:45am to 1:15pm PDT (11:45am to 1:15pm PDT), InterContinental Los Angeles Downtown, Floor: 6th Floor, Broadway

Abstract

With growing emphasis on scientific skills such as problem-solving, constructed response assessments are increasingly valued for revealing students’ knowledge-in-use. While machine learning (ML) offers promise for evaluating formative assessments, concerns about scoring reliability shaped by question-level measurement features, may hinder its broader use in science education. This study investigates how specific measurement variables influence ML model performance. Using data from the Next Generation Concept Inventory, which assesses student understanding of food, energy, and water systems through open-ended responses, we trained 168 text classification ML models across nine items and five measurement variables. Results indicate that questions focusing on water and question structure significantly affect model performance, as measured by Cohen’s Kappa.

Authors