Search
On-Site Program Calendar
Browse By Day
Browse By Time
Browse By Person
Browse By Room
Browse By Unit
Browse By Session Type
Search Tips
Change Preferences / Time Zone
Sign In
Bluesky
Threads
X (Twitter)
YouTube
With growing emphasis on scientific skills such as problem-solving, constructed response assessments are increasingly valued for revealing students’ knowledge-in-use. While machine learning (ML) offers promise for evaluating formative assessments, concerns about scoring reliability shaped by question-level measurement features, may hinder its broader use in science education. This study investigates how specific measurement variables influence ML model performance. Using data from the Next Generation Concept Inventory, which assesses student understanding of food, energy, and water systems through open-ended responses, we trained 168 text classification ML models across nine items and five measurement variables. Results indicate that questions focusing on water and question structure significantly affect model performance, as measured by Cohen’s Kappa.