AERA Annual Meeting: Investigating Use of Ontologies as Part of Automated Short Answer Scoring in Science Assessments (Poster 3)

Information Menu
Search Tips

Navigation and Settings Menu
Change Preferences / Time Zone
Sign In

Social Media Menu
Facebook
X (Twitter)

Back Home

Refresh: Off

Paper Summary

Share...

Direct link:

Investigating Use of Ontologies as Part of Automated Short Answer Scoring in Science Assessments (Poster 3)

In Event: AI and Formative Assessment: The Train Has Left the Station

Thu, April 11, 10:50am to 12:20pm, Pennsylvania Convention Center, Floor: Level 100, Room 115B

Abstract

Objectives
Assessing students' proficiency in scientific explanation, in which students integrate scientific knowledge and reasoning, presents challenges due to the structured nature of assessment items, which require evaluating their ability to apply multiple dimensions of scientific knowledge to comprehend and analyze phenomena (Zhai et al., 2020; National Research Council [NRC], 2012). Machine learning (ML) can be used to automate the scoring process of short text explanations to college science assessments and previously yielded an average Cohen’s Kappa of 0.47 with an average misclassification rate of 11% for text classification models in environmental science (see Haudek et al., 2023). These results fall below targets for model performance metrics (Williamson, et al., 2012) which suggest inherent challenges associated with classifying these content-rich and complex responses.
Theoretical Perspective
Therefore, we propose extending ML systems by integrating ontology-based information extraction, which leverages semantic attributes derived from mapping domain-specific terms. Ontologies serve as a foundational framework for organizing and representing domain-specific knowledge (Gruber, 1993). By mapping each word from students’ explanations onto corresponding domain-specific terms using a source ontology facilitates a shared understanding of semantic relationships across different texts.
Methods & Data Sources
The present study is part of a larger project focused on automatically evaluating the proficiency of undergraduates to explain the complex system of Food-Energy-Water through a Next Generation Concept Inventory (Haudek et al., 2023). We collected ~700 text responses to several assessment items given at seven post-secondary institutions across the USA. As an example, we report one item which examines connections between food, energy, and water resources in the context of a reservoir. Humans coded responses using fourteen analytic rubrics targeting key ideas to build a training set.
First, we utilized the environmental science ontology (EnvO, see reference) using Python 3.10.0. We selected words with a similarity score > 0.70. This identified domain-specific words, which constitute a subset of ontology-based terms present in each student's response and then calculated Term Frequency-Inverse Document Frequency (TF-IDF) metrics across the entire corpus of student text. Second, the TF-IDF metrics were used as input for classification models, including NB, SVM, RF, and BERT, a transformer model. We compare the accuracy metrics between models on 14 rubric categories using the original/raw text and models using TF-IDF after ontology-based extraction.
Results
The results (see Table 2) indicated that the BERT model consistently outperformed other models, with an improvement of ~3% to ~20%, regardless of whether raw text or ontology-based text was utilized. Furthermore, employing ontology-based text yielded better results for certain, specific descriptors when using individual ML algorithms.
----------------------------------------------Table 2---------------------------------------------------------
Contribution
Our findings demonstrated the benefits of incorporating BERT and ontology-based approaches to enhance performance of ML evaluation of science explanations. Integrating domain-specific terms from the source ontology provides significant semantic and contextual cues that increase accuracy. Additionally, the observed discrepancies in performance across different models, particularly when ontology is applied, suggest that certain scoring descriptors might necessitate semantic comprehension for more accurate evaluations, while others may not.

Investigating Use of Ontologies as Part of Automated Short Answer Scoring in Science Assessments (Poster 3)

Abstract

Authors