Paper Summary
Share...

Direct link:

A Research of Machine Learning Automatic Assessment of Scientific Explanation in Chinese (Poster 8)

Thu, April 11, 10:50am to 12:20pm, Pennsylvania Convention Center, Floor: Level 100, Room 115B

Abstract

Objective
The application of machine learning automated scoring technology can effectively address the problem of consuming a large amount of human and time resources in scientific explanation evaluation and has been successful in the English context (Zhai et al., 2022). Considering the diversity of language and culture, it remains to be studied whether these results can be applied to other contexts. This study attempts to systematically explore the accuracy and relevant influencing factors of automated assessment of scientific explanations based on machine learning in Chinese contexts.
Theoretical framework
The "Phenomenon-Theory-Data-Reasoning" (PTDR) model proposed by Yao et al. (2016) is regarded as a theoretical framework for scientific explanation in this study. The PTDR framework posits that the main trunk of an explanation consists of theory and data, which are connected through reasoning, and that the entire explanation activity points to the phenomenon being explained (Figure 4). The PTDR model builds upon and develops the CER model while taking into consideration the characteristics of Chinese science education and teaching practices.
Methods
In this study, 1593 students in grades 8-12 were randomly selected according to school level in Beijing, including 830 boys and 763 girls.
The questionnaire of this study includes 7 questions from the research group's question bank, which have high reliability and validity (e.g., Yao et al., 2016; Guo et al., 2016). The scoring criteria consist of two parts: the overall scoring criteria evaluate students' overall performance on each test question, and the analytical scoring criteria apply the PTDR framework to assign scores to each element in students' explanations.
The data analysis tool for automatic scoring of machine learning is lightSIDE program developed by Carnegie Mellon University. The data analysis process is shown in Figure 5.
Analysis and Results
We found that:(1) In the Chinese context, machine learning can accurately evaluate students' written scientific explanation (Cohen Kappa = 0.67). (2) The larger the sample size of the training set, the higher the Kappa. When the sample size exceeds 700, there is a higher level of consistency (Cohen Kappa > 0.6). (3) Using analytical scoring criteria can effectively improve the accuracy of machine learning automatic scoring. (4) Using explanatory-based test questions and specificity scenario descriptions can both effectively improve the accuracy of automatic scoring.
Scientific or scholarly significance of the study or work
This study contributes to the field by using automatic scoring techniques in no-English context. The findings will guide the design of automatic scorings, dashboards, and instructional scaffolds to support assessment practices.

----------------------------------------------Figure 4---------------------------------------------------------
----------------------------------------------Figure 5---------------------------------------------------------

Authors