Search
On-Site Program Calendar
Browse By Day
Browse By Time
Browse By Person
Browse By Room
Browse By Unit
Browse By Session Type
Search Tips
Change Preferences / Time Zone
Sign In
X (Twitter)
Objectives
We are designing an automated formative assessment system using artificial intelligence (AI) to evaluate students’ multi-modal assessments, which will provide individualized feedback to students’ written and drawn responses. The system supports formative assessment practice within a Next Generation Science Standards-aligned high school physical science curriculum (Interactions) (NGSS, 2013). We investigate how this automated, multi-modal feedback influences students' development with respect to a learning progression on electrical interactions (Kadaras et al., 2021).
Theoretical framework
The project uses scientific practices (SEPs) like modeling to enhance students' application of disciplinary core ideas (DCIs) and crosscutting concepts (CCCs) to make sense of compelling phenomena through multi-modal responses (Li et al., 2021). Leveraging multi-modality theory (Kress, 2009), which asserts efficient information processing across sensory channels, we combine written and drawn responses for thorough expression of comprehension. The AI system provides immediate personalized feedback while retaining diverse student representations for a richer learning experience. The use of AI augments efficiency and scalability in formative assessments, effectively transforming student learning via multi-modal evaluations.
Methods
We conducted content coding of student explanations and electronically-drawn models using validated rubrics aligned with an integrated progression of three dimensions of scientific knowledge – DCIs, SEPs, and CCCs (3D knowledge) and evidence centered design statements (Kaldaras et al., 2022; ECD, Mislevey & Haertel, 2006). Trained human coders scored responses initially using these rubrics. We then employed Natural Language Processing and convolutional neural networks in a supervised machine learning context to develop automatic scoring models.
Data sources
We collected and human scored ~1400 text explanations and ~1100 electronically drawn models from students in grades 9-12 across the United States who used the Interactions curriculum. These assessment items were embedded as formative assessments within the curriculum. Figure 1 presents a student’s response to the Electroscope task.
Results
We report on the progress of three formative assessment items. The first item is an explanation item. Building from previous work, we applied coding rubrics at high human-human interrater reliability and subsequently developed text classification models with good to near perfect performance (see Table 1). For the second item, in which students develop models and write a justification (Figure 1) we developed a coding rubric that incorporates modeling and DCI elements. Coders demonstrated high reliability, as evidenced by Krippendorff's Alpha coefficients ranging from 0.771 to 1.000. For a third explanation item, we began to develop the rubric using a deconstruction process.
----------------------------------------------Table 6---------------------------------------------------------
----------------------------------------------Figure 1---------------------------------------------------------
Scholarly significance of the work
We developed a rubric construction process adhering to ECD argument and 3D assessment items, applicable to multi-modal items. These rubrics showed robust reliability in human coding, demonstrating potential for training AI-based assessment systems to tackle multi-modal tasks. By preserving the 3D nature of scoring and providing feedback, we underscore the critical role of AI in evaluating the complexities of multi-modal responses. This approach aligns with the 3D vision of the Framework for K-12 Science Education (NRC, 2012), enhancing its ability to aid a comprehensive understanding of student learning.