Paper Summary
Share...

Direct link:

Toward Human-Centered Explainable AI Applications for Generating Feedback for Classroom Teachers from Large-Scale Assessments

Sun, April 12, 9:45 to 11:15am PDT (9:45 to 11:15am PDT), InterContinental Los Angeles Downtown, Floor: 5th Floor, Boyle Heights

Abstract

The National Assessment of Educational Progress (NAEP) provides critical insights into what students know and can do, informing educational policy and interventions at national, state, and large district levels. This work is built on prior research related to the potential of a human-centered AI (HAI) framework (see Figure 1) to extract meaningful data patterns from multi-source data, including both test-taking processes and response data, to generate scalable, actionable feedback. Variations in how students engaged with assessment tasks offer valuable contextual information beyond performance scores, shedding light on learning-related skills, such as motivation, time management, and self-regulation.

However, earlier approaches often relied on latent variables derived from deep learning models (e.g., autoencoders as shown in Figure 1) that compress temporal data in ways that are difficult for educators and researchers to interpret. Most critically, these methods may not sufficiently align feedback with classroom practice.

To address these limitations, this study investigates two central research questions:
• RQ-1: How can we develop explainable features from NAEP’s multi-source data that capture temporal information in students’ test-taking behavior? How can these features be visualized to support educators in understanding students’ learning states?
• RQ-2: How can generative AI be leveraged to produce rich, data-informed feedback that is practical and meaningful for teachers?

To explore RQ-1, we constructed student navigation plots and developed thirteen explainable features using data from the 2017 NAEP Math assessment. Figure 2 presents Spearman correlations among these features for a reference group of over 2,500 students who took the released NAEP math booklet. Table 1 highlights some newly developed indicators and their interpretation.

Figures 3 and 4 illustrate the application of these features to an individual student (Student A). The dot plot (Figure 3) displays key feature values and the student’s relative standing within the reference group. The navigation plot (Figure 4) visualizes the student’s complete test-taking trajectory and performance.

The explainable features enable the translation of numerical data into natural languages, facilitating the use of generative AI to address RQ-2. Specifically, AI agents with math teacher and coach personas were used to generate individualized feedback narratives (Table 2), significantly enhancing the interpretability and instructional relevance of prior findings. Additional details will be shared during the presentation.

By leveraging large-scale, representative, and high-quality NAEP data, this project proposes a generalizable framework that integrates statistical, psychometric, and AI methodologies with human expertise. The goal is to generate actionable insights and reimagine the role of large-scale assessments (LSAs) in supporting teaching and learning. The work makes the invisible test-taking process visible to educators, offering nuanced perspectives on how students with varying proficiency levels navigate assessments. The enhanced narratives generated by AI agents are especially valuable for teachers with limited experience or resources. The proposed methods are designed to be adaptable across diverse LSAs with rich data environments.

Authors