Paper Summary
Share...

Direct link:

“I’m Sorry, Dave. This Feedback is not Constructive”: Comparing Student Perceptions with Generative AI

Sun, April 27, 1:30 to 3:00pm MDT (1:30 to 3:00pm MDT), The Colorado Convention Center, Floor: Terrace Level, Bluebird Ballroom Room 3B

Abstract

Objectives and Framework
The rise of generative AI continues to have profound implications for education, particularly within the area of performance feedback. Although feedback is one of the strongest influences on student learning, providing constructive feedback is often time-consuming and challenging for educators. In fact, AI-generated feedback (or large language models [LLM]) has already been used as a time-efficient and cost-effective solution (Meyer et al., 2024), particularly in the writing domain (i.e., automated writing evaluation or essay scoring; Fleckenstein et al., 2023); however, little has been done to understand what kind of feedback generative AI considers as constructive and to what extent it compares to students’ feedback perceptions. Our objective was to compare AI-generated and student-provided rankings of feedback statements and to assess which motivational dimensions of feedback predicted such rankings. Guided by self-determination theory (Ryan & Deci, 2020), we conceptualized motivational dimensions using competence- and autonomy-supportive strategies of providing specific and positive feedback (Fong & Schallert, 2023).

Method
In Phase 1, we created 46 statements representing varying levels of feedback constructiveness. 508 U.S.-based undergraduates were presented with pairs of randomly selected statements and chose the statement perceived as more constructive. If a statement was chosen as more constructive, it received one point. A rank order was created based on the number of points, i.e., Rank 1 had the most points. In Phase 2, after prompting ChatGPT 3.5 to describe what makes feedback constructive, we asked the program to rank the same set of Phase 1 feedback statements from most to least constructive (Rank 1 was most constructive). To assess similarity between the student and AI rankings, we conducted a Spearman’s rank correlation. In Phase 3, another group of 48 undergraduate students rated each statement’s level of specificity and positivity. Controlling for feedback statement word count, we also regressed student ratings of specificity and positivity on both AI and student rankings (organized into four quartiles of statements) in two separate ordinal regression models.

Results
The correlation between student and AI rankings of feedback statements was r = .38, (p < .001), suggesting a small/moderate degree of overlap. For student-ranked quartiles, perceptions of specificity of the feedback statement were significantly associated with higher student rankings of constructiveness (beta = -4.78, p < .001). In contrast, perceptions of positivity of the feedback statements were significantly linked with higher AI rankings of constructiveness (beta = -2.93, p = .026).

Significance
Our exploratory study highlights the differences between AI and college students in their interpretation of constructive feedback. Although there is a moderate degree to which feedback perceptions overlap between AI and students, the cause of discrepancies in such perceptions might be the extent feedback specificity and positivity are present. Our findings suggest that AI appears to value positive and encouraging messages when evaluating constructive feedback, whereas students’ views seem more focused on how the feedback provides a specific pathway to improve. This distinction has important implications for how AI-generated feedback might be produced to support students’ motivation and their implementation of feedback.

Authors