AERA Annual Meeting: How Many Demonstrations of What Quality When Prompting Large Language Models for Science Essay Assessment

Information Menu
Search Tips

Navigation and Settings Menu
Change Preferences / Time Zone
Sign In

Back Home

Refresh: Off

Paper Summary

Share...

Direct link:

How Many Demonstrations of What Quality When Prompting Large Language Models for Science Essay Assessment

In Event: AERA Roundtable Session Friday 1:30 pm Four Seasons Ballroom 4
In Roundtable Session: Exploring Assessment Data for Psychological Insights and Technological Advances in Education (Table 5)

Fri, April 25, 1:30 to 3:00pm MDT (1:30 to 3:00pm MDT), The Colorado Convention Center, Floor: Ballroom Level, Four Seasons Ballroom 4

Abstract

Scientific writing is a core practice in science education, yet teachers often find it challenging to provide comprehensive, constructive feedback in real-time. Large Language Models (LLMs) have demonstrated assessment capabilities in various educational settings. In this research, we investigate the effectiveness of prompting instruction-tuned LLMs to assess middle school science essays based on a rubric of main ideas. In a comparison of Llama-3-8b with three GPT models, we found that prompting GPT4o with three examples outperformed customized AI assessment tools. We also found consistent results when we varied the examples in the prompt. Our results add to a body of recent research demonstrating the potential benefits of LLMs in assessment, alongside the importance of prompt design.

How Many Demonstrations of What Quality When Prompting Large Language Models for Science Essay Assessment

Abstract

Authors