Session Summary
Share...

Direct link:

Large Language Models in Assessment: Guidelines, Characteristics and Biases

Sun, April 14, 3:05 to 4:35pm, Convention Center, Floor: First, 121B

Session Type: Coordinated Paper Session

Abstract

In the rapidly evolving landscape of education, testing and assessment, AI-powered large language models (LLMs), such as ChatGPT, are becoming increasingly prevalent. This coordinated session aims to explore various aspects of LLMs that help to better understand their properties, strengths, and weaknesses. While the focus is on writing assessments, we conjecture that many of the presented findings will generalize to other areas as well. The session features four talks: The first two talks explore the linguistic characteristics and overall quality of AI-generated essays. The first talk studies the difference between several state-of-the art LLMs, both proprietary and open source, while the second talk focuses on the effect of the sampling temperature – a simple yet consequential parameter. The remaining two talks focus on detection of AI-generated essays. The third talk provides guidelines and discusses best practices for the use of such detectors, while the fourth talk investigates the issues of potential biases detectors might show against or in favor of non-native speakers.

Sub Unit

Session Organizer

Individual Presentations