Search
Program Calendar
Browse By Day
Browse By Time
Browse By Person
Browse By Room
Browse By Unit
Browse By Session Type
Search Tips
Personal Schedule
Sign In
Session Type: Coordinated Paper Session
In the rapidly evolving landscape of education, testing and assessment, AI-powered large language models (LLMs), such as ChatGPT, are becoming increasingly prevalent. This coordinated session aims to explore various aspects of LLMs that help to better understand their properties, strengths, and weaknesses. While the focus is on writing assessments, we conjecture that many of the presented findings will generalize to other areas as well. The session features four talks: The first two talks explore the linguistic characteristics and overall quality of AI-generated essays. The first talk studies the difference between several state-of-the art LLMs, both proprietary and open source, while the second talk focuses on the effect of the sampling temperature – a simple yet consequential parameter. The remaining two talks focus on detection of AI-generated essays. The third talk provides guidelines and discusses best practices for the use of such detectors, while the fourth talk investigates the issues of potential biases detectors might show against or in favor of non-native speakers.
Comparison of Essays Generated by Different LLMs - Yang Zhong, University of Pittsburgh; Jiangang Hao, EDUCATIONAL TESTING SERVICE; Michael Fauss, ETS
Effects of Sampling Temperature on Writing Style and Quality - Michael Fauss, ETS; Jiangang Hao, EDUCATIONAL TESTING SERVICE; Chen Li, ETS
Practical Considerations of using Detectors of AI-generated Essays: Dos and Don’ts - Jiangang Hao, EDUCATIONAL TESTING SERVICE; Michael Fauss, ETS
Towards Investigating the Fairness of Detecting LLM-Generated Essays - Yang Jiang, ETS; Jiangang Hao, EDUCATIONAL TESTING SERVICE; Michael Fauss, ETS; Chen Li, ETS