Paper Summary
Share...

Direct link:

Whose Homework Is ChatGPT Copying?

Sat, April 13, 11:25am to 12:55pm, Philadelphia Marriott Downtown, Floor: Level 5, Salon J

Abstract

State-of-the-art Large language models (LLMs), such as OpenAI's GPT and Google's Bard, are able to generate text that, for most purposes, is indistinguishable from human-written text. This "human-likeness" is learned from large sets of training documents that are collected by scraping data from websites and other electronic media. While these datasets contain an amazing wealth of human knowledge, experience, and creative expression, it is well known that, depending on the language, certain groups and demographics are significantly over- or underrepresented. Given the current trend towards using LLMs in education, testing and assessment, such discrepancies in the training data can lead to subtle but noticeable biases in the ways LLMs generate different kinds of texts, as well as in our abilities to detect them.

In this talk, first, the composition of commonly used English-language data sets is illuminated and groups are identified that are likely over- or underrepresented. Subsequently, it is discussed if and how such representation discrepancies can affect LLMs, which, unlike other AI systems, do not interpolate or memorize the training data directly. Examples are given for how biases in the training data do or do not show in AI generated responses. Prompt design is discussed as a simple means of reducing certain biases, and a brief overview of more advanced methods is given. Next, an outlook on which limitations of current LLMs can, in principle, be overcome in the future, and which limitations are likely to remain, because they are of a more fundamental nature. The talk concludes with a discussion of how these developments are expected to impact test security in the medium-to-long run.

Author