Search
On-Site Program Calendar
Browse By Day
Browse By Time
Browse By Person
Browse By Room
Browse By Unit
Browse By Session Type
Search Tips
Change Preferences / Time Zone
Sign In
X (Twitter)
Objectives
The presentation examines ChatGPT and related models from the perspective of potentially chaotic behavior. It sheds light on the vulnerability of providing reasonable, stable, and interpretable responses to manipulations of the input prompt. Interestingly, pretests show that changing even a small amount of input data can lead to dramatically different responses to prompts. This can be the case even for changes that are difficult for humans to detect (e.g., as a result of inserting or deleting line breaks). The research on which the presentation focuses is particularly relevant in an educational context, where, e.g., students may not have well-formulated questions that can be answered by large language models (see proposal by authors [4]).
Framework
In the framework of Critical Online Reasoning (COR) (see proposal by authors [2,3])[2,4]), current research focuses mostly on System 2-like reasoning processes to test the limits of generative language models. However, there is a level of linguistic structuring that is relevant to this research, which is semantically less complex, but has implications for the use of language models, especially in the area of (higher) education, as outlined in this presentation.
Methods & Analyses
Using response process and text data from the COR assessment (see proposal by authors [2,3])[2,4]), a central focus of the presentation is therefore on the type of linguistic information that causes the stochastic or even chaotic response behavior of given language models. This concerns the character level as well as the level of the logical document structure, the level of single words as well as the impact of changing phrases or using paraphrasing technologies on a larger scale. Special attention is given to the influence of different parts of speech. In particular, this concerns the influence of function words, which are known to be a reliable source for text classification and authorship attribution.
In this way, the presentation examines a wider range of linguistic effects that ultimately help to understand the levels at which generative models may currently fail to generate word sequences correctly - while being able to do so for the original text that was the subject of the change in question.
The presentation draws on several areas of quantitative, cognitive, and educational linguistics. This concerns the quantitative selection of relevant linguistic feature levels that might be more relevant for the sensitivity analysis in question. This also concerns the reference to a wider range of language models, including both pay-per-use and open-access models. As a result, GPT-3.5 and GPT 4 are included in the analysis, as well as models such as LLaMA or Alpaca.
Finally, the presentation considers the impact of text length and relate it to the impact of different language levels on the stability or instability of prompt response behavior.
Significance
The presentation thus bridges research in higher education on students/graduates’ COR skills, quantitative linguistics (language levels), and AI (large language models) and motivates further fine-graded mix-methods analyses of the multimodal COR data.