Paper Summary
Share...

Direct link:

Blurring Qualitative and Quantitative Approaches with AI Language Models: Applications to Clinical Trust

Sat, April 13, 3:05 to 4:35pm, Pennsylvania Convention Center, Floor: Level 200, Room 203AB

Abstract

Advances in artificial intelligence (AI) and natural language processing (NLP) promise new insights in the analysis of narrative data. Computational and algorithmic developments in deep learning neural networks have led to the advent of language models (LMs - including large language models, or LLMs) with unprecedented abilities to not only encode the meaning of words, sentences, and entire bodies of text, but also to use such encodings to generate new text. Within health professions education research, these LMs promise to bridge, and potentially blur, the distinction between qualitative analysis and quantitative measurement approaches.
Our research focuses on exploring the interpersonal dynamics of clinical entrustment by developing LM methodologies to augment the reach of traditional qualitative methods for analyzing large narrative datasets. We have employed LMs to assist narrative analysis in several ways, including: 1) to discover qualitative themes and measure associated constructs in unlabeled narratives, and 2) to uncover latent constructs underlying the classification of pre-labeled narratives. In 1), querying how supervisors and trainees may differentially approach entrustment decisions, we developed a transfer learning strategy (applying LLMs trained on datasets apart from the study dataset) to identify and measure constructs in a large dataset of feedback narratives. The constructs algorithmically identified included features of clinical task performance and sentiment characterizing the language used in the feedback. The LLMs provided consistent measurement of these constructs across the entire dataset, enabling statistical analysis of differences in how supervisors and trainees reflect on entrustment decisions and respond to potential sources of bias. In 2), querying how entrustment decisions shape feedback, we trained an LM from scratch to predict entrustment ratings from feedback narratives, using a training set consisting of narratives paired with entrustment ratings. By deconstructing the trained LM, we uncovered latent constructs the LM used to make its predictions. Such constructs included the narrative’s level of detail and the degree to which the feedback was reinforcing versus constructive.
While LMs offer the advantages of consistent construct measurement and applicability to large datasets, they also carry the disadvantages of algorithmic bias and lack of transparency. In 1), we identified gender bias in the LLM we had trained to measure sentiment; this bias originated from its training dataset. To mitigate this bias, we developed a strategy that masked the LLM to gender-identifying words during both training and measurement of sentiment. This allowed us to identify small but significant biases in the study data itself, which revealed that entrustment ratings appeared to be less susceptible to bias than the language used to convey it. With respect to transparency, while our work in 2) enabled the deconstruction of an LM designed for a specific task (i.e. prediction of entrustment), larger LLMs used in generative AI (including GPT-4 and LLaMA) currently lack the ability to trace their output to its sources. Our ongoing work focuses on developing LLM-based strategies that support transparency in narrative analysis, and on developing theory to characterize the epistemology and limitations of knowledge both represented within and derived from LLMs.

Authors