Individual Submission Summary
Share...

Direct link:

Human versus LLM: Evaluating Effectiveness of Generative AI in Thematic Coding in the Interpretive Social Sciences

Tue, August 11, 8:00 to 9:30am, TBA

Abstract

The ease of use of generative AI models like ChatGPT and their success in some forms of research-related automation has sparked interest in their potential for streamlining labor-intensive tasks such as thematic coding of complex social phenomena in the interpretive social sciences. We test the coding capacities of various large-language models (LLMs) by comparing the results of human and automated coding of a relatively small dataset (N=356) of statements addressing racism, issued by higher educational institutions in 2020. We evaluate varying degrees of automation, including DistilBERT, a compact and efficient general-purpose language representation model; MPNet, a sentence-transformer model; and several contemporary generative AI systems, including ChatGPT and Gemini. Within the generative AI domain, we examine three distinct configurations that reflect varying levels of AI reliance: (1) A fully automated approach using a generative AI–produced codebook and generative AI coding; (2) A hybrid Human-in-the-Loop approach combining a generative AI–generated, human-refined codebook with generative AI coding; and (3) A human-developed codebook (augmented by diverse research personas) paired with generative AI coding. We compare the outputs of all large language models (LLMs) to a human ground truth, defined as a human-generated codebook and corresponding human coding. We conclude by discussing the implications of our findings for methodological practice and the broader application of AI-assisted coding in social science research.

Authors