Paper Summary
Share...

Direct link:

Leveraging Large Language Models (LLMs) to Measure Aspiring School Leaders’ Capacities

Sat, April 11, 3:45 to 5:15pm PDT (3:45 to 5:15pm PDT), JW Marriott Los Angeles L.A. LIVE, Floor: 4th Floor, Diamond 1

Abstract

Job performance assessments that simulate real-world tasks and elicit respondents’ reasoning are increasingly used to measure leaders' capacities to engage in effective practices. This study examines the viability of Large Language Models (LLMs) to evaluate the substantial text data these assessments often generate. Using responses from 189 aspiring principals in Tennessee to a teacher hiring scenario, we evaluate six LLMs across three prompting strategies with a structured codebook. LLMs produced valid responses in 96% of item scores and showed reduced variability with detailed prompting. Higher-reasoning models (e.g., GPT-4o, Claude 3.7-Sonnet) demonstrate strong inter-rater reliability with trained human annotators. Findings suggest LLMs can scale assessment scoring while preserving fidelity to human norms, advancing the methodological toolkit for leadership researchers.

Authors