AERA Annual Meeting: Leveraging Large Language Models (LLMs) to Measure Aspiring School Leaders’ Capacities

Information Menu
Search Tips

Navigation and Settings Menu
Change Preferences / Time Zone
Sign In

Back Home

Refresh: Off

Paper Summary

Share...

Direct link:

Leveraging Large Language Models (LLMs) to Measure Aspiring School Leaders’ Capacities

In Event: Innovations in Leadership Evaluation: Policy, Practice, and Emerging Technologies

Sat, April 11, 3:45 to 5:15pm PDT (3:45 to 5:15pm PDT), JW Marriott Los Angeles L.A. LIVE, Floor: 4th Floor, Diamond 1

Abstract

Job performance assessments that simulate real-world tasks and elicit respondents’ reasoning are increasingly used to measure leaders' capacities to engage in effective practices. This study examines the viability of Large Language Models (LLMs) to evaluate the substantial text data these assessments often generate. Using responses from 189 aspiring principals in Tennessee to a teacher hiring scenario, we evaluate six LLMs across three prompting strategies with a structured codebook. LLMs produced valid responses in 96% of item scores and showed reduced variability with detailed prompting. Higher-reasoning models (e.g., GPT-4o, Claude 3.7-Sonnet) demonstrate strong inter-rater reliability with trained human annotators. Findings suggest LLMs can scale assessment scoring while preserving fidelity to human norms, advancing the methodological toolkit for leadership researchers.

Leveraging Large Language Models (LLMs) to Measure Aspiring School Leaders’ Capacities

Abstract

Authors