Search
On-Site Program Calendar
Browse By Day
Browse By Time
Browse By Person
Browse By Room
Browse By Unit
Browse By Session Type
Search Tips
Change Preferences / Time Zone
Sign In
Bluesky
Threads
X (Twitter)
YouTube
This study presents an integrated pipeline for large-scale qualitative analysis of educational video content, leveraging Automatic Speech Recognition (ASR) and Large Language Models (LLMs) for coreference resolution and Named Entity Recognition (NER) correction. Using 48 CrashCourse US History episodes, we benchmarked multiple ASR systems and applied LLM-based enhancements. Four episodes were manually annotated as gold standards to validate improvements. Results demonstrate that LLM-assisted coreference and NER significantly boost the accuracy and reliability of historical entity extraction, particularly for complex events, organizations, and laws. Topic modeling reveals that LLM-cleaned transcripts yield clearer, more coherent themes. Our findings highlight LLMs’ value in enhancing transcript fidelity, entity recognition, and thematic analysis, supporting more rigorous educational research across diverse contexts.