Paper Summary
Share...

Direct link:

Automatically Measuring Features of Teacher Discourse From Classroom Audio

Sat, April 18, 8:15 to 9:45am, Virtual Room

Abstract

Purpose and Theoretical Framework
We address a critical lack of quantitative and actionable feedback that teachers receive about the quality of their talk by providing an approach for the automatic analysis of teacher discourse. Existing automated approaches have focused on measurement across coarse-grained windows, often over an entire class session (Author, Wang et al., 2014; Ramakrishnan et al., 2019). Our Cyber-Enabled Teacher Analytics project is based on the hypothesis that fine-grained utterance-level information is needed to provide actionable feedback to help teachers improve their practice. Our emphasis is on question asking and the use of disciplinary language, drawing from theoretical frameworks of dialogic instruction in ELA classrooms by Nystrand and Gamoran (1997), teacher talk in ELA by Juzwik et al. (2013), and on student engagement by Shernoff et al. (2003).

Data Sources
Our dataset consists of 167 audio recordings of class sessions where teachers wore a wireless Samson AirLine 77 headset mic. It includes 127 observations from 16 teachers in Pennsylvania and 40 class observations from 11 teachers in Wisconsin. We used the Watson speech recognizer to automatically transcribe the audio. The average word accuracy of transcription was 60.2% for all utterances and 75.4% for those with more than two words.

Two trained observers (0.81 reliability via Gwet’s [2008] AC) coded a contiguous chunk of 200 utterances per observation (24,755 utterances total). Here, we focus on discriminating questions from statements, further separating each into whether they are instructional or not, followed by whether they are disciplinary or not. Instructional questions/statements pertain to the lesson and learning goals (e.g., “today we will read Moby Dick”) whereas disciplinary questions/utterances focus on specific content/disciplinary practices (“what is the theme of the story?”).

Method
Our goal was to automatically code utterances for the above features of talk as scored by the human coders. To do this, we extracted linguistic information (e.g., word types), n-grams (i.e., words/phrases), turn taking (e.g., pauses between utterances), and acoustics (e.g., loudness, pitch) from the utterances and submitted them to machine learning methods (Random Forest Classifiers) that learn how to identify each talk feature. For example, one classifier learned how to detect questions, another to detect disciplinary questions, and so on. Importantly, the classifiers were trained to generalize to new teachers rather than overfitting to the data (Author, 2018 provides details on the general method).

Results
We computed alignment between the computer’s estimates and the human codes as a measure of accuracy. Overall, agreement was moderate with the computer and human aligning 71% to 77% of the time. Specific results include: 77% for questions vs. statements; 74% for instructional questions; 76% for disciplinary questions; 71% for instructional statements; and 71% for content-specific statements.

Significance
Our results confirm the feasibility of fully-automated approaches to analyze classroom discourse at the utterance level despite challenges with the noisy nature of real-world classroom discourse. Future work is to incorporate our approach for utterance level modeling of dialogic talk features such as open-ended questions, uptake, and classroom discussion and on feedback tools for teachers.

Authors