Search
Program Calendar
Browse By Day
Browse By Time
Browse By Person
Browse By Room
Browse By Unit
Browse By Session Type
Browse By Descriptor
Search Tips
Annual Meeting Theme
Exhibitors
About Philadelphia
About AERA
Personal Schedule
Sign In
X (Twitter)
Research in recent years has highlighted the interplay of cognition and affect in tutorial interaction (Grafsgaard et al., 2011; John & Woolf, 2006). This interplay has implications for the design of intelligent tutoring systems that seek to attain or exceed the effectiveness of expert human tutors (Elliot & McGregor, 2001; Elliot & Pekrun, 2007; Ortony et al., 1990). To meet this goal, recent results demonstrated that understanding both the cognitive and affective nature of tutorial interaction is necessary. Prior investigations of facial expression in tutoring identified links between particular facial movements and cognitive-affective states relevant to learning (D’Mello & Graesser, 2010).
Over the past five years we have been constructing and analyzing hidden Markov models of task-oriented textual tutorial dialogue annotated with dialogue acts and facial expression annotated from video. Facial movement combinations were annotated in a novel, three-phase protocol to provide rich affective representation within tutorial dialogue. Few studies have utilized hidden Markov models (HMMs) to model affect within the context of learning. A recent study of human-human tutoring that we conducted modeled student brow lowering (an indicator of confusion) using HMMs provided both a predictive model and an analysis of confusion within the tutorial interaction. The work presented here builds on these prior findings by leveraging sixteen facial movements (including brow lowering) in a purely descriptive model built without additional constraints, resulting in a richer representation of affect. A corpus of human-human tutorial dialogue was collected during a tutorial dialogue study. Students solved an introductory computer programming problem and engaged in computer-mediated textual dialogue with a human tutor. The corpus consists of 48 dialogues annotated with dialogue acts. Student facial video was collected for post-analysis. Seven of the highest quality facial videos were selected for the extent to which the student’s entire face was visible during the recording, and for near-even split across genders and tutors. These videos were annotated with facial expressions for the present analysis. Tutoring sessions ranged in duration from thirty minutes to over an hour.
The facial expression and dialogue data were merged into sequences of observations needed to build a hidden Markov model. Each observation consisted of a facial expression (denoted as facial action units (AUs)), dialogue act or both. The Baum-Welch algorithm with log-likelihood measure was used for model training. Ten random initializations were performed to reduce convergence to local maxima. A hyperparameter optimization outer loop produced candidate HMMs across a range from three to twenty-two hidden states. Average log likelihood was computed across candidate HMMs for each number of hidden states. The models with best average log-likelihood had ten hidden states, and the best-fit model had the highest log-likelihood among these.
The descriptive HMM learned from facial expression and task-oriented tutorial dialogue revealed five frequently-occurring patterns of affective tutorial interaction. Each pattern modeled distinct and interpretable segments of the tutoring sessions. A closer inspection of hidden state sequences as they occurred within sessions showed notable differences between sessions.