Search
Browse By Day
Browse By Time
Browse By Person
Browse By Policy Area
Browse By Session Type
Browse By Keyword
Program Calendar
Personal Schedule
Sign In
Search Tips
Introduction: Early childhood education is a critical setting for child development, and classroom observations are a key tool for improving instructional quality (Bronfenbrenner & Morris, 2007; Phillips et al., 2016). As policymakers, educators, and researchers seek more scalable and cost-effective ways to assess and support quality, video-based observations have emerged as a promising tool (Kane et al., 2020). Collecting observational data via video offers potential cost savings and new opportunities for professional learning, but there are open questions about whether video scores are sufficiently reliable and valid for high-stakes uses (Brunvard, 2010; Kane et al., 2020). This study provides new evidence on the reliability and score comparability of live and video-based observations across the two most commonly used classroom observation measures in early childhood education.
Research Questions: We address the following research questions using two observation tools commonly used in early childhood coaching and accountability: the Classroom Assessment Scoring System PreK-3rd (CLASS 2nd edition; Pianta & Hamre, 2022) and the Early Childhood Environment Rating Scales (ECERS-3; Harms et al., 2014):
- Do observations of pre-K classroom quality demonstrate inter-rater reliability (IRR) when conducted over video?
- Do live and video scores vary systematically within an observation?
- Do live and video scores vary differentially by quality level?
Methods: Certified observers conducted 100 CLASS and 56 ECERS observations in 59 public pre-K and community-based childcare center classrooms across five states. Simultaneous with live observations, the classroom was recorded using microphones worn by teachers and a stationary recording device that rotated to follow the lead teacher's movements, resulting in a video to be coded later by a different certified observer. To address the research questions, we first calculated IRR for video observations using percent-within-one agreement. Second, we estimated multivariate regression models with observation-level fixed effects to identify whether scores varied across live and video for the same observation, controlling for the observer and observer drift (i.e., days between when the observer was certified and when they coded the observation). Third, we re-estimated models with interaction terms to test for moderation by the quality of the observation, operationalized as moderation by unadjusted live observation scores.
Results: Both CLASS and ECERS video observations exceeded the 80 percent-within-one agreement standard for certifying observers. On average, few CLASS and ECERS-3 live and video scores varied significantly within an observation, though video scores were 0.15 SD lower compared to live scores for CLASS Emotional Support. However, across CLASS and ECERS-3, there were significant interactions such that video scores were lower than live scores in the highest scoring observations, but higher than live scores in the lowest scoring observations. This may be due to challenges of capturing behaviors on video that result in extremely high or low scores.
Implications: The presentation will address implications for observations used in coaching and accountability systems. Findings generally support the use of video observations for coaching given the strong reliability but are less promising for accountability uses since scores may vary systematically for classrooms with the highest stakes (e.g., lowest scoring classrooms).