Individual Submission Summary
Share...

Direct link:

Estimating Treatment Heterogeneity in Early Childhood Contexts: Lessons Learned and Implications for Study Design

Sat, March 23, 4:15 to 5:45pm, Hilton Baltimore, Floor: Level 2, Key 4

Integrative Statement

Research in education has seen an increasing use of multi-site randomized control trials (RCTs). Multi-site RCTs are particularly common among studies of early childhood education due to the small number of children and teachers in each center and the need for adequate sample sizes for power to detect effects. In addition, there is growing interest in whether treatment impacts vary across sites given the variation in the structure, quality, and populations served across ECE centers (e.g., Bloom & Weiland, 2015; Walters, 2014).
However, designing studies to detect treatment heterogeneity often presents a challenge. Multi-site RCTs are often designed to detect average treatment impacts, which may not be optimal for detecting impact variation (Bloom & Spybrook, 2017). This study conducts a series of power calculations within a large-scale, multi-site evaluation of an ECE professional development program to explore the trade-offs in sample size and design for three scenarios: (1) power for average treatment effects with retrospective exploration of impact variation; (2) power for treatment impact variation with retrospective exploration of average effects; and (3) power for both average treatment effects and impact variation.
Data come from the National Center for Research on Early Childhood Education (NCRECE) Professional Development Study, where 427 teachers in 238 preschool centers were randomly assigned to a 14-week course. Classroom quality was measured using the Classroom Assessment Scoring System (CLASSTM; Pianta, LaParo & Hamre, 2008). An earlier analysis indicated practical challenges in estimating fixed intercepts, random coefficient models to examine variation in the impact of the NCRECE course on classroom quality.
We conduct a series of power calculations based on Bloom and Spybrook (2017) to examine design considerations when powering studies to detect treatment heterogeneity. First, we compare power to detect average treatment impacts in the NCRECE study with anticipated and actual power to detect variation based on study design and data limitations. Next, we consider the design of studies of similar scale but with the primary aim of estimating treatment heterogeneity, and calculate the necessary number of centers for a fixed number of teachers. Finally, additional analyses will also consider the necessary number of centers and teacher per center for power to detect both treatment heterogeneity and average treatment effects.
Preliminary results indicate that the small numbers of teachers per center in the NCRECE study, compounded by high rates of attrition, limited our power to detect treatment heterogeneity. However, findings indicate that studies with a similar number of teachers as the NCRECE study, but spread across a smaller number of sites, could be powered to detect moderate treatment heterogeneity. Increasing the number of centers reduces power to detect treatment heterogeneity; this reduction is greater if there is substantial attrition.
Findings also indicate the trade-off between power to detect average treatment impacts and treatment heterogeneity. Studies powered to detect smaller amounts of treatment heterogeneity have larger minimum detectable effect sizes. Additional analyses will investigate the frontier of optimal designs where power to both detect treatment heterogeneity and overall average treatment effects is maximized given set resource constraints.

Authors