Search
On-Site Program Calendar
Browse By Day
Browse By Time
Browse By Person
Browse By Room
Browse By Unit
Browse By Session Type
Search Tips
Change Preferences / Time Zone
Sign In
X (Twitter)
Coding is a common context for developing computational thinking (CT), and young children can learn to program tangible coding toys before they can read and write (e.g., Relkin et al., 2020). Yet, despite the growing body of research on CT and coding with tangible coding toys, there is still not an agreed upon definition of CT that is developmentally appropriate for early childhood nor is there agreement on how to assess it. As part of a larger study, we operationalized what CT looks like in early childhood and the kinds of knowledge children use as they engage in CT tasks with tangible coding toys. We then designed a performance assessment aligned to our model that allows young children to demonstrate their understanding of CT without reliance on reading or writing. In previous work, we wrote about the design of the Computational and Spatial Thinking (CaST) assessment (Authors, 2021a). The purpose of the present study is to explore the validity and reliability of CaST with a sample of 272 children aged 4-8 (girls =138). Our work was guided by two research questions: 1) Is the CaST assessment a reliable and valid measure of CT for a sample of children aged 4-8? 2) Does the CaST assessment function equally for children regardless of their gender and age?
Participants were recruited from five schools in the Intermountain West. In agreement with our IRB, guardians provided consent and children provided assent. CaST was administered in a one-to-one interview format in a quiet area in the school, videotaped (~16.4 minutes per child), and scored by two independent raters (κ = .91).
To answer RQ1, we used item response theory (IRT). We checked unidimensionality, local dependence, and functional form - and then confirmed the acceptable model-and item fits across the 19 items. The two-parameter logistic IRT model (2PL) was the best fit. Item discriminations were overall high (M = 2.26. SD = 1.12), whereas item difficulty differed by various features and item designs, ranging from -2.40 to .92 (M = -.21; SD = .86). In terms of reliability, we conducted Cronbach’s alpha and McDonald’s omega and found high internal consistency (α = .91) and saturation (ω = .92). We estimated IRT marginal reliability and found CaST holds high levels of marginal reliability (rxx = .87) and that children with medium to high CT abilities can be assessed more precisely with this measure.
To answer RQ2, we conducted differential item functioning (DIF) analyses and did not find any evidence of item bias based on gender and age indicating that differences in performance on items are due to true ability and not bias embedded in item design.
Given that CT is a multi-faceted construct, there is a need for a variety of validated assessments. The present study contributes evidence for how performance assessments of CT can offer valid and reliable ways to measure and understand what children know in a format other than multiple-choice and that does not rely on their reading, writing, and language skills.