Search
On-Site Program Calendar
Browse By Day
Browse By Time
Browse By Person
Browse By Room
Browse By Unit
Browse By Session Type
Search Tips
Change Preferences / Time Zone
Sign In
Bluesky
Threads
X (Twitter)
YouTube
Objective and Motivation
Single-word recognition is a key indicator for identifying dyslexia risk and reading difficulties (Wagner et al., 2022). Traditional screening methods—such as DIBELS, Woodcock-Johnson, and TOWRE—require a professionally trained proctor to administer oral word reading one-on-one with students. In our first lab-based study (Blinded for review), we transitioned this approach to a fully browser-based and automated lexical decision task. This innovation opened the potential for large-scale implementation in classrooms but also introduced new challenges and concerns voiced by school partners. In this talk, we present three improvement efforts—each sparked by practical concerns from educators, developed through school-based experiments and psychometric validation, and resulting in a more efficient, accessible assessment tool back to classrooms.
Challenge 1: Can we make ROAR-Word more efficient? (Blinded for review)
Initially, ROAR-Word presented items in random order and took students about 20 minutes to complete. Through close research-practice partnerships, we developed ROAR-CAT, a shorter adaptive version (3–5 minutes) using item response theory and computerized adaptive testing. This improvement was (1) first informed by a large-scale analysis showing that item difficulties were highly stable across age, socioeconomic status, and learning differences (N = 1,960), and (2) validated in two follow-up studies with students from public schools and specialized independent schools (N = 1,214). ROAR-CAT significantly reduced testing time while maintaining strong validity (r = .89) with existing school-used screeners.
Challenge 2: Is trial-by-trial feedback helpful or distracting for test engagement? (Blinded for review)
ROAR-Word originally included trial-by-trial auditory feedback to promote engagement. While some school partners raised concerns that this might disadvantage lower-performing students, engagement itself is a critical threat to validity—especially in digital screening contexts with young learners. We addressed this through two large-scale experiments with 6,610 students (Grades 1–12) in Colombia and the U.S., randomly assigned to receive either informative feedback (correct/incorrect cues) or neutral feedback. The assessment was administered in two formats: adaptive English and random-order Spanish. Informative feedback led to higher compliance, reduced disengagement, shorter completion times, and improved concurrent validity—without compromising score interpretability. These results suggest that trial-by-trial feedback can enhance both the validity and efficiency of universal screening.
Challenge 3: Can we improve the testing experience for beginning and struggling readers while maintaining score sensitivity?
Floor effects are a common challenge in early literacy screening (Catts et al., 2009). In ROAR-Word, each word was originally presented for 350ms—a design intended to assess reading automaticity. However, school partners raised concerns that the timing was too fast and potentially confusing for young or struggling readers. In collaboration with schools specializing in language-based learning differences, we tested varied presentation times with 435 students. We found that fast presentation preserves item difficulty, but including a set of easy words with unlimited presentation duration better differentiates struggling readers. Our solution is an adaptive format: all students begin with unlimited-time items, and only those who demonstrate reading ability transition to fast-timed trials. This approach improves the experience for beginning readers while maintaining score sensitivity for more advanced students.