Search
On-Site Program Calendar
Browse By Day
Browse By Time
Browse By Person
Browse By Room
Browse By Unit
Browse By Session Type
Search Tips
Change Preferences / Time Zone
Sign In
Bluesky
Threads
X (Twitter)
YouTube
The Secondary Instruments and Test group was tasked with identifying quantitative measures used with students in grades 7-12 and then describing existing validity evidence for those measures. To describe validity evidence, we drew on what the American Educational Research Association, American Psychological Association, and National Council on Measurement in Education define as the validity of an assessment. In their document, Standards’ (AERA et al., 2014), they provide a definition of validity and describe six sources of evidence that should be addressed to some degree within a validation argument for an assessment. Those sources are (a) test content, (b) response processes, (c) relationship to other variables, (d) internal structure, (e) consequences from testing and bias, and (f) reliability. Below we provide details of the instruments found with a brief overview of the validity evidence. This is followed by noting several interesting findings from the review.
After screening 1,747 articles for potential measures, 379 instruments were included in the final categorization framework. Of these 379 instruments, there were 223 instruments (58.84%) for which validity evidence was found from 301 different papers. Therefore, 41.16% of the measures we identified had no validity evidence found.
In total there were 1,025 instances of evidence types identified in those 301 papers, of which 744 (72.59%) had an associated claim supported by the evidence and 281 (27.41%) did not have an associated claim. The counts of the 1,025 pieces of evidence disaggregated by the type of evidence are: 354 for test content, 268 for reliability, 165 for internal structure, 162 for relations to other variables, 53 for response process, and 23 for consequences of testing. Further, 84 of the 301 (27.91%) papers had an interpretation statement for an associated measure and 149 (49.50%) had a use statement.
Several interesting findings were noticed from the data. First, the number of instruments without a name was alarming. Of the 379 instruments identified for inclusion, 179, or 47%, had no name associated with them. If the field is going to build cumulative knowledge about students’ understanding of a construct, it is imperative that researchers name the instruments they develop. Similarly, there were 42 (11.08%) large scale instruments, and 337 (88.92%) small scale instruments identified. Many of the instruments were used once and never again.
Most of the evidence was evidence of test content (31.11%) or reliability (26.14%), with very little evidence of response process (5.17%) and even less for consequences of testing (2.24%). This highlights a gap in the validity arguments for many of these instruments. These findings have implications for future validation studies researchers might conduct on these, and future, secondary measures used with grades 7-12 students.
Erin E. Krupa, North Carolina State University
Katherine Burkett, North Carolina State University
Brianna Bentley, William Peace University
Cigdem Alagoz, University of the Virgin Islands
Daria Gerasimova, University of Kansas
Deborah M. La Torre, University of California - Los Angeles
Emily Toutkoushian, The American Board of Anesthesiology