AERA Annual Meeting: Quantitative Measure of Secondary (Grades 7-12) Instruments and Tests

Information Menu
Search Tips

Navigation and Settings Menu
Change Preferences / Time Zone
Sign In

Back Home

Refresh: Off

Paper Summary

Share...

Direct link:

Quantitative Measure of Secondary (Grades 7-12) Instruments and Tests

In Event: Patterns in Validation and Validity Within and Across Measures in Mathematics Education

Sun, April 27, 11:40am to 1:10pm MDT (11:40am to 1:10pm MDT), The Colorado Convention Center, Floor: Meeting Room Level, Room 712

Abstract

The Secondary Instruments and Test group was tasked with identifying quantitative measures used with students in grades 7-12 and then describing existing validity evidence for those measures. To describe validity evidence, we drew on what the American Educational Research Association, American Psychological Association, and National Council on Measurement in Education define as the validity of an assessment. In their document, Standards’ (AERA et al., 2014), they provide a definition of validity and describe six sources of evidence that should be addressed to some degree within a validation argument for an assessment. Those sources are (a) test content, (b) response processes, (c) relationship to other variables, (d) internal structure, (e) consequences from testing and bias, and (f) reliability. Below we provide details of the instruments found with a brief overview of the validity evidence. This is followed by noting several interesting findings from the review.

After screening 1,747 articles for potential measures, 379 instruments were included in the final categorization framework. Of these 379 instruments, there were 223 instruments (58.84%) for which validity evidence was found from 301 different papers. Therefore, 41.16% of the measures we identified had no validity evidence found.

In total there were 1,025 instances of evidence types identified in those 301 papers, of which 744 (72.59%) had an associated claim supported by the evidence and 281 (27.41%) did not have an associated claim. The counts of the 1,025 pieces of evidence disaggregated by the type of evidence are: 354 for test content, 268 for reliability, 165 for internal structure, 162 for relations to other variables, 53 for response process, and 23 for consequences of testing. Further, 84 of the 301 (27.91%) papers had an interpretation statement for an associated measure and 149 (49.50%) had a use statement.

Several interesting findings were noticed from the data. First, the number of instruments without a name was alarming. Of the 379 instruments identified for inclusion, 179, or 47%, had no name associated with them. If the field is going to build cumulative knowledge about students’ understanding of a construct, it is imperative that researchers name the instruments they develop. Similarly, there were 42 (11.08%) large scale instruments, and 337 (88.92%) small scale instruments identified. Many of the instruments were used once and never again.

Most of the evidence was evidence of test content (31.11%) or reliability (26.14%), with very little evidence of response process (5.17%) and even less for consequences of testing (2.24%). This highlights a gap in the validity arguments for many of these instruments. These findings have implications for future validation studies researchers might conduct on these, and future, secondary measures used with grades 7-12 students.

Quantitative Measure of Secondary (Grades 7-12) Instruments and Tests

Abstract

Authors