AERA Annual Meeting: Rethinking Validity Evidence: Consequential Validity

Information Menu
Search Tips

Navigation and Settings Menu
Change Preferences / Time Zone
Sign In

Social Media Menu
Facebook
X (Twitter)

Back Home

Refresh: Off

Paper Summary

Share...

Direct link:

Rethinking Validity Evidence: Consequential Validity

In Event: Dismantling Assessment Injustice: Reconceptualizing Consequential Validity, Assessment Fairness, and Measurement Considerations in the Professions

Fri, April 12, 3:05 to 4:35pm, Philadelphia Marriott Downtown, Floor: Level 5, Salon C

Abstract

In assessment, the claims or interpretations we make about scores are substantiated by the evidence that we collect to support those claims. The Standards for Educational and Psychological Testing (American Educational Research Association et al., 2014) detail the processes for defining the domain of interest and for interpreting the outcomes of assessments. These processes have been accepted for some time. The guidance in the Standards has been thought of in terms of several interpretive arguments about test content, response processes, relationship to other variables, and consequences (Cook & Lineberry, 2016; Kane, 1994; Messick, 1989; Randall et al., 2022). In professions education, most work has focused on test content and relationship to other variables with less research on the consequences of the assessments (Cook & Lineberry, 2016; Randall et al., 2022).

The impact of assessments on various stakeholders provides the basis for consideration of the accumulation of evidence regarding assessment consequences. Cook and Lineberry (2016) identified a framework to organize consequences evidence; their framework encouraged a view of assessments as interventions. The dimensions they detailed included impact on stakeholders, impact of interpretations, intended and unintended uses, and benefits and harms associated with assessment claims and interpretations.

Randall et al. (2022) presented a framework with questions that looked at consequences from the perspective of accumulating evidence to support justice-oriented interpretations. The authors built on the work of Kane (interpretive use argument) and Mislevy (sociocognitive validation) and identified specific questions that could be used to gather validity evidence. For consequence evidence, the authors identified traditional and antiracist questions that could be asked. The traditional question guiding collection of evidence is whether items perform differentially across groups of test-takers. To provide evidence with social justice in mind, the authors encourage researchers to consider these dimensions: further marginalization of minoritized populations; the impact of structural racism on interpretations; systems that support “success” and whose systems define success; and who is advantaged by test administration.

These two frameworks will be used to critically examine validity evidence reported in Division I research presented at Annual Meetings. This critical framing can help set a research agenda for our Division and other assessment professionals.

Author

Danette Waller McKinley, Foundation for Advancement of International Medical Education and Research