Individual Submission Summary
Share...

Direct link:

Caring about consequences: Considerations for designing large-scale assessments to monitor foundational learning (SDG 4.1.1a)

Mon, March 24, 2:45 to 4:00pm, Palmer House, Floor: 7th Floor, LaSalle 2

Proposal

Context:
“Are children learning?” is an important question for assessing nearly all education policies and initiatives. Unfortunately, the answer to this question has remained largely unknown in many countries. Post-COVID projections predict 7 out of 10 children in low- and middle-income countries suffer from learning poverty: they are not able to read and understand simple texts by age 10 (World Bank, 2022). Amidst the need for large-scale data on foundational learning, the UNESCO Institute for Statistics, responsible for monitoring Sustainable Development Goal (SDG) 4 on equitable education, reported that data on learning outcomes at the grade 2/3 level is available for only 34 countries (UNESCO Institute for Statistics, 2024). This data gap is primarily from low-income and lower-middle-income countries. Consequently, SDG 4.1.1a – the early primary grade learning indicator, was recently downgraded from a Tier I to a Tier II indicator. Indicators without sufficient coverage will get dropped if more data is not made available soon. As a result, this urgent need for countries, especially in the Global South, to design large-scale system monitoring assessments motivates this paper.

Purpose:
Drawing on a case from India, this paper presentation explores the critical role of validity evidence for large-scale assessments used in system monitoring, explicitly focusing on testing consequences. The study seeks to emphasize the importance of developing a theory of action that clearly delineates both direct and indirect mechanisms by which these assessments are expected to influence educational outcomes. The paper argues that validity evidence must be systematically gathered and synthesized to support assessments’ theory of action. Through an examination of the case, the Annual Status of Education Report (ASER) in India, and particularly three critical decisions—(1) where to conduct ASER assessments, (2) who will collect ASER assessment data, and (3) what methodology will be used to test the students and report ASER results, the presentation illustrates how focusing on consequences included in the theory of action shapes design decisions and impacts the effectiveness of large-scale assessment programs. The paper aims to offer insights and recommendations for education systems in the Global South as they develop large-scale assessments to monitor progress toward SDG 4 targets.

Methodology:
A literature review was conducted on validity evidence for test consequences and theory of action for large-scale assessments. ASER reports, survey, and assessment instruments; training and administration manuals; and technical and policy documents were reviewed.

Theoretical framework:
- Sources of validity evidence and consequences for large-scale assessments -
The current version of the Standards for Educational and Psychological Testing (American Educational Research Association [AERA] et al., 2014) stipulates five sources of evidence used to evaluate the validity of a proposed interpretation of test scores for a particular use. These sources are validity evidence based on (a) test content, (b) response processes, (c) internal structure, (d) relations to other variables, and (e) consequences of testing. While all these sources must be considered, evaluating consequences is integral to validating educational assessments for system monitoring purposes. Moreover, such an evaluation should include whether intended positive consequences are achieved and any potential unintended negative consequences are minimized (Lane, 2014).

- Theory of action -
Theory of action is a common notion in program evaluation literature; still, it is seldom applied to assessment programs. Presumably, assessments are not generally intended to cause change in individuals or institutions in the same sense as educational or social interventions (Bennett, 2010). However, large-scale assessments must focus on the chain of reasoning from testing to test scores to decisions and intended consequences (Sireci & Soto, 2016). In validity theory and its application, more attention has been directed at testing instruments’ technical quality concerns than their various effects. However, in large-scale assessments for system monitoring, there is no bright line between technical concerns and practice or policy concerns (Haertel, 2013).

ASER in India: Theory of action, consequences and design features:
ASER is a large-scale educational assessment and survey in India started by the Indian nongovernmental organization Pratham in 2005. The thirteenth nationwide ASER in 2022 reached 616 rural districts and assessed a representative sample of over 530,000 children across 19,060 villages (ASER Centre, 2023). The goals of ASER go beyond generating data on learning outcomes to stimulating a broad civic movement to improve learning outcomes. ASER is meant to influence those who participate in the assessment (partner organizations, volunteers, and parents whose children are tested) and others who hold responsibility for children’s schooling and learning (government officials, teachers). Given its intended consequences, ASER differs from most large-scale assessments in crucial ways. It is conducted as a rapid household survey, ensuring the inclusion of out-of-school children and those in unrecognized schools. The oral and one-on-one assessment takes about 15 minutes per child and engages parents. ASER’s results are statistically representative at national, state, and district levels, yet results are presented in a simple-to-understand format to facilitate public discussion and action at local levels. ASER is implemented independently through local organizations and volunteers as data collectors, advocating the public’s right to understand children's learning (ASER Centre, 2015; Goodnight & Bobde, 2018; Goodnight, 2022).

Results:
Through an examination of the ASER case, this presentation demonstrates how theory of action guides the design and implementation of assessments, which supports intended consequences and contributes to achieving desired outcomes. Moreover, our study highlights the importance of systematically evaluating both intended and unintended consequences as an integral part of the validity argument. One significant negative consequence of large-scale assessments is the potential disengagement of key stakeholders, such as teachers and administrators. This disengagement can result from the technical inaccessibility of large-scale assessments, where overly complex designs and reporting methods may alienate those tasked with understanding and using the results. By anticipating this potentially negative consequence in its design, ASER reduced the risk of stakeholder disengagement, enhancing public understanding of assessment results. The case findings provide insights and yield recommendations that may help other education systems in the Global South.

Authors