Paper Summary
Share...

Direct link:

Design of Next Generation Science Assessments: Measuring What Matters

Mon, April 20, 10:35am to 12:05pm, Virtual Room

Abstract

Objectives:
We overview our systematic, scalable and equity-focused approach for designing assessment items that measure student proficiency with new science learning goals that integrate disciplinary core ideas and crosscutting concepts with scientific practices. The assessment tasks are intended for formative use within classroom instruction.

Perspective and Theoretical Framework:
There is a tremendous need for such assessment design work, as assessment plays a central role in supporting implementation of the new directions in science education both in the U.S. and internationally (Author and others, 2014). Our approach to meeting this challenge uses principles of Evidence-Centered Design (ECD; e.g., Almond, Steinberg, & Mislevy, 2002). ECD emphasizes the evidentiary base for specifying coherent, logical relationships among the (1) learning goals that comprise the constructs to be measured (i.e., the claims we want to make about what students know and can do); (2) evidence in the form of observations, behaviors, or performances that should reveal the target constructs; and (3) features of tasks or situations that should elicit those behaviors or performances.

Methods:
We use ECD to systematically unpack science learning goals and synthesize the unpacking into multiple components that we call learning performances. Learning performances are knowledge-in-use statements that guide the development of assessment tasks and rubrics for measuring three-dimensional learning goals such as the performance expectations of the NGSS. Figure 1 overviews our overall design process that involves 3 distinct phases – Domain Analysis, Domain Modeling, and Task and Rubric Development. While the figure illustrates a linear process, the actual process is very iterative and recursive (see Author and others, 2019).

Data sources:
We have used the design process to unpack multiple PEs from the physical and life science disciplines and have created over 100 assessment tasks for use in middle school classrooms. The tasks are technology-enhanced (e.g., use of simulation, modeling software, video) and many use non-textual representations to elicit student responses (e.g., through drawing or modeling).

Results:
Across a series of studies using multiple research methods, we have assembled data indicating that our tasks are functioning as intended, minimize construct-irrelevant variance, and support teachers’ classroom practice. For example, classroom observations show that teachers use the assessments in a variety of different modes, spanning a range between formative and summative use. Student cognitive lab studies provided data on both task comprehensibility and issues of equity and construct-irrelevant variance. Task performance studies have provided data on item features (e.g., difficulty) that affect student performance and on the utility of our rubric design in affording partial credit scores based on the presence or absence of key knowledge targets in the student responses.

Scholarly significance:
Our work provides an example of one way to approach the development of high-quality assessments that elicit knowledge-in-use performance consistent with the NGSS. Having the right kinds of assessments is critically important because they guide what teachers and students attend to during instruction. High quality assessments can help teachers implement new standards, help students learn more, and provide equitable opportunities for all students to develop their proficiencies within and across disciplines.

Authors