Paper Summary
Share...

Direct link:

Design and Validation of Instructionally Supportive Assessment: Examining Student Performance on Knowledge-in-Use Assessment Tasks

Tue, April 9, 10:25 to 11:55am, Fairmont Royal York Hotel, Floor: Mezzanine Level, Confederation 3

Abstract

To address the vision of the NGSS, instructionally supportive assessments are needed that evaluate students’ integrated, three-dimensional learning (NRC, 2014). Multi-dimensional, knowledge-in-use item formats allow students to demonstrate various aspects of proficiency (Authors, 2014). We will describe our method to develop three-dimensional assessments that can be used to support instruction (i.e., formatively) by indicating how students are making progress towards meeting a specific NGSS performance expectation. Further, we will report results from a validation study aimed at answering the question Can model-based information about student and item performance be provided as warrants for the inferential validity of the assessment tasks, individually and collectively?

Developing NGSS assessments requires novel task designs, including technologically-enhanced assessment tasks that are developed using a principled design approach, such as evidence-centered design (ECD) (Authors, 2018; Gorin & Mislevy, 2013; NRC, 2014). ECD links claims about student learning, evidence from student work products needed to support the claims, and design features of assessment tasks that elicit this desired evidence (Mislevy & Haertel, 2006). While ECD design specifications comprise one important aspect of validity evidence, they must be augmented with model-based evidence to ensure teachers can make valid inferences about their students’ proficiency (Authors, 2016).

Using ECD, we developed three-dimensional, knowledge-in-use assessment tasks to align with specific middle school performance expectations (Authors, 2014). A sample of students in their first year of learning using a NGSS-aligned curriculum completed assessment tasks related to the chemical reaction topic. The study was conducted in three school districts serving diverse student populations; 6th and 7th grade students were pseudo-randomly assigned a booklet of 6 assessment tasks. Students had approximately 60 minutes to complete the assessment tasks. Students’ constructed responses were scored by raters applying task-specific rubrics. Our scoring approach followed directly from our ECD framework, which is based on the focal knowledge, skills, and abilities that are hypothesized to underlie task performance (Authors, 2018).

We used the R ltm package (Rizopoulos, 2006) to fit a GPCM model. Figure 3 (left) displays category response functions for one assessment task, where each function represents the probability of an examinee with a given θ earning a total score of 0, 1, 2, or 3. These data show a desirable item response pattern – increasing item score values align with differential and increasing estimates of overall student proficiency. Figure 3 (right) displays the information functions for each task, indicating the precision with respect to estimating proficiency along the entire continuum. These functions show that the items vary in difficulty and generally are most precise at differentiating students with higher levels of proficiency.

NGSS implementation requires classroom assessments that can provide actionable information to teachers about students’ three-dimensional learning. As such, assessments that allow for partial scores on different aspects of multi-dimensional, integrated performance, are critical. These results provide evidence for inferential validity claims for our three-dimensional assessment tasks and associated rubrics.

Authors