Search
On-Site Program Calendar
Browse By Day
Browse By Time
Browse By Person
Browse By Room
Browse By Unit
Browse By Session Type
Search Tips
Change Preferences / Time Zone
Sign In
Bluesky
Threads
X (Twitter)
YouTube
1. Objectives/Purposes
We applied an “Anonymized” performance assessment (PA) of critical thinking (CT), “Migrants,” to college students in Switzerland, Colombia, and the US. We used G-Theory to bring empirical evidence to bear on some aspects of cognitive comparability of the PA across countries. We asked: (a) Do students receive similar relative and absolute cognitive process scores in each country? (b) Is there a country difference in person, outcome measures, rater, and interaction variance components? Is this difference a result of differentiations in countries, or is it because of other sources (e.g., interactions)? (c) How reliable is the scoring within and across countries?
2. Perspective(s)/Theoretical Framework
CT is a multifaceted construct encompassing the process of conceptualizing, analyzing, synthesizing, evaluating, and applying information to solve problems, make decisions, find answers, or reach conclusions (Braun et al., 2020; Shavelson et al., 2019). Comparability is the extent to which: “... scores can be validly compared ... from measurements taken at different times, in different places, or using variations in assessment content and procedures. ...” (Berman et al., 2020, p. 14). In this study, we focus on one aspect of comparability, that of reliability (Shavelson & Webb, 2009). We use scores from think-aloud protocols that bear on cognitive validity (Braun et al., 2020; Ruiz-Primo et al., 2001; Smith, 2017).
3. Methods, Techniques/Modes of Inquiry
Students participated in the think-aloud process and answered follow-up questions. Their responses were recorded, and the recordings were transcribed and coded based on the coding protocol designed by our research team. We use G-Theory and view a PA as a sample of student performance drawn from a complex universe defined by a combination of all possible raters and outcome measures and a fixed set of countries. The rater and outcome measure variance and covariance components, and their combinations examine the comparability of scores with countries forming a multivariate outcome vector. We also calculate estimated error variances for relative and absolute decisions, as well as generalizability (e.g., reliability) and dependability coefficients, to examine the research questions we proposed.
4. Data Sources, Evidence, Objects/Materials
We had performance scores for 34 undergraduate students from the teacher education program in Switzerland (10), Colombia (10), and the USA (14) on overall CT and facets of analyzing/evaluating information, perspective-taking, and recognizing consequences.
5. Results and/or Substantiated Conclusions/Warrants for Arguments/Points of View
Preliminary results revealed that there are differences in scorer leniency across the countries. Scoring reliability within countries was high. Reliability is but one piece of evidence to see if students can be assessed on their CT skills using PA in these countries.
6. Scientific/Scholarly Significance of the Study/Work
We aim to provide a part of the evidence to see if the ‘Migrant’ PA task can be applied to Switzerland, Colombia, and the US to make meaningful cross-country comparisons.