Paper Summary
Share...

Direct link:

Quantitative Measures Used by Researchers in Undergraduate Mathematics Education

Sun, April 27, 11:40am to 1:10pm MDT (11:40am to 1:10pm MDT), The Colorado Convention Center, Floor: Meeting Room Level, Room 712

Abstract

In the area of Research in Undergraduate Mathematics Education (RUME), we reviewed 154 measures and their validity evidence. This corpus of measures resulted from a search of literature from 2000-2019 from journals likely to publish articles in RUME. Recently, Artigue (2021) noted that qualitative methods remain the norm and criticized the "excessive predominance of small-scale qualitative studies, involving a very limited number of students or teachers" (p. 14). As RUME researchers, we have observed this trend as well.

As a field, RUME is at a place where many measures exist, and we are asking quantitative questions related to the generalizability of small-scale results (e.g., Author(s), 2019), determining the impact of instructional interventions (e.g., Johnson et al., 2020), and exploring relationships between different variables (e.g., Peters, 2013). Being able to answer such questions with confidence depends on the quality of the measures involved, which in turn depends on the strength of validity arguments for intended purposes. We explored: What quantitative measures have been in use in RUME from 2000-2019? What types of validity evidence are associated with these measures?


We organized our instruments into three overarching construct types: affective domain, cognitive domain, and learning environment. Many of the instruments measure the affective (45.5%) and cognitive domain (43.5%) and only about a tenth (9.7%) are designed to measure an undergraduate student learning environment.


Through our search, we noted the sources of validity presented by the RUME researchers (Table 4). On average, two forms of validity evidence (including reliability) were reported for each measure (M = 1.99, SD = 1.035). Only one measure, the Calculus Concept Readiness, provided all sources of validity and reliability evidence. Given that most measures reported only two sources of validity on average, we calculated the most prevalent sources of validity. The most frequently reported form of validation was test content (n = 94), followed by reliability (n = 86). Very few undergraduate mathematics measures reported validity evidence for consequences of testing (n = 4) or response process (n = 18). This suggests that RUME researchers may need to become more aware of the different types of validity and highlights the importance of reporting more than two strands of validity evidence.


While validity evidence for measures was generally scarce, there were notable exceptions within the set. In the presentation, we illustrate some robust arguments, in particular, for the Conceptions of Mathematics Questionnaire (CMQ; Crawford et al., 1998), The Calculus Concept Readiness (Carlson et al., 2015), and the Graduate Student Instructor Observation Protocol (Rogers et al., 2020).

Authors