Paper Summary

Using Automatic Item Generation to Transform Standardized Assessments From Paper-Based to Technologically Innovative Formats

Tue, April 17, 10:35am to 12:05pm, Sheraton Wall Centre, Floor: Third Level, South Pavilion Ballroom C

Abstract

Scholarly Significance: According to the National Research Council’s (2001) Knowing What Students Know, technology has the potential to transform educational assessments, including standardized tests, so that they are better measures of what we know to be true about how children learn and how they are best able to demonstrate their understanding. At a time when too many students feel disenfranchised in the educational system, technologically enhanced educational assessments may prove useful for helping us energize students about learning. However, a widespread undertaking for replacing long-established, paper-based educational assessments with technologically enhanced assessments remains unfulfilled, likely due to practical constraints.

Objectives. The objective of the present paper is twofold: To review the conceptual basis of Automatic Item Generation (AIG), a new approach for developing technologically enhanced educational assessments, and to demonstrate its practical application by means of examples found in the research literature. AIG is specifically focused on the development of psychometrically defensible test items, which measure not only traditional knowledge and skills but also enhance student engagement and positive affect. AIG reflects cutting-edge conceptual and empirical developments in the psychometric and educational testing community for developing technologically enhanced assessments (e.g., Bejar, 2011; Gierl & Lai, 2011). A delineation of the challenges that exist for implementing AIG and a review of the research studies that are demonstrating how to overcome these challenges are also discussed.

Perspectives/Theoretical Framework. Automatic Item Generation (AIG) is the theoretical framework that is used to structure the present paper. AIG grew out of advances in criterion-referenced testing in the 1960s (Drasgow, Luecht, & Bennett, 2006), when items began to be developed from standardized forms or item templates to measure homogenous skills (Hively, Patterson, & Page, 1968). AIG also has roots in advances made in instruction and learning, especially intelligent tutoring systems, in the 1980s (Anderson, 1983; for a review see Lajoie, 2000).

Modes of Inquiry and Data Sources. The paper includes a critical review of the empirical and theoretical research literature underlying the conceptual basis of AIG. This review includes: (1) the generation of items from cognitive and emotional-affective principles (strong theory) and from what are called “parent” items (weak theory) with known psychometric properties, (2) the challenges associated with estimates of reliability and validity when items are generated from strong and weak theories, (3) the links between AIG and criterion-referenced testing, and instruction and learning by means of item templates, and (4) practical examples of how AIG has been used to generate experimental items for large-scale testing programs (e.g., Graduate Record Examinations) in specific academic domains such as mathematics and reading.

Conclusions. The paper concludes with a summary of the promise AIG holds for generating psychometrically defensible items for a variety of educational assessments. However, research gaps remain in the research on AIG, especially more research is needed with respect to generating items that promote student engagement and positive affect.

Author