Paper Summary

Exploiting Automated Scoring and Feedback to Support Effective Reading/Writing

Sun, April 15, 12:25 to 1:55pm, Sheraton Wall Centre, Floor: Grand Ballroom Level, North Grand Ballroom A

Abstract

Goals/Purposes. The NLP methods built into automated scoring technologies (plus newer technologies that capture behavioral data) offer the potential to contribute to formative feedback, supporting an instructional technology environment, if they are linked to specific construct traits rather than being treated purely as predictors of overall essay score. This paper will provide an analysis of a wide range of existing features that contribute to prediction of human assessments of writing quality and will examine their potential use to provide feedback to students and teachers within an integrated system that includes formative assessment linked with digital reading and writing processes.

Perspectives/Theoretical Framework. A variety of natural language processing (NLP) techniques have been applied to the problem of automatically scoring student essays, including ETS’ e-rater scoring engine (Attali and Burstein, 2006). However, such methods are focused more on matching human scores and achieving overall reliability, than on analyzing how the features cover the construct. In ETS’ CBAL research initiative, focused on developing a balanced assessment system that supports instruction effectively, significant emphasis has been placed upon employing Evidence Centered Design (ECD) techniques (cf. Mislevy, Almond and Lucas, 2003). We report on a line of research (cf. Deane, Quinlan, and Kostin, 2011) that seeks to develop a detailed mapping of NLP features as part of an evidence model within the CBAL framework.

Methods/Techniques. Natural language processing methods, including word frequency and n-gram based methods, parsing and discourse analysis. Behavioral data collection methods, including keystroke logging. Statistical and psychometric analysis, including linear regression.

Data Sources/Evidence. NLP features derived from analysis of student essays and of source reading materials.. Timing features deriving from analysis of interaction with computer delivery software, including keystroke logs. Human scores based upon rubrics using holistic scoring techniques. Responses analysed drawn from a large dataset of more than 1000 essays for each of four writing prompts, together with reading materials provided to students to read online as part of the composition process. We will combine this data with a second set where each student has also taken a reading test, to validate how automated scoring features for writing correlate with reading performance.

Results. Various traits can be distinguished by aligning automated features with a competency model. For instance on one 8th grade prompt, keystroke timing and NLP features predict human essay scores in a linear regression (R2=.726). The contributing features include measures of productivity (amount produced, time on task; bursts of text production); lack of hesitancy; editing behaviors; proportion of original vs. borrowed language; clearly identifiable outline structure; word length and frequency; sentence variety; accuracy in mechanics and spelling; lack of grammatical errors. Whatever one’s opinion of automated scoring in high stakes assessments, such features align well with specific construct-relevant traits. We will discuss ways to deploy such features to provide dynamic formative feedback aligned with the writing process and will consider how an online system could be elaborated to provide direct support for teachers seeking to develop joint reading/writing strategies in their students.

Author