Paper Summary
Share...

Direct link:

Automating Bias in Writing Evaluation: Sources, Barriers, and Recommendations

Sun, April 14, 9:35 to 11:05am, Pennsylvania Convention Center, Floor: Level 100, Room 119B

Abstract

Purpose
Writing assessment is a ubiquitous component of education, job seeking and promotion, everyday communication, and more. To serve these crucial activities, automated writing evaluation (AWE) technologies have been developed to facilitate and streamline assessment and feedback by providing faster, more consistent, and ostensibly less biased judgments than human evaluators.

The goal of this work is to provide a narrative review of various human, computational, and institutional sources of bias in AWE, and then discuss ways to alleviate those potential biases. This work is intended to propel future research toward more equitable development and use of AWE.

Theoretical framework
Specifically, we consider scholarship on the phenomenon of standard language ideologies (Blommaert, 2005; Lippi-Green, 2012) that can give rise to bias in writing evaluation (Johnson & VanBrackle, 2012). Our work is also informed by scholarship pertaining to how biases manifest or are reinforced in automated systems, including (a) algorithmic bias (Buolamwini & Gebru, 2018; Eubanks, 2018; Noble, 2018; O'Neil, 2016), (b) biased training data (Hundt et al., 2020; Krishnamurthy, 2019), (c) problematic assumptions of linearity (Crossley et al., 2014), including aggregation of heterogeneous groups in training datasets (Bauer & Scheim, 2019; Foulds et al., 2020) and (d) the "black box" nature of algorithms, which includes properties like lack of transparency (Blattner et al., 2021).

Methods
This work is a narrative review which synthesizes literature from various fields that overlap with topics like educational technology, algorithmic bias, standard language ideologies, and ethics.

Data
This work did not use data.

Results
Biases and barriers we review include:
a) biased training data
b) constrained analytical assumptions (aggregation, assumptions of linearity)
c) black box approaches, including lack of validation and lack of transparency.

These biases and barriers can be hard to detect and lead to different ways in which students may receive skewed feedback or evaluation, due to human and machine biases.

We discuss several bias mitigation recommendations:
1) Intentionality in compiling inclusive and representative training datasets.
2) Rigorous training of human raters regarding the language ideologies and anti-bias assessment practices.
3) Exploring training datasets for signs of bias (e.g., discrepancies, disparities)
4) Implementation of principled aggregation only after establishing equivalence between groups.
5) Inclusion of nonlinear relationships in analyses, allowing for complex associations between variables.
6) Extending validation procedures to assess accuracy within and across subgroups and populations.
7) Transparent and explainable reporting of algorithms, models, and underlying inferences.

Significance
This work is intended to highlight several ways in which bias may arise in interactions with educational technology and propel future research and applied solutions leading to more equitable development and implementation of AWE.

Authors