Individual Submission Summary
Share...

Direct link:

Using Machine Learning Predictions in Linear Models

Thu, August 29, 4:00 to 5:30pm, Marriott, Washington 3

Abstract

Data sets (especially administrative records) are often missing an important dependent or independent variable; e.g., some states do not collect race for their voter files. To get around such problems, many political scientists are now using machine learning techniques to impute these missing variables with auxiliary data (e.g., using surname to predict race, Imai & Khanna (2016)). However, without adjustment, using these predictions as either independent or dependent variables to fit linear models leads to substantially large biases in the resulting estimates (e.g., attenuation bias). In this project, I first categorize the many different measurement error biases that can arise in this situation. Then, expanding on earlier econometric work by Chen, Hong, & Tamer (2005), I derive simple adjustments for linear models that are both nonparametric and robust to all forms of measurement error. I demonstrate the simplicity and accuracy of this estimator by estimating turnout levels by race in states where race is not recorded.

Author