Individual Submission Summary
Share...

Direct link:

Don't Get Duped: Analyzing Near Duplicates in Survey Data

Wed, Nov 13, 3:30 to 4:50pm, Salon 5 - Lower B2 Level

Abstract

Research fraud has increasingly garnered attention within the broader scientific community and criminology specifically. Here, we adopt recently advocated approaches to identifying potentially fraudulent survey data through pairwise comparisons of substantive variables between each observation in the data to detect “near duplicate” responses. We demonstrate these methods across four criminological surveys collected in different international locations. Preliminary results suggest that typical features of self-report crime data (e.g., skewed distributions) are likely to produce substantial near duplicate cases due to non-fraudulent data generating processes. Though these "forensic" methods may not provide conclusive evidence for or against data falsification, we recommend routinely adopting them into criminological data analysis workflows. They can detect obvious instances of data falsification, identify potentially problematic cases, and generally improve understanding of important features of one's data.

Authors