Individual Submission Summary

Direct link:

Data-driven Data Provision: A Case Study from the Fragile Families Challenge

Sun, August 12, 8:30 to 10:10am, Philadelphia Marriott Downtown, Floor: Level 4, 404


Metadata provides critical support for researchers working with public datasets, but new methods at times outgrow what existing data infrastructure is able to support. This paper describes what happened when a large, heterogeneous group of researchers used a complex social data set in a way that was not originally envisioned by its creators. Using the Fragile Families Challenge as a case study, we identify five strategic areas where improving metadata — variable names, response codes, cross-questionnaire matching, concept tags, and release format — can make data use easier for everyone. More generally, we illustrate some of the unintentional and invisible barriers that are preventing the use of machine learning methods in the social sciences, and suggest that data system design is a fundamental research problem for the field of computational social science.