Privacy, Ethics, and Computational Social Science: A Case Study of the Fragile Families Challenge

New sources of "big data" created by companies and governments hold great promise for advancing social science. Unfortunately, a fundamental barrier preventing researchers from achieving this promise is data access. Quite simply, most big data sources are not accessible to researchers. Therefore, developing procedures that enable safe and ethical data access represent an important methodological problem in computational social science. In this paper, we present our process for enabling data access during the Fragile Families Challenge, a scientific mass collaboration designed to improve the lives of disadvantaged children in the United States. We describe our process of threat modeling, threat mitigation, and third-party oversight. We also describe the ethical principles that formed the basis of our process. Ultimately, we hope that the approach that we developed will be helpful to researchers who seek data access and data custodians who wish to provide data access.