APPAM Fall Research Conference: Optimizing Machine Learning in Tabular Social Science Datasets through Data Augmentation

Navigation and Settings Menu
Personal Schedule
Sign In

Information Menu
Search Tips

Back Home

Refresh: Off View Personal Schedule

Individual Submission Summary

Share...

Direct link:

Optimizing Machine Learning in Tabular Social Science Datasets through Data Augmentation

In Event: Governing Innovation: AI, Infrastructure, and Global Public Value

Friday, November 14, 1:45 to 3:15pm, Property: Hyatt Regency Seattle, Floor: 7th Floor, Room: 707 - Snoqualmie

Abstract

This paper proposes an innovative data science technique called “time-shift data augmentation,” which repurposes information from multiple time periods to artificially increase a dataset’s sample size, thereby improving the performance of machine learning models. This methodology produced the top-performing model in an international data science competition, and has the potential to improve the accuracy of machine learning models in a wide range of social services applications. Specifically, we demonstrate the effectiveness of time-shift data augmentation by applying it in a competition called the Predicting Fertility Data Challenge, where dozens of participants around the world with expertise in data science, social science, and machine learning competed to predict births and adoptions using Dutch survey data. This strategy can be widely applied in many types of social science datasets, such as panel surveys and administrative data, with the potential to improve the accuracy of predictive models in a variety of social services settings.

Author

Emily Cantrell, Princeton University
Presenting Author