Individual Submission Summary
Share...

Direct link:

Optimizing Machine Learning in Tabular Social Science Datasets through Data Augmentation

Friday, November 14, 1:45 to 3:15pm, Property: Hyatt Regency Seattle, Floor: 7th Floor, Room: 707 - Snoqualmie

Abstract

This paper proposes an innovative data science technique called “time-shift data augmentation,” which repurposes information from multiple time periods to artificially increase a dataset’s sample size, thereby improving the performance of machine learning models. This methodology produced the top-performing model in an international data science competition, and has the potential to improve the accuracy of machine learning models in a wide range of social services applications. Specifically, we demonstrate the effectiveness of time-shift data augmentation by applying it in a competition called the Predicting Fertility Data Challenge, where dozens of participants around the world with expertise in data science, social science, and machine learning competed to predict births and adoptions using Dutch survey data. This strategy can be widely applied in many types of social science datasets, such as panel surveys and administrative data, with the potential to improve the accuracy of predictive models in a variety of social services settings.

Author