Individual Submission Summary

Direct link:

Augmenting Political Data through Generative Adversarial Networks (GANs)

Thu, August 29, 4:00 to 5:30pm, Marriott, Washington 3


This paper examines the potential for avoiding or mitigating the issue of an insufficient sample size—a major constraint in empirical political science research—through the application of Generative Adversarial Networks (GANs), which have gained increasing usage in recent years in the field of deep learning.
Empirical political science research had hitherto been constrained by the difficulty of collecting a large enough number of samples for highly accurate projections regarding the occurrence of rare events. Traditional approach using a simple linear model with few variables, for instance, inevitably results in lower prediction accuracy. Machine learning methods have begun being employed in the field of political science in recent years. Research has been conducted into nonlinear approaches using relatively robust methods, such as Support Vector Machine (SVM) and random forests, where sample size has not been sufficiently large. However, an over-fitting problem arises when the sample size is small and has complicated correlations. In such cases, attempts to maintain generalization performance have been made by strengthening regularization where possible.
In research using supervised deep learning methods, it is usually necessary to have a sufficiently large number of samples to construct an accurate prediction model. For this reason, data augmentation techniques have been actively used in area of the medical image processing and elsewhere when the sample is small at the time a deep learning model is constructed. Most data augmentation methods cannot be applied to political data, but GANs hold the strong possibility of being applicable. This paper details an attempt to generate additional samples of political data by training GANs to learn the distribution of sample data.
Specifically, we verified the effectiveness of GANs by using data contained in a paper written by Beck, King, and Zeng in 2000, a pioneering political science study which utilized neural networks. The number of cases in the study was 23,529; we randomly reduced the sample into four sets of 2,353, 7,059, 11,765, and 16,470 cases each and generated a real-valued time series using GAN to increase the number of cases again to the original 23,529. We then compared the out-of-sample forecasting performance of the new samples using GAN to the findings of the original study. The results were mixed, but it appears that GAN can generate samples that nearly approximate the original when the reduced data satisfies certain conditions (such as when the change score, defined by the trajectory matrix of each item, is nearly average). Further, we compared the performance of GAN with Cantú and Saiegh’s data generation method for diagnosing electoral fraud using vote counts. Cantú and Saiegh constructed a naive Bayes classifier by generating additional samples through the Monte Carlo method using Benford’s Law. At this stage, although results are still mixed, we believe further customization of GAN to the specific situation may increase predictability of our model.