Individual Submission Summary

Direct link:

Big Data, Little Data: Scaling Social Science Research on Facebook

Sat, June 11, 11:00 to 12:15, Fukuoka Hilton, Navis C


Researchers in the social sciences increasingly have access to large datasets from various media which describe human behavior at an unprecedented scale and granularity. An impressive array of tools and approaches have simultaneously been developed to manage these datasets, in order to depict what is happening, how it happened, and what might happen next. However, rather than big data being the ‘end of theory’, as declared by Wired Magazine in 2008, in the social sciences it instead represents an unprecedented opportunity for researchers develop and test theory (González-Bailó, 2013). Sole reliance on these massive streams of data can, however, create a position where behavior is described better than it is explained.

Take, for example, Facebook’s ‘Like’ button. Tracking usage of this ubiquitous feedback mechanism in relation to an entity, such as a popular musician, might show which age group likes the artist proportionally more, or who in a friend group liked the musician first. At the same time, it can’t tell you much about impression management – who is (or isn’t) liking the musician because of the impression it might give their friends. There are a lot of cases like this, where explanations of human relationships are too complex to be reduced to data on its face. Some approaches represent researchers scaling up – taking little data observations to the highest scale possible. But, in a world of perfect data, where everything is known, it’s also important to be able to scale down.

One answer to this is Facebook’s News Feed raters program. In this program, people rate stories in their News Feed, on both psychometric scales and through qualitative description of what they like and don’t like. In particular, qualitative examination of descriptions and content provide the theory that aggregated masses of clicks can’t. These descriptions and ratings are used for linguistic analysis and paired with behavioral data to test and build on the theories they generate, bringing the process back to scale. We’ll talk about how the research team developed a more qualitative approach to get signal that big data can’t - and the ways that this additional information is streamed back into data at scale.


González-Bailó, S. (2013). “Social Science in the Era of Big Data.” Policy & Internet, 5(2), 147-160.