Individual Submission Summary

Direct link:

From the Social Sciences to Data Mining: Data Journalism’s Sense-Making of Expanding Data

Thu, September 5, 1:00 to 2:30pm, Sheraton New Orleans Hotel, Floor: Eight, Endymion


The following paper will argue that data-journalistic research and storytelling mimics the specific epistemological approach to data found in the work of data scientists and engineers of machine learning, as opposed to how database architects or statisticians think. While data journalists historically have been encouraged to follow methodologies of the social sciences, the digital environment in which they investigate and write stories today increasingly prompt this professional community to adapt to real-time data flows and incomplete sets of data clusters, and exchange retrieval and sampling for knowledge-discovery and 'associational mining.' Consequentially, the methodological toolbox of surveying, sampling, and strict research design, which I will call the school of ‘supervised learning,’ is gradually giving way to a new ‘school,’ that of ‘unsupervised learning.’ Tracing this shift from one way of approaching data to another requires the brief overview of the organizational and technological histories of database managements systems (DBMS) and the transformations of the data engineering profession. The conclusion of this organizational genealogy will state that the sense-making practices of data journalists followed organizational and technological shifts in the government and corporate sectors, such as the adoption of specialized mainframe systems in federal agencies and dynamic data structures in technology companies. As a result, statistical evaluations are giving way to neural networks-based pattern recognitions and the acceptance of appendage-based database systems into journalistic workflows. Ultimately, this paper proposes a rethinking of the reliance on knowledge discovery in investigative research and reporting.