Individual Submission Summary
Share...

Direct link:

Churnalism, Press Releases and Wire Copy: Detecting Textual Reuse in Large News Corpora

Sat, May 26, 11:00 to 12:15, Hilton Old Town, Floor: M, Dvorak I

Abstract

This paper introduces a method for detecting shared text between different news articles, at large scale. It evaluates and validates the effectiveness of the method with two different parameter choices, featuring different sensitivity and specificity.

We present the results of applying the methods to a new corpus of 313,450 news articles showing the extent of textual overlap using both the more and less sensitive approaches evaluated. Despite our expectations from qualitative studies of journalists’ working practices, we find little evidence of the large-scale recycling of press releases, or copy from competing news sources. We detect heavy use of wire copy, but find the large majority of it to be directly attributed.

We explore the causes of textual reuse in the corpus, differentiating several distinct sources of overlap.

We conclude the method is robust and valid, and that both of the two variants tested are suitable for handling different research questions.

Author