Search
Program Calendar
Browse By Day
Browse By Time
Browse By Person
Browse By Room
Browse By Division
Browse By Session Type
Search Tips
Personal Schedule
Sign In
This paper introduces a method for detecting shared text between different news articles, at large scale. It evaluates and validates the effectiveness of the method with two different parameter choices, featuring different sensitivity and specificity.
We present the results of applying the methods to a new corpus of 313,450 news articles showing the extent of textual overlap using both the more and less sensitive approaches evaluated. Despite our expectations from qualitative studies of journalists’ working practices, we find little evidence of the large-scale recycling of press releases, or copy from competing news sources. We detect heavy use of wire copy, but find the large majority of it to be directly attributed.
We explore the causes of textual reuse in the corpus, differentiating several distinct sources of overlap.
We conclude the method is robust and valid, and that both of the two variants tested are suitable for handling different research questions.