Individual Submission Summary

Direct link:

Understanding News Stories by Clustering Articles: An Information Retrieval Approach

Sat, May 26, 11:00 to 12:15, Hilton Old Town, Floor: M, Dvorak I


We present a new method for identifying linked news stories from within a large number of articles, using Information Retrieval (IR) techniques to identify the textual closeness between pairs of articles and a network approach using Infomap to subsequently optimize the partition of the group into distinct stories. We distinguish IR approaches from other popular approaches to quantitative analysis of text, including dictionary, supervised and automated clustering methods. We argue for the value of Information Retrieval approaches as a means of quantitatively analysing textual data, particularly when trying to identify small amounts of relevant text within a very large corpus. This paper serves as a demonstration of how a real-world research question can benefit from the application of information retrieval techniques, as well as a substantive contribution to computational research projects with a unit of analysis at story rather than article level.
