
Search

Browse By Day

Browse By Time

Browse By Person

Browse By Mini-Conference

Browse By Division

Browse By Session or Event Type

Browse Sessions by Fields of Interest

Browse Papers by Fields of Interest
Search Tips
Conference
Location
About APSA

Personal Schedule

Change Preferences / Time Zone

Sign In


X (Twitter)
How can one trace the spread of information, ideas, and narratives across the world using text data? Social scientists have long sought to answer this question, which requires identifying pairs of documents that contain statements with the same underlying meaning about the same subject. Past approaches that rely on n-gram matching or topic modeling to date have yielded only a loose approximation to this ideal. We propose a method to track the global diffusion of information: first applying a highly scalable method called locality sensitive hashing (LSH) to cross-language embedded representations of text based on a large-language model (LLM) to generate a relatively small number of candidate pairs, then fine-tuning an instruct-trained LLM to identify the actual pairs of sentences that contain the same idea. It is extremely difficult to create a gold-standard labeled data set to evaluate performance for this pairwise problem--we do so by creating data set of thousands of benchmark sentence pairs that contain iterations of equivalent and different statements about the same and different topics. Our method has far higher recall than verbatim text reuse methods and is more precise than topic modeling.
This approach can be applied to the study of propaganda, misinformation, diffusion of innovations. In this paper, we apply the approach to show how U.S. media sources reuse information from Russian state media in the context of the 2022 Russian invasion of Ukraine, for example accusations that Ukraine is developing bioweapons.
Hannah Waight, New York University
Megan Brown, University of Michigan
Jason Greenfield, New York University
Kevin Aslett, University of Central Florida
Margaret E Roberts, University of California, San Diego
Anton Shirikov, University of Kansas
Jonathan Nagler, New York University
Joshua A. Tucker, New York University
Solomon Messing, New York University