
Search

Browse By Day

Browse By Time

Browse By Person

Browse By Mini-Conference

Browse By Division

Browse By Session or Event Type

Browse Sessions by Fields of Interest

Browse Papers by Fields of Interest
Search Tips
Conference
Location
About APSA

Personal Schedule

Change Preferences / Time Zone

Sign In


X (Twitter)
Digital media companies have increasingly restricted access to data about public activity on their platforms, which limits scholarship, impairs platform accountability, and empowers abusive users. However, nearly all platforms concentrate users in a few high-volume places (i.e. pages, channels, or subreddits). We exploit this concentration to develop new, scalable methods to reconstruct most public user activity on digital platforms, with or without access to platform-provided APIs. Our approach works especially well, we show, because the most popular places/channels are the most stable over time, and because lower-engagement users participate overwhelmingly in popular channels. Platforms also show fractal self-similarity, with subcategories of content mirroring the concentration, stable popularity, and ladder of engagement seen across the platform as a whole.
We deploy these methods in an R package, which can estimate total coverage for a scraping list or API collection list of a given size, and calculate how frequently it needs to be updated to minimize undercollection. Our approach makes it feasible to recover large segments of digital platform activity, both for "big picture" overviews of the highest-visibility content, and within smaller topics and