Individual Submission Summary
Share...

Direct link:

Vectorizing Social Theory: A Scalable LLM Pipeline for Operationalizing Interpretive Dimensions in Scene Theory

Sun, August 9, 2:00 to 3:30pm, TBA

Abstract

Translating interpretive concepts into scalable, comparable measures remains a core challenge in sociological methodology. While Scene Theory addresses this by scoring urban amenities across a 15-dimensional "periodic table" of place-based meanings, its original approach relies on labor-intensive expert coding that is difficult to reproduce across research teams and contexts. To overcome this, we develop and validate a fully reproducible, embedding-based pipeline that "vectorizes" social theory. This approach transforms each bipolar scene dimension into a directional axis within a shared semantic space, scoring amenities by projecting them onto these theoretical axes.

Leveraging a hierarchical taxonomy of 1,451 Yelp amenity categories in Canada, our method treats sociological theory as a semantic anchor system. The pipeline embeds prompt sets representing each theoretical pole, estimates robust pole centroids, and defines each dimension as the unit vector between them. To filter out the generic institutional noise inherent in transformer embeddings, we apply an "All-But-The-Top" PCA residualization. Furthermore, to mitigate lexical artifacts from high-frequency container terms, we introduce a negative-control design. This leverages human-coded "neutral" midpoints to identify contaminant tokens, systematically suppressing their influence through orthogonal projection.

Validation against a human-coded benchmark reveals that coders default to a neutral midpoint in 83% of cases, heavily depressing naïve correlations. However, when evaluated strictly on cases where humans make definitive directional judgments, alignment improves significantly (mean Spearman ρ ≈ 0.38; mean AUC ≈ 0.76). Performance is strongest for dimensions capturing theatricality (e.g., Glamorous, Exhibitionistic) and weakest for relational authenticity (notably Ethnic and State). These performance variations clarify where theoretical constructs drift or become underidentified when measured via terse category labels. Ultimately, this paper contributes a portable, auditable computational instrument and provides a generalized framework for testing construct separability in embedding spaces.

Author