Individual Submission Summary
Share...

Direct link:

Going with the flow when topic modeling: Exploratory data visualization, choosing k, and text chunking

Sun, August 9, 2:00 to 3:30pm, TBA

Abstract

Topic modeling is an important method for estimating semantic patterns in text corpora using word co-occurance distributions. In all topic modeling projects researchers face durable challenges when optimizing model fit and selecting options that best align with research questions. In addition to parameters associated with initialization, convergence thresholds, priors, and text preprocessing, significant literature has focused on the selection of the number of topics to include in the model (K). While metrics such as semantic coherence and exclusivity have defined a visual repertoire of model selection, they obfuscate word level insights into patterns of semantic flow. This paper proposes a new and flexible method for visualizing the dynamics of word level semantic patterns–Sankey diagrams using word overlap measures–and offers two preliminary demonstrations of these methods across two related choice points in topic modeling projects using a novel corpus: text chunking and the selection of K. By recentering words in the process of model evaluation and selection we demonstrate how our visualization process drives semantically relevant insights into the ways text chunking and the selection of K combine to distribute meaning across a corpus.

Authors