Reading Tea Leaves: How Humans Interpret Topic Models
Citations
8,059 citations
3,965 citations
Cites background from "Reading Tea Leaves: How Humans Inte..."
...…on topical models has recently picked up pace, especially in the field of generative topic models such as Latent Dirichlet Allocation (Blei et al., 2003), their hierarchical extensions (Teh et al., 2006), topic quality assessment and visualisation (Chang et al., 2009; Blei and Lafferty, 2009)....
[...]
2,589 citations
Additional excerpts
...The common intrusion-detection test [Chang et al., 2009] in topic models is a form of the forward simulation/prediction task: we ask the human to find the difference between the model’s true output and some corrupted output as a way to determine whether the human has correctly understood what the…...
[...]
...The common intrusion-detection test [Chang et al., 2009] in topic models is a form of the forward simulation/prediction task: we ask the human to find the difference between the model’s true output and some corrupted output as a way to determine whether the human has correctly understood what the model’s true output is....
[...]
2,044 citations
1,600 citations
References
49,639 citations
"Reading Tea Leaves: How Humans Inte..." refers background in this paper
...Amazon Mechanical Turk has been successfully used in the past to develop gold-standard data for natural language processing [22] and to label images [23]....
[...]
30,570 citations
"Reading Tea Leaves: How Humans Inte..." refers background or methods or result in this paper
...The performance of pLSI degrades with larger numbers of topics, suggesting that overfitting [ 4 ] might affect interpretability as well as predictive power....
[...]
...Because the direct computation of the posterior is intractable, we employ variational inference [ 4 ] and set the symmetric Dirichlet prior parameter, , to 1. CTM In LDA, the components of d are nearly independent (i.e., d is statistically neutral)....
[...]
...Latent Dirichlet allocation (LDA) [ 4 ] and the correlated topic model (CTM) [5] treat each document’s topic assignment as a multinomial random variable drawn from a symmetric Dirichlet and logistic normal prior, respectively....
[...]
...Models either use measures based on held-out likelihood [ 4 , 5] or an external task that is independent of the topic space such as sentiment detection [10] or information retrieval [11]....
[...]
...In this work we study three topic models: probabilistic latent semantic indexing (pLSI) [3], latent Dirichlet allocation (LDA) [ 4 ], and the correlated topic model (CTM) [5], which are all mixed membership models [17]....
[...]
25,546 citations
12,443 citations