Latent dirichlet allocation
Citations
848 citations
Cites methods from "Latent dirichlet allocation"
...Since LDA’s topics have no implicit orderings we first must match them based upon the similarity of the words in the distribution....
[...]
...In the case of LDA we find a significant increase in the accuracy of LDA with the randomly sampled data over the data from the Streaming API....
[...]
...We compare the topics drawn from the Streaming data with those drawn from the Firehose data using a widely-used topic modeling algorithm, latent Dirichlet allocation (LDA) (Blei, Ng, and Jordan 2003)....
[...]
...To get a sense of how the topics found in the Streaming data compare with those found with random samples, we compare with topics found by running LDA on random subsamples of the Firehose data....
[...]
...We also employed LDA to extract topics from the text....
[...]
836 citations
Cites methods from "Latent dirichlet allocation"
...Blei and Lafferty (2009) developed “Turbo Topics”, a method of identifying n-grams within LDAinferred topics that, when listed in decreasing order of probability, provide users with extra information about the usage of terms within topics....
[...]
...Blei and Lafferty (2009) developed “Turbo Topics”, a method of identifying n-grams within LDAinferred topics that, when listed in decreasing order of probability, provide users with extra information about the usage of terms within topics. This two-stage process yields good results on experimental data, although the resulting output is still simply a ranked list containing a mixture of terms and n-grams, and the usefulness of the method for topic interpretation was not tested in a user study. Newman et al. (2010) describe a method for ranking terms within topics to aid interpretability called Pointwise Mutual Information (PMI) ranking....
[...]
...Blei and Lafferty (2009) developed “Turbo Topics”, a method of identifying n-grams within LDAinferred topics that, when listed in decreasing order of probability, provide users with extra information about the usage of terms within topics. This two-stage process yields good results on experimental data, although the resulting output is still simply a ranked list containing a mixture of terms and n-grams, and the usefulness of the method for topic interpretation was not tested in a user study. Newman et al. (2010) describe a method for ranking terms within topics to aid interpretability called Pointwise Mutual Information (PMI) ranking. Under PMI ranking of terms, each of the ten most probable terms within a topic are ranked in decreasing order of approximately how often they occur in close proximity to the nine other most probable terms from that topic in some large, external “reference” corpus, such as Wikipedia or Google n-grams. Although this method correlated highly with human judgments of term importance within topics, it does not easily generalize to topic models fit to corpora that don’t have a readily available external source of word co-occurrences. In contrast, Taddy (2011) uses an intrinsic measure to rank terms within topics: a quantity called lift, defined as the ratio of a term’s probability within a topic to its marginal probability across the corpus....
[...]
832 citations
Cites background or methods from "Latent dirichlet allocation"
...…collection, in the form of methods such as latent semantic analysis (Deerwester et al., 1990), probabilistic latent semantic analysis (Hofmann, 2001), random projection (Widdows and Ferraro, 2008), and more recently, latent Dirichlet allocation (Blei et al., 2003; Griffiths and Steyvers, 2004)....
[...]
..., 1990), probabilistic latent semantic analysis (Hofmann, 2001), random projection (Widdows and Ferraro, 2008), and more recently, latent Dirichlet allocation (Blei et al., 2003; Griffiths and Steyvers, 2004)....
[...]
...LDA is a Bayesian graphical model for text document collections represented by bags-of-words (see Blei et al. (2003), Griffiths and Steyvers (2004), Buntine and Jakulin (2004))....
[...]
828 citations
Cites background or methods from "Latent dirichlet allocation"
...For LDA, CTR and HFT, the number of topics K is selected from {5, 10, 20, 50, 100} using the validation set....
[...]
...To compare our model with topic modeling based recommender systems, we select three representative models: Latent Dirichlet Allocation (LDA) [5], Collaborative Topic Regression (CTR) [33] and Hidden Factor as Topic (HFT) [17], and (iii) deep recommender systems....
[...]
...Among those topic modeling based models (LDA, CTR and HFT), both HFT-10 and HFT-50 perform better in all three datasets....
[...]
...• LDA:LatentDirichletAllocation is a well-known topic modeling algorithm presented in [5]....
[...]
...We set K = 10 for LDA and CTR....
[...]
823 citations
Cites background from "Latent dirichlet allocation"
...Latent Dirichlet Allocation [4]) is a statistical generative model that relies on a hierarchical Bayesian network that relates words and mesha l-0 08 48 05 0, v er si on 1 25 J ul 2 01 3...
[...]
...Basically, LDA (i.e. Latent Dirichlet Allocation [4]) is a statistical generative model that relies on a hierarchical Bayesian network that relates words and mes-sages through latent topics....
[...]
References
17,608 citations
16,079 citations
"Latent dirichlet allocation" refers background in this paper
...Finally, Griffiths and Steyvers (2002) have presented a Markov chain Monte Carlo algorithm for LDA....
[...]
...Structures similar to that shown in Figure 1 are often studied in Bayesian statistical modeling, where they are referred to ashierarchical models(Gelman et al., 1995), or more precisely asconditionally independent hierarchical models(Kass and Steffey, 1989)....
[...]
...Structures similar to that shown in Figure 1 are often studied in Bayesian statistical modeling, where they are referred to as hierarchical models (Gelman et al., 1995), or more precisely as conditionally independent hierarchical models (Kass and Steffey, 1989)....
[...]
12,443 citations
"Latent dirichlet allocation" refers methods in this paper
...To address these shortcomings, IR researchers have proposed several other dimensionality reduction techniques, most notably latent semantic indexing (LSI) (Deerwester et al., 1990)....
[...]
...To address these shortcomings, IR researchers have proposed several other dimensionality reduction techniques, most notablylatent semantic indexing (LSI)(Deerwester et al., 1990)....
[...]
12,059 citations
"Latent dirichlet allocation" refers background or methods in this paper
...In the populartf-idf scheme (Salton and McGill, 1983), a basic vocabulary of “words” or “terms” is chosen, and, for each document in the corpus, a count is formed of the number of occurrences of each word....
[...]
...We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model....
[...]
7,086 citations