scispace - formally typeset
Search or ask a question
Topic

Latent Dirichlet allocation

About: Latent Dirichlet allocation is a research topic. Over the lifetime, 5351 publications have been published within this topic receiving 212555 citations. The topic is also known as: LDA.


Papers
More filters
Proceedings ArticleDOI
14 Jun 2009
TL;DR: A topic model is introduced that uses Dirichlet compound multinomial (DCM) distributions to model this burstiness phenomenon and achieves better held-out likelihood than standard latentDirichlet allocation (LDA).
Abstract: Many different topic models have been used successfully for a variety of applications However, even state-of-the-art topic models suffer from the important flaw that they do not capture the tendency of words to appear in bursts; it is a fundamental property of language that if a word is used once in a document, it is more likely to be used again We introduce a topic model that uses Dirichlet compound multinomial (DCM) distributions to model this burstiness phenomenon On both text and non-text datasets, the new model achieves better held-out likelihood than standard latent Dirichlet allocation (LDA) It is straightforward to incorporate the DCM extension into topic models that are more complex than LDA

104 citations

Proceedings ArticleDOI
09 Jul 2007
TL;DR: This work studies the representation of images by Latent Dirichlet Allocation (LDA) models for content-based image retrieval, and shows the suitability of the approach for large-scale databases.
Abstract: Online image repositories such as Flickr contain hundreds of millions of images and are growing quickly. Along with that the needs for supporting indexing, searching and browsing is becoming more and more pressing. In this work we will employ the image content as a source of information to retrieve images. We study the representation of images by Latent Dirichlet Allocation (LDA) models for content-based image retrieval. Image representations are learned in an unsupervised fashion, and each image is modeled as the mixture of topics/object parts depicted in the image. This allows us to put images into subspaces for higher-level reasoning which in turn can be used to find similar images. Different similarity measures based on the described image representation are studied. The presented approach is evaluated on a real world image database consisting of more than 246,000 images and compared to image models based on probabilistic Latent Semantic Analysis (pLSA). Results show the suitability of the approach for large-scale databases. Finally we incorporate active learning with user relevance feedback in our framework, which further boosts the retrieval performance.

104 citations

Proceedings ArticleDOI
06 Aug 2009
TL;DR: A new model, ccLDA, is proposed, which extends over the Latent Dirichlet Allocation (LDA) and cross-collection mixture (ccMix) models on blogs and forums and provides a qualitative and quantitative analysis of the model on the cross-cultural data.
Abstract: This paper presents preliminary results on the detection of cultural differences from people's experiences in various countries from two perspectives: tourists and locals. Our approach is to develop probabilistic models that would provide a good framework for such studies. Thus, we propose here a new model, ccLDA, which extends over the Latent Dirichlet Allocation (LDA) (Blei et al., 2003) and cross-collection mixture (ccMix) (Zhai et al., 2004) models on blogs and forums. We also provide a qualitative and quantitative analysis of the model on the cross-cultural data.

103 citations

Book ChapterDOI
01 Jan 2015
TL;DR: This chapter gives an introduction to music recommender systems research, highlighting the distinctive characteristics of music, as compared to other kinds of media, and pointing to the most important challenges faced by music recommendation research.
Abstract: This chapter gives an introduction to music recommender systems research. We highlight the distinctive characteristics of music, as compared to other kinds of media. We then provide a literature survey of content-based music recommendation, contextual music recommendation, hybrid methods, and sequential music recommendation, followed by overview of evaluation strategies and commonly used data sets. We conclude by pointing to the most important challenges faced by music recommendation research.

103 citations

Proceedings ArticleDOI
01 Jan 2015
TL;DR: Experiments using a state-of-theart LVCSR system showed adaptation could yield perplexity reductions of 8% relatively over the baseline RNNLM and small but consistent word error rate reductions.
Abstract: Copyright © 2015 ISCA. Recurrent neural network language models (RNNLMs) have recently become increasingly popular for many applications including speech recognition. In previous research RNNLMs have normally been trained on well-matched in-domain data. The adaptation of RNNLMs remains an open research area to be explored. In this paper, genre and topic based RNNLMadaptation techniques are investigated for a multi-genre broadcast transcription task. A number of techniques including Probabilistic Latent Semantic Analysis, Latent Dirichlet Allocation and Hierarchical Dirichlet Processes are used to extract show level topic information. These were then used as additional input to the RNNLM during training, which can facilitate unsupervised test time adaptation. Experiments using a state-of-theart LVCSR system trained on 1000 hours of speech and more than 1 billion words of text showed adaptation could yield perplexity reductions of 8% relatively over the baseline RNNLM and small but consistent word error rate reductions.

102 citations


Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
86% related
Support vector machine
73.6K papers, 1.7M citations
86% related
Deep learning
79.8K papers, 2.1M citations
85% related
Feature extraction
111.8K papers, 2.1M citations
84% related
Convolutional neural network
74.7K papers, 2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023323
2022842
2021418
2020429
2019473
2018446