Topic

Latent Dirichlet allocation

About: Latent Dirichlet allocation is a research topic. Over the lifetime, 5351 publications have been published within this topic receiving 212555 citations. The topic is also known as: LDA.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•

Replicated Softmax: an Undirected Topic Model

[...]

Geoffrey E. Hinton¹, Ruslan Salakhutdinov²•Institutions (2)

University of Toronto¹, Massachusetts Institute of Technology²

07 Dec 2009

TL;DR: This work introduces a two-layer undirected graphical model, called a "Replicated Softmax", that can be used to model and automatically extract low-dimensional latent semantic representations from a large unstructured collection of documents.

...read moreread less

Abstract: We introduce a two-layer undirected graphical model, called a "Replicated Softmax", that can be used to model and automatically extract low-dimensional latent semantic representations from a large unstructured collection of documents. We present efficient learning and inference algorithms for this model, and show how a Monte-Carlo based method, Annealed Importance Sampling, can be used to produce an accurate estimate of the log-probability the model assigns to test data. This allows us to demonstrate that the proposed model is able to generalize much better compared to Latent Dirichlet Allocation in terms of both the log-probability of held-out documents and the retrieval accuracy.

...read moreread less

541 citations

Discovering object categories in image collections

[...]

Josef Sivic¹, Bryan Russell¹, Alexei A. Efros², Andrew Zisserman², William T. Freeman³ - Show less +1 more•Institutions (3)

University of Oxford¹, Massachusetts Institute of Technology², Carnegie Mellon University³

25 Feb 2005

TL;DR: Given a set of images containing multiple object categories, this work seeks to discover those categories and their image locations without supervision using generative models from the statistical text literature: probabilistic Latent Semantic Analysis (pLSA), and Latent Dirichlet Allocation (LDA).

...read moreread less

Abstract: Given a set of images containing multiple object categories, we seek to discover those categories and their image locations without supervision. We achieve this using generative models from the statistical text literature: probabilistic Latent Semantic Analysis (pLSA), and Latent Dirichlet Allocation (LDA). In text analysis these are used to discover topics in a corpus using the bag-of-words document representation. Here we discover topics as object categories, so that an image containing instances of several categories is modelled as a mixture of topics. The models are applied to images by using a visual analogue of a word, formed by vector quantizing SIFT like region descriptors. We investigate a set of increasingly demanding scenarios, starting with image sets containing only two object categories through to sets containing multiple categories (including airplanes, cars, faces, motorbikes, spotted cats) and background clutter. The object categories sample both intra-class and scale variation, and both the categories and their approximate spatial layout are found without supervision. We also demonstrate classification of unseen images and images containing multiple objects. Performance of the proposed unsupervised method is compared to the semi-supervised approach of [7].1 1This work was sponsored in part by the EU Project CogViSys, the University of Oxford, Shell Oil, and the National Geospatial-Intelligence Agency.

...read moreread less

524 citations

Journal Article•DOI•

Unsupervised Activity Perception in Crowded and Complicated Scenes Using Hierarchical Bayesian Models

[...]

Xiaogang Wang¹, Xiaoxu Ma¹, W.E.L. Grimson¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Mar 2009-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A novel unsupervised learning framework to model activities and interactions in crowded and complicated scenes with many kinds of activities co-occurring, and three hierarchical Bayesian models are proposed that advance existing language models, such as LDA and HDP.

...read moreread less

Abstract: We propose a novel unsupervised learning framework to model activities and interactions in crowded and complicated scenes. Hierarchical Bayesian models are used to connect three elements in visual surveillance: low-level visual features, simple "atomic" activities, and interactions. Atomic activities are modeled as distributions over low-level visual features, and multi-agent interactions are modeled as distributions over atomic activities. These models are learnt in an unsupervised way. Given a long video sequence, moving pixels are clustered into different atomic activities and short video clips are clustered into different interactions. In this paper, we propose three hierarchical Bayesian models, Latent Dirichlet Allocation (LDA) mixture model, Hierarchical Dirichlet Process (HDP) mixture model, and Dual Hierarchical Dirichlet Processes (Dual-HDP) model. They advance existing language models, such as LDA [1] and HDP [2]. Our data sets are challenging video sequences from crowded traffic scenes and train station scenes with many kinds of activities co-occurring. Without tracking and human labeling effort, our framework completes many challenging visual surveillance tasks of board interest such as: (1) discovering typical atomic activities and interactions; (2) segmenting long video sequences into different interactions; (3) segmenting motions into different activities; (4) detecting abnormality; and (5) supporting high-level queries on activities and interactions.

...read moreread less

522 citations

Proceedings Article•DOI•

Topical N-Grams: Phrase and Topic Discovery, with an Application to Information Retrieval

[...]

Xuerui Wang¹, Andrew McCallum¹, Xing Wei¹•Institutions (1)

University of Massachusetts Amherst¹

28 Oct 2007

TL;DR: Topical n-grams as discussed by the authors is a probabilistic model that generates words in their textual order by, for each word, first sampling a topic, then sampling its status as a unigram or bigram, and then sampling the word from a topic-specific unigrams or bigrams distribution.

...read moreread less

Abstract: Most topic models, such as latent Dirichlet allocation, rely on the bag-of-words assumption. However, word order and phrases are often critical to capturing the meaning of text in many text mining tasks. This paper presents topical n-grams, a topic model that discovers topics as well as topical phrases. The probabilistic model generates words in their textual order by, for each word, first sampling a topic, then sampling its status as a unigram or bigram, and then sampling the word from a topic-specific unigram or bigram distribution. Thus our model can model "white house" as a special meaning phrase in the 'politics' topic, but not in the 'real estate' topic. Successive bigrams form longer phrases. We present experiments showing meaningful phrases and more interpretable topics from the NIPS data and improved information retrieval performance on a TREC collection.

...read moreread less

510 citations

Proceedings Article•DOI•

Latent dirichlet allocation for tag recommendation

[...]

Ralf Krestel, Peter Fankhauser, Wolfgang Nejdl

23 Oct 2009

TL;DR: This paper introduces an approach based on Latent Dirichlet Allocation (LDA) for recommending tags of resources in order to improve search and shows that the approach achieves significantly better precision and recall than the use of association rules.

...read moreread less

Abstract: Tagging systems have become major infrastructures on the Web. They allow users to create tags that annotate and categorize content and share them with other users, very helpful in particular for searching multimedia content. However, as tagging is not constrained by a controlled vocabulary and annotation guidelines, tags tend to be noisy and sparse. Especially new resources annotated by only a few users have often rather idiosyncratic tags that do not reflect a common perspective useful for search. In this paper we introduce an approach based on Latent Dirichlet Allocation (LDA) for recommending tags of resources in order to improve search. Resources annotated by many users and thus equipped with a fairly stable and complete tag set are used to elicit latent topics to which new resources with only a few tags are mapped. Based on this, other tags belonging to a topic can be recommended for the new resource. Our evaluation shows that the approach achieves significantly better precision and recall than the use of association rules, suggested in previous work, and also recommends more specific tags. Moreover, extending resources with these recommended tags significantly improves search for new resources.

...read moreread less

500 citations

Collapse

Network Information

Performance

Metrics

6,513

Papers

245,225

Citations

No. of papers in the topic in previous years
Year	Papers
2023	323
2022	842
2021	418
2020	429
2019	473
2018	446

Latent Dirichlet allocation

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics