Topic

Latent Dirichlet allocation

About: Latent Dirichlet allocation is a research topic. Over the lifetime, 5351 publications have been published within this topic receiving 212555 citations. The topic is also known as: LDA.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•

Hidden Topic Markov Models

[...]

Amit Gruber¹, Yair Weiss, Michal Rosen-Zvi•Institutions (1)

Hebrew University of Jerusalem¹

11 Mar 2007

TL;DR: This paper proposes modeling the topics of words in the document as a Markov chain, and shows that incorporating this dependency allows us to learn better topics and to disambiguate words that can belong to different topics.

...read moreread less

Abstract: Algorithms such as Latent Dirichlet Allocation (LDA) have achieved significant progress in modeling word document relationships. These algorithms assume each word in the document was generated by a hidden topic and explicitly model the word distribution of each topic as well as the prior distribution over topics in the document. Given these parameters, the topics of all words in the same document are assumed to be independent. In this paper, we propose modeling the topics of words in the document as a Markov chain. Specifically, we assume that all words in the same sentence have the same topic, and successive sentences are more likely to have the same topics. Since the topics are hidden, this leads to using the well-known tools of Hidden Markov Models for learning and inference. We show that incorporating this dependency allows us to learn better topics and to disambiguate words that can belong to different topics. Quantitatively, we show that we obtain better perplexity in modeling documents with only a modest increase in learning and inference complexity.

...read moreread less

290 citations

Book Chapter•DOI•

Bayesian Nonparametrics: Hierarchical Bayesian nonparametric models with applications

[...]

Yee Whye Teh, Michael I. Jordan

01 Apr 2010

TL;DR: The role of hierarchical modeling in Bayesian nonparametrics is discussed, focusing on models in which the infinite-dimensional parameters are treated hierarchically, and the value of these hierarchical constructions is demonstrated in a wide range of practical applications.

...read moreread less

Abstract: Hierarchical modeling is a fundamental concept in Bayesian statistics. The basic idea is that parameters are endowed with distributions which may themselves introduce new parameters, and this construction recurses. In this review we discuss the role of hierarchical modeling in Bayesian nonparametrics, focusing on models in which the infinite-dimensional parameters are treated hierarchically. For example, we consider a model in which the base measure for a Dirichlet process is itself treated as a draw from another Dirichlet process. This yields a natural recursion that we refer to as a hierarchical Dirichlet process. We also discuss hierarchies based on the Pitman-Yor process and on completely random processes. We demonstrate the value of these hierarchical constructions in a wide range of practical applications, in problems in computational biology, computer vision and natural language processing.

...read moreread less

290 citations

Proceedings Article•

Spatial Latent Dirichlet Allocation

[...]

Xiaogang Wang¹, Eric Grimson¹•Institutions (1)

Massachusetts Institute of Technology¹

03 Dec 2007

TL;DR: A topic model Spatial Latent Dirichlet Allocation (SLDA), which better encodes spatial structures among visual words that are essential for solving many vision problems, is proposed and used to discover objects from a collection of images.

...read moreread less

Abstract: In recent years, the language model Latent Dirichlet Allocation (LDA), which clusters co-occurring words into topics, has been widely applied in the computer vision field. However, many of these applications have difficulty with modeling the spatial and temporal structure among visual words, since LDA assumes that a document is a "bag-of-words". It is also critical to properly design "words" and "documents" when using a language model to solve vision problems. In this paper, we propose a topic model Spatial Latent Dirichlet Allocation (SLDA), which better encodes spatial structures among visual words that are essential for solving many vision problems. The spatial information is not encoded in the values of visual words but in the design of documents. Instead of knowing the partition of words into documents a priori, the word-document assignment becomes a random hidden variable in SLDA. There is a generative procedure, where knowledge of spatial structure can be flexibly added as a prior, grouping visual words which are close in space into the same document. We use SLDA to discover objects from a collection of images, and show it achieves better performance than LDA.

...read moreread less

287 citations

Journal Article•DOI•

Extracting urban functional regions from points of interest and human activities on location-based social networks

[...]

Song Gao, Krzysztof Janowicz, Helen Couclelis

01 Jun 2017-Transactions in Gis

TL;DR: This research develops a statistical framework to help discover semantically meaningful topics and functional regions based on the co‐occurrence patterns of POI types and demonstrates the effectiveness of the proposed methodology by identifying distinctive types of latent topics and by extracting urban functional regions.

...read moreread less

Abstract: Data about points of interest (POI) have been widely used in studying urban land use types and for sensing human behavior. However, it is difficult to quantify the correct mix or the spatial relations among different POI types indicative of specific urban functions. In this research, we develop a statistical framework to help discover semantically meaningful topics and functional regions based on the co-occurrence patterns of POI types. The framework applies the latent Dirichlet allocation (LDA) topic modeling technique and incorporates user check-in activities on location-based social networks. Using a large corpus of about 100,000 Foursquare venues and user check-in behavior in the 10 most populated urban areas of the US, we demonstrate the effectiveness of our proposed methodology by identifying distinctive types of latent topics and, further, by extracting urban functional regions using K-means clustering and Delaunay triangulation spatial constraints clustering. We show that a region can support multiple functions but with different probabilities, while the same type of functional region can span multiple geographically non-adjacent locations. Since each region can be modeled as a vector consisting of multinomial topic distributions, similar regions with regard to their thematic topic signatures can be identified. Compared with remote sensing images which mainly uncover the physical landscape of urban environments, our popularity-based POI topic modeling approach can be seen as a complementary social sensing view on urban space based on human activities.

...read moreread less

283 citations

Proceedings Article•DOI•

Learning Topic Models -- Going beyond SVD

[...]

Sanjeev Arora¹, Rong Ge¹, Ankur Moitra²•Institutions (2)

Princeton University¹, Institute for Advanced Study²

20 Oct 2012

TL;DR: This paper formally justifies Nonnegative Matrix Factorization (NMF) as a main tool in this context, which is an analog of SVD where all vectors are nonnegative, and gives the first polynomial-time algorithm for learning topic models without the above two limitations.

...read moreread less

Abstract: Topic Modeling is an approach used for automatic comprehension and classification of data in a variety of settings, and perhaps the canonical application is in uncovering thematic structure in a corpus of documents. A number of foundational works both in machine learning and in theory have suggested a probabilistic model for documents, whereby documents arise as a convex combination of (i.e. distribution on) a small number of topic vectors, each topic vector being a distribution on words (i.e. a vector of word-frequencies). Similar models have since been used in a variety of application areas, the Latent Dirichlet Allocation or LDA model of Blei et al. is especially popular. Theoretical studies of topic modeling focus on learning the model's parameters assuming the data is actually generated from it. Existing approaches for the most part rely on Singular Value Decomposition (SVD), and consequently have one of two limitations: these works need to either assume that each document contains only one topic, or else can only recover the {\em span} of the topic vectors instead of the topic vectors themselves. This paper formally justifies Nonnegative Matrix Factorization (NMF) as a main tool in this context, which is an analog of SVD where all vectors are nonnegative. Using this tool we give the first polynomial-time algorithm for learning topic models without the above two limitations. The algorithm uses a fairly mild assumption about the underlying topic matrix called separability, which is usually found to hold in real-life data. Perhaps the most attractive feature of our algorithm is that it generalizes to yet more realistic models that incorporate topic-topic correlations, such as the Correlated Topic Model (CTM) and the Pachinko Allocation Model (PAM). We hope that this paper will motivate further theoretical results that use NMF as a replacement for SVD -- just as NMF has come to replace SVD in many applications.

...read moreread less

277 citations

Collapse

Network Information

Performance

Metrics

6,513

Papers

245,225

Citations

No. of papers in the topic in previous years
Year	Papers
2023	323
2022	842
2021	418
2020	429
2019	473
2018	446

Latent Dirichlet allocation

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics