Topic

Latent Dirichlet allocation

About: Latent Dirichlet allocation is a research topic. Over the lifetime, 5351 publications have been published within this topic receiving 212555 citations. The topic is also known as: LDA.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Employing Latent Dirichlet Allocation for fraud detection in telecommunications

[...]

Dongshan Xing¹, Mark Girolami¹•Institutions (1)

University of Glasgow¹

01 Oct 2007-Pattern Recognition Letters

TL;DR: This paper employs Latent Dirichlet Allocation (LDA) to build user profile signatures and assumes that any significant unexplainable deviations from the normal activity of an individual user is strongly correlated with fraudulent activity.

...read moreread less

85 citations

Posted Content•

The structural topic model and applied social science

[...]

Margaret E. Roberts¹, Brandon M. Stewart¹, Dustin Tingley¹, Edoardo M. Airoldi¹•Institutions (1)

Harvard University¹

01 Jan 2013-Research Papers in Economics

TL;DR: The Structural Topic Model (STM) as mentioned in this paper incorporates document-level information (e.g., author, partisan affiliation, date) into the standard topic model to incorporate corpus structure or document metadata.

...read moreread less

Abstract: We develop the Structural Topic Model which provides a general way to incorporate corpus structure or document metadata into the standard topic model. Document-level covariates enter the model through a simple generalized linear model framework in the prior distributions controlling either topical prevalence or topical content. We demonstrate the model’s use in two applied problems: the analysis of open-ended responses in a survey experiment about immigration policy, and understanding differing media coverage of China’s rise. 1 Topic Models and Social Science Over the last decade probabilistic topic models, such as Latent Dirichlet Allocation (LDA), have become a common tool for understanding large text corpora [1].1 Although originally developed for descriptive and exploratory purposes, social scientists are increasingly seeing the value of topic models as a tool for measurement of latent linguistic, political and psychological variables [2]. The defining element of this work is the presence of additional document-level information (e.g. author, partisan affiliation, date) on which variation in either topical prevalence or topical content is of theoretic interest.2 As a practical matter, this generally involves running an off-the-shelf implementation of LDA and then performing a post-hoc evaluation of variation with a covariate of interest. A better alternative to post-hoc comparisons is to build the additional information about the structure of the corpus into the model itself by altering the prior distributions to partially pool information amongst similar documents. Numerous special cases of this framework have been developed for particular types of corpus structure affecting both topic prevalence (e.g. time [3], author [4]) and topical content (e.g. ideology [5], geography [6]). Applied users have been slow to adopt these models because it is often difficult to find a model that exactly fits their specific corpus. We develop the Structural Topic Model (STM) which accommodates corpus structure through document-level covariates affecting topical prevalence and/or topical content. The central idea is to ∗Prepared for the NIPS 2013 Workshop on Topic Models: Computation, Application, and Evaluation. A forthcoming R package implements the methods described here. † These authors contributed equally. We assume a general familiarity with LDA throughout (see [1] for a review) By “topical prevalence” we mean the proportion of document devoted to a given topic. By “topical content” we mean the rate of word use within a given topic.

...read moreread less

85 citations

Journal Article•DOI•

Anchored Correlation Explanation: Topic Modeling with Minimal Domain Knowledge

[...]

Ryan J. Gallagher¹, Ryan J. Gallagher², Kyle Reing², David C. Kale², Greg Ver Steeg² - Show less +1 more•Institutions (2)

University of Vermont¹, University of Southern California²

02 Dec 2017-Transactions of the Association for Computational Linguistics

TL;DR: Correlation Explanation is introduced, an alternative approach to topic modeling that does not assume an underlying generative model, and instead learns maximally informative topics through an information-theoretic framework that generalizes to hierarchical and semi-supervised extensions with no additional modeling assumptions.

...read moreread less

Abstract: While generative models such as Latent Dirichlet Allocation (LDA) have proven fruitful in topic modeling, they often require detailed assumptions and careful specification of hyperparameters. Such model complexity issues only compound when trying to generalize generative models to incorporate human input. We introduce Correlation Explanation (CorEx), an alternative approach to topic modeling that does not assume an underlying generative model, and instead learns maximally informative topics through an information-theoretic framework. This framework naturally generalizes to hierarchical and semi-supervised extensions with no additional modeling assumptions. In particular, word-level domain knowledge can be flexibly incorporated within CorEx through anchor words, allowing topic separability and representation to be promoted with minimal human intervention. Across a variety of datasets, metrics, and experiments, we demonstrate that CorEx produces topics that are comparable in quality to those produced by unsupervised and semi-supervised variants of LDA.

...read moreread less

85 citations

Journal Article•DOI•

A Dirichlet Process Mixture of Generalized Dirichlet Distributions for Proportional Data Modeling

[...]

Nizar Bouguila¹, Djemel Ziou•Institutions (1)

Concordia University¹

01 Jan 2010-IEEE Transactions on Neural Networks

TL;DR: Through some applications involving real-data classification and image databases categorization using visual words, it is shown that clustering via infinite mixture models offers a more powerful and robust performance than classic finite mixtures.

...read moreread less

Abstract: In this paper, we propose a clustering algorithm based on both Dirichlet processes and generalized Dirichlet distribution which has been shown to be very flexible for proportional data modeling. Our approach can be viewed as an extension of the finite generalized Dirichlet mixture model to the infinite case. The extension is based on nonparametric Bayesian analysis. This clustering algorithm does not require the specification of the number of mixture components to be given in advance and estimates it in a principled manner. Our approach is Bayesian and relies on the estimation of the posterior distribution of clusterings using Gibbs sampler. Through some applications involving real-data classification and image databases categorization using visual words, we show that clustering via infinite mixture models offers a more powerful and robust performance than classic finite mixtures.

...read moreread less

85 citations

Journal Article•DOI•

Geometric Latent Dirichlet Allocation on a Matching Graph for Large-scale Image Datasets

[...]

James Philbin¹, Josef Sivic², Andrew Zisserman¹•Institutions (2)

University of Oxford¹, École Normale Supérieure²

01 Nov 2011-International Journal of Computer Vision

TL;DR: The Geometric Latent Dirichlet Allocation model for unsupervised discovery of particular objects in unordered image collections is introduced and how “hub images” (images representative of an object or landmark) can easily be extracted from the authors' matching graph representation is discussed.

...read moreread less

Abstract: Given a large-scale collection of images our aim is to efficiently associate images which contain the same entity, for example a building or object, and to discover the significant entities. To achieve this, we introduce the Geometric Latent Dirichlet Allocation (gLDA) model for unsupervised discovery of particular objects in unordered image collections. This explicitly represents images as mixtures of particular objects or facades, and builds rich latent topic models which incorporate the identity and locations of visual words specific to the topic in a geometrically consistent way. Applying standard inference techniques to this model enables images likely to contain the same object to be probabilistically grouped and ranked. Additionally, to reduce the computational cost of applying the gLDA model to large datasets, we propose a scalable method that first computes a matching graph over all the images in a dataset. This matching graph connects images that contain the same object, and rough image groups can be mined from this graph using standard clustering techniques. The gLDA model can then be applied to generate a more nuanced representation of the data. We also discuss how "hub images" (images representative of an object or landmark) can easily be extracted from our matching graph representation. We evaluate our techniques on the publicly available Oxford buildings dataset (5K images) and show examples of automatically mined objects. The methods are evaluated quantitatively on this dataset using a ground truth labeling for a number of Oxford landmarks. To demonstrate the scalability of the matching graph method, we show qualitative results on two larger datasets of images taken of the Statue of Liberty (37K images) and Rome (1M+ images).

...read moreread less

85 citations

Collapse

Network Information

Performance

Metrics

6,513

Papers

245,225

Citations

No. of papers in the topic in previous years
Year	Papers
2023	323
2022	842
2021	418
2020	429
2019	473
2018	446

Latent Dirichlet allocation

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics