scispace - formally typeset
Search or ask a question
Topic

Latent Dirichlet allocation

About: Latent Dirichlet allocation is a research topic. Over the lifetime, 5351 publications have been published within this topic receiving 212555 citations. The topic is also known as: LDA.


Papers
More filters
Journal ArticleDOI
TL;DR: A flexible Bayesian semiparametric approach to analyzing crash data that are of hierarchical or multilevel nature and is shown to improve model fitting significantly for the latter data, which can have important policy implications for various safety management programs.

39 citations

Journal ArticleDOI
TL;DR: This work shows that their method successfully models dependencies online for large-scale multi-label datasets with many labels and improves over the baseline method not modeling dependencies, and makes the batch variant competitive with existing more complex multi- label topic models.
Abstract: Multi-label text classification is an increasingly important field as large amounts of text data are available and extracting relevant information is important in many application contexts. Probabilistic generative models are the basis of a number of popular text mining methods such as Naive Bayes or Latent Dirichlet Allocation. However, Bayesian models for multi-label text classification often are overly complicated to account for label dependencies and skewed label frequencies while at the same time preventing overfitting. To solve this problem we employ the same technique that contributed to the success of deep learning in recent years: greedy layer-wise training. Applying this technique in the supervised setting prevents overfitting and leads to better classification accuracy. The intuition behind this approach is to learn the labels first and subsequently add a more abstract layer to represent dependencies among the labels. This allows using a relatively simple hierarchical topic model which can easily be adapted to the online setting. We show that our method successfully models dependencies online for large-scale multi-label datasets with many labels and improves over the baseline method not modeling dependencies. The same strategy, layer-wise greedy training, also makes the batch variant competitive with existing more complex multi-label topic models.

39 citations

Proceedings ArticleDOI
02 Aug 2009
TL;DR: A set of Bayesian methods for automatically extending the WordNet ontology with new concepts and annotating existing concepts with generic property fields, or attributes is presented.
Abstract: This paper presents a set of Bayesian methods for automatically extending the WordNet ontology with new concepts and annotating existing concepts with generic property fields, or attributes. We base our approach on Latent Dirichlet Allocation and evaluate along two dimensions: (1) the precision of the ranked lists of attributes, and (2) the quality of the attribute assignments to WordNet concepts. In all cases we find that the principled LDA-based approaches outperform previously proposed heuristic methods, greatly improving the specificity of attributes at each concept.

39 citations

Patent
Ce Liu1
01 Apr 2010
TL;DR: In this article, a method of operating a computer system to perform material recognition based on multiple features extracted from an image is described and a combination of low-level features extracted directly from the image and multiple novel mid-level feature extracted from transformed versions of the image are selected and used to assign a material category to a single image.
Abstract: A method of operating a computer system to perform material recognition based on multiple features extracted from an image is described. A combination of low-level features extracted directly from the image and multiple novel mid-level features extracted from transformed versions of the image are selected and used to assign a material category to a single image. The novel mid-level features include non-reflectance based features such as the micro-texture features micro jet and micro-SIFT and the shape feature curvature, and reflectance-based features including edge slice and edge ribbon. An augmented Latent Dirichlet Allocation (LDA) model is provided as an exemplary Bayesian framework for selecting a subset of features useful for material recognition of objects in an image.

39 citations

Proceedings Article
01 Nov 2011
TL;DR: It is shown that using LDA for word class induction scales better with the number of classes than the Brown algorithm and the resulting classes outperform Brown on the three tasks.
Abstract: Word classes automatically induced from distributional evidence have proved useful many NLP tasks including Named Entity Recognition, parsing and sentence retrieval. The Brown hard clustering algorithm is commonly used in this scenario. Here we propose to use Latent Dirichlet Allocation in order to induce soft, probabilistic word classes. We compare our approach against Brown in terms of efficiency. We also compare the usefulness of the induced Brown and LDA word classes for the semi-supervised learning of three NLP tasks: fine-grained Named Entity Recognition, Morphological Analysis and semantic Relation Classification. We show that using LDA for word class induction scales better with the number of classes than the Brown algorithm and the resulting classes outperform Brown on the three tasks.

39 citations


Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
86% related
Support vector machine
73.6K papers, 1.7M citations
86% related
Deep learning
79.8K papers, 2.1M citations
85% related
Feature extraction
111.8K papers, 2.1M citations
84% related
Convolutional neural network
74.7K papers, 2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023323
2022842
2021418
2020429
2019473
2018446