Topic
Latent Dirichlet allocation
About: Latent Dirichlet allocation is a research topic. Over the lifetime, 5351 publications have been published within this topic receiving 212555 citations. The topic is also known as: LDA.
Papers published on a yearly basis
Papers
More filters
••
TL;DR: A flexible Bayesian semiparametric approach to analyzing crash data that are of hierarchical or multilevel nature and is shown to improve model fitting significantly for the latter data, which can have important policy implications for various safety management programs.
39 citations
••
TL;DR: This work shows that their method successfully models dependencies online for large-scale multi-label datasets with many labels and improves over the baseline method not modeling dependencies, and makes the batch variant competitive with existing more complex multi- label topic models.
Abstract: Multi-label text classification is an increasingly important field as large amounts of text data are available and extracting relevant information is important in many application contexts. Probabilistic generative models are the basis of a number of popular text mining methods such as Naive Bayes or Latent Dirichlet Allocation. However, Bayesian models for multi-label text classification often are overly complicated to account for label dependencies and skewed label frequencies while at the same time preventing overfitting. To solve this problem we employ the same technique that contributed to the success of deep learning in recent years: greedy layer-wise training. Applying this technique in the supervised setting prevents overfitting and leads to better classification accuracy. The intuition behind this approach is to learn the labels first and subsequently add a more abstract layer to represent dependencies among the labels. This allows using a relatively simple hierarchical topic model which can easily be adapted to the online setting. We show that our method successfully models dependencies online for large-scale multi-label datasets with many labels and improves over the baseline method not modeling dependencies. The same strategy, layer-wise greedy training, also makes the batch variant competitive with existing more complex multi-label topic models.
39 citations
••
02 Aug 2009TL;DR: A set of Bayesian methods for automatically extending the WordNet ontology with new concepts and annotating existing concepts with generic property fields, or attributes is presented.
Abstract: This paper presents a set of Bayesian methods for automatically extending the WordNet ontology with new concepts and annotating existing concepts with generic property fields, or attributes. We base our approach on Latent Dirichlet Allocation and evaluate along two dimensions: (1) the precision of the ranked lists of attributes, and (2) the quality of the attribute assignments to WordNet concepts. In all cases we find that the principled LDA-based approaches outperform previously proposed heuristic methods, greatly improving the specificity of attributes at each concept.
39 citations
•
01 Apr 2010TL;DR: In this article, a method of operating a computer system to perform material recognition based on multiple features extracted from an image is described and a combination of low-level features extracted directly from the image and multiple novel mid-level feature extracted from transformed versions of the image are selected and used to assign a material category to a single image.
Abstract: A method of operating a computer system to perform material recognition based on multiple features extracted from an image is described. A combination of low-level features extracted directly from the image and multiple novel mid-level features extracted from transformed versions of the image are selected and used to assign a material category to a single image. The novel mid-level features include non-reflectance based features such as the micro-texture features micro jet and micro-SIFT and the shape feature curvature, and reflectance-based features including edge slice and edge ribbon. An augmented Latent Dirichlet Allocation (LDA) model is provided as an exemplary Bayesian framework for selecting a subset of features useful for material recognition of objects in an image.
39 citations
•
01 Nov 2011TL;DR: It is shown that using LDA for word class induction scales better with the number of classes than the Brown algorithm and the resulting classes outperform Brown on the three tasks.
Abstract: Word classes automatically induced from distributional evidence have proved useful many NLP tasks including Named Entity Recognition, parsing and sentence retrieval. The Brown hard clustering algorithm is commonly used in this scenario. Here we propose to use Latent Dirichlet Allocation in order to induce soft, probabilistic word classes. We compare our approach against Brown in terms of efficiency. We also compare the usefulness of the induced Brown and LDA word classes for the semi-supervised learning of three NLP tasks: fine-grained Named Entity Recognition, Morphological Analysis and semantic Relation Classification. We show that using LDA for word class induction scales better with the number of classes than the Brown algorithm and the resulting classes outperform Brown on the three tasks.
39 citations