Topic
Latent Dirichlet allocation
About: Latent Dirichlet allocation is a research topic. Over the lifetime, 5351 publications have been published within this topic receiving 212555 citations. The topic is also known as: LDA.
Papers published on a yearly basis
Papers
More filters
••
23 Aug 2004TL;DR: A new finite mixture model based on a generalization of the Dirichlet distribution is presented, which involves the comparison of the performance of Gaussian and generalizedDirichlet mixtures in the classification of several pattern-recognition data sets.
Abstract: This paper presents a new finite mixture model based on a generalization of the Dirichlet distribution. For the estimation of the parameters of this mixture we use a GEM (generalized expectation maximization) algorithm based on a Newton-Raphson step. The experimental results involve the comparison of the performance of Gaussian and generalized Dirichlet mixtures in the classification of several pattern-recognition data sets.
51 citations
••
02 Oct 2016TL;DR: This paper proposes a novel neural language model, Topic-based Skip-gram, to learn topic-based word embeddings for biomedical literature indexing with CNNs, and describes two multimodal CNN architectures, which are able to employ different kinds of wordembeddings at the same time for text classification.
Abstract: Recently, distributed word embeddings trained by neural language models are commonly used for text classification with Convolutional Neural Networks (CNNs). In this paper, we propose a novel neural language model, Topic-based Skip-gram, to learn topic-based word embeddings for biomedical literature indexing with CNNs. Topic-based Skip-gram leverages textual content with topic models, e.g., Latent Dirichlet Allocation (LDA), to capture precise topic-based word relationship and then integrate it into distributed word embedding learning. We then describe two multimodal CNN architectures, which are able to employ different kinds of word embeddings at the same time for text classification. Through extensive experiments conducted on several real-world datasets, we demonstrate that combination of our Topic-based Skip-gram and multimodal CNN architectures outperforms state-of-the-art methods in biomedical literature indexing, clinical note annotation and general textual benchmark dataset classification.
51 citations
01 Jan 2014
TL;DR: A course recommendation system based on historical grades of students in college, able to recommend available courses in sites such as: Coursera, Udacity, Edx, etc, is proposed.
Abstract: In this paper we propose a course recommendation system based on historical grades of students in college. Our model will be able to recommend available courses in sites such as: Coursera, Udacity, Edx, etc. To do so, probabilistic topic models are used as follows. On one hand, Latent Dirichlet Allocation (LDA) topic model infers topics from content given in a college course syllabus. On the other hand, topics are also extracted from a massive online open course (MOOC) syllabus. These two sets of topics and grading information are matched using a content based recommendation system so as to recommend relevant online courses to students. Preliminary results show suitability of our approach.
51 citations
•
TL;DR: Weibull hybrid autoencoding inference (WHAI) as mentioned in this paper infers posterior samples via a hybrid of stochastic gradient MCMC and auto-encoding variational Bayes.
Abstract: To train an inference network jointly with a deep generative topic model, making it both scalable to big corpora and fast in out-of-sample prediction, we develop Weibull hybrid autoencoding inference (WHAI) for deep latent Dirichlet allocation, which infers posterior samples via a hybrid of stochastic-gradient MCMC and autoencoding variational Bayes. The generative network of WHAI has a hierarchy of gamma distributions, while the inference network of WHAI is a Weibull upward-downward variational autoencoder, which integrates a deterministic-upward deep neural network, and a stochastic-downward deep generative model based on a hierarchy of Weibull distributions. The Weibull distribution can be used to well approximate a gamma distribution with an analytic Kullback-Leibler divergence, and has a simple reparameterization via the uniform noise, which help efficiently compute the gradients of the evidence lower bound with respect to the parameters of the inference network. The effectiveness and efficiency of WHAI are illustrated with experiments on big corpora.
51 citations
•
11 Mar 2007
TL;DR: Experimental results show that using these techniques is it possible to apply DP mixture models to very large data sets and that search algorithms provide a practical alternative to expensive MCMC and variational techniques.
Abstract: Dirichlet process (DP) mixture models provide a flexible Bayesian framework for density estimation. Unfortunately, their flexibility comes at a cost: inference in DP mixture models is computationally expensive, even when conjugate distributions are used. In the common case when one seeks only a maximum a posteriori assignment of data points to clusters, we show that search algorithms provide a practical alternative to expensive MCMC and variational techniques. When a true posterior sample is desired, the solution found by search can serve as a good initializer for MCMC. Experimental results show that using these techniques is it possible to apply DP mixture models to very large data sets.
51 citations