scispace - formally typeset
Search or ask a question
Topic

Latent Dirichlet allocation

About: Latent Dirichlet allocation is a research topic. Over the lifetime, 5351 publications have been published within this topic receiving 212555 citations. The topic is also known as: LDA.


Papers
More filters
Proceedings Article
31 Mar 2018
TL;DR: In this article, the authors provide general conditions for obtaining optimal risk bounds for point estimates acquired from mean-field variational Bayesian inference, which pertain to the existence of certain test functions for the distance metric on the parameter space and minimal assumptions on the prior.
Abstract: The article addresses a long-standing open problem on the justification of using variational Bayes methods for parameter estimation. We provide general conditions for obtaining optimal risk bounds for point estimates acquired from mean-field variational Bayesian inference. The conditions pertain to the existence of certain test functions for the distance metric on the parameter space and minimal assumptions on the prior. A general recipe for verification of the conditions is outlined which is broadly applicable to existing Bayesian models with or without latent variables. As illustrations, specific applications to Latent Dirichlet Allocation and Gaussian mixture models are discussed.

43 citations

Proceedings ArticleDOI
01 Jul 2017
TL;DR: Salience Rank is proposed, a modification of TPR that only needs to run PageRank once and extracts comparable or better keyphrases on benchmark datasets, and has the flexibility to extract keyphRases with varying tradeoffs between topic specificity and corpus specificity.
Abstract: Topical PageRank (TPR) uses latent topic distribution inferred by Latent Dirichlet Allocation (LDA) to perform ranking of noun phrases extracted from documents. The ranking procedure consists of running PageRank K times, where K is the number of topics used in the LDA model. In this paper, we propose a modification of TPR, called Salience Rank. Salience Rank only needs to run PageRank once and extracts comparable or better keyphrases on benchmark datasets. In addition to quality and efficiency benefit, our method has the flexibility to extract keyphrases with varying tradeoffs between topic specificity and corpus specificity.

43 citations

Book ChapterDOI
17 Aug 2012
TL;DR: This work proposes a novel topic detection technique that permits to retrieve in real-time the most emergent topics expressed by the community by using Latent Dirichlet Allocation (LDA) Model in place of traditional VSM model.
Abstract: Microblogging is a recent social phenomenon of Web2.0 technology, having applications in many domains. It is another form of social media, recognized as Real-Time Web Publishing, which has won an impressive audience acceptance and surprisingly changed online expression and interaction for millions of users.It is observed that clustering by topic can be very helpful for the quick retrieval of desired information. We propose a novel topic detection technique that permits to retrieve in real-time the most emergent topics expressed by the community. Traditional text mining techniques have no special considerations for short and sparse microblog data. Keeping in view these special characteristics of data, we adopt Single-pass Clustering technique by using Latent Dirichlet Allocation (LDA) Model in place of traditional VSM model, to extract the hidden microblog topics information. Experiments on actual dataset results showed that the proposed method decreased the probabilities of miss and false alarm, as well as reduced the normalized detection cost.

43 citations

Proceedings ArticleDOI
27 Jun 2014
TL;DR: This paper presents a method to extract service evolution patterns by exploiting Latent Dirichlet Allocation (LDA) and time series prediction and shows that this approach leads to a higher precision than traditional collaborative filtering and content matching methods.
Abstract: Web service recommendation has become a critical problem as services become increasingly prevalent on the Internet. Some existing methods focus on content matching techniques such as keyword search and semantic matching while others are based on Quality of Service (QoS) prediction. However, services and their mashups are evolving over time with publishing, perishing and changing of interfaces. Therefore, a practical service recommendation approach should take into account the evolution of a service ecosystem. In this paper, we present a method to extract service evolution patterns by exploiting Latent Dirichlet Allocation (LDA) and time series prediction. A time-aware service recommendation framework for mashup creation is presented combing service evolution, collaborative filtering and content matching. Experiments on real-world ProgrammableWeb data set show that our approach leads to a higher precision than traditional collaborative filtering and content matching methods.

43 citations

Proceedings ArticleDOI
01 Oct 2019
TL;DR: Using the LDA method as an algorithm to produce topic modeling, each topic similarity, and visualization of topic clusters from the tweet data generated as many as 4 topics (Economic, Military, Sports, Technology) in Indonesian is successfully carried out.
Abstract: Twitter is a popular social media for every user to issue thoughts and emotional forms which are tweets, tweets that only have 140 characters with limitations to write in text. Twitter is one of the social media places to get information that is always up to date, tweets are categorized into big data because tweets are information that can be used as a source of data for research. Latent Dirichlet Allocation (LDA) as an algorithm that can process large text data (big data). In this study using the LDA method as an algorithm to produce topic modeling, each topic similarity, and visualization of topic clusters from the tweet data generated as many as 4 topics (Economic, Military, Sports, Technology) in Indonesian, where each topic has a number different tweets. The LDA method used in the processing of tweet data is successfully carried out and works optimally, in each topic extraction, topic modeling, generating index words that are in each topic cluster and computer visualization in the topic.LDA output shows optimal performance in the process of word indexing in Sport topics with 1260 tweets with an accuracy of 98% better than the LSI method in Topic Modeling.

43 citations


Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
86% related
Support vector machine
73.6K papers, 1.7M citations
86% related
Deep learning
79.8K papers, 2.1M citations
85% related
Feature extraction
111.8K papers, 2.1M citations
84% related
Convolutional neural network
74.7K papers, 2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023323
2022842
2021418
2020429
2019473
2018446