scispace - formally typeset
Search or ask a question
Topic

Latent Dirichlet allocation

About: Latent Dirichlet allocation is a research topic. Over the lifetime, 5351 publications have been published within this topic receiving 212555 citations. The topic is also known as: LDA.


Papers
More filters
Journal ArticleDOI
TL;DR: This paper addresses a comparison study on scientific unstructured text document classification (e-books) based on the full text where applying the most popular topic modeling approach (LDA, LSA) to cluster the words into a set of topics as important keywords for classification.
Abstract: With the rapid growth of information technology, the amount of unstructured text data in digital libraries is rapidly increased and has become a big challenge in analyzing, organizing and how to classify text automatically in E-research repository to get the benefit from them is the cornerstone. The manual categorization of text documents requires a lot of financial, human resources for management. In order to get so, topic modeling are used to classify documents. This paper addresses a comparison study on scientific unstructured text document classification (e-books) based on the full text where applying the most popular topic modeling approach (LDA, LSA) to cluster the words into a set of topics as important keywords for classification. Our dataset consists of (300) books contain about 23 million words based on full text. In the used topic models (LSA, LDA) each word in the corpus of vocabulary is connected with one or more topics with a probability, as estimated by the model. Many (LDA, LSA) models were built with different values of coherence and pick the one that produces the highest coherence value. The result of this paper showed that LDA has better results than LSA and the best results obtained from the LDA method was ( 0.592179 ) of coherence value when the number of topics was 20 while the LSA coherence value was (0.5773026) when the number of topics was 10.

30 citations

Journal ArticleDOI
TL;DR: This paper proposes to do tag refinement from the angle of topic modeling and presents a novel graphical model, regularized latent Dirichlet allocation (rLDA), where tag similarity and tag relevance are jointly estimated in an iterative manner, so that they can benefit from each other.

30 citations

Journal ArticleDOI
24 Mar 2018
TL;DR: The results show that the proposed WSSE based on LDA and clustering outperforms the keyword-based search system.
Abstract: With the ever increasing number of Web services, discovering an appropriate Web service requested by users has become a vital yet challenging task. We need a scalable and efficient search engine to deal with the large volume of Web services. The aim of this approach is to provide an efficient search engine that can retrieve the most relevant Web services in a short time. The proposed Web service search engine (WSSE) is based on the probabilistic topic modeling and clustering techniques that are integrated to support each other by discovering the semantic meaning of Web services and reducing the search space. The latent Dirichlet allocation (LDA) is used to extract topics from Web service descriptions. These topics are used to group similar Web services together. Each Web service description is represented as a topic vector, so the topic model is an efficient technique to reduce the dimensionality of word vectors and to discover the semantic meaning that is hidden in Web service descriptions. Also, the Web service description is represented as a word vector to address the drawbacks of the keyword-based search system. The accuracy of the proposed WSSE is compared with the keyword-based search system. Also, the precision and recall metrics are used to evaluate the performance of the proposed approach and the keyword-based search system. The results show that the proposed WSSE based on LDA and clustering outperforms the keyword-based search system.

30 citations

Proceedings ArticleDOI
14 Jun 2009
TL;DR: Independent Factor Topic Models (IFTM) are proposed which use linear latent variable models to uncover the hidden sources of correlation between topics to provide a fast Newton-Raphson based variational inference algorithm.
Abstract: Topic models such as Latent Dirichlet Allocation (LDA) and Correlated Topic Model (CTM) have recently emerged as powerful statistical tools for text document modeling. In this paper, we improve upon CTM and propose Independent Factor Topic Models (IFTM) which use linear latent variable models to uncover the hidden sources of correlation between topics. There are 2 main contributions of this work. First, by using a sparse source prior model, we can directly visualize sparse patterns of topic correlations. Secondly, the conditional independence assumption implied in the use of latent source variables allows the objective function to factorize, leading to a fast Newton-Raphson based variational inference algorithm. Experimental results on synthetic and real data show that IFTM runs on average 3--5 times faster than CTM, while giving competitive performance as measured by perplexity and loglikelihood of held-out data.

30 citations

Journal ArticleDOI
TL;DR: In the big data era, destination images have played an increasingly important role in tourism development as discussed by the authors, however, seldom tourism research has utilised big data analytics to examine destination images.
Abstract: In the big data era, destination images have played an increasingly important role in tourism development. However, seldom tourism research has utilised big data analytics to examine destin...

30 citations


Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
86% related
Support vector machine
73.6K papers, 1.7M citations
86% related
Deep learning
79.8K papers, 2.1M citations
85% related
Feature extraction
111.8K papers, 2.1M citations
84% related
Convolutional neural network
74.7K papers, 2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023323
2022842
2021418
2020429
2019473
2018446