scispace - formally typeset
Search or ask a question
Topic

Latent Dirichlet allocation

About: Latent Dirichlet allocation is a research topic. Over the lifetime, 5351 publications have been published within this topic receiving 212555 citations. The topic is also known as: LDA.


Papers
More filters
Proceedings ArticleDOI
09 Mar 2009
TL;DR: This paper presents a novel approach to the extraction of user context, casting the problem of context recovery as an unsupervised, clustering problem, and experimental results with real data are presented to validate the proposed un supervised learning approach and demonstrate its applicability.
Abstract: This paper examines the recovery of user context in indoor environmnents with existing wireless infrastructures to enable assistive systems. We present a novel approach to the extraction of user context, casting the problem of context recovery as an unsupervised, clustering problem. A well known density-based clustering technique, DBSCAN, is adapted to recover user context that includes user motion state, and significant places the user visits from WiFi observations consisting of access point id and signal strength. Furthermore, user rhythms or sequences of places the user visits periodically are derived from the above low level contexts by employing a state-of-the-art probabilistic clustering technique, the Latent Dirichlet Allocation (LDA), to enable a variety of application services. Experimental results with real data are presented to validate the proposed unsupervised learning approach and demonstrate its applicability.

34 citations

01 Jan 1996
TL;DR: Bayesian nonparametric inference for a nonsequential change-point problem is studied using a mixture of products of Dirichlet processes as a prior distribution to overcome analytic diiculties in computing the posterior distributions of interest.
Abstract: SUMMARY Bayesian nonparametric inference for a nonsequential change-point problem is studied. We use a mixture of products of Dirichlet processes as our prior distribution. This allows the data before and after the change-point to be dependent, even when the change point is known. A Gibbs sampler algorithm is also proposed in order to overcome analytic diiculties in computing the posterior distributions of interest, some of which have support on the space of all distribution functions.

34 citations

01 Jan 2010
TL;DR: A new algorithm for topic modeling, text classification and retrieval, tailored to sequences of time-stamped documents, based on the auto-encoder architecture is presented, which demonstrates state-of-the-art information retrieval and classification results on the Reuters collection, as well as an application to volatility forecasting from financial news.
Abstract: We present a new algorithm for topic modeling, text classification and retrieval, tailored to sequences of time-stamped documents. Based on the auto-encoder architecture, our nonlinear multi-layer model is trained stage-wise to produce increasingly more compact representations of bags-of-words at the document or paragraph level, thus performing a semantic analysis. It also incorporates simple temporal dynamics on the latent representations, to take advantage of the inherent structure of sequences of documents, and can simultaneously perform a supervised classification or regression on document labels. Learning this model is done by maximizing the joint likelihood of the model, and we use an approximate gradient-based MAP inference. We demonstrate that by minimizing a weighted cross-entropy loss between histograms of word occurrences and their reconstruction, we directly minimize the topic-model perplexity, and show that our topic model obtains lower perplexity than Latent Dirichlet Allocation on the NIPS and State of the Union datasets. We illustrate how the dynamical constraints help the learning while enabling to visualize the topic trajectory. Finally, we demonstrate state-of-the-art information retrieval and classification results on the Reuters collection, as well as an application to volatility forecasting from financial news.

34 citations

Proceedings Article
11 Sep 2017
TL;DR: This study is an initial attempt to compare NCSs by using clustering and topic modelling methods to investigate the similarity and differences between them and argues that topic modeling method, LDA, can be used as an automated technique for analysis and understanding of textual documents by policy makers and governments during the development and reviewing of national strategies and policies.
Abstract: The consequences of cybersecurity attacks can be severe for nation states and their people. Recently many nations have revisited their national cybersecurity strategies (NCSs) to ensure that their cybersecurity capabilities is sufficient to protect their citizens and cyberspace. This study is an initial attempt to compare NCSs by using clustering and topic modelling methods to investigate the similarity and differences between them. We also aimed to identify underlying topics that are appeared in NCSs. We have collected and examined 60 NCSs that have been developed during 20032016. By relying on institutional theories, we found that memberships in the international intuitions could be a determinant factor for harmonization and integration between NCSs. By applying hierarchical clustering method, we noticed a stronger similarities between NCSs that are developed by the EU or NATO members. We also found that public-private partnerships, protection of critical infrastructure, and defending citizen and public IT systems are among those topics that have been received considerable attention in the majority of NCSs. We also argue that topic modeling method, LDA, can be used as an automated technique for analysis and understanding of textual documents by policy makers and governments during the development and reviewing of national strategies and policies.

34 citations

Journal ArticleDOI
TL;DR: The authors performed topic modeling using all publicly available CSR (Corporate Social Responsibility) reports for all constituent firms of the major stock market indices of 15 industrialized countries of the world, including the United States.
Abstract: This paper performs topic modeling using all publicly available CSR (Corporate Social Responsibility) reports for all constituent firms of the major stock market indices of 15 industrialized countr...

34 citations


Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
86% related
Support vector machine
73.6K papers, 1.7M citations
86% related
Deep learning
79.8K papers, 2.1M citations
85% related
Feature extraction
111.8K papers, 2.1M citations
84% related
Convolutional neural network
74.7K papers, 2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023323
2022842
2021418
2020429
2019473
2018446