scispace - formally typeset
Search or ask a question
Topic

Latent Dirichlet allocation

About: Latent Dirichlet allocation is a research topic. Over the lifetime, 5351 publications have been published within this topic receiving 212555 citations. The topic is also known as: LDA.


Papers
More filters
Proceedings ArticleDOI
30 Oct 2009
TL;DR: This paper proposes windowing the topic analysis to give a more nuanced view of the system's evolution and demonstrates that windowed topic analysis offers advantages over topic analysis applied to a project's lifetime because many topics are quite local.
Abstract: As development on a software project progresses, developers shift their focus between different topics and tasks many times. Managers and newcomer developers often seek ways of understanding what tasks have recently been worked on and how much effort has gone into each; for example, a manager might wonder what unexpected tasks occupied their team's attention during a period when they were supposed to have been implementing new features. Tools such as Latent Dirichlet Allocation (LDA) and Latent Semantic Indexing (LSI) can be used to extract a set of independent topics from a corpus of commit-log comments. Previous work in the area has created a single set of topics by analyzing comments from the entire lifetime of the project. In this paper, we propose windowing the topic analysis to give a more nuanced view of the system's evolution. By using a defined time-window of, for example, one month, we can track which topics come and go over time, and which ones recur. We propose visualizations of this model that allows us to explore the evolving stream of topics of development occurring over time. We demonstrate that windowed topic analysis offers advantages over topic analysis applied to a project's lifetime because many topics are quite local.

116 citations

Proceedings ArticleDOI
21 Aug 2005
TL;DR: This work proposes a novel Web recommendation system in which collaborative features such as navigation or rating data as well as the content features accessed by the users are seamlessly integrated under the maximum entropy principle.
Abstract: Web users display their preferences implicitly by navigating through a sequence of pages or by providing numeric ratings to some items. Web usage mining techniques are used to extract useful knowledge about user interests from such data. The discovered user models are then used for a variety of applications such as personalized recommendations. Web site content or semantic features of objects provide another source of knowledge for deciphering users' needs or interests. We propose a novel Web recommendation system in which collaborative features such as navigation or rating data as well as the content features accessed by the users are seamlessly integrated under the maximum entropy principle. Both the discovered user patterns and the semantic relationships among Web objects are represented as sets of constraints that are integrated to fit the model. In the case of content features, we use a new approach based on Latent Dirichlet Allocation (LDA) to discover the hidden semantic relationships among items and derive constraints used in the model. Experiments on real Web site usage data sets show that this approach can achieve better recommendation accuracy, when compared to systems using only usage information. The integration of semantic information also allows for better interpretation of the generated recommendations.

116 citations

Proceedings ArticleDOI
01 Nov 2011
TL;DR: This work explores spacetime structures of the topical content of short textual messages in a stream available from Twitter in Ireland using a streaming Latent Dirichlet Allocation topic model trained with an incremental variational Bayes method to quantify a proportion of novelty in the observed abnormal patterns.
Abstract: Human-generated textual data streams from services such as Twitter increasingly become geo-referenced. The spatial resolution of their coverage improves quickly, making them a promising instrument for sensing various aspects of evolution and dynamics of social systems. This work explores spacetime structures of the topical content of short textual messages in a stream available from Twitter in Ireland. It uses a streaming Latent Dirichlet Allocation topic model trained with an incremental variational Bayes method. The posterior probabilities of the discovered topics are post-processed with a spatial kernel density and subjected to comparative analysis. The identified prevailing topics are often found to be spatially contiguous. We apply Markov-modulated non-homogeneous Poisson processes to quantify a proportion of novelty in the observed abnormal patterns. A combined use of these techniques allows for real-time analysis of the temporal evolution and spatial variability of population's response to various stimuli such as large scale sportive, political or cultural events.

115 citations

Proceedings Article
01 Jan 2010
TL;DR: This work extends Latent Dirichlet Allocation by explicitly allowing for the encoding of side information in the distribution over words, which results in a variety of new capabilities, such as improved estimates for infrequently occurring words, as well as the ability to leverage thesauri and dictionaries in order to boost topic cohesion within and across languages.
Abstract: We extend Latent Dirichlet Allocation (LDA) by explicitly allowing for the encoding of side information in the distribution over words. This results in a variety of new capabilities, such as improved estimates for infrequently occurring words, as well as the ability to leverage thesauri and dictionaries in order to boost topic cohesion within and across languages. We present experiments on multi-language topic synchronisation where dictionary information is used to bias corresponding words towards similar topics. Results indicate that our model substantially improves topic cohesion when compared to the standard LDA model.

115 citations

Book ChapterDOI
02 Dec 2013
TL;DR: This paper proposes a novel approach that seamlessly integrates tagging data and WSDL documents through augmented Latent Dirichlet Allocation LDA and develops three strategies to preprocess tagging data before being integrated into the LDA framework for clustering.
Abstract: Clustering Web services that groups together services with similar functionalities helps improve both the accuracy and efficiency of the Web service search engines. An important limitation of existing Web service clustering approaches is that they solely focus on utilizing WSDL Web Service Description Language documents. There has been a recent trend of using user-contributed tagging data to improve the performance of service clustering. Nonetheless, these approaches fail to completely leverage the information carried by the tagging data and hence only trivially improve the clustering performance. In this paper, we propose a novel approach that seamlessly integrates tagging data and WSDL documents through augmented Latent Dirichlet Allocation LDA. We also develop three strategies to preprocess tagging data before being integrated into the LDA framework for clustering. Comprehensive experiments based on real data and the implementation of a Web service search engine demonstrate the effectiveness of the proposed LDA-based service clustering approach.

115 citations


Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
86% related
Support vector machine
73.6K papers, 1.7M citations
86% related
Deep learning
79.8K papers, 2.1M citations
85% related
Feature extraction
111.8K papers, 2.1M citations
84% related
Convolutional neural network
74.7K papers, 2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023323
2022842
2021418
2020429
2019473
2018446