scispace - formally typeset
Search or ask a question
Topic

Latent Dirichlet allocation

About: Latent Dirichlet allocation is a research topic. Over the lifetime, 5351 publications have been published within this topic receiving 212555 citations. The topic is also known as: LDA.


Papers
More filters
Proceedings Article
01 Jan 2013
TL;DR: This paper uses Latent Dirichlet Allocation (LDA) to generate a topic distribution over each event and user, and enables linked data as data sources to collect contextual information related to events and users, and build an enhanced profile for them.
Abstract: In recent years, social networking services have gained phenomenal popularity. They allow us to explore the world and share our findings in a convenient way. Event is a critical component in social networks. A user can create, share or join different events in their social circle. In this paper, we investigate the problem of event recommendation. We propose recommendation methods based on the similarity of an event’s content and a user’s interests in terms of topics. Specifically, we use Latent Dirichlet Allocation (LDA) to generate a topic distribution over each event and user. We also consider friend relationship and attendance history to increase recommendation accuracy. Moreover, we enable linked data as our data sources to collect contextual information related to events and users, and build an enhanced profile for them. As reliable resource, linked data is used to find structured knowledge and linkages among different knowledge. Finally, we conduct comprehensive experiments on various datasets in both academic community and popular social networking

30 citations

Journal ArticleDOI
TL;DR: A non-parametric hierarchical Bayesian framework is developed for designing a classifier, based on a mixture of simple (linear) classifiers, which is extended to allow simultaneous design of classifiers on multiple data sets, termed multi-task learning.
Abstract: A non-parametric hierarchical Bayesian framework is developed for designing a classifier, based on a mixture of simple (linear) classifiers. Each simple classifier is termed a local "expert", and the number of experts and their construction are manifested via a Dirichlet process formulation. The simple form of the "experts" allows analytical handling of incomplete data. The model is extended to allow simultaneous design of classifiers on multiple data sets, termed multi-task learning, with this also performed non-parametrically via the Dirichlet process. Fast inference is performed using variational Bayesian (VB) analysis, and example results are presented for several data sets. We also perform inference via Gibbs sampling, to which we compare the VB results.

30 citations

Journal ArticleDOI
TL;DR: A linear-time algorithm is proposed that defines a distributed predictive model for finite state symbolic sequences which represent the traces of the activity of a number of individuals within a group.
Abstract: To provide a parsimonious generative representation of the sequential activity of a number of individuals within a population there is a necessary tradeoff between the definition of individual specific and global representations. A linear-time algorithm is proposed that defines a distributed predictive model for finite state symbolic sequences which represent the traces of the activity of a number of individuals within a group. The algorithm is based on a straightforward generalization of latent Dirichlet allocation to time-invariant Markov chains of arbitrary order. The modelling assumption made is that the possibly heterogeneous behavior of individuals may be represented by a relatively small number of simple and common behavioral traits which may interleave randomly according to an individual-specific distribution. The results of an empirical study on three different application domains indicate that this modelling approach provides an efficient low-complexity and intuitively interpretable representation scheme which is reflected by improved prediction performance over comparable models.

30 citations

Journal ArticleDOI
TL;DR: A framework for document-level multi-topic sentiment classification of Email data is developed, and both latent Dirichlet allocation topic modeling and semantic text segmentation are applied to post-process Email documents.
Abstract: Email data has unique characteristics, involving multiple topics, lengthy replies, formal language, high variance in length, high duplication, anomalies, and indirect relationships that distinguish it from other social media data. In order to better model Email documents and to capture complex sentiment structures in the content, we develop a framework for document-level multi-topic sentiment classification of Email data. Note that, a large volume of labeled Email data is rarely publicly available. We introduce an optional data augmentation process to increase the size of datasets with synthetically labeled data to reduce the probability of overfitting and underfitting during the training process. To generate segments with topic embeddings and topic weighting vectors as inputs for our proposed model, we apply both latent Dirichlet allocation topic modeling and semantic text segmentation to post-process Email documents. Empirical results obtained with multiple sets of experiments, including performance comparison against various state-of-the-art algorithms with and without data augmentation and diverse parameter settings, are analyzed to demonstrate the effectiveness of our proposed framework.

30 citations

Proceedings ArticleDOI
01 Dec 2008
TL;DR: A new latent Dirichlet language model (LDLM) is presented for modeling of word sequence by merging theDirichlet priors to characterize the uncertainty of latent topics of n-gram events and a new Bayesian framework is introduced.
Abstract: Latent Dirichlet allocation (LDA) has been successfully presented for document modeling and classification. LDA calculates the document probability based on bag-of-words scheme without considering the sequence of words. This model discovers the topic structure at document level, which is different from the concern of word prediction in speech recognition. In this paper, we present a new latent Dirichlet language model (LDLM) for modeling of word sequence. A new Bayesian framework is introduced by merging the Dirichlet priors to characterize the uncertainty of latent topics of n-gram events. The robust topic-based language model is established accordingly. In the experiments, we implement LDLM for continuous speech recognition and obtain better performance than probabilistic latent semantic analysis (PLSA) based language method.

30 citations


Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
86% related
Support vector machine
73.6K papers, 1.7M citations
86% related
Deep learning
79.8K papers, 2.1M citations
85% related
Feature extraction
111.8K papers, 2.1M citations
84% related
Convolutional neural network
74.7K papers, 2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023323
2022842
2021418
2020429
2019473
2018446