Topic
Latent Dirichlet allocation
About: Latent Dirichlet allocation is a research topic. Over the lifetime, 5351 publications have been published within this topic receiving 212555 citations. The topic is also known as: LDA.
Papers published on a yearly basis
Papers
More filters
•
01 Jan 2013TL;DR: This paper uses Latent Dirichlet Allocation (LDA) to generate a topic distribution over each event and user, and enables linked data as data sources to collect contextual information related to events and users, and build an enhanced profile for them.
Abstract: In recent years, social networking services have gained phenomenal popularity. They allow us to explore the world and share our findings in a convenient way. Event is a critical component in social networks. A user can create, share or join different events in their social circle. In this paper, we investigate the problem of event recommendation. We propose recommendation methods based on the similarity of an event’s content and a user’s interests in terms of topics. Specifically, we use Latent Dirichlet Allocation (LDA) to generate a topic distribution over each event and user. We also consider friend relationship and attendance history to increase recommendation accuracy. Moreover, we enable linked data as our data sources to collect contextual information related to events and users, and build an enhanced profile for them. As reliable resource, linked data is used to find structured knowledge and linkages among different knowledge. Finally, we conduct comprehensive experiments on various datasets in both academic community and popular social networking
30 citations
••
TL;DR: A non-parametric hierarchical Bayesian framework is developed for designing a classifier, based on a mixture of simple (linear) classifiers, which is extended to allow simultaneous design of classifiers on multiple data sets, termed multi-task learning.
Abstract: A non-parametric hierarchical Bayesian framework is developed for designing a classifier, based on a mixture of simple (linear) classifiers. Each simple classifier is termed a local "expert", and the number of experts and their construction are manifested via a Dirichlet process formulation. The simple form of the "experts" allows analytical handling of incomplete data. The model is extended to allow simultaneous design of classifiers on multiple data sets, termed multi-task learning, with this also performed non-parametrically via the Dirichlet process. Fast inference is performed using variational Bayesian (VB) analysis, and example results are presented for several data sets. We also perform inference via Gibbs sampling, to which we compare the VB results.
30 citations
••
TL;DR: A linear-time algorithm is proposed that defines a distributed predictive model for finite state symbolic sequences which represent the traces of the activity of a number of individuals within a group.
Abstract: To provide a parsimonious generative representation of the sequential activity of a number of individuals within a population there is a necessary tradeoff between the definition of individual specific and global representations. A linear-time algorithm is proposed that defines a distributed predictive model for finite state symbolic sequences which represent the traces of the activity of a number of individuals within a group. The algorithm is based on a straightforward generalization of latent Dirichlet allocation to time-invariant Markov chains of arbitrary order. The modelling assumption made is that the possibly heterogeneous behavior of individuals may be represented by a relatively small number of simple and common behavioral traits which may interleave randomly according to an individual-specific distribution. The results of an empirical study on three different application domains indicate that this modelling approach provides an efficient low-complexity and intuitively interpretable representation scheme which is reflected by improved prediction performance over comparable models.
30 citations
••
TL;DR: A framework for document-level multi-topic sentiment classification of Email data is developed, and both latent Dirichlet allocation topic modeling and semantic text segmentation are applied to post-process Email documents.
Abstract: Email data has unique characteristics, involving multiple topics, lengthy replies, formal language, high variance in length, high duplication, anomalies, and indirect relationships that distinguish it from other social media data. In order to better model Email documents and to capture complex sentiment structures in the content, we develop a framework for document-level multi-topic sentiment classification of Email data. Note that, a large volume of labeled Email data is rarely publicly available. We introduce an optional data augmentation process to increase the size of datasets with synthetically labeled data to reduce the probability of overfitting and underfitting during the training process. To generate segments with topic embeddings and topic weighting vectors as inputs for our proposed model, we apply both latent Dirichlet allocation topic modeling and semantic text segmentation to post-process Email documents. Empirical results obtained with multiple sets of experiments, including performance comparison against various state-of-the-art algorithms with and without data augmentation and diverse parameter settings, are analyzed to demonstrate the effectiveness of our proposed framework.
30 citations
••
01 Dec 2008TL;DR: A new latent Dirichlet language model (LDLM) is presented for modeling of word sequence by merging theDirichlet priors to characterize the uncertainty of latent topics of n-gram events and a new Bayesian framework is introduced.
Abstract: Latent Dirichlet allocation (LDA) has been successfully presented for document modeling and classification. LDA calculates the document probability based on bag-of-words scheme without considering the sequence of words. This model discovers the topic structure at document level, which is different from the concern of word prediction in speech recognition. In this paper, we present a new latent Dirichlet language model (LDLM) for modeling of word sequence. A new Bayesian framework is introduced by merging the Dirichlet priors to characterize the uncertainty of latent topics of n-gram events. The robust topic-based language model is established accordingly. In the experiments, we implement LDLM for continuous speech recognition and obtain better performance than probabilistic latent semantic analysis (PLSA) based language method.
30 citations