scispace - formally typeset
Search or ask a question
Topic

Latent Dirichlet allocation

About: Latent Dirichlet allocation is a research topic. Over the lifetime, 5351 publications have been published within this topic receiving 212555 citations. The topic is also known as: LDA.


Papers
More filters
Book ChapterDOI
20 Mar 2016
TL;DR: Two new automatic coherence metrics that use Twitter as a separate background dataset to measure the coherence of topics are proposed and it is shown that the proposed Pointwise Mutual Information-based metric provides the highest levels of agreement with human preferences of topic coherence over two Twitter datasets.
Abstract: Twitter offers scholars new ways to understand the dynamics of public opinion and social discussions. However, in order to understand such discussions, it is necessary to identify coherent topics that have been discussed in the tweets. To assess the coherence of topics, several automatic topic coherence metrics have been designed for classical document corpora. However, it is unclear how suitable these metrics are for topic models generated from Twitter datasets. In this paper, we use crowdsourcing to obtain pairwise user preferences of topical coherences and to determine how closely each of the metrics align with human preferences. Moreover, we propose two new automatic coherence metrics that use Twitter as a separate background dataset to measure the coherence of topics. We show that our proposed Pointwise Mutual Information-based metric provides the highest levels of agreement with human preferences of topic coherence over two Twitter datasets.

32 citations

Journal ArticleDOI
TL;DR: By combining users' sentiments in review texts and their rating scores, the RAS model can learn more precise latent factors of users and items compared with the baseline models and alleviates the cold-start problem to some extent.

32 citations

Journal ArticleDOI
TL;DR: The popular Latent Dirichlet Allocation model is extended, by exploiting three different conditional Markovian assumptions, to extend the performance advantages of sequence-modeling approaches over real-word data.
Abstract: Probabilistic topic models are widely used in different contexts to uncover the hidden structure in large text corpora. One of the main (and perhaps strong) assumption of these models is that generative process follows a bag-of-words assumption, i.e. each token is independent from the previous one. We extend the popular Latent Dirichlet Allocation model by exploiting three different conditional Markovian assumptions: (i) the token generation depends on the current topic and on the previous token; (ii) the topic associated with each observation depends on topic associated with the previous one; (iii) the token generation depends on the current and previous topic. For each of these modeling assumptions we present a Gibbs Sampling procedure for parameter estimation. Experimental evaluation over real-word data shows the performance advantages, in terms of recall and precision, of the sequence-modeling approaches.

32 citations

15 Mar 2015
TL;DR: This is the first study in the medical domain that has been done to use fuzzy set theory to express semantic properties of words and documents in terms of topics, and the experimental results showed major improvements.
Abstract: In the past several years, the medical data have been growing explosively. For example, the number of papers published in PubMed was increased from 112,177 in 1960 to 2,019,238 in 2013 and the annual average number of discharges between 2007 and 2010 is around 35 million1. Recently, various text mining techniques have been introduced into the medical domain. One fundamental objective of those techniques is to process the unstructured medical data into a proper format for better utilization to recognize explicit facts. Topic Modeling with Latent Dirichlet Allocation (LDA) (Blei, Ng, & Jordan, 2003) is a popular unsupervised method for discovering latent semantic structure of a document collection. Topic modeling has been applied on medical data for different purposes, such as medical document categorization (Sarioglu, Yadav, & Choi, 2013) and medical document retrieval (Huang et al., 2014). Despite the usefulness of topic models for medical data analysis, existing topic models such as LDA still suffer from several critical issues. One issue of those existing topic models is their computational complexity of the model. Almost all uses of topic models require probabilistic inference, which is arguably hard to achieve without approximate inference algorithms such as Gibbs sampling. Another issue of those existing topic models is their expressive power of representing medical documents. The performance of various tasks such as document classification modeling using topic models is still not satisfactory. In this paper, we propose to model medical documents using fuzzy set theory. Fuzzy set theory models membership of objects using a possibility distribution. To the best of our knowledge, this is the first study in the medical domain that has been done to use fuzzy set theory to express semantic properties of words and documents in terms of topics. Compared with existing topic models such as LDA, the fuzzy set theory is computationally efficient. We develop several efficient strategies to model medical documents using fuzzy set theory. Regarding the expressive power, we adopt real medical document collections and compare the performance of our proposed method with LDA by considering document modeling. The experimental results showed major improvements.

32 citations

Journal ArticleDOI
TL;DR: This paper enhances the search and recommendation functionalities of Enterprise Social Software by extending their knowledge structures with the addition of underlying hidden topics which are discovered using probabilistic topic models.
Abstract: Highlights? Application of latent topic models for search and recommendation in Enterprise Social Software ? Generation and refinement of enterprise knowledge structures with latent topic models ? Item to item collaborative and content based recommendations with Latent Dirichlet Allocation Enterprise Social Software refers to open and flexible organizational systems and tools which utilize Web 20 technologies to stimulate participation through informal interactions A challenge in Enterprise Social Software is to discover and maintain over time the knowledge structure of topics found relevant to the organization Knowledge structures, ranging in formality from ontologies to folksonomies, support user activity by enabling users to categorize and retrieve information resources In this paper we enhance the search and recommendation functionalities of Enterprise Social Software by extending their knowledge structures with the addition of underlying hidden topics which we discover using probabilistic topic models We employ Latent Dirichlet Allocation in order to elicit hidden topics and use the latter to assess similarities in resource and tag recommendation as well as for the expansion of query results As an application of our approach we have extended the search and recommendation facilities of an open source Enterprise Social Software system which we have deployed and evaluated in five knowledge-intensive small and medium enterprises

32 citations


Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
86% related
Support vector machine
73.6K papers, 1.7M citations
86% related
Deep learning
79.8K papers, 2.1M citations
85% related
Feature extraction
111.8K papers, 2.1M citations
84% related
Convolutional neural network
74.7K papers, 2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023323
2022842
2021418
2020429
2019473
2018446