Topic

Latent Dirichlet allocation

About: Latent Dirichlet allocation is a research topic. Over the lifetime, 5351 publications have been published within this topic receiving 212555 citations. The topic is also known as: LDA.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Book Chapter•DOI•

Topics in Tweets: A User Study of Topic Coherence Metrics for Twitter Data

[...]

Anjie Fang¹, Craig Macdonald¹, Iadh Ounis¹, Philip Habel¹•Institutions (1)

University of Glasgow¹

20 Mar 2016

TL;DR: Two new automatic coherence metrics that use Twitter as a separate background dataset to measure the coherence of topics are proposed and it is shown that the proposed Pointwise Mutual Information-based metric provides the highest levels of agreement with human preferences of topic coherence over two Twitter datasets.

...read moreread less

Abstract: Twitter offers scholars new ways to understand the dynamics of public opinion and social discussions. However, in order to understand such discussions, it is necessary to identify coherent topics that have been discussed in the tweets. To assess the coherence of topics, several automatic topic coherence metrics have been designed for classical document corpora. However, it is unclear how suitable these metrics are for topic models generated from Twitter datasets. In this paper, we use crowdsourcing to obtain pairwise user preferences of topical coherences and to determine how closely each of the metrics align with human preferences. Moreover, we propose two new automatic coherence metrics that use Twitter as a separate background dataset to measure the coherence of topics. We show that our proposed Pointwise Mutual Information-based metric provides the highest levels of agreement with human preferences of topic coherence over two Twitter datasets.

...read moreread less

32 citations

Journal Article•DOI•

Rating prediction using review texts with underlying sentiments

[...]

Dongjin Yu¹, Yunlei Mu¹, Yike Jin¹•Institutions (1)

Hangzhou Dianzi University¹

01 Jan 2017-Information Processing Letters

TL;DR: By combining users' sentiments in review texts and their rating scores, the RAS model can learn more precise latent factors of users and items compared with the baseline models and alleviates the cold-start problem to some extent.

...read moreread less

32 citations

Journal Article•DOI•

Probabilistic topic models for sequence data

[...]

Nicola Barbieri¹, Giuseppe Manco², Ettore Ritacco², Marco Carnuccio³, Antonio Bevacqua³ - Show less +1 more•Institutions (3)

Yahoo!¹, Indian Council of Agricultural Research², University of Calabria³

01 Oct 2013-Machine Learning

TL;DR: The popular Latent Dirichlet Allocation model is extended, by exploiting three different conditional Markovian assumptions, to extend the performance advantages of sequence-modeling approaches over real-word data.

...read moreread less

Abstract: Probabilistic topic models are widely used in different contexts to uncover the hidden structure in large text corpora. One of the main (and perhaps strong) assumption of these models is that generative process follows a bag-of-words assumption, i.e. each token is independent from the previous one. We extend the popular Latent Dirichlet Allocation model by exploiting three different conditional Markovian assumptions: (i) the token generation depends on the current topic and on the previous token; (ii) the topic associated with each observation depends on topic associated with the previous one; (iii) the token generation depends on the current and previous topic. For each of these modeling assumptions we present a Gibbs Sampling procedure for parameter estimation. Experimental evaluation over real-word data shows the performance advantages, in terms of recall and precision, of the sequence-modeling approaches.

...read moreread less

32 citations

A Fuzzy Approach Model for Uncovering Hidden Latent Semantic Structure in Medical Text Collections

[...]

Amir Karami, Aryya Gangopadhyay, Bin Zhou, Hadi Kharrazi

15 Mar 2015

TL;DR: This is the first study in the medical domain that has been done to use fuzzy set theory to express semantic properties of words and documents in terms of topics, and the experimental results showed major improvements.

...read moreread less

Abstract: In the past several years, the medical data have been growing explosively. For example, the number of papers published in PubMed was increased from 112,177 in 1960 to 2,019,238 in 2013 and the annual average number of discharges between 2007 and 2010 is around 35 million1. Recently, various text mining techniques have been introduced into the medical domain. One fundamental objective of those techniques is to process the unstructured medical data into a proper format for better utilization to recognize explicit facts. Topic Modeling with Latent Dirichlet Allocation (LDA) (Blei, Ng, & Jordan, 2003) is a popular unsupervised method for discovering latent semantic structure of a document collection. Topic modeling has been applied on medical data for different purposes, such as medical document categorization (Sarioglu, Yadav, & Choi, 2013) and medical document retrieval (Huang et al., 2014). Despite the usefulness of topic models for medical data analysis, existing topic models such as LDA still suffer from several critical issues. One issue of those existing topic models is their computational complexity of the model. Almost all uses of topic models require probabilistic inference, which is arguably hard to achieve without approximate inference algorithms such as Gibbs sampling. Another issue of those existing topic models is their expressive power of representing medical documents. The performance of various tasks such as document classification modeling using topic models is still not satisfactory. In this paper, we propose to model medical documents using fuzzy set theory. Fuzzy set theory models membership of objects using a possibility distribution. To the best of our knowledge, this is the first study in the medical domain that has been done to use fuzzy set theory to express semantic properties of words and documents in terms of topics. Compared with existing topic models such as LDA, the fuzzy set theory is computationally efficient. We develop several efficient strategies to model medical documents using fuzzy set theory. Regarding the expressive power, we adopt real medical document collections and compare the performance of our proposed method with LDA by considering document modeling. The experimental results showed major improvements.

...read moreread less

32 citations

Journal Article•DOI•

Using latent topics to enhance search and recommendation in Enterprise Social Software

[...]

Konstantinos Christidis¹, Gregoris Mentzas¹, Dimitris Apostolou²•Institutions (2)

National Technical University of Athens¹, University of Piraeus²

01 Aug 2012-Expert Systems With Applications

TL;DR: This paper enhances the search and recommendation functionalities of Enterprise Social Software by extending their knowledge structures with the addition of underlying hidden topics which are discovered using probabilistic topic models.

...read moreread less

Abstract: Highlights? Application of latent topic models for search and recommendation in Enterprise Social Software ? Generation and refinement of enterprise knowledge structures with latent topic models ? Item to item collaborative and content based recommendations with Latent Dirichlet Allocation Enterprise Social Software refers to open and flexible organizational systems and tools which utilize Web 20 technologies to stimulate participation through informal interactions A challenge in Enterprise Social Software is to discover and maintain over time the knowledge structure of topics found relevant to the organization Knowledge structures, ranging in formality from ontologies to folksonomies, support user activity by enabling users to categorize and retrieve information resources In this paper we enhance the search and recommendation functionalities of Enterprise Social Software by extending their knowledge structures with the addition of underlying hidden topics which we discover using probabilistic topic models We employ Latent Dirichlet Allocation in order to elicit hidden topics and use the latter to assess similarities in resource and tag recommendation as well as for the expansion of query results As an application of our approach we have extended the search and recommendation facilities of an open source Enterprise Social Software system which we have deployed and evaluated in five knowledge-intensive small and medium enterprises

...read moreread less

32 citations

Collapse

Network Information

Performance

Metrics

6,513

Papers

245,225

Citations

No. of papers in the topic in previous years
Year	Papers
2023	323
2022	842
2021	418
2020	429
2019	473
2018	446

Latent Dirichlet allocation

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics