Topic

Latent Dirichlet allocation

About: Latent Dirichlet allocation is a research topic. Over the lifetime, 5351 publications have been published within this topic receiving 212555 citations. The topic is also known as: LDA.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•

Latent Variable Models of Selectional Preference

[...]

Diarmuid Ó Séaghdha¹•Institutions (1)

University of Cambridge¹

11 Jul 2010

TL;DR: Three models related to Latent Dirichlet Allocation, a proven method for modelling document-word cooccurrences, are presented and evaluated on datasets of human plausibility judgements and perform very competitively, especially for infrequent predicate-argument combinations.

...read moreread less

Abstract: This paper describes the application of so-called topic models to selectional preference induction. Three models related to Latent Dirichlet Allocation, a proven method for modelling document-word cooccurrences, are presented and evaluated on datasets of human plausibility judgements. Compared to previously proposed techniques, these models perform very competitively, especially for infrequent predicate-argument combinations where they exceed the quality of Web-scale predictions while using relatively little data.

...read moreread less

114 citations

Proceedings Article•DOI•

Latent dirichlet allocation based multi-document summarization

[...]

Rachit Arora¹, Balaraman Ravindran¹•Institutions (1)

Indian Institute of Technology Madras¹

24 Jul 2008

TL;DR: This article uses Latent Dirichlet Allocation to capture the events being covered by the documents and form the summary with sentences representing these different events and shows that the algorithms gave significantly better ROUGE-1 recall measures compared to DUC 2002 winners.

...read moreread less

Abstract: Extraction based Multi-Document Summarization Algorithms consist of choosing sentences from the documents using some weighting mechanism and combining them into a summary. In this article we use Latent Dirichlet Allocation to capture the events being covered by the documents and form the summary with sentences representing these different events. Our approach is distinguished from existing approaches in that we use mixture models to capture the topics and pick up the sentences without paying attention to the details of grammar and structure of the documents. Finally we present the evaluation of the algorithms on the DUC 2002 Corpus multi-document summarization tasks using the ROUGE evaluator to evaluate the summaries. Compared to DUC 2002 winners, our algorithms gave significantly better ROUGE-1 recall measures.

...read moreread less

114 citations

Book Chapter•DOI•

Analyzing entities and topics in news articles using statistical topic models

[...]

David Newman¹, Chaitanya Chemudugunta¹, Padhraic Smyth¹, Mark Steyvers¹•Institutions (1)

University of California, Irvine¹

23 May 2006

TL;DR: A novel combination of statistical topic models and named-entity recognizers are presented to jointly analyze entities mentioned and topics discussed in a collection of 330,000 New York Times news articles.

...read moreread less

Abstract: Statistical language models can learn relationships between topics discussed in a document collection and persons, organizations and places mentioned in each document. We present a novel combination of statistical topic models and named-entity recognizers to jointly analyze entities mentioned (persons, organizations and places) and topics discussed in a collection of 330,000 New York Times news articles. We demonstrate an analytic framework which automatically extracts from a large collection: topics; topic trends; and topics that relate entities.

...read moreread less

114 citations

Journal Article•DOI•

Content analysis of e-petitions with topic modeling: How to train and evaluate LDA models?

[...]

Loni Hagen¹•Institutions (1)

University of South Florida¹

01 Nov 2018-Information Processing and Management

TL;DR: A framework to train and validate Latent Dirichlet Allocation (LDA), the simplest and most popular topic modeling algorithm, using e-petition data is described and findings have significant implications for developing LDA tools and assuring validity and interpretability of LDA content analysis.

...read moreread less

Abstract: E-petitions have become a popular vehicle for political activism, but studying them has been difficult because efficient methods for analyzing their content are currently lacking. Researchers have used topic modeling for content analysis, but current practices carry some serious limitations. While modeling may be more efficient than manually reading each petition, it generally relies on unsupervised machine learning and so requires a dependable training and validation process. And so this paper describes a framework to train and validate Latent Dirichlet Allocation (LDA), the simplest and most popular topic modeling algorithm, using e-petition data. With rigorous training and evaluation, 87% of LDA-generated topics made sense to human judges. Topics also aligned well with results from an independent content analysis by the Pew Research Center, and were strongly associated with corresponding social events. Computer-assisted content analysts can benefit from our guidelines to supervise every process of training and evaluation of LDA. Software developers can benefit from learning the demands of social scientists when using LDA for content analysis. These findings have significant implications for developing LDA tools and assuring validity and interpretability of LDA content analysis. In addition, LDA topics can have some advantages over subjects extracted by manual content analysis by reflecting multiple themes expressed in texts, by extracting new themes that are not highlighted by human coders, and by being less prone to human bias.

...read moreread less

113 citations

Journal Article•DOI•

Transportation sentiment analysis using word embedding and ontology-based topic modeling

[...]

Farman Ali¹, Daehan Kwak², Pervez Khan³, Shaker El-Sappagh⁴, Shaker El-Sappagh¹, Amjad Ali⁵, Amjad Ali¹, Sana Ullah⁶, Sana Ullah⁷, Kyehyun Kim¹, Kyung Sup Kwak¹ - Show less +7 more•Institutions (7)

Inha University¹, Kean University², Information Technology University³, Banha University⁴, COMSATS Institute of Information Technology⁵, Gyeongsang National University⁶, University of Swat⁷

15 Jun 2019-Knowledge Based Systems

TL;DR: This work proposes an ontology and latent Dirichlet allocation (OLDA)-based topic modeling and word embedding approach for sentiment classification, which achieves accuracy of 93%, which shows that the proposed approach is effective for sentiment Classification.

...read moreread less

Abstract: Social networks play a key role in providing a new approach to collecting information regarding mobility and transportation services. To study this information, sentiment analysis can make decent observations to support intelligent transportation systems (ITSs) in examining traffic control and management systems. However, sentiment analysis faces technical challenges: extracting meaningful information from social network platforms, and the transformation of extracted data into valuable information. In addition, accurate topic modeling and document representation are other challenging tasks in sentiment analysis. We propose an ontology and latent Dirichlet allocation (OLDA)-based topic modeling and word embedding approach for sentiment classification. The proposed system retrieves transportation content from social networks, removes irrelevant content to extract meaningful information, and generates topics and features from extracted data using OLDA. It also represents documents using word embedding techniques, and then employs lexicon-based approaches to enhance the accuracy of the word embedding model. The proposed ontology and the intelligent model are developed using Web Ontology Language and Java, respectively. Machine learning classifiers are used to evaluate the proposed word embedding system. The method achieves accuracy of 93%, which shows that the proposed approach is effective for sentiment classification.

...read moreread less

113 citations

Collapse

Network Information

Performance

Metrics

6,513

Papers

245,225

Citations

No. of papers in the topic in previous years
Year	Papers
2023	323
2022	842
2021	418
2020	429
2019	473
2018	446

Latent Dirichlet allocation

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics