Topic

Latent Dirichlet allocation

About: Latent Dirichlet allocation is a research topic. Over the lifetime, 5351 publications have been published within this topic receiving 212555 citations. The topic is also known as: LDA.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Sentiment Classification Based on AS-LDA Model☆

[...]

Jiguang Liang¹, Ping Liu¹, Jianlong Tan¹, Shuo Bai¹•Institutions (1)

Chinese Academy of Sciences¹

01 Jan 2014-Procedia Computer Science

TL;DR: A sentiment classification method called AS LDA is introduced, which assumes that words in subjective documents consists of two parts: sentiment element words and auxiliary words which are sampled accordingly from sentiment topics and auxiliary topics.

...read moreread less

32 citations

Proceedings Article•

Multi-field Correlated Topic Modeling.

[...]

Konstantin Salomatin¹, Yiming Yang¹, Abhimanyu Lad•Institutions (1)

Carnegie Mellon University¹

01 Jan 2009

TL;DR: A new extension of the CTM method to enable modeling with multi-field topics in a global graphical structure, and a mean-field variational algorithm to allow joint learning of multinomial topic models from discrete data and Gaussianstyle topic models for real-valued data are proposed.

...read moreread less

Abstract: Popular methods for probabilistic topic modeling like the Latent Dirichlet Allocation (LDA, [1]) and Correlated Topic Models (CTM, [2]) share an important property, i.e., using a common set of topics to model all the data. This property can be too restrictive for modeling complex data entries where multiple fields of heterogeneous data jointly provide rich information about each object or event. We propose a new extension of the CTM method to enable modeling with multi-field topics in a global graphical structure, and a mean-field variational algorithm to allow joint learning of multinomial topic models from discrete data and Gaussianstyle topic models for real-valued data. We conducted experiments with both simulated and real data, and observed that the multi-field CTM outperforms a conventional CTM in both likelihood maximization and perplexity reduction. A deeper analysis on the simulated data reveals that the superior performance is the result of successful discovery of the mapping among field-specific topics and observed data.

...read moreread less

32 citations

Journal Article•DOI•

An enhanced guided LDA model augmented with BERT based semantic strength for aspect term extraction in sentiment analysis

[...]

Manju Venugopalan, Deepa Gupta

01 Mar 2022-Knowledge Based Systems

TL;DR: This paper proposed an unsupervised approach for aspect term extraction, a guided Latent Dirichlet Allocation (LDA) model that uses minimal aspect seed words from each aspect category to guide the model in identifying the hidden topics of interest to the user.

...read moreread less

Abstract: Aspect level sentiment analysis is a fine-grained task in sentiment analysis. It extracts aspects and their corresponding sentiment polarity from opinionated text. The first subtask of identifying the opinionated aspects is called aspect extraction, which is the focus of the work. Social media platforms are an enormous resource of unlabeled data. However, data annotation for fine-grained tasks is quite expensive and laborious. Hence unsupervised models would be highly appreciated. The proposed model is an unsupervised approach for aspect term extraction, a guided Latent Dirichlet Allocation (LDA) model that uses minimal aspect seed words from each aspect category to guide the model in identifying the hidden topics of interest to the user. The guided LDA model is enhanced by guiding inputs using regular expressions based on linguistic rules. The model is further enhanced by multiple pruning strategies, including a BERT based semantic filter, which incorporates semantics to strengthen situations where co-occurrence statistics might fail to serve as a differentiator. The thresholds for these semantic filters have been estimated using Particle Swarm Optimization strategy. The proposed model is expected to overcome the disadvantage of basic LDA models that fail to differentiate the overlapping topics that represent each aspect category. The work has been evaluated on the restaurant domain of SemEval 2014, 2015 and 2016 datasets and has reported an F-measure of 0.81, 0.74 and 0.75 respectively, which is competitive in comparison to the state of art unsupervised baselines and appreciable even with respect to the supervised baselines.

...read moreread less

32 citations

Proceedings Article•DOI•

LDA-based keyword selection in text categorization

[...]

Serafettin Tasci¹, Tunga Güngör¹•Institutions (1)

Boğaziçi University¹

23 Oct 2009

TL;DR: It is observed that almost in all metrics, information gain performs best at all keyword numbers while the LDA-based metrics perform similar to chi-square and document frequency thresholding.

...read moreread less

Abstract: Text categorization is the task of automatically assigning unlabeled text documents to some predefined category labels by means of an induction algorithm. Since the data in text categorization are high-dimensional, feature selection is broadly used in text categorization systems for reducing the dimensionality. In the literature, there are some widely known metrics such as information gain and document frequency thresholding. Recently, a generative graphical model called latent dirichlet allocation (LDA) that can be used to model and discover the underlying topic structures of textual data, was proposed. In this paper, we use the hidden topic analysis of LDA for feature selection and compare it with the classical feature selection metrics in text categorization. For the experiments, we use SVM as the classifier and tf∗idf weighting for weighting the terms. We observed that almost in all metrics, information gain performs best at all keyword numbers while the LDA-based metrics perform similar to chi-square and document frequency thresholding.

...read moreread less

32 citations

Posted Content•DOI•

TClustVID: A Novel Machine Learning Classification Model to Investigate Topics and Sentiment in COVID-19 Tweets

[...]

Md. Shahriare Satu¹, Md. Imran Khan, Mufti Mahmud², Shahadat Uddin³, Matthew A. Summers⁴, Matthew A. Summers⁵, Julian M.W. Quinn⁶, Julian M.W. Quinn⁵, Mohammad Ali Moni⁷, Mohammad Ali Moni⁵ - Show less +6 more•Institutions (7)

Noakhali Science and Technology University¹, Nottingham Trent University², University of Sydney³, University of Cambridge⁴, Garvan Institute of Medical Research⁵, Royal North Shore Hospital⁶, University of New South Wales⁷

04 Aug 2020-medRxiv

TL;DR: An intelligent clustering-based classification and topics extracting model (named TClustVID) that analyze COVID-19-related public tweets to extract significant sentiments with high accuracy and showed higher performance compared to the traditional classifiers determined by clustering criteria.

...read moreread less

Abstract: COVID-19, caused by the SARS-Cov2, varies greatly in its severity but represent serious respiratory symptoms with vascular and other complications, particularly in older adults. The disease can be spread by both symptomatic and asymptomatic infected individuals, and remains uncertainty over key aspects of its infectivity, no effective remedy yet exists and this disease causes severe economic effects globally. For these reasons, COVID-19 is the subject of intense and widespread discussion on social media platforms including Facebook and Twitter. These public forums substantially impact on public opinions in some cases and exacerbate widespread panic and misinformation spread during the crisis. Thus, this work aimed to design an intelligent clustering-based classification and topics extracting model (named TClustVID) that analyze COVID-19-related public tweets to extract significant sentiments with high accuracy. We gathered COVID-19 Twitter datasets from the IEEE Dataport repository and employed a range of data preprocessing methods to clean the raw data, then applied tokenization and produced a word-to-index dictionary. Thereafter, different classifications were employed to Twitter datasets which enabled exploration of the performance of traditional and TclustVID classification methods. TClustVID showed higher performance compared to the traditional classifiers determined by clustering criteria. Finally, we extracted significant topic clusters from TClustVID, split them into positive, neutral and negative clusters and implemented latent dirichlet allocation for extraction of popular COVID-19 topics. This approach identified common prevailing public opinions and concerns related to COVID-19, as well as attitudes to infection prevention strategies held by people from different countries concerning the current pandemic situation.

...read moreread less

32 citations

Collapse

Network Information

Performance

Metrics

6,513

Papers

245,225

Citations

No. of papers in the topic in previous years
Year	Papers
2023	323
2022	842
2021	418
2020	429
2019	473
2018	446

Latent Dirichlet allocation

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics