scispace - formally typeset
Search or ask a question
Topic

Latent Dirichlet allocation

About: Latent Dirichlet allocation is a research topic. Over the lifetime, 5351 publications have been published within this topic receiving 212555 citations. The topic is also known as: LDA.


Papers
More filters
Proceedings Article
Liang Yao1, Yin Zhang1, Baogang Wei1, Zhe Jin1, Rui Zhang1, Yangyang Zhang1, Qinfei Chen1 
12 Feb 2017
TL;DR: This paper proposes a novel knowledge-based topic model by incorporating knowledge graph embeddings into topic modeling and improves the semantic coherence significantly and capture a better representation of a document in the topic space.
Abstract: Probabilistic topic models could be used to extract low-dimension topics from document collections. However, such models without any human knowledge often produce topics that are not interpretable. In recent years, a number of knowledge-based topic models have been proposed, but they could not process fact-oriented triple knowledge in knowledge graphs. Knowledge graph embeddings, on the other hand, automatically capture relations between entities in knowledge graphs. In this paper, we propose a novel knowledge-based topic model by incorporating knowledge graph embeddings into topic modeling. By combining latent Dirichlet allocation, a widely used topic model with knowledge encoded by entity vectors, we improve the semantic coherence significantly and capture a better representation of a document in the topic space. Our evaluation results will demonstrate the effectiveness of our method.

54 citations

01 Aug 2012
TL;DR: This article presents a general method to use information retrieved from the Latent Dirichlet Allocation (LDA) topic model for Text Segmentation: Using topic assignments instead of words in two well-known Text Se segmentation algorithms, namely TextTiling and C99, leads to significant improvements.
Abstract: This article presents a general method to use information retrieved from the Latent Dirichlet Allocation (LDA) topic model for Text Segmentation: Using topic assignments instead of words in two well-known Text Segmentation algorithms, namely TextTiling and C99, leads to significant improvements. Further, we introduce our own algorithm called TopicTiling, which is a simplified version of TextTiling (Hearst, 1997). In our study, we evaluate and optimize parameters of LDA and TopicTiling. A further contribution to improve the segmentation accuracy is obtained through stabilizing topic assignments by using information from all LDA inference iterations. Finally, we show that TopicTiling outperforms previous Text Segmentation algorithms on two widely used datasets, while being computationally less expensive than other algorithms.

54 citations

Journal ArticleDOI
TL;DR: In this paper, the authors examined trends in academic research on personal information privacy using Scopus DB and extracted 2356 documents covering journal articles, reviews, book chapters, conference papers and working papers published between 1972 and August 2015.

53 citations

Proceedings ArticleDOI
01 Apr 2014
TL;DR: This work explores topic adaptation on a diverse data set and presents a new bilingual variant of Latent Dirichlet Allocation to compute topic-adapted, probabilistic phrase translation features, and dynamically infer document-specific translation probabilities for test sets of unknown origin.
Abstract: Translating text from diverse sources poses a challenge to current machine translation systems which are rarely adapted to structure beyond corpus level. We explore topic adaptation on a diverse data set and present a new bilingual variant of Latent Dirichlet Allocation to compute topic-adapted, probabilistic phrase translation features. We dynamically infer document-specific translation probabilities for test sets of unknown origin, thereby capturing the effects of document context on phrase translations. We show gains of up to 1.26 BLEU over the baseline and 1.04 over a domain adaptation benchmark. We further provide an analysis of the domain-specific data and show additive gains of our model in combination with other types of topic-adapted features.

53 citations

Proceedings Article
01 Nov 2011
TL;DR: A hierarchical Bayesian model based on latent Dirichlet allocation (LDA) is presented, called subjLDA, for sentence-level subjectivity detection, which automatically identifies whether a given sentence expresses opinion or states facts.
Abstract: This paper presents a hierarchical Bayesian model based on latent Dirichlet allocation (LDA), called subjLDA, for sentence-level subjectivity detection, which automatically identifies whether a given sentence expresses opinion or states facts. In contrast to most of the existing methods relying on either labelled corpora for classifier training or linguistic pattern extraction for subjectivity classification, we view the problem as weakly-supervised generative model learning, where the only input to the model is a small set of domain independent subjectivity lexical clues. A mechanism is introduced to incorporate the prior information about the subjectivity lexical clues into model learning by modifying the Dirichlet priors of topic-word distributions. The subjLDA model has been evaluated on the Multi-Perspective Question Answering (MPQA) dataset and promising results have been observed in the preliminary experiments. We have also explored adding neutral words as prior information for model learning. It was found that while incorporating subjectivity clues bearing positive or negative polarity can achieve a significant performance gain, the prior lexical information from neutral words is less effective.

53 citations


Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
86% related
Support vector machine
73.6K papers, 1.7M citations
86% related
Deep learning
79.8K papers, 2.1M citations
85% related
Feature extraction
111.8K papers, 2.1M citations
84% related
Convolutional neural network
74.7K papers, 2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023323
2022842
2021418
2020429
2019473
2018446