Topic

Latent Dirichlet allocation

About: Latent Dirichlet allocation is a research topic. Over the lifetime, 5351 publications have been published within this topic receiving 212555 citations. The topic is also known as: LDA.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Im2Text and Text2Im: Associating Images and Texts for Cross-Modal Retrieval

[...]

Yashaswi Verma¹, C. V. Jawahar¹•Institutions (1)

International Institute of Information Technology, Hyderabad¹

01 Jan 2014

TL;DR: This paper studies two complementary cross-modal prediction tasks: predicting text given an image (“Im2Text”), and predicting image(s) given a piece of text (‘Text2Im’), and proposes a novel Structural SVM based unified formulation for these two tasks.

...read moreread less

Abstract: Building bilateral semantic associations between images and texts is among the fundamental problems in computer vision. In this paper, we study two complementary cross-modal prediction tasks: (i) predicting text(s) given an image (“Im2Text”), and (ii) predicting image(s) given a piece of text (“Text2Im”). We make no assumption on the specific form of text; i.e., it could be either a set of labels, phrases, or even captions. We pose both these tasks in a retrieval framework. For Im2Text, given a query image, our goal is to retrieve a ranked list of semantically relevant texts from an independent textcorpus (i.e., texts with no corresponding images). Similarly, for Text2Im, given a query text, we aim to retrieve a ranked list of semantically relevant images from a collection of unannotated images (i.e., images without any associated textual meta-data). We propose a novel Structural SVM based unified formulation for these two tasks. For both visual and textual data, two types of representations are investigated. These are based on: (1) unimodal probability distributions over topics learned using latent Dirichlet allocation, and (2) explicitly learned multi-modal correlations using canonical correlation analysis. Extensive experiments on three popular datasets (two medium and one web-scale) demonstrate that our framework gives promising results compared to existing models under various settings, thus confirming its efficacy for both the tasks.

...read moreread less

52 citations

Book Chapter•DOI•

Hierarchical latent tree analysis for topic detection

[...]

Tengfei Liu¹, Nevin L. Zhang¹, Peixian Chen¹•Institutions (1)

Hong Kong University of Science and Technology¹

15 Sep 2014

TL;DR: A new method for topic detection, where a topic is determined by identifying words that appear with high frequency in the topic and low frequency in other topics, is proposed using a hierarchy of discrete latent variables.

...read moreread less

Abstract: In the LDA approach to topic detection, a topic is determined by identifying the words that are used with high frequency when writing about the topic. However, high frequency words in one topic may be also used with high frequency in other topics. Thus they may not be the best words to characterize the topic. In this paper, we propose a new method for topic detection, where a topic is determined by identifying words that appear with high frequency in the topic and low frequency in other topics. We model patterns of word co- occurrence and co-occurrences of those patterns using a hierarchy of discrete latent variables. The states of the latent variables represent clusters of documents and they are interpreted as topics. The words that best distinguish a cluster from other clusters are selected to characterize the topic. Empirical results show that the new method yields topics with clearer thematic characterizations than the alternative approaches.

...read moreread less

52 citations

Proceedings Article•DOI•

Nonparametric discovery of human routines from sensor data

[...]

Feng-Tso Sun¹, Yeh Yi-Ting², Heng-Tze Cheng¹, Cynthia Kuo¹, Martin L. Griss¹ - Show less +1 more•Institutions (2)

Carnegie Mellon University¹, Stanford University²

24 Mar 2014

TL;DR: Experimental results show that the nonparametric framework presented can automatically learn the appropriate model parameters from sensor data without any form of model selection procedure and can outperform traditional parametric approaches for human routine discovery tasks.

...read moreread less

Abstract: People engage in routine behaviors. Automatic routine discovery goes beyond low-level activity recognition such as sitting or standing and analyzes human behaviors at a higher level (e.g., commuting to work). With recent developments in ubiquitous sensor technologies, it becomes easier to acquire a massive amount of sensor data. One main line of research is to mine human routines from sensor data using parametric topic models such as latent Dirichlet allocation. The main shortcoming of parametric models is that it assumes a fixed, pre-specified parameter regardless of the data. Choosing an appropriate parameter usually requires an inefficient trial-and-error model selection process. Furthermore, it is even more difficult to find optimal parameter values in advance for personalized applications. In this paper, we present a novel nonparametric framework for human routine discovery that can infer high-level routines without knowing the number of latent topics beforehand. Our approach is evaluated on public datasets in two routine domains: a 34-daily-activity dataset and a transportation mode dataset. Experimental results show that our nonparametric framework can automatically learn the appropriate model parameters from sensor data without any form of model selection procedure and can outperform traditional parametric approaches for human routine discovery tasks.

...read moreread less

52 citations

Book Chapter•DOI•

Tensor Decompositions for Learning Latent Variable Models A Survey for ALT

[...]

Animashree Anandkumar¹, Rong Ge², Daniel Hsu³, Sham M. Kakade⁴, Matus Telgarsky⁵ - Show less +1 more•Institutions (5)

University of California, Irvine¹, Microsoft², Columbia University³, Rutgers University⁴, University of Michigan⁵

04 Oct 2015

TL;DR: This work considers a computationally and statistically efficient parameter estimation method for a wide class of latent variable models--including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation--which exploits a certain tensor structure in their low-order observable moments typically, of second- and third-order.

...read moreread less

Abstract: This note is a short version of that in [1]. It is intended as a survey for the 2015 Algorithmic Learning Theory ALT conference. This work considers a computationally and statistically efficient parameter estimation method for a wide class of latent variable models--including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation--which exploits a certain tensor structure in their low-order observable moments typically, of second- and third-order. Specifically, parameter estimation is reduced to the problem of extracting a certain orthogonal decomposition of a symmetric tensor derived from the moments; this decomposition can be viewed as a natural generalization of the singular value decomposition for matrices. Although tensor decompositions are generally intractable to compute, the decomposition of these specially structured tensors can be efficiently obtained by a variety of approaches, including power iterations and maximization approaches similar to the case of matrices. A detailed analysis of a robust tensor power method is provided, establishing an analogue of Wedin's perturbation theorem for the singular vectors of matrices. This implies a robust and computationally tractable estimation approach for several popular latent variable models.

...read moreread less

51 citations

Journal Article•DOI•

Incorporating word embeddings into topic modeling of short text

[...]

Wang Gao¹, Min Peng¹, Hua Wang², Yanchun Zhang², Qianqian Xie¹, Gang Tian¹ - Show less +2 more•Institutions (2)

Wuhan University¹, Victoria University, Australia²

01 Nov 2019-Knowledge and Information Systems

TL;DR: A novel model for short text topic modeling, referred as Conditional Random Field regularized Topic Model (CRFTM), which not only develops a generalized solution to alleviate the sparsity problem by aggregating short texts into pseudo-documents, but also leverages a Conditional random field regularized model that encourages semantically related words to share the same topic assignment.

...read moreread less

Abstract: Short texts have become the prevalent format of information on the Internet. Inferring the topics of this type of messages becomes a critical and challenging task for many applications. Due to the length of short texts, conventional topic models (e.g., latent Dirichlet allocation and its variants) suffer from the severe data sparsity problem which makes topic modeling of short texts difficult and unreliable. Recently, word embeddings have been proved effective to capture semantic and syntactic information about words, which can be used to induce similarity measures and semantic correlations among words. Enlightened by this, in this paper, we design a novel model for short text topic modeling, referred as Conditional Random Field regularized Topic Model (CRFTM). CRFTM not only develops a generalized solution to alleviate the sparsity problem by aggregating short texts into pseudo-documents, but also leverages a Conditional Random Field regularized model that encourages semantically related words to share the same topic assignment. Experimental results on two real-world datasets show that our method can extract more coherent topics, and significantly outperform state-of-the-art baselines on several evaluation metrics.

...read moreread less

51 citations

Collapse

Network Information

Performance

Metrics

6,513

Papers

245,225

Citations

No. of papers in the topic in previous years
Year	Papers
2023	323
2022	842
2021	418
2020	429
2019	473
2018	446

Latent Dirichlet allocation

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics