scispace - formally typeset
Search or ask a question
Topic

Latent Dirichlet allocation

About: Latent Dirichlet allocation is a research topic. Over the lifetime, 5351 publications have been published within this topic receiving 212555 citations. The topic is also known as: LDA.


Papers
More filters
Journal ArticleDOI
TL;DR: The Gibbs sampler is developed, which is easily extended to a generalized linear mixed model with a probit link function, and is shown to be an improvement, in terms of operator norm and efficiency, over other commonly used MCMC algorithms.
Abstract: We develop a new Gibbs sampler for a linear mixed model with a Dirichlet process random effect term, which is easily extended to a generalized linear mixed model with a probit link function. Our Gibbs sampler exploits the properties of the multinomial and Dirichlet distributions, and is shown to be an improvement, in terms of operator norm and efficiency, over other commonly used MCMC algorithms. We also investigate methods for the estimation of the precision parameter of the Dirichlet process, finding that maximum likelihood may not be desirable, but a posterior mode is a reasonable approach. Examples are given to show how these models perform on real data. Our results complement both the theoretical basis of the Dirichlet process nonparametric prior and the computational work that has been done to date.

47 citations

Proceedings ArticleDOI
01 Jan 2018
TL;DR: A novel way called GraphBTM to represent biterms as graphs and design a Graph Convolutional Networks (GCNs) with residual connections to extract transitive features frombiterms is proposed and an amortized variational inference method for GraphB TM is presented.
Abstract: Discovering the latent topics within texts has been a fundamental task for many applications. However, conventional topic models suffer different problems in different settings. The Latent Dirichlet Allocation (LDA) may not work well for short texts due to the data sparsity (i.e. the sparse word co-occurrence patterns in short documents). The Biterm Topic Model (BTM) learns topics by modeling the word-pairs named biterms in the whole corpus. This assumption is very strong when documents are long with rich topic information and do not exhibit the transitivity of biterms. In this paper, we propose a novel way called GraphBTM to represent biterms as graphs and design a Graph Convolutional Networks (GCNs) with residual connections to extract transitive features from biterms. To overcome the data sparsity of LDA and the strong assumption of BTM, we sample a fixed number of documents to form a mini-corpus as a sample. We also propose a dataset called All News extracted from 15 news publishers, in which documents are much longer than 20 Newsgroups. We present an amortized variational inference method for GraphBTM. Our method generates more coherent topics compared with previous approaches. Experiments show that the sampling strategy improves performance by a large margin.

47 citations

Proceedings ArticleDOI
22 Oct 2012
TL;DR: This paper addresses the task of mining typical behavioral patterns from small group face-to-face interactions and linking them to social-psychological group variables by aggregating automatically extracted cues at the individual and dyadic levels.
Abstract: This paper addresses the task of mining typical behavioral patterns from small group face-to-face interactions and linking them to social-psychological group variables. Towards this goal, we define group speaking and looking cues by aggregating automatically extracted cues at the individual and dyadic levels. Then, we define a bag of nonverbal patterns (Bag-of-NVPs) to discretize the group cues. The topics learnt using the Latent Dirichlet Allocation (LDA) topic model are then interpreted by studying the correlations with group variables such as group composition, group interpersonal perception, and group performance. Our results show that both group behavior cues and topics have significant correlations with (and predictive information for) all the above variables. For our study, we use interactions with unacquainted members i.e. newly formed groups.

47 citations

Journal ArticleDOI
TL;DR: This work takes advantage of a relevance feedback mechanism to balance the contributions of multiple topic models with specified numbers of topics and proposes a multitopic model to improve retrieval performance.
Abstract: View-based 3D model retrieval uses a set of views to represent each object. Discovering the complex relationship between multiple views remains challenging in 3D object retrieval. Recent progress in the latent Dirichlet allocation (LDA) model leads us to propose its use for 3D object retrieval. This LDA approach explores the hidden relationships between extracted primordial features of these views. Since LDA is limited to a fixed number of topics, we further propose a multitopic model to improve retrieval performance. We take advantage of a relevance feedback mechanism to balance the contributions of multiple topic models with specified numbers of topics. We demonstrate our improved retrieval performance over the state-of-the-art approaches.

46 citations

Proceedings ArticleDOI
01 Oct 2014
TL;DR: The developed self-disclosure topic model (SDTM) significantly outperforms several comparable methods on classifying the level of selfdisclosure, and the analysis of the longitudinal data using SDTM uncovers significant and positive correlation between self Disclosure and conversation frequency and length.
Abstract: Self-disclosure, the act of revealing oneself to others, is an important social behavior that strengthens interpersonal relationships and increases social support. Although there are many social science studies of self-disclosure, they are based on manual coding of small datasets and questionnaires. We conduct a computational analysis of self-disclosure with a large dataset of naturally-occurring conversations, a semi-supervised machine learning algorithm, and a computational analysis of the effects of self-disclosure on subsequent conversations. We use a longitudinal dataset of 17 million tweets, all of which occurred in conversations that consist of five or more tweets directly replying to the previous tweet, and from dyads with twenty of more conversations each. We develop self-disclosure topic model (SDTM), a variant of latent Dirichlet allocation (LDA) for automatically classifying the level of self-disclosure for each tweet. We take the results of SDTM and analyze the effects of self-disclosure on subsequent conversations. Our model significantly outperforms several comparable methods on classifying the level of selfdisclosure, and the analysis of the longitudinal data using SDTM uncovers significant and positive correlation between selfdisclosure and conversation frequency and length.

46 citations


Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
86% related
Support vector machine
73.6K papers, 1.7M citations
86% related
Deep learning
79.8K papers, 2.1M citations
85% related
Feature extraction
111.8K papers, 2.1M citations
84% related
Convolutional neural network
74.7K papers, 2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023323
2022850
2021420
2020429
2019473
2018447