Parallel Inference for Latent Dirichlet Allocation on Graphics Processing Units

Open AccessProceedings Article

Parallel Inference for Latent Dirichlet Allocation on Graphics Processing Units

Feng Yan, +2 more

- Vol. 22, pp 2134-2142

Chats0

TLDR

A novel data partitioning scheme is proposed that effectively reduces the memory cost of parallelizing two inference methods on GPUs for latent Dirichlet Allocation models, collapsed Gibbs sampling and collapsed variational Bayesian.

Abstract:

The recent emergence of Graphics Processing Units (GPUs) as general-purpose parallel computing devices provides us with new opportunities to develop scalable learning methods for massive data. In this work, we consider the problem of parallelizing two inference methods on GPUs for latent Dirichlet Allocation (LDA) models, collapsed Gibbs sampling (CGS) and collapsed variational Bayesian (CVB). To address limited memory constraints on GPUs, we propose a novel data partitioning scheme that effectively reduces the memory cost. This partitioning scheme also balances the computational cost on each multiprocessor and enables us to easily avoid memory access conflicts. We use data streaming to handle extremely large datasets. Extensive experiments showed that our parallel inference methods consistently produced LDA models with the same predictive power as sequential training methods did but with 26x speedup for CGS and 196x speedup for CVB on a GPU with 30 multiprocessors. The proposed partitioning scheme and data streaming make our approach scalable with more multiprocessors. Furthermore, they can be used as general techniques to parallelize other machine learning models.

Citations

PDF

Open Access

More filters

Proceedings Article

Online Learning for Latent Dirichlet Allocation

Matthew D. Hoffman, +2 more

TL;DR: An online variational Bayes (VB) algorithm for Latent Dirichlet Allocation (LDA) based on online stochastic optimization with a natural gradient step is developed, which shows converges to a local optimum of the VB objective function.

...read moreread less

Journal ArticleDOI

Probabilistic Topic Models

David M. Blei, +2 more

- 18 Oct 2010 -

IEEE Signal Processing Magazine

TL;DR: In this paper, a review of probabilistic topic models can be found, which can be used to summarize a large collection of documents with a smaller number of distributions over words.

...read moreread less

Book ChapterDOI

Mining Text Data

Charu C. Aggarwal, +1 more

TL;DR: Mining Text Data introduces an important niche in the text analytics field, and is an edited volume contributed by leading international researchers and practitioners focused on social networks & data mining.

...read moreread less

Journal ArticleDOI

PLDA+: Parallel latent dirichlet allocation with data placement and pipeline processing

Zhiyuan Liu, +3 more

- 06 May 2011 -

ACM Transactions on Intelligent Systems ...

TL;DR: Data placement, pipeline processing, word bundling, and priority-based scheduling are proposed to improve scalability of LDA and significantly reduce the unparallelizable communication bottleneck and achieve good load balancing.

...read moreread less

Proceedings Article

Parallel Gibbs Sampling: From Colored Fields to Thin Junction Trees

Joseph E. Gonzalez, +3 more

TL;DR: This work proposes two methods to construct parallel Gibbs samplers guaranteed to draw from the targeted distribution, called the Chromatic sampler and the Splash sampler, a complementary strategy which can be used when the variables are tightly coupled.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

Convex Optimization

Stephen Boyd, +1 more

TL;DR: In this article, the focus is on recognizing convex optimization problems and then finding the most appropriate technique for solving them, and a comprehensive introduction to the subject is given. But the focus of this book is not on the optimization problem itself, but on the problem of finding the appropriate technique to solve it.

...read moreread less

Journal ArticleDOI

Finding scientific topics

Thomas L. Griffiths, +1 more

- 06 Apr 2004 -

Proceedings of the National Academy of S...

TL;DR: A generative model for documents is described, introduced by Blei, Ng, and Jordan, and a Markov chain Monte Carlo algorithm is presented for inference in this model, which is used to analyze abstracts from PNAS by using Bayesian model selection to establish the number of topics.

...read moreread less

ReportDOI

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation

Yee Whye Teh, +2 more

TL;DR: This paper proposes the collapsed variational Bayesian inference algorithm for LDA, and shows that it is computationally efficient, easy to implement and significantly more accurate than standard variationalBayesian inference for L DA.

...read moreread less

Posted Content

On Smoothing and Inference for Topic Models

Arthur U. Asuncion, +3 more

- 09 May 2012 -

arXiv: Learning

TL;DR: Using the insights gained from this comparative study, it is shown how accurate topic models can be learned in several seconds on text corpora with thousands of documents.

...read moreread less

Proceedings Article

On smoothing and inference for topic models

Arthur U. Asuncion, +3 more

TL;DR: In this article, the authors compare the performance of topic models with collapsed Gibbs sampling, variational inference, and maximum a posteriori estimation, and find that the main differences are attributable to the amount of smoothing applied to the counts.

...read moreread less