Modeling Documents with Deep Boltzmann Machines

Open AccessPosted Content

Modeling Documents with Deep Boltzmann Machines

Nitish Srivastava, +2 more

- 26 Sep 2013 -

arXiv: Learning

Chats0

TLDR

A Deep Boltzmann Machine model suitable for modeling and extracting latent semantic representations from a large unstructured collection of documents is introduced and it is shown that the model assigns better log probability to unseen data than the Replicated Softmax model.

Abstract:

We introduce a Deep Boltzmann Machine model suitable for modeling and extracting latent semantic representations from a large unstructured collection of documents. We overcome the apparent difficulty of training a DBM with judicious parameter tying. This parameter tying enables an efficient pretraining algorithm and a state initialization scheme that aids inference. The model can be trained just as efficiently as a standard Restricted Boltzmann Machine. Our experiments show that the model assigns better log probability to unseen data than the Replicated Softmax model. Features extracted from our model outperform LDA, Replicated Softmax, and DocNADE models on document retrieval and document classification tasks.

Citations

PDF

Open Access

More filters

Book

Deep Learning

Ian Goodfellow, +2 more

TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.

...read moreread less

Proceedings Article

Distributed Representations of Sentences and Documents

Quoc V. Le, +1 more

TL;DR: Paragraph Vector is an unsupervised algorithm that learns fixed-length feature representations from variable-length pieces of texts, such as sentences, paragraphs, and documents, and its construction gives the algorithm the potential to overcome the weaknesses of bag-of-words models.

...read moreread less

Posted Content

Distributed Representations of Sentences and Documents

Quoc V. Le, +1 more

- 16 May 2014 -

arXiv: Computation and Language

TL;DR: The authors proposed paragraph vector, an unsupervised algorithm that learns fixed-length feature representations from variable-length pieces of texts, such as sentences, paragraphs, and documents, and achieved new state-of-the-art results on several text classification and sentiment analysis tasks.

...read moreread less

Proceedings ArticleDOI

Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks

Kai Sheng Tai, +2 more

TL;DR: The authors introduced the Tree-LSTM, a generalization of LSTMs to tree-structured network topologies, which outperformed all existing systems and strong LSTM baselines on two tasks: predicting the semantic relatedness of two sentences (SemEval 2014, Task 1) and sentiment classification (Stanford Sentiment Treebank).

...read moreread less

Book

Neural Networks and Deep Learning

Charu C. Aggarwal

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Latent dirichlet allocation

David M. Blei, +2 more

- 01 Mar 2003 -

Journal of Machine Learning Research

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.

...read moreread less

Proceedings Article

Latent Dirichlet Allocation

David M. Blei, +2 more

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).

...read moreread less

Journal ArticleDOI

Finding scientific topics

Thomas L. Griffiths, +1 more

- 06 Apr 2004 -

Proceedings of the National Academy of S...

TL;DR: A generative model for documents is described, introduced by Blei, Ng, and Jordan, and a Markov chain Monte Carlo algorithm is presented for inference in this model, which is used to analyze abstracts from PNAS by using Bayesian model selection to establish the number of topics.

...read moreread less

Journal ArticleDOI

Training products of experts by minimizing contrastive divergence

Geoffrey E. Hinton

- 01 Aug 2002 -

Neural Computation

TL;DR: A product of experts (PoE) is an interesting candidate for a perceptual system in which rapid inference is vital and generation is unnecessary because it is hard even to approximate the derivatives of the renormalization term in the combination rule.

...read moreread less

Journal ArticleDOI

Probabilistic topic models

David M. Blei

- 01 Apr 2012 -

Communications of The ACM

TL;DR: Surveying a suite of algorithms that offer a solution to managing large document archives suggests they are well-suited to handle large amounts of data.

...read moreread less