scispace - formally typeset
Open AccessProceedings Article

Joint Training Deep Boltzmann Machines for Classification

Reads0
Chats0
TLDR
This work introduces a new method for training deep Boltzmann machines jointly, and shows that this approach performs competitively for classification and outperforms previous methods in terms of accuracy of approximate inference and classification with missing inputs.
Abstract
We introduce a new method for training deep Boltzmann machines jointly. Prior methods of training DBMs require an initial learning pass that trains the model greedily, one layer at a time, or do not perform well on classification tasks. In our approach, we train all layers of the DBM simultaneously, using a novel training procedure called multi-prediction training. The resulting model can either be interpreted as a single generative model trained to maximize a variational approximation to the generalized pseudolikelihood, or as a family of recurrent networks that share parameters and may be approximately averaged together using a novel technique we call the multi-inference trick. We show that our approach performs competitively for classification and outperforms previous methods in terms of accuracy of approximate inference and classification with missing inputs. 1 Deep Boltzmann machines A deep Boltzmann machine (Salakhutdinov and Hinton, 2009) is a probabilistic model consisting of many layers of random variables, most of which are latent. Typically, a DBM contains a set of D input features v that are called the visible units because they are always observed during both training and evaluation. The DBM is usually applied to classification problems and thus often represents the class label with a one-of-k code in the form of a discrete-valued label unit y. y is observed (on examples for which it is available) during training. The DBM also contains several latent variables that are never observed. These hidden units are usually organized intoL layersh (i) of size Ni;i = 1;:::;L, with each unit in a layer conditionally independent of the other units in the layer given the neighboring layers. These conditional independence properties allow fast Gibbs sampling because an entire layer of units can be sampled at a time. Likewise, mean field inference with fixed point equations is fast because each fixed point equation gives a solution to roughly half of the variational parameters. Inference proceeds by alternating between updating all of the even numbered layers and updating all of the odd numbered layers. A DBM defines a probability distribution by exponentiating and normalizing an energy function

read more

Citations
More filters
Journal ArticleDOI

Deep learning for visual understanding

TL;DR: The state-of-the-art in deep learning algorithms in computer vision is reviewed by highlighting the contributions and challenges from over 210 recent research papers, and the future trends and challenges in designing and training deep neural networks are summarized.
Proceedings Article

Maxout Networks

TL;DR: A simple new model called maxout is defined designed to both facilitate optimization by dropout and improve the accuracy of dropout's fast approximate model averaging technique.
Posted Content

Maxout Networks

TL;DR: In this article, a simple new model called maxout is proposed to both facilitate optimization by dropout and improve the accuracy of dropout's fast approximate model averaging technique, which is a natural companion to dropout.
Posted Content

Deep Learning of Representations: Looking Forward

TL;DR: In this paper, the authors examine some of the challenges of scaling deep learning algorithms to much larger models and datasets, reducing optimization difficulties due to ill-conditioning or local minima, designing more efficient and powerful inference and sampling procedures, and learning to disentangle the factors of variation underlying the observed data.
Journal ArticleDOI

A Survey of Deep Learning Techniques: Application in Wind and Solar Energy Resources

TL;DR: Different types of Deep Learning algorithms applied in the field of solar and wind energy resources are discussed and their performance through a novel taxonomy is evaluated and a comprehensive state-of-the-art of the literature is presented leading to an assessment and performance evaluation.
References
More filters
Posted Content

Improving neural networks by preventing co-adaptation of feature detectors

TL;DR: The authors randomly omits half of the feature detectors on each training case to prevent complex co-adaptations in which a feature detector is only helpful in the context of several other specific feature detectors.
Proceedings Article

Deep Boltzmann machines

TL;DR: A new learning algorithm for Boltzmann machines that contain many layers of hidden variables that is made more efficient by using a layer-by-layer “pre-training” phase that allows variational inference to be initialized with a single bottomup pass.
Journal ArticleDOI

Annealed importance sampling

TL;DR: In this article, it is shown how one can use the Markov chain transitions for such an annealing sequence to define an importance sampler, while the use of importance weights ensures that the estimates found converge to the correct values as the number of anneeling runs increases.
Proceedings ArticleDOI

Training restricted Boltzmann machines using approximations to the likelihood gradient

TL;DR: A new algorithm for training Restricted Boltzmann Machines is introduced, which is compared to some standard Contrastive Divergence and Pseudo-Likelihood algorithms on the tasks of modeling and classifying various types of data.

Annealed Importance Sampling

TL;DR: In this article, it is shown how one can use the Markov chain transitions for such an annealing sequence to define an importance sampler, which is a generalization of a recently proposed variant of sequential importance sampling.
Related Papers (5)