Joint Training Deep Boltzmann Machines for Classification

Open AccessProceedings Article

Joint Training Deep Boltzmann Machines for Classification

Chats0

TLDR

This work introduces a new method for training deep Boltzmann machines jointly, and shows that this approach performs competitively for classification and outperforms previous methods in terms of accuracy of approximate inference and classification with missing inputs.

Abstract:

We introduce a new method for training deep Boltzmann machines jointly. Prior methods of training DBMs require an initial learning pass that trains the model greedily, one layer at a time, or do not perform well on classification tasks. In our approach, we train all layers of the DBM simultaneously, using a novel training procedure called multi-prediction training. The resulting model can either be interpreted as a single generative model trained to maximize a variational approximation to the generalized pseudolikelihood, or as a family of recurrent networks that share parameters and may be approximately averaged together using a novel technique we call the multi-inference trick. We show that our approach performs competitively for classification and outperforms previous methods in terms of accuracy of approximate inference and classification with missing inputs. 1 Deep Boltzmann machines A deep Boltzmann machine (Salakhutdinov and Hinton, 2009) is a probabilistic model consisting of many layers of random variables, most of which are latent. Typically, a DBM contains a set of D input features v that are called the visible units because they are always observed during both training and evaluation. The DBM is usually applied to classification problems and thus often represents the class label with a one-of-k code in the form of a discrete-valued label unit y. y is observed (on examples for which it is available) during training. The DBM also contains several latent variables that are never observed. These hidden units are usually organized intoL layersh (i) of size Ni;i = 1;:::;L, with each unit in a layer conditionally independent of the other units in the layer given the neighboring layers. These conditional independence properties allow fast Gibbs sampling because an entire layer of units can be sampled at a time. Likewise, mean field inference with fixed point equations is fast because each fixed point equation gives a solution to roughly half of the variational parameters. Inference proceeds by alternating between updating all of the even numbered layers and updating all of the odd numbered layers. A DBM defines a probability distribution by exponentiating and normalizing an energy function

Joint Training Deep Boltzmann Machines for Classification

Citations

Deep learning for visual understanding

Maxout Networks

Maxout Networks

Deep Learning of Representations: Looking Forward

A Survey of Deep Learning Techniques: Application in Wind and Solar Energy Resources

References

Improving neural networks by preventing co-adaptation of feature detectors

Deep Boltzmann machines

Annealed importance sampling

Training restricted Boltzmann machines using approximations to the likelihood gradient

Annealed Importance Sampling

Related Papers (5)

Deep Boltzmann machines

Gradient-based learning applied to document recognition

The Manifold Tangent Classifier

Improving neural networks by preventing co-adaptation of feature detectors

A fast learning algorithm for deep belief nets