Multimodal Learning with Deep Boltzmann Machines

Open AccessProceedings Article

Multimodal Learning with Deep Boltzmann Machines

Nitish Srivastava, +1 more

- Vol. 25, pp 2222-2230

Chats0

TLDR

In this paper, a Deep Boltzmann Machine (DBM) is proposed for learning a generative model of data that consists of multiple and diverse input modalities, which can be used to extract a unified representation that fuses modalities together.

Abstract:

A Deep Boltzmann Machine is described for learning a generative model of data that consists of multiple and diverse input modalities. The model can be used to extract a unified representation that fuses modalities together. We find that this representation is useful for classification and information retrieval tasks. The model works by learning a probability density over the space of multimodal inputs. It uses states of latent variables as representations of the input. The model can extract this representation even when some modalities are absent by sampling from the conditional distribution over them and filling them in. Our experimental results on bi-modal data consisting of images and text show that the Multimodal DBM can learn a good generative model of the joint space of image and text inputs that is useful for information retrieval from both unimodal and multimodal queries. We further demonstrate that this model significantly outperforms SVMs and LDA on discriminative tasks. Finally, we compare our model to other deep learning methods, including autoencoders and deep belief networks, and show that it achieves noticeable gains.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Representation Learning: A Review and New Perspectives

Yoshua Bengio, +2 more

- 01 Aug 2013 -

IEEE Transactions on Pattern Analysis an...

TL;DR: Recent work in the area of unsupervised feature learning and deep learning is reviewed, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks.

...read moreread less

Posted Content

Conditional Generative Adversarial Nets

Mehdi Mirza, +1 more

- 06 Nov 2014 -

arXiv: Learning

TL;DR: The conditional version of generative adversarial nets is introduced, which can be constructed by simply feeding the data, y, to the generator and discriminator, and it is shown that this model can generate MNIST digits conditioned on class labels.

...read moreread less

Proceedings ArticleDOI

Neural Collaborative Filtering

Xiangnan He, +5 more

TL;DR: This work strives to develop techniques based on neural networks to tackle the key problem in recommendation --- collaborative filtering --- on the basis of implicit feedback, and presents a general framework named NCF, short for Neural network-based Collaborative Filtering.

...read moreread less

Book

Deep Learning: Methods and Applications

Li Deng, +1 more

TL;DR: This monograph provides an overview of general deep learning methodology and its applications to a variety of signal and information processing tasks, including natural language and text processing, information retrieval, and multimodal information processing empowered by multi-task deep learning.

...read moreread less

Posted Content

Least Squares Generative Adversarial Networks

Xudong Mao, +5 more

- 13 Nov 2016 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This paper proposes the Least Squares Generative Adversarial Networks (LSGANs) which adopt the least squares loss function for the discriminator, and shows that minimizing the objective function of LSGAN yields minimizing the Pearson X2 divergence.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope

Aude Oliva, +1 more

- 01 May 2001 -

International Journal of Computer Vision

TL;DR: The performance of the spatial envelope model shows that specific information about object shape or identity is not a requirement for scene categorization and that modeling a holistic representation of the scene informs about its probable semantic category.

...read moreread less

Journal ArticleDOI

Training products of experts by minimizing contrastive divergence

Geoffrey E. Hinton

- 01 Aug 2002 -

Neural Computation

TL;DR: A product of experts (PoE) is an interesting candidate for a perceptual system in which rapid inference is vital and generation is unnecessary because it is hard even to approximate the derivatives of the renormalization term in the combination rule.

...read moreread less

Proceedings ArticleDOI

Vlfeat: an open and portable library of computer vision algorithms

Andrea Vedaldi, +1 more

TL;DR: VLFeat is an open and portable library of computer vision algorithms that includes rigorous implementations of common building blocks such as feature detectors, feature extractors, (hierarchical) k-means clustering, randomized kd-tree matching, and super-pixelization.

...read moreread less

Proceedings Article

Multimodal Deep Learning

Jiquan Ngiam, +5 more

TL;DR: This work presents a series of tasks for multimodal learning and shows how to train deep networks that learn features to address these tasks, and demonstrates cross modality feature learning, where better features for one modality can be learned if multiple modalities are present at feature learning time.

...read moreread less

Proceedings Article

Deep Boltzmann machines

Ruslan Salakhutdinov, +1 more

TL;DR: A new learning algorithm for Boltzmann machines that contain many layers of hidden variables that is made more efficient by using a layer-by-layer “pre-training” phase that allows variational inference to be initialized with a single bottomup pass.

...read moreread less

Multimodal Learning with Deep Boltzmann Machines

Citations

Representation Learning: A Review and New Perspectives

Conditional Generative Adversarial Nets

Neural Collaborative Filtering

Deep Learning: Methods and Applications

Least Squares Generative Adversarial Networks

References

Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope

Training products of experts by minimizing contrastive divergence

Vlfeat: an open and portable library of computer vision algorithms

Multimodal Deep Learning

Deep Boltzmann machines

Related Papers (5)

ImageNet Classification with Deep Convolutional Neural Networks

Reducing the Dimensionality of Data with Neural Networks

A fast learning algorithm for deep belief nets

Deep Residual Learning for Image Recognition

Very Deep Convolutional Networks for Large-Scale Image Recognition