scispace - formally typeset
Search or ask a question

Showing papers by "Ruslan Salakhutdinov published in 2013"


Journal ArticleDOI
TL;DR: Efficient learning and inference algorithms for the HDP-DBM model are presented and it is shown that it is able to learn new concepts from very few examples on CIFAR-100 object recognition, handwritten character recognition, and human motion capture datasets.
Abstract: We introduce HD (or “Hierarchical-Deep”) models, a new compositional learning architecture that integrates deep learning models with structured hierarchical Bayesian (HB) models. Specifically, we show how we can learn a hierarchical Dirichlet process (HDP) prior over the activities of the top-level features in a deep Boltzmann machine (DBM). This compound HDP-DBM model learns to learn novel concepts from very few training example by learning low-level generic features, high-level features that capture correlations among low-level features, and a category hierarchy for sharing priors over the high-level features that are typical of different kinds of concepts. We present efficient learning and inference algorithms for the HDP-DBM model and show that it is able to learn new concepts from very few examples on CIFAR-100 object recognition, handwritten character recognition, and human motion capture datasets.

236 citations


Proceedings Article
05 Dec 2013
TL;DR: A Hierarchical Bayesian model based on com-positionality and causality that can learn a wide range of natural (although simple) visual concepts, generalizing in human-like ways from just one image.
Abstract: People can learn a new visual class from just one example, yet machine learning algorithms typically require hundreds or thousands of examples to tackle the same problems. Here we present a Hierarchical Bayesian model based on com-positionality and causality that can learn a wide range of natural (although simple) visual concepts, generalizing in human-like ways from just one image. We evaluated performance on a challenging one-shot classification task, where our model achieved a human-level error rate while substantially outperforming two deep learning models. We also tested the model on another conceptual task, generating new examples, by using a "visual Turing test" to show that our model produces human-like performance.

230 citations


Proceedings Article
05 Dec 2013
TL;DR: This work proposes a method for improving classification performance for high capacity classifiers by discovering similar classes and transferring knowledge among them, which learns to organize the classes into a tree hierarchy, and proposes an algorithm for learning the underlying tree structure.
Abstract: High capacity classifiers, such as deep neural networks, often struggle on classes that have very few training examples. We propose a method for improving classification performance for such classes by discovering similar classes and transferring knowledge among them. Our method learns to organize the classes into a tree hierarchy. This tree structure imposes a prior over the classifier's parameters. We show that the performance of deep neural networks can be improved by applying these priors to the weights in the last layer. Our method combines the strength of discriminatively trained deep neural networks, which typically require large amounts of training data, with tree-based priors, making deep neural networks work well on infrequent classes as well. We also propose an algorithm for learning the underlying tree structure. Starting from an initial pre-specified tree, this algorithm modifies the tree to make it more pertinent to the task being solved, for example, removing semantic relationships in favour of visual ones for an image classification task. Our method achieves state-of-the-art classification results on the CIFAR-100 image data set and the MIR Flickr image-text data set.

217 citations


Proceedings Article
05 Dec 2013
TL;DR: A stochastic feedforward network with hidden layers composed of both deterministic and stochastics variables is proposed that achieves superior performance on synthetic and facial expressions datasets compared to conditional Restricted Boltzmann Machines and Mixture Density Networks.
Abstract: Multilayer perceptrons (MLPs) or neural networks are popular models used for nonlinear regression and classification tasks As regressors, MLPs model the conditional distribution of the predictor variables Y given the input variables X However, this predictive distribution is assumed to be unimodal (eg Gaussian) For tasks involving structured prediction, the conditional distribution should be multi-modal, resulting in one-to-many mappings By using stochastic hidden variables rather than deterministic ones, Sigmoid Belief Nets (SBNs) can induce a rich multimodal distribution in the output space However, previously proposed learning algorithms for SBNs are not efficient and unsuitable for modeling real-valued data In this paper, we propose a stochastic feedforward network with hidden layers composed of both deterministic and stochastic variables A new Generalized EM training procedure using importance sampling allows us to efficiently learn complicated conditional distributions Our model achieves superior performance on synthetic and facial expressions datasets compared to conditional Restricted Boltzmann Machines and Mixture Density Networks In addition, the latent features of our model improves classification and can learn to generate colorful textures of objects

141 citations


Proceedings Article
11 Aug 2013
TL;DR: A type of Deep Boltzmann Machine that is suitable for extracting distributed semantic representations from a large unstructured collection of documents is introduced and it is shown that the model assigns better log probability to unseen data than the Replicated Softmax model.
Abstract: We introduce a type of Deep Boltzmann Machine (DBM) that is suitable for extracting distributed semantic representations from a large unstructured collection of documents. We overcome the apparent difficulty of training a DBM with judicious parameter tying. This enables an efficient pretraining algorithm and a state initialization scheme for fast inference. The model can be trained just as efficiently as a standard Restricted Boltzmann Machine. Our experiments show that the model assigns better log probability to unseen data than the Replicated Softmax model. Features extracted from our model outperform LDA, Replicated Softmax, and DocNADE models on document retrieval and document classification tasks.

123 citations


Posted Content
TL;DR: A Deep Boltzmann Machine model suitable for modeling and extracting latent semantic representations from a large unstructured collection of documents is introduced and it is shown that the model assigns better log probability to unseen data than the Replicated Softmax model.
Abstract: We introduce a Deep Boltzmann Machine model suitable for modeling and extracting latent semantic representations from a large unstructured collection of documents. We overcome the apparent difficulty of training a DBM with judicious parameter tying. This parameter tying enables an efficient pretraining algorithm and a state initialization scheme that aids inference. The model can be trained just as efficiently as a standard Restricted Boltzmann Machine. Our experiments show that the model assigns better log probability to unseen data than the Replicated Softmax model. Features extracted from our model outperform LDA, Replicated Softmax, and DocNADE models on document retrieval and document classification tasks.

108 citations


Posted Content
TL;DR: A deep-learning based generative framework using attention that can robustly attend to the face region of novel test subjects and can learn generative models of new faces from a novel dataset of large images where the face locations are not known.
Abstract: Attention has long been proposed by psychologists as important for effectively dealing with the enormous sensory stimulus available in the neocortex. Inspired by the visual attention models in computational neuroscience and the need of object-centric data for generative models, we describe for generative learning framework using attentional mechanisms. Attentional mechanisms can propagate signals from region of interest in a scene to an aligned canonical representation, where generative modeling takes place. By ignoring background clutter, generative models can concentrate their resources on the object of interest. Our model is a proper graphical model where the 2D Similarity transformation is a part of the top-down process. A ConvNet is employed to provide good initializations during posterior inference which is based on Hamiltonian Monte Carlo. Upon learning images of faces, our model can robustly attend to face regions of novel test subjects. More importantly, our model can learn generative models of new faces from a novel dataset of large images where the face locations are not known.

70 citations


Proceedings Article
16 Jun 2013
TL;DR: An efficient way of sampling from the posterior distribution over factor values is described and it is demonstrated that these samples can be used in the EM algorithm for learning interesting mixture models of natural image patches.
Abstract: Factor Analysis is a statistical method that seeks to explain linear variations in data by using unobserved latent variables. Due to its additive nature, it is not suitable for modeling data that is generated by multiple groups of latent factors which interact multiplicatively. In this paper, we introduce Tensor Analyzers which are a multilinear generalization of Factor Analyzers. We describe an efficient way of sampling from the posterior distribution over factor values and we demonstrate that these samples can be used in the EM algorithm for learning interesting mixture models of natural image patches. Tensor Analyzers can also accurately recognize a face under significant pose and illumination variations when given only one previous image of that face. We also show that Tensor Analyzers can be trained in an unsupervised, semi-supervised, or fully supervised settings.

67 citations


Proceedings Article
05 Dec 2013
TL;DR: In this article, the similarity between binary codes is approximated as the hamming distance between two distinct binary codes, rather than as the distance between f (x) and g(x) for two distinct codes f, g.
Abstract: When approximating binary similarity using the hamming distance between short binary hashes, we show that even if the similarity is symmetric, we can have shorter and more accurate hashes by using two distinct code maps. I.e. by approximating the similarity between x and x′ as the hamming distance between f (x) and g(x′), for two distinct binary codes f, g, rather than as the hamming distance between f (x) and f (x′).

66 citations


Proceedings ArticleDOI
05 Dec 2013
TL;DR: A novel sequence of intermediate distributions for exponential families defined by averaging the moments of the initial and target distributions is presented and an asymptotically optimal piecewise linear schedule is derived.
Abstract: Many powerful Monte Carlo techniques for estimating partition functions, such as annealed importance sampling (AIS), are based on sampling from a sequence of intermediate distributions which interpolate between a tractable initial distribution and the intractable target distribution. The near-universal practice is to use geometric averages of the initial and target distributions, but alternative paths can perform substantially better. We present a novel sequence of intermediate distributions for exponential families defined by averaging the moments of the initial and target distributions. We analyze the asymptotic performance of both the geometric and moment averages paths and derive an asymptotically optimal piecewise linear schedule. AIS with moment averaging performs well empirically at estimating partition functions of restricted Boltzmann machines (RBMs), which form the building blocks of many deep learning models.

61 citations


Posted Content
TL;DR: The results show that deep learning methods are able to learn physiologically important representations and detect latent relations in neuroimaging data.
Abstract: Deep learning methods have recently made notable advances in the tasks of classification and representation learning. These tasks are important for brain imaging and neuroscience discovery, making the methods attractive for porting to a neuroimager's toolbox. Success of these methods is, in part, explained by the flexibility of deep learning models. However, this flexibility makes the process of porting to new areas a difficult parameter optimization problem. In this work we demonstrate our results (and feasible parameter ranges) in application of deep learning methods to structural and functional brain imaging data. We also describe a novel constraint-based approach to visualizing high dimensional data. We use it to analyze the effect of parameter choices on data transformations. Our results show that deep learning methods are able to learn physiologically important representations and detect latent relations in neuroimaging data.

Posted Content
TL;DR: It is shown that even if the similarity is symmetric, the authors can have shorter and more accurate hashes by using two distinct code maps by approximating the similarity between x and x′ as the hamming distance between f (x) and g (x′), for two distinct binary codes f, g.
Abstract: When approximating binary similarity using the hamming distance between short binary hashes, we show that even if the similarity is symmetric, we can have shorter and more accurate hashes by using two distinct code maps. I.e. by approximating the similarity between $x$ and $x'$ as the hamming distance between $f(x)$ and $g(x')$, for two distinct binary codes $f,g$, rather than as the hamming distance between $f(x)$ and $f(x')$.

Journal ArticleDOI
Samy Bengio1, Li Deng, Hugo Larochelle, Honglak Lee, Ruslan Salakhutdinov 
TL;DR: There has been a resurgence of research in the design of deep architecture models and learning algorithms, i.e., methods that rely on the extraction of a multilayer representation of the data.
Abstract: There has been a resurgence of research in the design of deep architecture models and learning algorithms, i.e., methods that rely on the extraction of a multilayer representation of the data. Often referred to as deep learning, this topic of research has been building on and contributing to many different research topics, such as neural networks, graphical models, feature learning, unsupervised learning, optimization, pattern recognition, and signal processing. Deep learning is also motivated and inspired by neuroscience and has had a tremendous impact on various applications such as computer vision, speech recognition, and natural language processing. The clearly multidisciplinary nature of deep learning led to a call for papers for a special issue dedicated to learning deep architectures.

26 Apr 2013
TL;DR: This work introduces a type of Deep Boltzmann Machine that is suitable for extracting distributed semantic representations from a large unstructured collection of documents and proposes an approximate inference method that interacts with learning in a way that makes it possible to train the DBM more eciently than previously proposed methods.
Abstract: We introduce a type of Deep Boltzmann Machine (DBM) that is suitable for extracting distributed semantic representations from a large unstructured collection of documents. We propose an approximate inference method that interacts with learning in a way that makes it possible to train the DBM more eciently than previously proposed methods. Even though the model has two hidden layers, it can be trained just as eciently as a standard Restricted Boltzmann Machine. Our experiments show that the model assigns better log probability to unseen data than the Replicated Softmax model. Features extracted from our model outperform LDA, Replicated Softmax, and DocNADE models on document retrieval and document classication tasks.