Showing papers by "Ruslan Salakhutdinov published in 2013"

PDF

Open Access

Journal Article•DOI•

[...]

Ruslan Salakhutdinov¹, Joshua B. Tenenbaum², Antonio Torralba²•Institutions (2)

University of Toronto¹, Massachusetts Institute of Technology²

01 Aug 2013-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Efficient learning and inference algorithms for the HDP-DBM model are presented and it is shown that it is able to learn new concepts from very few examples on CIFAR-100 object recognition, handwritten character recognition, and human motion capture datasets.

...read moreread less

Abstract: We introduce HD (or “Hierarchical-Deep”) models, a new compositional learning architecture that integrates deep learning models with structured hierarchical Bayesian (HB) models. Specifically, we show how we can learn a hierarchical Dirichlet process (HDP) prior over the activities of the top-level features in a deep Boltzmann machine (DBM). This compound HDP-DBM model learns to learn novel concepts from very few training example by learning low-level generic features, high-level features that capture correlations among low-level features, and a category hierarchy for sharing priors over the high-level features that are typical of different kinds of concepts. We present efficient learning and inference algorithms for the HDP-DBM model and show that it is able to learn new concepts from very few examples on CIFAR-100 object recognition, handwritten character recognition, and human motion capture datasets.

...read moreread less

236 citations

Proceedings Article•

One-shot learning by inverting a compositional causal process

[...]

Brenden M. Lake¹, Ruslan Salakhutdinov², Josh Tenenbaum¹•Institutions (2)

Massachusetts Institute of Technology¹, University of Toronto²

05 Dec 2013

TL;DR: A Hierarchical Bayesian model based on com-positionality and causality that can learn a wide range of natural (although simple) visual concepts, generalizing in human-like ways from just one image.

...read moreread less

Abstract: People can learn a new visual class from just one example, yet machine learning algorithms typically require hundreds or thousands of examples to tackle the same problems. Here we present a Hierarchical Bayesian model based on com-positionality and causality that can learn a wide range of natural (although simple) visual concepts, generalizing in human-like ways from just one image. We evaluated performance on a challenging one-shot classification task, where our model achieved a human-level error rate while substantially outperforming two deep learning models. We also tested the model on another conceptual task, generating new examples, by using a "visual Turing test" to show that our model produces human-like performance.

...read moreread less

230 citations

Proceedings Article•

Discriminative Transfer Learning with Tree-based Priors

[...]

Nitish Srivastava¹, Ruslan Salakhutdinov¹•Institutions (1)

University of Toronto¹

05 Dec 2013

TL;DR: This work proposes a method for improving classification performance for high capacity classifiers by discovering similar classes and transferring knowledge among them, which learns to organize the classes into a tree hierarchy, and proposes an algorithm for learning the underlying tree structure.

...read moreread less

Abstract: High capacity classifiers, such as deep neural networks, often struggle on classes that have very few training examples. We propose a method for improving classification performance for such classes by discovering similar classes and transferring knowledge among them. Our method learns to organize the classes into a tree hierarchy. This tree structure imposes a prior over the classifier's parameters. We show that the performance of deep neural networks can be improved by applying these priors to the weights in the last layer. Our method combines the strength of discriminatively trained deep neural networks, which typically require large amounts of training data, with tree-based priors, making deep neural networks work well on infrequent classes as well. We also propose an algorithm for learning the underlying tree structure. Starting from an initial pre-specified tree, this algorithm modifies the tree to make it more pertinent to the task being solved, for example, removing semantic relationships in favour of visual ones for an image classification task. Our method achieves state-of-the-art classification results on the CIFAR-100 image data set and the MIR Flickr image-text data set.

...read moreread less

217 citations

Proceedings Article•

Learning Stochastic Feedforward Neural Networks

[...]

Yichuan Tang¹, Ruslan Salakhutdinov¹•Institutions (1)

University of Toronto¹

05 Dec 2013

TL;DR: A stochastic feedforward network with hidden layers composed of both deterministic and stochastics variables is proposed that achieves superior performance on synthetic and facial expressions datasets compared to conditional Restricted Boltzmann Machines and Mixture Density Networks.

...read moreread less

Abstract: Multilayer perceptrons (MLPs) or neural networks are popular models used for nonlinear regression and classification tasks As regressors, MLPs model the conditional distribution of the predictor variables Y given the input variables X However, this predictive distribution is assumed to be unimodal (eg Gaussian) For tasks involving structured prediction, the conditional distribution should be multi-modal, resulting in one-to-many mappings By using stochastic hidden variables rather than deterministic ones, Sigmoid Belief Nets (SBNs) can induce a rich multimodal distribution in the output space However, previously proposed learning algorithms for SBNs are not efficient and unsuitable for modeling real-valued data In this paper, we propose a stochastic feedforward network with hidden layers composed of both deterministic and stochastic variables A new Generalized EM training procedure using importance sampling allows us to efficiently learn complicated conditional distributions Our model achieves superior performance on synthetic and facial expressions datasets compared to conditional Restricted Boltzmann Machines and Mixture Density Networks In addition, the latent features of our model improves classification and can learn to generate colorful textures of objects

...read moreread less

141 citations

Proceedings Article•

Modeling documents with a Deep Boltzmann Machine

[...]

Nitish Srivastava¹, Ruslan Salakhutdinov¹, Geoffrey E. Hinton¹•Institutions (1)

University of Toronto¹

11 Aug 2013

TL;DR: A type of Deep Boltzmann Machine that is suitable for extracting distributed semantic representations from a large unstructured collection of documents is introduced and it is shown that the model assigns better log probability to unseen data than the Replicated Softmax model.

...read moreread less

Abstract: We introduce a type of Deep Boltzmann Machine (DBM) that is suitable for extracting distributed semantic representations from a large unstructured collection of documents. We overcome the apparent difficulty of training a DBM with judicious parameter tying. This enables an efficient pretraining algorithm and a state initialization scheme for fast inference. The model can be trained just as efficiently as a standard Restricted Boltzmann Machine. Our experiments show that the model assigns better log probability to unseen data than the Replicated Softmax model. Features extracted from our model outperform LDA, Replicated Softmax, and DocNADE models on document retrieval and document classification tasks.

...read moreread less

123 citations

Posted Content•

Modeling Documents with Deep Boltzmann Machines

[...]

Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey E. Hinton

26 Sep 2013-arXiv: Learning

TL;DR: A Deep Boltzmann Machine model suitable for modeling and extracting latent semantic representations from a large unstructured collection of documents is introduced and it is shown that the model assigns better log probability to unseen data than the Replicated Softmax model.

...read moreread less

Abstract: We introduce a Deep Boltzmann Machine model suitable for modeling and extracting latent semantic representations from a large unstructured collection of documents. We overcome the apparent difficulty of training a DBM with judicious parameter tying. This parameter tying enables an efficient pretraining algorithm and a state initialization scheme that aids inference. The model can be trained just as efficiently as a standard Restricted Boltzmann Machine. Our experiments show that the model assigns better log probability to unseen data than the Replicated Softmax model. Features extracted from our model outperform LDA, Replicated Softmax, and DocNADE models on document retrieval and document classification tasks.

...read moreread less

108 citations

Posted Content•

Learning Generative Models with Visual Attention

[...]

Yichuan Tang¹, Nitish Srivastava¹, Ruslan Salakhutdinov¹•Institutions (1)

University of Toronto¹

20 Dec 2013-arXiv: Computer Vision and Pattern Recognition

TL;DR: A deep-learning based generative framework using attention that can robustly attend to the face region of novel test subjects and can learn generative models of new faces from a novel dataset of large images where the face locations are not known.

...read moreread less

Abstract: Attention has long been proposed by psychologists as important for effectively dealing with the enormous sensory stimulus available in the neocortex. Inspired by the visual attention models in computational neuroscience and the need of object-centric data for generative models, we describe for generative learning framework using attentional mechanisms. Attentional mechanisms can propagate signals from region of interest in a scene to an aligned canonical representation, where generative modeling takes place. By ignoring background clutter, generative models can concentrate their resources on the object of interest. Our model is a proper graphical model where the 2D Similarity transformation is a part of the top-down process. A ConvNet is employed to provide good initializations during posterior inference which is based on Hamiltonian Monte Carlo. Upon learning images of faces, our model can robustly attend to face regions of novel test subjects. More importantly, our model can learn generative models of new faces from a novel dataset of large images where the face locations are not known.

...read moreread less

70 citations

Proceedings Article•

Tensor Analyzers

[...]

Yichuan Tang¹, Ruslan Salakhutdinov¹, Geoffrey E. Hinton¹•Institutions (1)

University of Toronto¹

16 Jun 2013

TL;DR: An efficient way of sampling from the posterior distribution over factor values is described and it is demonstrated that these samples can be used in the EM algorithm for learning interesting mixture models of natural image patches.

...read moreread less

Abstract: Factor Analysis is a statistical method that seeks to explain linear variations in data by using unobserved latent variables. Due to its additive nature, it is not suitable for modeling data that is generated by multiple groups of latent factors which interact multiplicatively. In this paper, we introduce Tensor Analyzers which are a multilinear generalization of Factor Analyzers. We describe an efficient way of sampling from the posterior distribution over factor values and we demonstrate that these samples can be used in the EM algorithm for learning interesting mixture models of natural image patches. Tensor Analyzers can also accurately recognize a face under significant pose and illumination variations when given only one previous image of that face. We also show that Tensor Analyzers can be trained in an unsupervised, semi-supervised, or fully supervised settings.

...read moreread less

67 citations

Proceedings Article•

The Power of Asymmetry in Binary Hashing

[...]

Behnam Neyshabur¹, Nati Srebro¹, Ruslan Salakhutdinov², Yury Makarychev¹, Payman Yadollahpour¹ - Show less +1 more•Institutions (2)

Toyota Technological Institute at Chicago¹, University of Toronto²

05 Dec 2013

TL;DR: In this article, the similarity between binary codes is approximated as the hamming distance between two distinct binary codes, rather than as the distance between f (x) and g(x) for two distinct codes f, g.

...read moreread less

Abstract: When approximating binary similarity using the hamming distance between short binary hashes, we show that even if the similarity is symmetric, we can have shorter and more accurate hashes by using two distinct code maps. I.e. by approximating the similarity between x and x′ as the hamming distance between f (x) and g(x′), for two distinct binary codes f, g, rather than as the hamming distance between f (x) and f (x′).

...read moreread less

66 citations

Proceedings Article•DOI•

Annealing between distributions by averaging moments

[...]

Roger Grosse¹, Chris J. Maddison², Ruslan Salakhutdinov²•Institutions (2)

Massachusetts Institute of Technology¹, University of Toronto²

05 Dec 2013

TL;DR: A novel sequence of intermediate distributions for exponential families defined by averaging the moments of the initial and target distributions is presented and an asymptotically optimal piecewise linear schedule is derived.

...read moreread less

Abstract: Many powerful Monte Carlo techniques for estimating partition functions, such as annealed importance sampling (AIS), are based on sampling from a sequence of intermediate distributions which interpolate between a tractable initial distribution and the intractable target distribution. The near-universal practice is to use geometric averages of the initial and target distributions, but alternative paths can perform substantially better. We present a novel sequence of intermediate distributions for exponential families defined by averaging the moments of the initial and target distributions. We analyze the asymptotic performance of both the geometric and moment averages paths and derive an asymptotically optimal piecewise linear schedule. AIS with moment averaging performs well empirically at estimating partition functions of restricted Boltzmann machines (RBMs), which form the building blocks of many deep learning models.

...read moreread less

61 citations

Posted Content•

Deep learning for neuroimaging: a validation study

[...]

Sergey M. Plis¹, R Devon Hjelm², Ruslan Salakhutdinov³, Vince D. Calhoun¹•Institutions (3)

The Mind Research Network¹, University of New Mexico², University of Toronto³

20 Dec 2013-arXiv: Neural and Evolutionary Computing

TL;DR: The results show that deep learning methods are able to learn physiologically important representations and detect latent relations in neuroimaging data.

...read moreread less

Abstract: Deep learning methods have recently made notable advances in the tasks of classification and representation learning. These tasks are important for brain imaging and neuroscience discovery, making the methods attractive for porting to a neuroimager's toolbox. Success of these methods is, in part, explained by the flexibility of deep learning models. However, this flexibility makes the process of porting to new areas a difficult parameter optimization problem. In this work we demonstrate our results (and feasible parameter ranges) in application of deep learning methods to structural and functional brain imaging data. We also describe a novel constraint-based approach to visualizing high dimensional data. We use it to analyze the effect of parameter choices on data transformations. Our results show that deep learning methods are able to learn physiologically important representations and detect latent relations in neuroimaging data.

...read moreread less

Posted Content•

The Power of Asymmetry in Binary Hashing

[...]

Behnam Neyshabur, Payman Yadollahpour, Yury Makarychev, Ruslan Salakhutdinov, Nathan Srebro - Show less +1 more

29 Nov 2013-arXiv: Learning

TL;DR: It is shown that even if the similarity is symmetric, the authors can have shorter and more accurate hashes by using two distinct code maps by approximating the similarity between x and x′ as the hamming distance between f (x) and g (x′), for two distinct binary codes f, g.

...read moreread less

Abstract: When approximating binary similarity using the hamming distance between short binary hashes, we show that even if the similarity is symmetric, we can have shorter and more accurate hashes by using two distinct code maps. I.e. by approximating the similarity between $x$ and $x'$ as the hamming distance between $f(x)$ and $g(x')$, for two distinct binary codes $f,g$, rather than as the hamming distance between $f(x)$ and $f(x')$.

...read moreread less

Journal Article•DOI•

Guest Editors' Introduction: Special Section on Learning Deep Architectures

[...]

Samy Bengio¹, Li Deng, Hugo Larochelle, Honglak Lee, Ruslan Salakhutdinov - Show less +1 more•Institutions (1)

Google¹

01 Aug 2013-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: There has been a resurgence of research in the design of deep architecture models and learning algorithms, i.e., methods that rely on the extraction of a multilayer representation of the data.

...read moreread less

Abstract: There has been a resurgence of research in the design of deep architecture models and learning algorithms, i.e., methods that rely on the extraction of a multilayer representation of the data. Often referred to as deep learning, this topic of research has been building on and contributing to many different research topics, such as neural networks, graphical models, feature learning, unsupervised learning, optimization, pattern recognition, and signal processing. Deep learning is also motivated and inspired by neuroscience and has had a tremendous impact on various applications such as computer vision, speech recognition, and natural language processing. The clearly multidisciplinary nature of deep learning led to a call for papers for a special issue dedicated to learning deep architectures.

...read moreread less

Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine

[...]

Nitish Srivastava¹, Ruslan Salakhutdinov¹, Geoffrey E. Hinton¹•Institutions (1)

University of Toronto¹

26 Apr 2013

TL;DR: This work introduces a type of Deep Boltzmann Machine that is suitable for extracting distributed semantic representations from a large unstructured collection of documents and proposes an approximate inference method that interacts with learning in a way that makes it possible to train the DBM more eciently than previously proposed methods.

...read moreread less

Abstract: We introduce a type of Deep Boltzmann Machine (DBM) that is suitable for extracting distributed semantic representations from a large unstructured collection of documents. We propose an approximate inference method that interacts with learning in a way that makes it possible to train the DBM more eciently than previously proposed methods. Even though the model has two hidden layers, it can be trained just as eciently as a standard Restricted Boltzmann Machine. Our experiments show that the model assigns better log probability to unseen data than the Replicated Softmax model. Features extracted from our model outperform LDA, Replicated Softmax, and DocNADE models on document retrieval and document classication tasks.

...read moreread less