Showing papers by "Ruslan Salakhutdinov published in 2009"

PDF

Open Access

Proceedings Article•

[...]

Ruslan Salakhutdinov¹, Geoffrey E. Hinton¹•Institutions (1)

15 Apr 2009

TL;DR: A new learning algorithm for Boltzmann machines that contain many layers of hidden variables that is made more efficient by using a layer-by-layer “pre-training” phase that allows variational inference to be initialized with a single bottomup pass.

...read moreread less

Abstract: We present a new learning algorithm for Boltzmann machines that contain many layers of hidden variables Data-dependent expectations are estimated using a variational approximation that tends to focus on a single mode, and dataindependent expectations are approximated using persistent Markov chains The use of two quite different techniques for estimating the two types of expectation that enter into the gradient of the log-likelihood makes it practical to learn Boltzmann machines with multiple hidden layers and millions of parameters The learning can be made more efficient by using a layer-by-layer “pre-training” phase that allows variational inference to be initialized with a single bottomup pass We present results on the MNIST and NORB datasets showing that deep Boltzmann machines learn good generative models and perform well on handwritten digit and visual object recognition tasks

...read moreread less

2,221 citations

Journal Article•DOI•

Semantic hashing

[...]

Ruslan Salakhutdinov¹, Geoffrey E. Hinton¹•Institutions (1)

University of Toronto¹

01 Jul 2009-International Journal of Approximate Reasoning

TL;DR: In this paper, a deep graphical model of the word-count vectors obtained from a large set of documents is proposed. But the model is restricted to the deep layer of the deep neural network and cannot handle large numbers of documents.

...read moreread less

1,266 citations

Proceedings Article•DOI•

Evaluation methods for topic models

[...]

Hanna Wallach¹, Iain Murray², Ruslan Salakhutdinov², David Mimno¹•Institutions (2)

University of Massachusetts Amherst¹, University of Toronto²

14 Jun 2009

TL;DR: It is demonstrated experimentally that commonly-used methods are unlikely to accurately estimate the probability of held-out documents, and two alternative methods that are both accurate and efficient are proposed.

...read moreread less

Abstract: A natural evaluation metric for statistical topic models is the probability of held-out documents given a trained model. While exact computation of this probability is intractable, several estimators for this probability have been used in the topic modeling literature, including the harmonic mean method and empirical likelihood method. In this paper, we demonstrate experimentally that commonly-used methods are unlikely to accurately estimate the probability of held-out documents, and propose two alternative methods that are both accurate and efficient.

...read moreread less

877 citations

Proceedings Article•

Replicated Softmax: an Undirected Topic Model

[...]

Geoffrey E. Hinton¹, Ruslan Salakhutdinov²•Institutions (2)

University of Toronto¹, Massachusetts Institute of Technology²

07 Dec 2009

TL;DR: This work introduces a two-layer undirected graphical model, called a "Replicated Softmax", that can be used to model and automatically extract low-dimensional latent semantic representations from a large unstructured collection of documents.

...read moreread less

Abstract: We introduce a two-layer undirected graphical model, called a "Replicated Softmax", that can be used to model and automatically extract low-dimensional latent semantic representations from a large unstructured collection of documents. We present efficient learning and inference algorithms for this model, and show how a Monte-Carlo based method, Annealed Importance Sampling, can be used to produce an accurate estimate of the log-probability the model assigns to test data. This allows us to demonstrate that the proposed model is able to generalize much better compared to Latent Dirichlet Allocation in terms of both the log-probability of held-out documents and the retrieval accuracy.

...read moreread less

541 citations

Journal Article•DOI•

Learning deep generative models

[...]

Ruslan Salakhutdinov¹•Institutions (1)

University of Toronto¹

01 Jan 2009-Social Science Research Network

TL;DR: The aim of the thesis is to demonstrate that deep generative models that contain many layers of latent variables and millions of parameters can be learned efficiently, and that the learned high-level feature representations can be successfully applied in a wide spectrum of application domains, including visual object recognition, information retrieval, and classification and regression tasks.

...read moreread less

Abstract: Building intelligent systems that are capable of extracting high-level representations from high-dimensional sensory data lies at the core of solving many AI related tasks, including object recognition, speech perception, and language understanding. Theoretical and biological arguments strongly suggest that building such systems requires models with deep architectures that involve many layers of nonlinear processing. The aim of the thesis is to demonstrate that deep generative models that contain many layers of latent variables and millions of parameters can be learned efficiently, and that the learned high-level feature representations can be successfully applied in a wide spectrum of application domains, including visual object recognition, information retrieval, and classification and regression tasks. In addition, similar methods can be used for nonlinear dimensionality reduction. The first part of the thesis focuses on analysis and applications of probabilistic generative models called Deep Belief Networks. We show that these deep hierarchical models can learn useful feature representations from a large supply of unlabeled sensory inputs. The learned high-level representations capture a lot of structure in the input data, which is useful for subsequent problem-specific tasks, such as classification, regression or information retrieval, even though these tasks are unknown when the generative model is being trained. In the second part of the thesis, we introduce a new learning algorithm for a different type of hierarchical probabilistic model, which we call a Deep Boltzmann Machine. Like Deep Belief Networks, Deep Boltzmann Machines have the potential of learning internal representations that become increasingly complex at higher layers, which is a promising way of solving object and speech recognition problems. Unlike Deep Belief Networks and many existing models with deep architectures, the approximate inference procedure, in addition to a fast bottom-up pass, can incorporate top-down feedback. This allows Deep Boltzmann Machines to better propagate uncertainty about ambiguous inputs.

...read moreread less

370 citations

Proceedings Article•

Modelling Relational Data using Bayesian Clustered Tensor Factorization

[...]

Ilya Sutskever¹, Joshua B. Tenenbaum², Ruslan Salakhutdinov²•Institutions (2)

University of Toronto¹, Massachusetts Institute of Technology²

07 Dec 2009

TL;DR: The Bayesian Clustered Tensor Factorization (BCTF) model is introduced, which embeds a factorized representation of relations in a nonparametric Bayesian clustering framework that is fully Bayesian but scales well to large data sets.

...read moreread less

Abstract: We consider the problem of learning probabilistic models for complex relational structures between various types of objects. A model can help us "understand" a dataset of relational facts in at least two ways, by finding interpretable structure in the data, and by supporting predictions, or inferences about whether particular unobserved relations are likely to be true. Often there is a tradeoff between these two aims: cluster-based models yield more easily interpretable representations, while factorization-based approaches have given better predictive performance on large data sets. We introduce the Bayesian Clustered Tensor Factorization (BCTF) model, which embeds a factorized representation of relations in a nonparametric Bayesian clustering framework. Inference is fully Bayesian but scales well to large data sets. The model simultaneously discovers interpretable clusters and yields predictive performance that matches or beats previous probabilistic models for relational data.

...read moreread less

275 citations

Proceedings Article•

Learning in Markov Random Fields using Tempered Transitions

[...]

Ruslan Salakhutdinov¹•Institutions (1)

Massachusetts Institute of Technology¹

07 Dec 2009

TL;DR: This paper shows that using MCMC operators based on tempered transitions enables the stochastic approximation algorithm to better explore highly multimodal distributions, which considerably improves parameter estimates in large, densely-connected MRF's.

...read moreread less

Abstract: Markov random fields (MRF's), or undirected graphical models, provide a powerful framework for modeling complex dependencies among random variables. Maximum likelihood learning in MRF's is hard due to the presence of the global normalizing constant. In this paper we consider a class of stochastic approximation algorithms of the Robbins-Monro type that use Markov chain Monte Carlo to do approximate maximum likelihood learning. We show that using MCMC operators based on tempered transitions enables the stochastic approximation algorithm to better explore highly multimodal distributions, which considerably improves parameter estimates in large, densely-connected MRF's. Our results on MNIST and NORB datasets demonstrate that we can successfully learn good generative models of high-dimensional, richly structured data that perform well on digit and object recognition tasks.

...read moreread less

119 citations

Proceedings Article•DOI•

Learning nonlinear dynamic models

[...]

John Langford¹, Ruslan Salakhutdinov², Tong Zhang³•Institutions (3)

Yahoo!¹, University of Toronto², Rutgers University³

14 Jun 2009

TL;DR: In this article, a novel approach for learning nonlinear dynamic models is presented, which leads to a new set of tools capable of solving problems that are otherwise difficult to solve, such as motion capture and high-dimensional video data.

...read moreread less

Abstract: We present a novel approach for learning nonlinear dynamic models, which leads to a new set of tools capable of solving problems that are otherwise difficult We provide theory showing this new approach is consistent for models with long range structure, and apply the approach to motion capture and high-dimensional video data, yielding results superior to standard alternatives

...read moreread less

33 citations

Posted Content•

Learning Nonlinear Dynamic Models

[...]

John Langford¹, Ruslan Salakhutdinov², Tong Zhang³•Institutions (3)

Yahoo!¹, University of Toronto², Rutgers University³

20 May 2009-arXiv: Artificial Intelligence

TL;DR: A novel approach for learning nonlinear dynamic models leads to a new set of tools capable of solving problems that are otherwise difficult, and is applied to motion capture and high-dimensional video data, yielding results superior to standard alternatives.

...read moreread less

Abstract: We present a novel approach for learning nonlinear dynamic models, which leads to a new set of tools capable of solving problems that are otherwise difficult. We provide theory showing this new approach is consistent for models with long range structure, and apply the approach to motion capture and high-dimensional video data, yielding results superior to standard alternatives.

...read moreread less

32 citations

Proceedings Article•DOI•

Workshop summary: Workshop on learning feature hierarchies

[...]