scispace - formally typeset
Search or ask a question

Showing papers by "Ruslan Salakhutdinov published in 2010"


Proceedings Article
31 Mar 2010
TL;DR: A new approximate inference algorithm for Deep Boltzmann Machines (DBM’s), a generative model with many layers of hidden variables, that learns a separate “recognition” model that is used to quickly initialize, in a single bottom-up pass, the values of the latent variables in all hidden layers.
Abstract: We present a new approximate inference algorithm for Deep Boltzmann Machines (DBM’s), a generative model with many layers of hidden variables. The algorithm learns a separate “recognition” model that is used to quickly initialize, in a single bottom-up pass, the values of the latent variables in all hidden layers. We show that using such a recognition model, followed by a combined top-down and bottom-up pass, it is possible to efficiently learn a good generative model of high-dimensional highly-structured sensory input. We show that the additional computations required by incorporating a top-down feedback plays a critical role in the performance of a DBM, both as a generative and discriminative model. Moreover, inference is only at most three times slower compared to the approximate inference in a Deep Belief Network (DBN), making large-scale learning of DBM’s practical. Finally, we demonstrate that the DBM’s trained using the proposed approximate inference algorithm perform well compared to DBN’s and SVM’s on the MNIST handwritten digit, OCR English letters, and NORB visual object recognition tasks.

374 citations


Posted Content
TL;DR: In this article, a weighted version of the trace-norm regularizer is proposed for matrix completion with non-uniform sampling, and the experimental results demonstrate that the weighted trace-normal regularization indeed yields significant gains on the (highly nonuniformly sampled) Netflix dataset.
Abstract: We show that matrix completion with trace-norm regularization can be significantly hurt when entries of the matrix are sampled non-uniformly. We introduce a weighted version of the trace-norm regularizer that works well also with non-uniform sampling. Our experimental results demonstrate that the weighted trace-norm regularization indeed yields significant gains on the (highly non-uniformly sampled) Netflix dataset.

205 citations


Proceedings Article
06 Dec 2010
TL;DR: It is shown that matrix completion with trace-norm regularization can be significantly hurt when entries of the matrix are sampled non-uniformly, but that a properly weighted version of the trace- norm regularizer works well with non- uniform sampling.
Abstract: We show that matrix completion with trace-norm regularization can be significantly hurt when entries of the matrix are sampled non-uniformly, but that a properly weighted version of the trace-norm regularizer works well with non-uniform sampling. We show that the weighted trace-norm regularization indeed yields significant gains on the highly non-uniformly sampled Netflix dataset.

169 citations


Proceedings Article
06 Dec 2010
TL;DR: This work uses a factorization technique of Burer and Monteiro to devise scalable first-order algorithms for convex programs involving the max-norm and these algorithms are applied to solve huge collaborative filtering, graph cut, and clustering problems.
Abstract: The max-norm was proposed as a convex matrix regularizer in [1] and was shown to be empirically superior to the trace-norm for collaborative filtering problems. Although the max-norm can be computed in polynomial time, there are currently no practical algorithms for solving large-scale optimization problems that incorporate the max-norm. The present work uses a factorization technique of Burer and Monteiro [2] to devise scalable first-order algorithms for convex programs involving the max-norm. These algorithms are applied to solve huge collaborative filtering, graph cut, and clustering problems. Empirically, the new methods outperform mature techniques from all three areas.

167 citations


Proceedings Article
21 Jun 2010
TL;DR: This paper first shows a close connection between Fast PCD and adaptive MCMC, and develops a Coupled Adaptive Simulated Tempering algorithm that can be used to better explore a highly multimodal energy landscape.
Abstract: When modeling high-dimensional richly structured data, it is often the case that the distribution defined by the Deep Boltzmann Machine (DBM) has a rough energy landscape with many local minima separated by high energy barriers. The commonly used Gibbs sampler tends to get trapped in one local mode, which often results in unstable learning dynamics and leads to poor parameter estimates. In this paper, we concentrate on learning DBM's using adaptive MCMC algorithms. We first show a close connection between Fast PCD and adaptive MCMC. We then develop a Coupled Adaptive Simulated Tempering algorithm that can be used to better explore a highly multimodal energy landscape. Finally, we demonstrate that the proposed algorithm considerably improves parameter estimates, particularly when learning large-scale DBM's.

92 citations