scispace - formally typeset
Search or ask a question

Showing papers by "Ilya Sutskever published in 2010"


Proceedings Article
31 Mar 2010
TL;DR: This paper analyzes the CD1 update rule for Restricted Boltzmann Machines with binary variables, and shows that the regularized CD update has a fixed point for a large class of regularization functions using Brower’s fixed point theorem.
Abstract: Contrastive Divergence (CD) is a popular method for estimating the parameters of Markov Random Fields (MRFs) by rapidly approximating an intractable term in the gradient of the log probability Despite CD’s empirical success, little is known about its theoretical convergence properties In this paper, we analyze the CD1 update rule for Restricted Boltzmann Machines (RBMs) with binary variables We show that this update is not the gradient of any function, and construct a counterintuitive “regularization function” that causes CD learning to cycle indefinitely Nonetheless, we show that the regularized CD update has a fixed point for a large class of regularization functions using Brower’s fixed point theorem

106 citations


Journal ArticleDOI
TL;DR: The Temporal-Kernel Recurrent Neural Network is introduced, which is a variant of the RNN that can cope with long-term dependencies much more easily than a standard RNN, and it is shown that the TKRNN develops short-term memory that successfully solves the serial recall task by representing the input string with a stable state of its hidden units.

49 citations


Proceedings Article
31 Mar 2010
TL;DR: A new Markov Chain transition operator is introduced that updates all the variables of a pairwise MRF in parallel by using auxiliary Gaussian variables and implies that the later can be learned in place of the former without any loss of modeling power.
Abstract: Markov Random Fields (MRFs) are an important class of probabilistic models which are used for density estimation, classification, denoising, and for constructing Deep Belief Networks. Every application of an MRF requires addressing its inference problem, which can be done using deterministic inference methods or using stochastic Markov Chain Monte Carlo methods. In this paper we introduce a new Markov Chain transition operator that updates all the variables of a pairwise MRF in parallel by using auxiliary Gaussian variables. The proposed MCMC operator is extremely simple to implement and to parallelize. This is achieved by a formal equivalence result between arbitrary pairwise MRFs and a particular type of Restricted Boltzmann Machine. This result also implies that the later can be learned in place of the former without any loss of modeling power, a possibility we explore in experiments.

22 citations