Search or ask a question

Showing papers by "Ilya Sutskever published in 2010"

PDF

Open Access

Proceedings Article•

On the Convergence Properties of Contrastive Divergence

[...]

Ilya Sutskever¹, Tijmen Tieleman¹•Institutions (1)

University of Toronto¹

31 Mar 2010

TL;DR: This paper analyzes the CD1 update rule for Restricted Boltzmann Machines with binary variables, and shows that the regularized CD update has a fixed point for a large class of regularization functions using Brower’s fixed point theorem.

...read moreread less

Abstract: Contrastive Divergence (CD) is a popular method for estimating the parameters of Markov Random Fields (MRFs) by rapidly approximating an intractable term in the gradient of the log probability Despite CD’s empirical success, little is known about its theoretical convergence properties In this paper, we analyze the CD1 update rule for Restricted Boltzmann Machines (RBMs) with binary variables We show that this update is not the gradient of any function, and construct a counterintuitive “regularization function” that causes CD learning to cycle indefinitely Nonetheless, we show that the regularized CD update has a fixed point for a large class of regularization functions using Brower’s fixed point theorem

...read moreread less

106 citations

Journal Article•DOI•

Temporal-kernel recurrent neural networks.

[...]

Ilya Sutskever¹, Geoffrey E. Hinton¹•Institutions (1)

University of Toronto¹

01 Mar 2010-Neural Networks

TL;DR: The Temporal-Kernel Recurrent Neural Network is introduced, which is a variant of the RNN that can cope with long-term dependencies much more easily than a standard RNN, and it is shown that the TKRNN develops short-term memory that successfully solves the serial recall task by representing the input string with a stable state of its hidden units.

...read moreread less

49 citations

Proceedings Article•

Parallelizable Sampling of Markov Random Fields

[...]

James Martens, Ilya Sutskever¹•Institutions (1)

University of Toronto¹

31 Mar 2010

TL;DR: A new Markov Chain transition operator is introduced that updates all the variables of a pairwise MRF in parallel by using auxiliary Gaussian variables and implies that the later can be learned in place of the former without any loss of modeling power.

...read moreread less

Abstract: Markov Random Fields (MRFs) are an important class of probabilistic models which are used for density estimation, classification, denoising, and for constructing Deep Belief Networks. Every application of an MRF requires addressing its inference problem, which can be done using deterministic inference methods or using stochastic Markov Chain Monte Carlo methods. In this paper we introduce a new Markov Chain transition operator that updates all the variables of a pairwise MRF in parallel by using auxiliary Gaussian variables. The proposed MCMC operator is extremely simple to implement and to parallelize. This is achieved by a formal equivalence result between arbitrary pairwise MRFs and a particular type of Restricted Boltzmann Machine. This result also implies that the later can be learned in place of the former without any loss of modeling power, a possibility we explore in experiments.

...read moreread less

22 citations