Top 8 papers published by Ilya Sutskever from OpenAI in 2018

Proceedings Article•

FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models

[...]

Will Grathwohl¹, Ricky T. Q. Chen², Jesse Bettencourt², Ilya Sutskever³, David Duvenaud² - Show less +1 more•Institutions (3)

Lawrence Livermore National Laboratory¹, University of Toronto², OpenAI³

27 Sep 2018

TL;DR: This paper uses Hutchinson's trace estimator to give a scalable unbiased estimate of the log-density and demonstrates the approach on high-dimensional density estimation, image generation, and variational inference, achieving the state-of-the-art among exact likelihood methods with efficient sampling.

...read moreread less

Abstract: A promising class of generative models maps points from a simple distribution to a complex distribution through an invertible neural network. Likelihood-based training of these models requires restricting their architectures to allow cheap computation of Jacobian determinants. Alternatively, the Jacobian trace can be used if the transformation is specified by an ordinary differential equation. In this paper, we use Hutchinson's trace estimator to give a scalable unbiased estimate of the log-density. The result is a continuous-time invertible generative model with unbiased density estimation and one-pass sampling, while allowing unrestricted neural network architectures. We demonstrate our approach on high-dimensional density estimation, image generation, and variational inference, achieving the state-of-the-art among exact likelihood methods with efficient sampling.

...read moreread less

344 citations

Posted Content•

FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models

[...]

Will Grathwohl¹, Ricky T. Q. Chen², Jesse Bettencourt², Ilya Sutskever³, David Duvenaud² - Show less +1 more•Institutions (3)

Lawrence Livermore National Laboratory¹, University of Toronto², OpenAI³

02 Oct 2018-arXiv: Learning

TL;DR: In this paper, the authors use Hutchinson's trace estimator to give a scalable unbiased estimate of the log-density, achieving the state-of-the-art among exact likelihood methods with efficient sampling.

...read moreread less

Abstract: A promising class of generative models maps points from a simple distribution to a complex distribution through an invertible neural network. Likelihood-based training of these models requires restricting their architectures to allow cheap computation of Jacobian determinants. Alternatively, the Jacobian trace can be used if the transformation is specified by an ordinary differential equation. In this paper, we use Hutchinson's trace estimator to give a scalable unbiased estimate of the log-density. The result is a continuous-time invertible generative model with unbiased density estimation and one-pass sampling, while allowing unrestricted neural network architectures. We demonstrate our approach on high-dimensional density estimation, image generation, and variational inference, achieving the state-of-the-art among exact likelihood methods with efficient sampling.

...read moreread less

106 citations

Posted Content•

Some Considerations on Learning to Explore via Meta-Reinforcement Learning

[...]

Bradly C. Stadie, Ge Yang, Rein Houthooft, Xi Chen, Yan Duan, Yuhuai Wu, Pieter Abbeel, Ilya Sutskever - Show less +4 more

15 Feb 2018-arXiv: Artificial Intelligence

TL;DR: E-MAML and E-$\text{RL}^2$ deliver better performance on tasks where exploration is important and are presented on a novel environment called `Krazy World' and a set of maze environments.

...read moreread less

Abstract: We consider the problem of exploration in meta reinforcement learning. Two new meta reinforcement learning algorithms are suggested: E-MAML and E-$\text{RL}^2$. Results are presented on a novel environment we call `Krazy World' and a set of maze environments. We show E-MAML and E-$\text{RL}^2$ deliver better performance on tasks where exploration is important.

...read moreread less

73 citations

Proceedings Article•

GamePad: A Learning Environment for Theorem Proving

[...]

Daniel Huang¹, Prafulla Dhariwal², Dawn Song¹, Ilya Sutskever²•Institutions (2)

University of California, Berkeley¹, OpenAI²

27 Sep 2018

TL;DR: A system called GamePad is introduced that can be used to explore the application of machine learning methods to theorem proving in the Coq proof assistant and addresses position evaluation and tactic prediction tasks, which arise naturally in tactic-based theorem proving.

...read moreread less

Abstract: In this paper, we introduce a system called GamePad that can be used to explore the application of machine learning methods to theorem proving in the Coq proof assistant. Interactive theorem provers such as Coq enable users to construct machine-checkable proofs in a step-by-step manner. Hence, they provide an opportunity to explore theorem proving with human supervision. We use GamePad to synthesize proofs for a simple algebraic rewrite problem and train baseline models for a formalization of the Feit-Thompson theorem. We address position evaluation (i.e., predict the number of proof steps left) and tactic prediction (i.e., predict the next proof step) tasks, which arise naturally in tactic-based theorem proving.

...read moreread less

51 citations

Posted Content•

GamePad: A Learning Environment for Theorem Proving

[...]

Daniel Huang¹, Prafulla Dhariwal², Dawn Song¹, Ilya Sutskever²•Institutions (2)

University of California, Berkeley¹, OpenAI²

02 Jun 2018-arXiv: Learning

TL;DR: In this paper, the authors introduce a system called GamePad that can be used to explore the application of machine learning methods to theorem proving in the Coq proof assistant in a step-by-step manner.

...read moreread less

Abstract: In this paper, we introduce a system called GamePad that can be used to explore the application of machine learning methods to theorem proving in the Coq proof assistant. Interactive theorem provers such as Coq enable users to construct machine-checkable proofs in a step-by-step manner. Hence, they provide an opportunity to explore theorem proving with human supervision. We use GamePad to synthesize proofs for a simple algebraic rewrite problem and train baseline models for a formalization of the Feit-Thompson theorem. We address position evaluation (i.e., predict the number of proof steps left) and tactic prediction (i.e., predict the next proof step) tasks, which arise naturally in tactic-based theorem proving.

...read moreread less

42 citations

Proceedings Article•

The Importance of Sampling inMeta-Reinforcement Learning

[...]

Bradly C. Stadie¹, Ge Yang², Rein Houthooft³, Peter Chen, Yan Duan¹, Yuhuai Wu, Pieter Abbeel¹, Ilya Sutskever⁴ - Show less +4 more•Institutions (4)

University of California, Berkeley¹, University of Chicago², Ghent University³, Google⁴

01 Jan 2018

TL;DR: E-MAML and E-$\text{RL}^2$ deliver better performance than baseline algorithms on both tasks and are presented on a new environment the authors call `Krazy World': a difficult high-dimensional gridworld.

...read moreread less

Abstract: We interpret meta-reinforcement learning as the problem of learning how to quickly find a good sampling distribution in a new environment. This interpretation leads to the development of two new meta-reinforcement learning algorithms: E-MAML and E-$\text{RL}^2$. Results are presented on a new environment we call `Krazy World': a difficult high-dimensional gridworld which is designed to highlight the importance of correctly differentiating through sampling distributions in meta-reinforcement learning. Further results are presented on a set of maze environments. We show E-MAML and E-$\text{RL}^2$ deliver better performance than baseline algorithms on both tasks.

...read moreread less

16 citations

Patent•

Training a policy neural network and a value neural network

[...]

Thore Graepel¹, Shih-chieh Huang¹, David Silver¹, Arthur Guez¹, Laurent Sifre¹, Ilya Sutskever¹, Chris J. Maddison¹ - Show less +3 more•Institutions (1)

Google¹

01 Feb 2018

TL;DR: In this article, a value neural network is trained to generate a value score for the state of an environment that represents a predicted long-term reward resulting from the environment being in the state.

...read moreread less

Abstract: Methods, systems and apparatus, including computer programs encoded on computer storage media, for training a value neural network that is configured to receive an observation characterizing a state of an environment being interacted with by an agent and to process the observation in accordance with parameters of the value neural network to generate a value score. One of the systems performs operations that include training a supervised learning policy neural network; initializing initial values of parameters of a reinforcement learning policy neural network having a same architecture as the supervised learning policy network to the trained values of the parameters of the supervised learning policy neural network; training the reinforcement learning policy neural network on second training data; and training the value neural network to generate a value score for the state of the environment that represents a predicted long-term reward resulting from the environment being in the state.

...read moreread less

13 citations

Patent•

리커런트 뉴럴 네트워크를 이용한 컨디션들의 충족 가능성에 대한 예측

[...]

Greg S. Corrado, Ilya Sutskever, Jeffrey Dean

07 Feb 2018

TL;DR: In this paper, the authors proposed a method to solve the problem of the lack of resources in the South Korean market by using the concept of "social media" and "social networks".

...read moreread less

Abstract: 리커런트 뉴럴 네트워크를 이용하여 충족되는 컨디션들의 가능성을 예측하기 위한 컴퓨터 저장 매체 상에 인코딩된 컴퓨터 프로그램을 포함하는 방법들, 시스템들, 및 장치가 개시된다. 상기 시스템들 중 하나는, 복수의 타임 스텝들 각각에서 각각의 입력을 포함하는 시간적 시퀀스를 프로세싱하도록 구성되고 그리고 하나 이상의 리커런트 뉴럴 네트워크 계층들; 및 하나 이상의 로지스틱 회귀 노드들을 포함하고, 상기 로지스틱 회귀 노드들 각각은 미리결정된 컨디션들의 세트의 각각의 컨디션에 대응하고, 상기 로지스틱 회귀 노드들 각각은 상기 복수의 타임 스텝들 각각에 대하여, 상기 타임 스텝에 대한 네트워크 내부 상태를 수신하고, 그리고 상기 타임 스텝에 대한 상기 대응 컨디션에 대한 미래 컨디션 점수를 생성하도록, 상기 로지스틱 회귀 노드의 파라미터들의 세트의 현재 값들에 따라 상기 타임 스텝에 대한 네트워크 내부 상태를 프로세싱한다.

...read moreread less

4 citations

Showing papers by "Ilya Sutskever published in 2018"