scispace - formally typeset
Search or ask a question

Showing papers by "Ilya Sutskever published in 2018"


Proceedings Article
27 Sep 2018
TL;DR: This paper uses Hutchinson's trace estimator to give a scalable unbiased estimate of the log-density and demonstrates the approach on high-dimensional density estimation, image generation, and variational inference, achieving the state-of-the-art among exact likelihood methods with efficient sampling.
Abstract: A promising class of generative models maps points from a simple distribution to a complex distribution through an invertible neural network. Likelihood-based training of these models requires restricting their architectures to allow cheap computation of Jacobian determinants. Alternatively, the Jacobian trace can be used if the transformation is specified by an ordinary differential equation. In this paper, we use Hutchinson's trace estimator to give a scalable unbiased estimate of the log-density. The result is a continuous-time invertible generative model with unbiased density estimation and one-pass sampling, while allowing unrestricted neural network architectures. We demonstrate our approach on high-dimensional density estimation, image generation, and variational inference, achieving the state-of-the-art among exact likelihood methods with efficient sampling.

344 citations


Posted Content
TL;DR: In this paper, the authors use Hutchinson's trace estimator to give a scalable unbiased estimate of the log-density, achieving the state-of-the-art among exact likelihood methods with efficient sampling.
Abstract: A promising class of generative models maps points from a simple distribution to a complex distribution through an invertible neural network. Likelihood-based training of these models requires restricting their architectures to allow cheap computation of Jacobian determinants. Alternatively, the Jacobian trace can be used if the transformation is specified by an ordinary differential equation. In this paper, we use Hutchinson's trace estimator to give a scalable unbiased estimate of the log-density. The result is a continuous-time invertible generative model with unbiased density estimation and one-pass sampling, while allowing unrestricted neural network architectures. We demonstrate our approach on high-dimensional density estimation, image generation, and variational inference, achieving the state-of-the-art among exact likelihood methods with efficient sampling.

106 citations


Posted Content
TL;DR: E-MAML and E-$\text{RL}^2$ deliver better performance on tasks where exploration is important and are presented on a novel environment called `Krazy World' and a set of maze environments.
Abstract: We consider the problem of exploration in meta reinforcement learning. Two new meta reinforcement learning algorithms are suggested: E-MAML and E-$\text{RL}^2$. Results are presented on a novel environment we call `Krazy World' and a set of maze environments. We show E-MAML and E-$\text{RL}^2$ deliver better performance on tasks where exploration is important.

73 citations


Proceedings Article
27 Sep 2018
TL;DR: A system called GamePad is introduced that can be used to explore the application of machine learning methods to theorem proving in the Coq proof assistant and addresses position evaluation and tactic prediction tasks, which arise naturally in tactic-based theorem proving.
Abstract: In this paper, we introduce a system called GamePad that can be used to explore the application of machine learning methods to theorem proving in the Coq proof assistant. Interactive theorem provers such as Coq enable users to construct machine-checkable proofs in a step-by-step manner. Hence, they provide an opportunity to explore theorem proving with human supervision. We use GamePad to synthesize proofs for a simple algebraic rewrite problem and train baseline models for a formalization of the Feit-Thompson theorem. We address position evaluation (i.e., predict the number of proof steps left) and tactic prediction (i.e., predict the next proof step) tasks, which arise naturally in tactic-based theorem proving.

51 citations


Posted Content
TL;DR: In this paper, the authors introduce a system called GamePad that can be used to explore the application of machine learning methods to theorem proving in the Coq proof assistant in a step-by-step manner.
Abstract: In this paper, we introduce a system called GamePad that can be used to explore the application of machine learning methods to theorem proving in the Coq proof assistant. Interactive theorem provers such as Coq enable users to construct machine-checkable proofs in a step-by-step manner. Hence, they provide an opportunity to explore theorem proving with human supervision. We use GamePad to synthesize proofs for a simple algebraic rewrite problem and train baseline models for a formalization of the Feit-Thompson theorem. We address position evaluation (i.e., predict the number of proof steps left) and tactic prediction (i.e., predict the next proof step) tasks, which arise naturally in tactic-based theorem proving.

42 citations


Proceedings Article
01 Jan 2018
TL;DR: E-MAML and E-$\text{RL}^2$ deliver better performance than baseline algorithms on both tasks and are presented on a new environment the authors call `Krazy World': a difficult high-dimensional gridworld.
Abstract: We interpret meta-reinforcement learning as the problem of learning how to quickly find a good sampling distribution in a new environment. This interpretation leads to the development of two new meta-reinforcement learning algorithms: E-MAML and E-$\text{RL}^2$. Results are presented on a new environment we call `Krazy World': a difficult high-dimensional gridworld which is designed to highlight the importance of correctly differentiating through sampling distributions in meta-reinforcement learning. Further results are presented on a set of maze environments. We show E-MAML and E-$\text{RL}^2$ deliver better performance than baseline algorithms on both tasks.

16 citations


Patent
01 Feb 2018
TL;DR: In this article, a value neural network is trained to generate a value score for the state of an environment that represents a predicted long-term reward resulting from the environment being in the state.
Abstract: Methods, systems and apparatus, including computer programs encoded on computer storage media, for training a value neural network that is configured to receive an observation characterizing a state of an environment being interacted with by an agent and to process the observation in accordance with parameters of the value neural network to generate a value score. One of the systems performs operations that include training a supervised learning policy neural network; initializing initial values of parameters of a reinforcement learning policy neural network having a same architecture as the supervised learning policy network to the trained values of the parameters of the supervised learning policy neural network; training the reinforcement learning policy neural network on second training data; and training the value neural network to generate a value score for the state of the environment that represents a predicted long-term reward resulting from the environment being in the state.

13 citations


Patent
07 Feb 2018
TL;DR: In this paper, the authors proposed a method to solve the problem of the lack of resources in the South Korean market by using the concept of "social media" and "social networks".
Abstract: 리커런트 뉴럴 네트워크를 이용하여 충족되는 컨디션들의 가능성을 예측하기 위한 컴퓨터 저장 매체 상에 인코딩된 컴퓨터 프로그램을 포함하는 방법들, 시스템들, 및 장치가 개시된다. 상기 시스템들 중 하나는, 복수의 타임 스텝들 각각에서 각각의 입력을 포함하는 시간적 시퀀스를 프로세싱하도록 구성되고 그리고 하나 이상의 리커런트 뉴럴 네트워크 계층들; 및 하나 이상의 로지스틱 회귀 노드들을 포함하고, 상기 로지스틱 회귀 노드들 각각은 미리결정된 컨디션들의 세트의 각각의 컨디션에 대응하고, 상기 로지스틱 회귀 노드들 각각은 상기 복수의 타임 스텝들 각각에 대하여, 상기 타임 스텝에 대한 네트워크 내부 상태를 수신하고, 그리고 상기 타임 스텝에 대한 상기 대응 컨디션에 대한 미래 컨디션 점수를 생성하도록, 상기 로지스틱 회귀 노드의 파라미터들의 세트의 현재 값들에 따라 상기 타임 스텝에 대한 네트워크 내부 상태를 프로세싱한다.

4 citations