Top 15 papers published by Ilya Sutskever from OpenAI in 2017

Journal Article•DOI•

ImageNet classification with deep convolutional neural networks

[...]

Alex Krizhevsky¹, Ilya Sutskever¹, Geoffrey E. Hinton²•Institutions (2)

24 May 2017-Communications of The ACM

TL;DR: A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.

...read moreread less

Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0%, respectively, which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overfitting in the fully connected layers we employed a recently developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

...read moreread less

33,301 citations

Posted Content•

Evolution Strategies as a Scalable Alternative to Reinforcement Learning.

[...]

Tim Salimans, Jonathan Ho, Xi Chen, Ilya Sutskever

10 Mar 2017-arXiv: Machine Learning

TL;DR: This work explores the use of Evolution Strategies (ES), a class of black box optimization algorithms, as an alternative to popular MDP-based RL techniques such as Q-learning and Policy Gradients, and highlights several advantages of ES as a blackbox optimization technique.

...read moreread less

Abstract: We explore the use of Evolution Strategies (ES), a class of black box optimization algorithms, as an alternative to popular MDP-based RL techniques such as Q-learning and Policy Gradients. Experiments on MuJoCo and Atari show that ES is a viable solution strategy that scales extremely well with the number of CPUs available: By using a novel communication strategy based on common random numbers, our ES implementation only needs to communicate scalars, making it possible to scale to over a thousand parallel workers. This allows us to solve 3D humanoid walking in 10 minutes and obtain competitive results on most Atari games after one hour of training. In addition, we highlight several advantages of ES as a black box optimization technique: it is invariant to action frequency and delayed rewards, tolerant of extremely long horizons, and does not need temporal discounting or value function approximation.

...read moreread less

1,218 citations

Proceedings Article•

One-Shot Imitation Learning

[...]

Yan Duan¹, Marcin Andrychowicz², Bradly C. Stadie¹, OpenAI Jonathan Ho, Jonas Schneider³, Ilya Sutskever³, Pieter Abbeel¹, Wojciech Zaremba³ - Show less +4 more•Institutions (3)

University of California, Berkeley¹, University of Warsaw², OpenAI³

21 Mar 2017

TL;DR: One-shot imitation learning as mentioned in this paper is a meta-learning framework for learning from very few demonstrations of any given task and instantly generalizing to new situations of the same task, without requiring task-specific engineering.

...read moreread less

Abstract: Imitation learning has been commonly applied to solve different tasks in isolation. This usually requires either careful feature engineering, or a significant number of samples. This is far from what we desire: ideally, robots should be able to learn from very few demonstrations of any given task, and instantly generalize to new situations of the same task, without requiring task-specific engineering. In this paper, we propose a meta-learning framework for achieving such capability, which we call one-shot imitation learning. Specifically, we consider the setting where there is a very large (maybe infinite) set of tasks, and each task has many instantiations. For example, a task could be to stack all blocks on a table into a single tower, another task could be to place all blocks on a table into two-block towers, etc. In each case, different instances of the task would consist of different sets of blocks with different initial states. At training time, our algorithm is presented with pairs of demonstrations for a subset of all tasks. A neural net is trained that takes as input one demonstration and the current state (which initially is the initial state of the other demonstration of the pair), and outputs an action with the goal that the resulting sequence of states and actions matches as closely as possible with the second demonstration. At test time, a demonstration of a single instance of a new task is presented, and the neural net is expected to perform well on new instances of this new task. Our experiments show that the use of soft attention allows the model to generalize to conditions and tasks unseen in the training data. We anticipate that by training this model on a much greater variety of tasks and settings, we will obtain a general system that can turn any demonstrations into robust policies that can accomplish an overwhelming variety of tasks.

...read moreread less

531 citations

Posted Content•

Learning To Generate Reviews and Discovering Sentiment

[...]

Alec Radford, Rafal Jozefowicz, Ilya Sutskever

05 Apr 2017-arXiv: Learning

TL;DR: The properties of byte-level recurrent language models are explored and a single unit which performs sentiment analysis is found which achieves state of the art on the binary subset of the Stanford Sentiment Treebank.

...read moreread less

Abstract: We explore the properties of byte-level recurrent language models. When given sufficient amounts of capacity, training data, and compute time, the representations learned by these models include disentangled features corresponding to high-level concepts. Specifically, we find a single unit which performs sentiment analysis. These representations, learned in an unsupervised manner, achieve state of the art on the binary subset of the Stanford Sentiment Treebank. They are also very data efficient. When using only a handful of labeled examples, our approach matches the performance of strong baselines trained on full datasets. We also demonstrate the sentiment unit has a direct influence on the generative process of the model. Simply fixing its value to be positive or negative generates samples with the corresponding positive or negative sentiment.

...read moreread less

452 citations

Posted Content•

Emergent Complexity via Multi-Agent Competition.

[...]

Trapit Bansal¹, Jakub Pachocki², Szymon Sidor³, Ilya Sutskever⁴, Igor Mordatch² - Show less +1 more•Institutions (4)

University of Massachusetts Amherst¹, OpenAI², Massachusetts Institute of Technology³, Google⁴

10 Oct 2017-arXiv: Artificial Intelligence

TL;DR: This work introduces several competitive multi-agent environments where agents compete in a 3D world with simulated physics and points out that such environments come with a natural curriculum, because for any skill level, an environment full of agents of this level will have the right level of difficulty.

...read moreread less

Abstract: Reinforcement learning algorithms can train agents that solve problems in complex, interesting environments. Normally, the complexity of the trained agent is closely related to the complexity of the environment. This suggests that a highly capable agent requires a complex environment for training. In this paper, we point out that a competitive multi-agent environment trained with self-play can produce behaviors that are far more complex than the environment itself. We also point out that such environments come with a natural curriculum, because for any skill level, an environment full of agents of this level will have the right level of difficulty. This work introduces several competitive multi-agent environments where agents compete in a 3D world with simulated physics. The trained agents learn a wide variety of complex and interesting skills, even though the environment themselves are relatively simple. The skills include behaviors such as running, blocking, ducking, tackling, fooling opponents, kicking, and defending using both arms and legs. A highlight of the learned behaviors can be found here: this https URL

...read moreread less

271 citations

Posted Content•

Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments

[...]

Maruan Al-Shedivat¹, Trapit Bansal², Yuri Burda³, Ilya Sutskever⁴, Igor Mordatch³, Pieter Abbeel⁵ - Show less +2 more•Institutions (5)

Carnegie Mellon University¹, University of Massachusetts Amherst², OpenAI³, Google⁴, University of California, Berkeley⁵

10 Oct 2017-arXiv: Learning

TL;DR: In this article, a gradient-based meta-learning algorithm is proposed for continuous adaptation in dynamically changing and adversarial scenarios, and the authors design a new multi-agent competitive environment, RoboSumo, and define iterated adaptation games for testing various aspects of continuous adaptation strategies.

...read moreread less

Abstract: Ability to continuously learn and adapt from limited experience in nonstationary environments is an important milestone on the path towards general intelligence. In this paper, we cast the problem of continuous adaptation into the learning-to-learn framework. We develop a simple gradient-based meta-learning algorithm suitable for adaptation in dynamically changing and adversarial scenarios. Additionally, we design a new multi-agent competitive environment, RoboSumo, and define iterated adaptation games for testing various aspects of continuous adaptation strategies. We demonstrate that meta-learning enables significantly more efficient adaptation than reactive baselines in the few-shot regime. Our experiments with a population of agents that learn and compete suggest that meta-learners are the fittest.

...read moreread less

209 citations

Proceedings Article•

Third-Person Imitation Learning

[...]

Bradly C. Stadie¹, Pieter Abbeel¹, Ilya Sutskever²•Institutions (2)

University of California, Berkeley¹, OpenAI²

06 Mar 2017

TL;DR: In this article, the authors present a method for unsupervised third-person imitation learning, where the agent is provided with a sequence of states and a specification of the actions that it should have taken.

...read moreread less

Abstract: Reinforcement learning (RL) makes it possible to train agents capable of achieving sophisticated goals in complex and uncertain environments. A key difficulty in reinforcement learning is specifying a reward function for the agent to optimize. Traditionally, imitation learning in RL has been used to overcome this problem. Unfortunately, hitherto imitation learning methods tend to require that demonstrations are supplied in the first-person: the agent is provided with a sequence of states and a specification of the actions that it should have taken. While powerful, this kind of imitation learning is limited by the relatively hard problem of collecting first-person demonstrations. Humans address this problem by learning from third-person demonstrations: they observe other humans perform tasks, infer the task, and accomplish the same task themselves. In this paper, we present a method for unsupervised third-person imitation learning. Here third-person refers to training an agent to correctly achieve a simple goal in a simple environment when it is provided a demonstration of a teacher achieving the same goal but from a different viewpoint; and unsupervised refers to the fact that the agent receives only these third-person demonstrations, and is not provided a correspondence between teacher states and student states. Our methods primary insight is that recent advances from domain confusion can be utilized to yield domain agnostic features which are crucial during the training process. To validate our approach, we report successful experiments on learning from third-person demonstrations in a pointmass domain, a reacher domain, and inverted pendulum.

...read moreread less

135 citations

Posted Content•

Third-Person Imitation Learning

[...]

Bradly C. Stadie¹, Pieter Abbeel¹, Ilya Sutskever²•Institutions (2)

University of California, Berkeley¹, OpenAI²

06 Mar 2017-arXiv: Learning

TL;DR: In this article, the authors present a method for unsupervised third-person imitation learning, where the agent is provided with a sequence of states and a specification of the actions that it should have taken.

...read moreread less

Abstract: Reinforcement learning (RL) makes it possible to train agents capable of achieving sophisticated goals in complex and uncertain environments. A key difficulty in reinforcement learning is specifying a reward function for the agent to optimize. Traditionally, imitation learning in RL has been used to overcome this problem. Unfortunately, hitherto imitation learning methods tend to require that demonstrations are supplied in the first-person: the agent is provided with a sequence of states and a specification of the actions that it should have taken. While powerful, this kind of imitation learning is limited by the relatively hard problem of collecting first-person demonstrations. Humans address this problem by learning from third-person demonstrations: they observe other humans perform tasks, infer the task, and accomplish the same task themselves. In this paper, we present a method for unsupervised third-person imitation learning. Here third-person refers to training an agent to correctly achieve a simple goal in a simple environment when it is provided a demonstration of a teacher achieving the same goal but from a different viewpoint; and unsupervised refers to the fact that the agent receives only these third-person demonstrations, and is not provided a correspondence between teacher states and student states. Our methods primary insight is that recent advances from domain confusion can be utilized to yield domain agnostic features which are crucial during the training process. To validate our approach, we report successful experiments on learning from third-person demonstrations in a pointmass domain, a reacher domain, and inverted pendulum.

...read moreread less

128 citations

Proceedings Article•

Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments

[...]

Maruan Al-Shedivat¹, Trapit Bansal², Yuri Burda³, Ilya Sutskever⁴, Igor Mordatch³, Pieter Abbeel⁵ - Show less +2 more•Institutions (5)

Carnegie Mellon University¹, University of Massachusetts Amherst², OpenAI³, Google⁴, University of California, Berkeley⁵

10 Oct 2017

TL;DR: A simple gradient-based meta-learning algorithm suitable for adaptation in dynamically changing and adversarial scenarios is developed and demonstrated that meta- learning enables significantly more efficient adaptation than reactive baselines in the few-shot regime.

...read moreread less

Abstract: Ability to continuously learn and adapt from limited experience in nonstationary environments is an important milestone on the path towards general intelligence. In this paper, we cast the problem of continuous adaptation into the learning-to-learn framework. We develop a simple gradient-based meta-learning algorithm suitable for adaptation in dynamically changing and adversarial scenarios. Additionally, we design a new multi-agent competitive environment, RoboSumo, and define iterated adaptation games for testing various aspects of continuous adaptation strategies. We demonstrate that meta-learning enables significantly more efficient adaptation than reactive baselines in the few-shot regime. Our experiments with a population of agents that learn and compete suggest that meta-learners are the fittest.

...read moreread less

77 citations

Proceedings Article•

Emergent Complexity via Multi-Agent Competition

[...]

Trapit Bansal¹, Jakub Pachocki², Szymon Sidor³, Ilya Sutskever⁴, Igor Mordatch² - Show less +1 more•Institutions (4)

University of Massachusetts Amherst¹, OpenAI², Massachusetts Institute of Technology³, Google⁴

10 Oct 2017

TL;DR: In this paper, the authors introduce several competitive multi-agent environments where agents compete in a 3D world with simulated physics, and the trained agents learn a wide variety of complex and interesting skills, even though the environment themselves are relatively simple.

...read moreread less

Abstract: Reinforcement learning algorithms can train agents that solve problems in complex, interesting environments. Normally, the complexity of the trained agent is closely related to the complexity of the environment. This suggests that a highly capable agent requires a complex environment for training. In this paper, we point out that a competitive multi-agent environment trained with self-play can produce behaviors that are far more complex than the environment itself. We also point out that such environments come with a natural curriculum, because for any skill level, an environment full of agents of this level will have the right level of difficulty. This work introduces several competitive multi-agent environments where agents compete in a 3D world with simulated physics. The trained agents learn a wide variety of complex and interesting skills, even though the environment themselves are relatively simple. The skills include behaviors such as running, blocking, ducking, tackling, fooling opponents, kicking, and defending using both arms and legs. A highlight of the learned behaviors can be found here: this https URL

...read moreread less

53 citations

Posted Content•

One-Shot Imitation Learning

[...]

Yan Duan, Marcin Andrychowicz, Bradly C. Stadie, Jonathan Ho, Jonas Schneider, Ilya Sutskever, Pieter Abbeel, Wojciech Zaremba - Show less +4 more

21 Mar 2017-arXiv: Artificial Intelligence

TL;DR: A meta-learning framework for achieving one-shot imitation learning, where ideally, robots should be able to learn from very few demonstrations of any given task, and instantly generalize to new situations of the same task, without requiring task-specific engineering.

...read moreread less

Abstract: Imitation learning has been commonly applied to solve different tasks in isolation. This usually requires either careful feature engineering, or a significant number of samples. This is far from what we desire: ideally, robots should be able to learn from very few demonstrations of any given task, and instantly generalize to new situations of the same task, without requiring task-specific engineering. In this paper, we propose a meta-learning framework for achieving such capability, which we call one-shot imitation learning. Specifically, we consider the setting where there is a very large set of tasks, and each task has many instantiations. For example, a task could be to stack all blocks on a table into a single tower, another task could be to place all blocks on a table into two-block towers, etc. In each case, different instances of the task would consist of different sets of blocks with different initial states. At training time, our algorithm is presented with pairs of demonstrations for a subset of all tasks. A neural net is trained that takes as input one demonstration and the current state (which initially is the initial state of the other demonstration of the pair), and outputs an action with the goal that the resulting sequence of states and actions matches as closely as possible with the second demonstration. At test time, a demonstration of a single instance of a new task is presented, and the neural net is expected to perform well on new instances of this new task. The use of soft attention allows the model to generalize to conditions and tasks unseen in the training data. We anticipate that by training this model on a much greater variety of tasks and settings, we will obtain a general system that can turn any demonstrations into robust policies that can accomplish an overwhelming variety of tasks. Videos available at this https URL .

...read moreread less

Proceedings Article•DOI•

Learning online alignments with continuous rewards policy gradient

[...]

Yuping Luo¹, Chung-Cheng Chiu², Navdeep Jaitly², Ilya Sutskever•Institutions (2)

Tsinghua University¹, Google²

05 Mar 2017

TL;DR: This work presents a new method for solving sequence-to-sequence problems using hard online alignments instead of soft offline alignments, which achieves encouraging performance on TIMIT and Wall Street Journal speech recognition datasets.

...read moreread less

Abstract: Sequence-to-sequence models with soft attention had significant success in machine translation, speech recognition, and question answering. Though capable and easy to use, they require that the entirety of the input sequence is available at the beginning of inference, an assumption that is not valid for instantaneous translation and speech recognition. To address this problem, we present a new method for solving sequence-to-sequence problems using hard online alignments instead of soft offline alignments. The online alignments model is able to start producing outputs without the need to first process the entire input sequence. A highly accurate online sequence-to-sequence model is useful because it can be used to build an accurate voice-based instantaneous translator. Our model uses hard binary stochastic decisions to select the timesteps at which outputs will be produced. The model is trained to produce these stochastic decisions using a standard policy gradient method. In our experiments, we show that this model achieves encouraging performance on TIMIT and Wall Street Journal (WSJ) speech recognition datasets.

...read moreread less

Patent•

Reinforcement learning using advantage estimates

[...]

Shixiang Gu¹, Timothy P. Lillicrap¹, Ilya Sutskever¹, Sergey Levine¹•Institutions (1)

Google¹

09 Feb 2017

Posted Content•

An online sequence-to-sequence model for noisy speech recognition

[...]

Chung-Cheng Chiu, Dieterich Lawson, Yuping Luo, George Tucker, Kevin Swersky, Ilya Sutskever, Navdeep Jaitly - Show less +3 more

16 Jun 2017-arXiv: Computation and Language

TL;DR: An improvement to online sequence-to-sequence model training, and its application to noisy settings with mixed speech from two speakers are highlighted.

...read moreread less

Abstract: Generative models have long been the dominant approach for speech recognition. The success of these models however relies on the use of sophisticated recipes and complicated machinery that is not easily accessible to non-practitioners. Recent innovations in Deep Learning have given rise to an alternative - discriminative models called Sequence-to-Sequence models, that can almost match the accuracy of state of the art generative models. While these models are easy to train as they can be trained end-to-end in a single step, they have a practical limitation that they can only be used for offline recognition. This is because the models require that the entirety of the input sequence be available at the beginning of inference, an assumption that is not valid for instantaneous speech recognition. To address this problem, online sequence-to-sequence models were recently introduced. These models are able to start producing outputs as data arrives, and the model feels confident enough to output partial transcripts. These models, like sequence-to-sequence are causal - the output produced by the model until any time, $t$, affects the features that are computed subsequently. This makes the model inherently more powerful than generative models that are unable to change features that are computed from the data. This paper highlights two main contributions - an improvement to online sequence-to-sequence model training, and its application to noisy settings with mixed speech from two speakers.

...read moreread less

Patent•

Recurrent neural networks for online sequence generation

[...]

Chung-Cheng Chiu¹, Navdeep Jaitly¹, Ilya Sutskever¹, Yuping Luo¹•Institutions (1)

Google¹

19 May 2017

Showing papers by "Ilya Sutskever published in 2017"