I
Ilya Sutskever
Researcher at OpenAI
Publications - 137
Citations - 294374
Ilya Sutskever is an academic researcher from OpenAI. The author has contributed to research in topics: Artificial neural network & Reinforcement learning. The author has an hindex of 75, co-authored 131 publications receiving 235539 citations. Previous affiliations of Ilya Sutskever include Google & University of Toronto.
Papers
More filters
Proceedings Article
One-Shot Imitation Learning
Yan Duan,Marcin Andrychowicz,Bradly C. Stadie,OpenAI Jonathan Ho,Jonas Schneider,Ilya Sutskever,Pieter Abbeel,Wojciech Zaremba +7 more
TL;DR: One-shot imitation learning as mentioned in this paper is a meta-learning framework for learning from very few demonstrations of any given task and instantly generalizing to new situations of the same task, without requiring task-specific engineering.
Proceedings Article
Multi-task Sequence to Sequence Learning
TL;DR: The results show that training on a small amount of parsing and image caption data can improve the translation quality between English and German by up to 1.5 BLEU points over strong single-task baselines on the WMT benchmarks, and reveal interesting properties of the two unsupervised learning objectives, autoencoder and skip-thought, in the MTL context.
Posted Content
Learning To Generate Reviews and Discovering Sentiment
TL;DR: The properties of byte-level recurrent language models are explored and a single unit which performs sentiment analysis is found which achieves state of the art on the binary subset of the Stanford Sentiment Treebank.
Proceedings Article
The Recurrent Temporal Restricted Boltzmann Machine
TL;DR: The Recurrent TRBM is introduced, which is a very slight modification of the TRBM for which exact inference is very easy and exact gradient learning is almost tractable.
Proceedings Article
Continuous deep Q-learning with model-based acceleration
TL;DR: This paper derives a continuous variant of the Q-learning algorithm, which it is called normalized advantage functions (NAF), as an alternative to the more commonly used policy gradient and actor-critic methods, and substantially improves performance on a set of simulated robotic control tasks.