F
Filip Wolski
Researcher at OpenAI
Publications - 7
Citations - 12039
Filip Wolski is an academic researcher from OpenAI. The author has contributed to research in topics: Reinforcement learning & Hindsight bias. The author has an hindex of 6, co-authored 7 publications receiving 7372 citations.
Papers
More filters
Posted Content
Proximal Policy Optimization Algorithms
TL;DR: A new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent, are proposed.
Proceedings Article
Hindsight Experience Replay
Marcin Andrychowicz,Filip Wolski,Alex Ray,Jonas Schneider,Rachel Fong,Peter Welinder,Bob McGrew,Josh Tobin,OpenAI Pieter Abbeel,Wojciech Zaremba +9 more
TL;DR: A novel technique is presented which allows sample-efficient learning from rewards which are sparse and binary and therefore avoid the need for complicated reward engineering and may be seen as a form of implicit curriculum.
Posted Content
Dota 2 with Large Scale Deep Reinforcement Learning
Christopher Berner,Greg Brockman,Brooke Chan,Vicki Cheung,Przemyslaw Debiak,Christy Dennison,David Farhi,Quirin Fischer,Shariq Hashme,Christopher Hesse,Rafal Jozefowicz,Scott Gray,Catherine Olsson,Jakub Pachocki,Michael Petrov,Henrique Ponde de Oliveira Pinto,Jonathan Raiman,Tim Salimans,Jeremy Schlatter,Jonas Schneider,Szymon Sidor,Ilya Sutskever,Jie Tang,Filip Wolski,Susan Zhang +24 more
TL;DR: By defeating the Dota 2 world champion (Team OG), OpenAI Five demonstrates that self-play reinforcement learning can achieve superhuman performance on a difficult task.
Posted Content
Hindsight Experience Replay
Marcin Andrychowicz,Filip Wolski,Alex Ray,Jonas Schneider,Rachel Fong,Peter Welinder,Bob McGrew,Josh Tobin,Pieter Abbeel,Wojciech Zaremba +9 more
TL;DR: In this paper, a technique called hindsight experience replay is proposed to learn from rewards which are sparse and binary and therefore avoid the need for complicated reward engineering, which can be combined with an arbitrary off-policy algorithm and may be seen as a form of implicit curriculum.
Proceedings Article
Evolved Policy Gradients
Rein Houthooft,Richard Chen,Phillip Isola,Bradly C. Stadie,Filip Wolski,Jonathan Ho,Pieter Abbeel +6 more
TL;DR: Empirical results show that the evolved policy gradient algorithm (EPG) achieves faster learning on several randomized environments compared to an off-the-shelf policy gradient method, and its learned loss can generalize to out-of-distribution test time tasks, and exhibits qualitatively different behavior from other popular metalearning algorithms.