scispace - formally typeset
P

Pieter Abbeel

Researcher at University of California, Berkeley

Publications -  672
Citations -  99807

Pieter Abbeel is an academic researcher from University of California, Berkeley. The author has contributed to research in topics: Reinforcement learning & Computer science. The author has an hindex of 126, co-authored 589 publications receiving 70911 citations. Previous affiliations of Pieter Abbeel include Facebook & University of California.

Papers
More filters
Proceedings Article

Model-agnostic meta-learning for fast adaptation of deep networks

TL;DR: An algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning problems, including classification, regression, and reinforcement learning is proposed.
Proceedings Article

Trust Region Policy Optimization

TL;DR: A method for optimizing control policies, with guaranteed monotonic improvement, by making several approximations to the theoretically-justified scheme, called Trust Region Policy Optimization (TRPO).
Posted Content

Trust Region Policy Optimization

TL;DR: Trust Region Policy Optimization (TRPO) as mentioned in this paper is an iterative procedure for optimizing policies, with guaranteed monotonic improvement, which is similar to natural policy gradient methods and is effective for optimizing large nonlinear policies such as neural networks.
Posted Content

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

TL;DR: In this article, an off-policy actor-critic deep RL algorithm based on the maximum entropy reinforcement learning framework is proposed, where the actor aims to maximize expected reward while also maximizing entropy.
Proceedings ArticleDOI

Apprenticeship learning via inverse reinforcement learning

TL;DR: This work thinks of the expert as trying to maximize a reward function that is expressible as a linear combination of known features, and gives an algorithm for learning the task demonstrated by the expert, based on using "inverse reinforcement learning" to try to recover the unknown reward function.