scispace - formally typeset
G

Garrett Thomas

Researcher at Stanford University

Publications -  15
Citations -  1254

Garrett Thomas is an academic researcher from Stanford University. The author has contributed to research in topics: Reinforcement learning & Task (project management). The author has an hindex of 9, co-authored 12 publications receiving 741 citations. Previous affiliations of Garrett Thomas include University of California, Berkeley.

Papers
More filters
Proceedings Article

MOPO: Model-based Offline Policy Optimization

TL;DR: Model-based offline policy optimization (MOPO) as discussed by the authors proposes to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics and theoretically shows that the algorithm maximizes a lower bound of the policy's return under the true MDP.
Posted Content

Value Iteration Networks

TL;DR: The Value Iteration Network (VIN) as discussed by the authors is a differentiable approximation of the value iteration algorithm, which can be represented as a convolutional neural network and trained end-to-end using standard backpropagation.
Proceedings Article

Value iteration networks

TL;DR: The value iteration network (VIN) as mentioned in this paper is a differentiable approximation of the value iteration algorithm, which can be represented as a convolutional neural network, and trained end-to-end using standard backpropagation.
Proceedings ArticleDOI

Learning Robotic Assembly from CAD

TL;DR: This work exploits the fact that in modern assembly domains, geometric information about the task is readily available via the CAD design files, and proposes a neural network architecture that can learn to track the motion plan, thereby generalizing the assembly controller to changes in the object positions.
Posted Content

MOPO: Model-based Offline Policy Optimization

TL;DR: A new model-based offline RL algorithm is proposed that applies the variance of a Lipschitz-regularized model as a penalty to the reward function, and it is found that this algorithm outperforms both standard model- based RL methods and existing state-of-the-art model-free offline RL approaches on existing offline RL benchmarks, as well as two challenging continuous control tasks.