Home
/
Authors
/
Scott Fujimoto

Author

Scott Fujimoto

Bio: Scott Fujimoto is an academic researcher from McGill University. The author has contributed to research in topics: Reinforcement learning & Computer science. The author has an hindex of 9, co-authored 16 publications receiving 2232 citations.

Papers

PDF

Open Access

More filters

Posted Content•

Addressing Function Approximation Error in Actor-Critic Methods

[...]

Scott Fujimoto¹, Herke van Hoof², David Meger¹•Institutions (2)

McGill University¹, University of Amsterdam²

26 Feb 2018-arXiv: Artificial Intelligence

TL;DR: This paper builds on Double Q-learning, by taking the minimum value between a pair of critics to limit overestimation, and draws the connection between target networks and overestimation bias.

...read moreread less

Abstract: In value-based reinforcement learning methods such as deep Q-learning, function approximation errors are known to lead to overestimated value estimates and suboptimal policies. We show that this problem persists in an actor-critic setting and propose novel mechanisms to minimize its effects on both the actor and the critic. Our algorithm builds on Double Q-learning, by taking the minimum value between a pair of critics to limit overestimation. We draw the connection between target networks and overestimation bias, and suggest delaying policy updates to reduce per-update error and further improve performance. We evaluate our method on the suite of OpenAI gym tasks, outperforming the state of the art in every environment tested.

...read moreread less

1,968 citations

Proceedings Article•

Addressing Function Approximation Error in Actor-Critic Methods

[...]

Scott Fujimoto¹, Herke van Hoof², David Meger¹•Institutions (2)

McGill University¹, University of Amsterdam²

03 Jul 2018

TL;DR: In this paper, the authors show that the overestimation bias persists in an actor-critic setting and propose novel mechanisms to minimize its effects on both the actor and the critic.

...read moreread less

954 citations

Proceedings Article•

Off-Policy Deep Reinforcement Learning without Exploration

[...]

Scott Fujimoto¹, David Meger¹, Doina Precup¹•Institutions (1)

McGill University¹

24 May 2019

TL;DR: This paper introduces a novel class of off-policy algorithms, batch-constrained reinforcement learning, which restricts the action space in order to force the agent towards behaving close to on-policy with respect to a subset of the given data.

...read moreread less

Abstract: Many practical applications of reinforcement learning constrain agents to learn from a fixed batch of data which has already been gathered, without offering further possibility for data collection. In this paper, we demonstrate that due to errors introduced by extrapolation, standard off-policy deep reinforcement learning algorithms, such as DQN and DDPG, are incapable of learning with data uncorrelated to the distribution under the current policy, making them ineffective for this fixed batch setting. We introduce a novel class of off-policy algorithms, batch-constrained reinforcement learning, which restricts the action space in order to force the agent towards behaving close to on-policy with respect to a subset of the given data. We present the first continuous control deep reinforcement learning algorithm which can learn effectively from arbitrary, fixed batch data, and empirically demonstrate the quality of its behavior in several tasks.

...read moreread less

853 citations

Posted Content•

Benchmarking Batch Deep Reinforcement Learning Algorithms.

[...]

Scott Fujimoto¹, Edoardo Conti, Mohammad Ghavamzadeh, Joelle Pineau•Institutions (1)

McGill University¹

03 Oct 2019-arXiv: Learning

TL;DR: This paper benchmark the performance of recent off-policy and batch reinforcement learning algorithms under unified settings on the Atari domain, with data generated by a single partially-trained behavioral policy, and finds that many of these algorithms underperform DQN trained online with the same amount of data.

...read moreread less

Abstract: Widely-used deep reinforcement learning algorithms have been shown to fail in the batch setting--learning from a fixed data set without interaction with the environment. Following this result, there have been several papers showing reasonable performances under a variety of environments and batch settings. In this paper, we benchmark the performance of recent off-policy and batch reinforcement learning algorithms under unified settings on the Atari domain, with data generated by a single partially-trained behavioral policy. We find that under these conditions, many of these algorithms underperform DQN trained online with the same amount of data, as well as the partially-trained behavioral policy. To introduce a strong baseline, we adapt the Batch-Constrained Q-learning algorithm to a discrete-action setting, and show it outperforms all existing algorithms at this task.

...read moreread less

150 citations

Proceedings Article•

GEOMetrics: Exploiting Geometric Structure for Graph-Encoded Objects

[...]

Edward J. Smith¹, Scott Fujimoto¹, Adriana Romero¹, David Meger¹•Institutions (1)

McGill University¹

31 Jan 2019

TL;DR: In this paper, a graph convolutional update preserving vertex information and an adaptive splitting heuristic allowing detail to emerge is proposed for 3D object reconstruction from images with the ShapeNet dataset.

...read moreread less

Abstract: Mesh models are a promising approach for encoding the structure of 3D objects. Current mesh reconstruction systems predict uniformly distributed vertex locations of a predetermined graph through a series of graph convolutions, leading to compromises with respect to performance or resolution. In this paper, we argue that the graph representation of geometric objects allows for additional structure, which should be leveraged for enhanced reconstruction. Thus, we propose a system which properly benefits from the advantages of the geometric structure of graph encoded objects by introducing (1) a graph convolutional update preserving vertex information; (2) an adaptive splitting heuristic allowing detail to emerge; and (3) a training objective operating both on the local surfaces defined by vertices as well as the global structure defined by the mesh. Our proposed method is evaluated on the task of 3D object reconstruction from images with the ShapeNet dataset, where we demonstrate state of the art performance, both visually and numerically, while having far smaller space requirements by generating adaptive meshes

...read moreread less

49 citations

1
2
3
4
…

Cited by

PDF

Open Access

More filters

On robust estimation of the location parameter

[...]

Frederick R. Forst

01 Jan 1980

3,652 citations

Proceedings Article•

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

[...]

Tuomas Haarnoja¹, Aurick Zhou¹, Pieter Abbeel¹, Sergey Levine¹•Institutions (1)

University of California, Berkeley¹

03 Jul 2018

TL;DR: This paper proposes soft actor-critic, an off-policy actor-Critic deep RL algorithm based on the maximum entropy reinforcement learning framework, and achieves state-of-the-art performance on a range of continuous control benchmark tasks, outperforming prior on-policy and off- policy methods.

...read moreread less

Abstract: Model-free deep reinforcement learning (RL) algorithms have been demonstrated on a range of challenging decision making and control tasks. However, these methods typically suffer from two major challenges: very high sample complexity and brittle convergence properties, which necessitate meticulous hyperparameter tuning. Both of these challenges severely limit the applicability of such methods to complex, real-world domains. In this paper, we propose soft actor-critic, an off-policy actor-critic deep RL algorithm based on the maximum entropy reinforcement learning framework. In this framework, the actor aims to maximize expected reward while also maximizing entropy. That is, to succeed at the task while acting as randomly as possible. Prior deep RL methods based on this framework have been formulated as Q-learning methods. By combining off-policy updates with a stable stochastic actor-critic formulation, our method achieves state-of-the-art performance on a range of continuous control benchmark tasks, outperforming prior on-policy and off-policy methods. Furthermore, we demonstrate that, in contrast to other off-policy algorithms, our approach is very stable, achieving very similar performance across different random seeds.

...read moreread less

1,500 citations

Posted Content•

Occupancy Networks: Learning 3D Reconstruction in Function Space

[...]

Lars Mescheder¹, Michael Oechsle¹, Michael Niemeyer¹, Sebastian Nowozin², Andreas Geiger¹ - Show less +1 more•Institutions (2)

University of Tübingen¹, Google²

10 Dec 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper proposes Occupancy Networks, a new representation for learning-based 3D reconstruction methods that encodes a description of the 3D output at infinite resolution without excessive memory footprint, and validate that the representation can efficiently encode 3D structure and can be inferred from various kinds of input.

...read moreread less

Abstract: With the advent of deep neural networks, learning-based approaches for 3D reconstruction have gained popularity. However, unlike for images, in 3D there is no canonical representation which is both computationally and memory efficient yet allows for representing high-resolution geometry of arbitrary topology. Many of the state-of-the-art learning-based 3D reconstruction approaches can hence only represent very coarse 3D geometry or are limited to a restricted domain. In this paper, we propose Occupancy Networks, a new representation for learning-based 3D reconstruction methods. Occupancy networks implicitly represent the 3D surface as the continuous decision boundary of a deep neural network classifier. In contrast to existing approaches, our representation encodes a description of the 3D output at infinite resolution without excessive memory footprint. We validate that our representation can efficiently encode 3D structure and can be inferred from various kinds of input. Our experiments demonstrate competitive results, both qualitatively and quantitatively, for the challenging tasks of 3D reconstruction from single images, noisy point clouds and coarse discrete voxel grids. We believe that occupancy networks will become a useful tool in a wide variety of learning-based 3D tasks.

...read moreread less

1,212 citations

Posted Content•

Soft Actor-Critic Algorithms and Applications

[...]

Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, Sergey Levine - Show less +7 more

13 Dec 2018-arXiv: Learning

TL;DR: Soft Actor-Critic (SAC), the recently introduced off-policy actor-critic algorithm based on the maximum entropy RL framework, achieves state-of-the-art performance, outperforming prior on-policy and off- policy methods in sample-efficiency and asymptotic performance.

...read moreread less

Abstract: Model-free deep reinforcement learning (RL) algorithms have been successfully applied to a range of challenging sequential decision making and control tasks. However, these methods typically suffer from two major challenges: high sample complexity and brittleness to hyperparameters. Both of these challenges limit the applicability of such methods to real-world domains. In this paper, we describe Soft Actor-Critic (SAC), our recently introduced off-policy actor-critic algorithm based on the maximum entropy RL framework. In this framework, the actor aims to simultaneously maximize expected return and entropy. That is, to succeed at the task while acting as randomly as possible. We extend SAC to incorporate a number of modifications that accelerate training and improve stability with respect to the hyperparameters, including a constrained formulation that automatically tunes the temperature hyperparameter. We systematically evaluate SAC on a range of benchmark tasks, as well as real-world challenging tasks such as locomotion for a quadrupedal robot and robotic manipulation with a dexterous hand. With these improvements, SAC achieves state-of-the-art performance, outperforming prior on-policy and off-policy methods in sample-efficiency and asymptotic performance. Furthermore, we demonstrate that, in contrast to other off-policy algorithms, our approach is very stable, achieving similar performance across different random seeds. These results suggest that SAC is a promising candidate for learning in real-world robotics tasks.

...read moreread less

1,209 citations

Proceedings Article•DOI•

Occupancy Networks: Learning 3D Reconstruction in Function Space

[...]

Lars Mescheder¹, Michael Oechsle¹, Michael Niemeyer¹, Sebastian Nowozin², Andreas Geiger¹ - Show less +1 more•Institutions (2)

University of Tübingen¹, Google²

15 Jun 2019

TL;DR: In this paper, the authors propose Occupancy Networks, which implicitly represent the 3D surface as the continuous decision boundary of a deep neural network classifier, which can be used for learning-based 3D reconstruction methods.

...read moreread less

1,192 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse