Home
/
Authors
/
Simon Schmitt

Author

Simon Schmitt

Bio: Simon Schmitt is an academic researcher from Google. The author has contributed to research in topics: Reinforcement learning & Artificial neural network. The author has an hindex of 8, co-authored 14 publications receiving 920 citations.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

[...]

Julian Schrittwieser, Ioannis Antonoglou¹, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel¹, Timothy P. Lillicrap, David Silver¹ - Show less +8 more•Institutions (1)

University College London¹

19 Nov 2019-arXiv: Learning

TL;DR: The MuZero algorithm is presented, which, by combining a tree-based search with a learned model, achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics.

...read moreread less

Abstract: Constructing agents with planning capabilities has long been one of the main challenges in the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge success in challenging domains, such as chess and Go, where a perfect simulator is available. However, in real-world problems the dynamics governing the environment are often complex and unknown. In this work we present the MuZero algorithm which, by combining a tree-based search with a learned model, achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics. MuZero learns a model that, when applied iteratively, predicts the quantities most directly relevant to planning: the reward, the action-selection policy, and the value function. When evaluated on 57 different Atari games - the canonical video game environment for testing AI techniques, in which model-based planning approaches have historically struggled - our new algorithm achieved a new state of the art. When evaluated on Go, chess and shogi, without any knowledge of the game rules, MuZero matched the superhuman performance of the AlphaZero algorithm that was supplied with the game rules.

...read moreread less

855 citations

Journal Article•DOI•

Mastering Atari, Go, chess and shogi by planning with a learned model

[...]

University College London¹

23 Dec 2020-Nature

TL;DR: MuZero as discussed by the authors is a reinforcement learning algorithm that combines a tree-based search with a learned model to achieve state-of-the-art performance in high-performance planning and visually complex domains.

...read moreread less

Abstract: Constructing agents with planning capabilities has long been one of the main challenges in the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge success in challenging domains, such as chess1 and Go2, where a perfect simulator is available. However, in real-world problems, the dynamics governing the environment are often complex and unknown. Here we present the MuZero algorithm, which, by combining a tree-based search with a learned model, achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics. The MuZero algorithm learns an iterable model that produces predictions relevant to planning: the action-selection policy, the value function and the reward. When evaluated on 57 different Atari games3—the canonical video game environment for testing artificial intelligence techniques, in which model-based planning approaches have historically struggled4—the MuZero algorithm achieved state-of-the-art performance. When evaluated on Go, chess and shogi—canonical environments for high-performance planning—the MuZero algorithm matched, without any knowledge of the game dynamics, the superhuman performance of the AlphaZero algorithm5 that was supplied with the rules of the game. A reinforcement-learning algorithm that combines a tree-based search with a learned model achieves superhuman performance in high-performance planning and visually complex domains, without any knowledge of their underlying dynamics.

...read moreread less

679 citations

Journal Article•DOI•

Multi-task Deep Reinforcement Learning with PopArt

[...]

Matteo Hessel, Hubert Soyer, Lasse Espeholt, Wojciech Marian Czarnecki, Simon Schmitt, Hado van Hasselt - Show less +2 more

17 Jul 2019

TL;DR: This work proposes to automatically adapt the contribution of each task to the agent’s updates, so that all tasks have a similar impact on the learning dynamics, and learns a single trained policy that exceeds median human performance on this multi-task domain.

...read moreread less

Abstract: The reinforcement learning (RL) community has made great strides in designing algorithms capable of exceeding human performance on specific tasks. These algorithms are mostly trained one task at the time, each new task requiring to train a brand new agent instance. This means the learning algorithm is general, but each solution is not; each agent can only solve the one task it was trained on. In this work, we study the problem of learning to master not one but multiple sequentialdecision tasks at once. A general issue in multi-task learning is that a balance must be found between the needs of multiple tasks competing for the limited resources of a single learning system. Many learning algorithms can get distracted by certain tasks in the set of tasks to solve. Such tasks appear more salient to the learning process, for instance because of the density or magnitude of the in-task rewards. This causes the algorithm to focus on those salient tasks at the expense of generality. We propose to automatically adapt the contribution of each task to the agent’s updates, so that all tasks have a similar impact on the learning dynamics. This resulted in state of the art performance on learning to play all games in a set of 57 diverse Atari games. Excitingly, our method learned a single trained policy - with a single set of weights - that exceeds median human performance. To our knowledge, this was the first time a single agent surpassed human-level performance on this multi-task domain. The same approach also demonstrated state of the art performance on a set of 30 tasks in the 3D reinforcement learning platform DeepMind Lab.

...read moreread less

244 citations

Posted Content•

Kickstarting Deep Reinforcement Learning

[...]

Simon Schmitt, J. J. Hudson, Augustin Zidek, Simon Osindero, Carl Doersch, Wojciech Marian Czarnecki, Joel Z. Leibo, Heinrich Küttler, Andrew Zisserman, Karen Simonyan, S. M. Ali Eslami - Show less +7 more

10 Mar 2018-arXiv: Learning

TL;DR: It is shown that, on a challenging and computationally-intensive multi-task benchmark (DMLab-30), kickstarted training improves the data efficiency of new agents, making it significantly easier to iterate on their design.

...read moreread less

Abstract: We present a method for using previously-trained 'teacher' agents to kickstart the training of a new 'student' agent. To this end, we leverage ideas from policy distillation and population based training. Our method places no constraints on the architecture of the teacher or student agents, and it regulates itself to allow the students to surpass their teachers in performance. We show that, on a challenging and computationally-intensive multi-task benchmark (DMLab-30), kickstarted training improves the data efficiency of new agents, making it significantly easier to iterate on their design. We also show that the same kickstarting pipeline can allow a single student agent to leverage multiple 'expert' teachers which specialize on individual tasks. In this setting kickstarting yields surprisingly large gains, with the kickstarted agent matching the performance of an agent trained from scratch in almost 10x fewer steps, and surpassing its final performance by 42 percent. Kickstarting is conceptually simple and can easily be incorporated into reinforcement learning experiments.

...read moreread less

96 citations

Proceedings Article•

Off-Policy Actor-Critic with Shared Experience Replay

[...]

Simon Schmitt, Matteo Hessel, Karen Simonyan

12 Jul 2020

TL;DR: This work analyzes the bias-variance tradeoffs in V- Trace, a form of importance sampling for actor-critic methods, and argues for mixing experience sampled from replay with on-policy experience, and proposes a new trust region scheme that scales effectively to data distributions where V-trace becomes unstable.

...read moreread less

Abstract: We investigate the combination of actor-critic reinforcement learning algorithms with uniform large-scale experience replay and propose solutions for two challenges: (a) efficient actor-critic learning with experience replay (b) stability of off-policy learning where agents learn from other agents behaviour. We employ those insights to accelerate hyper-parameter sweeps in which all participating agents run concurrently and share their experience via a common replay module. To this end we analyze the bias-variance tradeoffs in V-trace, a form of importance sampling for actor-critic methods. Based on our analysis, we then argue for mixing experience sampled from replay with on-policy experience, and propose a new trust region scheme that scales effectively to data distributions where V-trace becomes unstable. We provide extensive empirical validation of the proposed solution. We further show the benefits of this setup by demonstrating state-of-the-art data efficiency on Atari among agents trained up until 200M environment frames.

...read moreread less

39 citations

1
2
3
4
…

Cited by

PDF

Open Access

More filters

Posted Content•

A Survey on Multi-Task Learning

[...]

Yu Zhang¹, Qiang Yang¹•Institutions (1)

Hong Kong University of Science and Technology¹

25 Jul 2017-arXiv: Learning

TL;DR: Multi-task learning (MTL) as mentioned in this paper is a learning paradigm in machine learning and its aim is to leverage useful information contained in multiple related tasks to help improve the generalization performance of all the tasks.

...read moreread less

Abstract: Multi-Task Learning (MTL) is a learning paradigm in machine learning and its aim is to leverage useful information contained in multiple related tasks to help improve the generalization performance of all the tasks. In this paper, we give a survey for MTL from the perspective of algorithmic modeling, applications and theoretical analyses. For algorithmic modeling, we give a definition of MTL and then classify different MTL algorithms into five categories, including feature learning approach, low-rank approach, task clustering approach, task relation learning approach and decomposition approach as well as discussing the characteristics of each approach. In order to improve the performance of learning tasks further, MTL can be combined with other learning paradigms including semi-supervised learning, active learning, unsupervised learning, reinforcement learning, multi-view learning and graphical models. When the number of tasks is large or the data dimensionality is high, we review online, parallel and distributed MTL models as well as dimensionality reduction and feature hashing to reveal their computational and storage advantages. Many real-world applications use MTL to boost their performance and we review representative works in this paper. Finally, we present theoretical analyses and discuss several future directions for MTL.

...read moreread less

1,202 citations

Journal Article•DOI•

Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

[...]

University College London¹

19 Nov 2019-arXiv: Learning

...read moreread less

855 citations

Journal Article•DOI•

Mastering Atari, Go, chess and shogi by planning with a learned model

[...]

University College London¹

23 Dec 2020-Nature

...read moreread less

Abstract: Constructing agents with planning capabilities has long been one of the main challenges in the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge success in challenging domains, such as chess1 and Go2, where a perfect simulator is available. However, in real-world problems, the dynamics governing the environment are often complex and unknown. Here we present the MuZero algorithm, which, by combining a tree-based search with a learned model, achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics. The MuZero algorithm learns an iterable model that produces predictions relevant to planning: the action-selection policy, the value function and the reward. When evaluated on 57 different Atari games3—the canonical video game environment for testing artificial intelligence techniques, in which model-based planning approaches have historically struggled4—the MuZero algorithm achieved state-of-the-art performance. When evaluated on Go, chess and shogi—canonical environments for high-performance planning—the MuZero algorithm matched, without any knowledge of the game dynamics, the superhuman performance of the AlphaZero algorithm5 that was supplied with the rules of the game. A reinforcement-learning algorithm that combines a tree-based search with a learned model achieves superhuman performance in high-performance planning and visually complex domains, without any knowledge of their underlying dynamics.

...read moreread less

679 citations

Journal Article•DOI•

Toward Causal Representation Learning

[...]

Bernhard Schölkopf¹, Francesco Locatello¹, Stefan Bauer¹, Nan Rosemary Ke, Nal Kalchbrenner², Anirudh Goyal, Yoshua Bengio - Show less +3 more•Institutions (2)

Max Planck Society¹, Google²

26 Feb 2021

TL;DR: The authors reviewed fundamental concepts of causal inference and related them to crucial open problems of machine learning, including transfer and generalization, thereby assaying how causality can contribute to modern machine learning research.

...read moreread less

Abstract: The two fields of machine learning and graphical causality arose and are developed separately. However, there is, now, cross-pollination and increasing interest in both fields to benefit from the advances of the other. In this article, we review fundamental concepts of causal inference and relate them to crucial open problems of machine learning, including transfer and generalization, thereby assaying how causality can contribute to modern machine learning research. This also applies in the opposite direction: we note that most work in causality starts from the premise that the causal variables are given. A central problem for AI and causality is, thus, causal representation learning, that is, the discovery of high-level causal variables from low-level observations. Finally, we delineate some implications of causality for machine learning and propose key research areas at the intersection of both communities.

...read moreread less

601 citations

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

[...]

Tianhe Yu, Deirdre Quillen, Zhanpeng He, Ryan Julian, Karol Hausman, Chelsea Finn, Sergey Levine - Show less +3 more

24 Oct 2019

TL;DR: An open-source simulated benchmark for meta-reinforcement learning and multi-task learning consisting of 50 distinct robotic manipulation tasks is proposed to make it possible to develop algorithms that generalize to accelerate the acquisition of entirely new, held-out tasks.

...read moreread less

487 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse