Home
/
Authors
/
Todd Hester

Author

Todd Hester

Other affiliations: Spaulding Rehabilitation Hospital, Amazon.com, University of Texas at Austin ...read more

Bio: Todd Hester is an academic researcher from Google. The author has contributed to research in topics: Reinforcement learning & Robot. The author has an hindex of 26, co-authored 65 publications receiving 3176 citations. Previous affiliations of Todd Hester include Spaulding Rehabilitation Hospital & Amazon.com.

Topics: Reinforcement learning, Robot, Markov decision process, Setpoint, Robot learning ...read more

Papers published on a yearly basis

2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006

Papers

PDF

Open Access

More filters

Posted Content•

Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards

[...]

Matej Vecerík, Todd Hester, Jonathan Scholz, Fumin Wang, Olivier Pietquin, Bilal Piot, Nicolas Heess, Thomas Rothörl, Thomas Lampe, Martin Riedmiller - Show less +6 more

27 Jul 2017-arXiv: Artificial Intelligence

TL;DR: A general and model-free approach for Reinforcement Learning on real robotics with sparse rewards built upon the Deep Deterministic Policy Gradient algorithm to use demonstrations that out-performs DDPG, and does not require engineered rewards.

...read moreread less

Abstract: We propose a general and model-free approach for Reinforcement Learning (RL) on real robotics with sparse rewards. We build upon the Deep Deterministic Policy Gradient (DDPG) algorithm to use demonstrations. Both demonstrations and actual interactions are used to fill a replay buffer and the sampling ratio between demonstrations and transitions is automatically tuned via a prioritized replay mechanism. Typically, carefully engineered shaping rewards are required to enable the agents to efficiently explore on high dimensional control problems such as robotics. They are also required for model-based acceleration methods relying on local solvers such as iLQG (e.g. Guided Policy Search and Normalized Advantage Function). The demonstrations replace the need for carefully engineered rewards, and reduce the exploration problem encountered by classical RL approaches in these domains. Demonstrations are collected by a robot kinesthetically force-controlled by a human demonstrator. Results on four simulated insertion tasks show that DDPG from demonstrations out-performs DDPG, and does not require engineered rewards. Finally, we demonstrate the method on a real robotics task consisting of inserting a clip (flexible object) into a rigid object.

...read moreread less

514 citations

Posted Content•

Deep Q-learning from Demonstrations

[...]

Todd Hester, Matej Vecerík, Olivier Pietquin, Marc Lanctot, Tom Schaul, Bilal Piot, Dan Horgan, John Quan, Andrew Sendonaris, Gabriel Dulac-Arnold, Ian Osband, John P. Agapiou, Joel Z. Leibo, Audrunas Gruslys - Show less +10 more

12 Apr 2017-arXiv: Artificial Intelligence

TL;DR: Deep Q-learning from Demonstrations (DQfD) as mentioned in this paper leverages small sets of demonstration data to massively accelerate the learning process, and is able to automatically assess the necessary ratio of demonstrating data while learning thanks to a prioritized replay mechanism.

...read moreread less

Abstract: Deep reinforcement learning (RL) has achieved several high profile successes in difficult decision-making problems. However, these algorithms typically require a huge amount of data before they reach reasonable performance. In fact, their performance during learning can be extremely poor. This may be acceptable for a simulator, but it severely limits the applicability of deep RL to many real-world tasks, where the agent must learn in the real environment. In this paper we study a setting where the agent may access data from previous control of the system. We present an algorithm, Deep Q-learning from Demonstrations (DQfD), that leverages small sets of demonstration data to massively accelerate the learning process even from relatively small amounts of demonstration data and is able to automatically assess the necessary ratio of demonstration data while learning thanks to a prioritized replay mechanism. DQfD works by combining temporal difference updates with supervised classification of the demonstrator's actions. We show that DQfD has better initial performance than Prioritized Dueling Double Deep Q-Networks (PDD DQN) as it starts with better scores on the first million steps on 41 of 42 games and on average it takes PDD DQN 83 million steps to catch up to DQfD's performance. DQfD learns to out-perform the best demonstration given in 14 of 42 games. In addition, DQfD leverages human demonstrations to achieve state-of-the-art results for 11 games. Finally, we show that DQfD performs better than three related algorithms for incorporating demonstration data into DQN.

...read moreread less

481 citations

Posted Content•

Challenges of Real-World Reinforcement Learning

[...]

Gabriel Dulac-Arnold¹, Daniel J. Mankowitz, Todd Hester•Institutions (1)

Google¹

29 Apr 2019-arXiv: Learning

TL;DR: A set of nine unique challenges that must be addressed to productionize RL to real world problems are presented and an example domain that has been modified to present these challenges as a testbed for practical RL research is presented.

...read moreread less

Abstract: Reinforcement learning (RL) has proven its worth in a series of artificial domains, and is beginning to show some successes in real-world scenarios. However, much of the research advances in RL are often hard to leverage in real-world systems due to a series of assumptions that are rarely satisfied in practice. We present a set of nine unique challenges that must be addressed to productionize RL to real world problems. For each of these challenges, we specify the exact meaning of the challenge, present some approaches from the literature, and specify some metrics for evaluating that challenge. An approach that addresses all nine challenges would be applicable to a large number of real world problems. We also present an example domain that has been modified to present these challenges as a testbed for practical RL research.

...read moreread less

380 citations

Proceedings Article•

Deep Q-learning From Demonstrations.

[...]

Todd Hester¹, Matej Vecerík¹, Olivier Pietquin¹, Marc Lanctot¹, Tom Schaul¹, Bilal Piot¹, Dan Horgan¹, John Quan¹, Andrew Sendonaris¹, Ian Osband¹, Gabriel Dulac-Arnold¹, John P. Agapiou¹, Joel Z. Leibo¹, Audrunas Gruslys¹ - Show less +10 more•Institutions (1)

Google¹

29 Apr 2018

TL;DR: Deep Q-learning from Demonstrations (DQfD) as discussed by the authors leverages small sets of demonstration data to massively accelerate the learning process, and is able to automatically assess the necessary ratio of demonstrating data while learning thanks to a prioritized replay mechanism.

...read moreread less

Abstract: Deep reinforcement learning (RL) has achieved several high profile successes in difficult decision-making problems. However, these algorithms typically require a huge amount of data before they reach reasonable performance. In fact, their performance during learning can be extremely poor. This may be acceptable for a simulator, but it severely limits the applicability of deep RL to many real-world tasks, where the agent must learn in the real environment. In this paper we study a setting where the agent may access data from previous control of the system. We present an algorithm, Deep Q-learning from Demonstrations (DQfD), that leverages small sets of demonstration data to massively accelerate the learning process even from relatively small amounts of demonstration data and is able to automatically assess the necessary ratio of demonstration data while learning thanks to a prioritized replay mechanism. DQfD works by combining temporal difference updates with supervised classification of the demonstrator’s actions. We show that DQfD has better initial performance than Prioritized Dueling Double Deep Q-Networks (PDD DQN) as it starts with better scores on the first million steps on 41 of 42 games and on average it takes PDD DQN 83 million steps to catch up to DQfD’s performance. DQfD learns to out-perform the best demonstration given in 14 of 42 games. In addition, DQfD leverages human demonstrations to achieve state-of-the-art results for 11 games. Finally, we show that DQfD performs better than three related algorithms for incorporating demonstration data into DQN.

...read moreread less

351 citations

Posted Content•

Safe Exploration in Continuous Action Spaces

[...]

Gal Dalal, Krishnamurthy Dvijotham, Matej Vecerík, Todd Hester, Cosmin Paduraru, Yuval Tassa - Show less +2 more

26 Jan 2018-arXiv: Artificial Intelligence

TL;DR: This work addresses the problem of deploying a reinforcement learning agent on a physical system such as a datacenter cooling unit or robot, where critical constraints must never be violated, and directly adds to the policy a safety layer that analytically solves an action correction formulation per each state.

...read moreread less

Abstract: We address the problem of deploying a reinforcement learning (RL) agent on a physical system such as a datacenter cooling unit or robot, where critical constraints must never be violated. We show how to exploit the typically smooth dynamics of these systems and enable RL algorithms to never violate constraints during learning. Our technique is to directly add to the policy a safety layer that analytically solves an action correction formulation per each state. The novelty of obtaining an elegant closed-form solution is attained due to a linearized model, learned on past trajectories consisting of arbitrary actions. This is to mimic the real-world circumstances where data logs were generated with a behavior policy that is implausible to describe mathematically; such cases render the known safety-aware off-policy methods inapplicable. We demonstrate the efficacy of our approach on new representative physics-based environments, and prevail where reward shaping fails by maintaining zero constraint violations.

...read moreread less

266 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13

Collapse

Cited by

PDF

Open Access

More filters

On robust estimation of the location parameter

[...]

Frederick R. Forst

01 Jan 1980

3,652 citations

Journal Article•DOI•

The Pharmacological Basis of Therapeutics

[...]

J. H. Gaddum

01 Dec 1941-Nature

TL;DR: The Pharmacological Basis of Therapeutics, by Prof. Louis Goodman and Prof. Alfred Gilman, New York: The Macmillan Company, 1941, p.

...read moreread less

Abstract: The Pharmacological Basis of Therapeutics A Textbook of Pharmacology, Toxicology and Therapeutics for Physicians and Medical Students. By Prof. Louis Goodman and Prof. Alfred Gilman. Pp. xiii + 1383. (New York: The Macmillan Company, 1941.) 50s. net.

...read moreread less

2,686 citations

Journal Article•DOI•

Reinforcement learning in robotics: A survey

[...]

Jens Kober¹, J. Andrew Bagnell², Jan Peters³•Institutions (3)

Bielefeld University¹, Carnegie Mellon University², Max Planck Society³

01 Sep 2013-The International Journal of Robotics Research

TL;DR: This article attempts to strengthen the links between the two research communities by providing a survey of work in reinforcement learning for behavior generation in robots by highlighting both key challenges in robot reinforcement learning as well as notable successes.

...read moreread less

Abstract: Reinforcement learning offers to robotics a framework and set of tools for the design of sophisticated and hard-to-engineer behaviors. Conversely, the challenges of robotic problems provide both inspiration, impact, and validation for developments in reinforcement learning. The relationship between disciplines has sufficient promise to be likened to that between physics and mathematics. In this article, we attempt to strengthen the links between the two research communities by providing a survey of work in reinforcement learning for behavior generation in robots. We highlight both key challenges in robot reinforcement learning as well as notable successes. We discuss how contributions tamed the complexity of the domain and study the role of algorithms, representations, and prior knowledge in achieving these successes. As a result, a particular focus of our paper lies on the choice between model-based and model-free as well as between value-function-based and policy-search methods. By analyzing a simple problem in some detail we demonstrate how reinforcement learning approaches may be profitably applied, and we note throughout open questions and the tremendous potential for future research.

...read moreread less

2,391 citations

Journal Article•DOI•

A review of wearable sensors and systems with application in rehabilitation.

[...]

Shyamal Patel¹, Shyamal Patel², Hyung-Kyu Park³, Paolo Bonato⁴, Paolo Bonato², Leighton Chan³, Mary M. Rodgers⁵, Mary M. Rodgers³ - Show less +4 more•Institutions (5)

Northeastern University¹, Spaulding Rehabilitation Hospital², National Institutes of Health³, Massachusetts Institute of Technology⁴, University of Maryland, Baltimore⁵

20 Apr 2012-Journal of Neuroengineering and Rehabilitation

TL;DR: In this paper, a review of wearable sensors and systems that are relevant to the field of rehabilitation is presented, focusing on health and wellness, safety, home rehabilitation, assessment of treatment efficacy, and early detection of disorders.

...read moreread less

Abstract: The aim of this review paper is to summarize recent developments in the field of wearable sensors and systems that are relevant to the field of rehabilitation. The growing body of work focused on the application of wearable technology to monitor older adults and subjects with chronic conditions in the home and community settings justifies the emphasis of this review paper on summarizing clinical applications of wearable technology currently undergoing assessment rather than describing the development of new wearable sensors and systems. A short description of key enabling technologies (i.e. sensor technology, communication technology, and data analysis techniques) that have allowed researchers to implement wearable systems is followed by a detailed description of major areas of application of wearable technology. Applications described in this review paper include those that focus on health and wellness, safety, home rehabilitation, assessment of treatment efficacy, and early detection of disorders. The integration of wearable and ambient sensors is discussed in the context of achieving home monitoring of older adults and subjects with chronic conditions. Future work required to advance the field toward clinical deployment of wearable sensors and systems is discussed.

...read moreread less

1,826 citations

Journal Article•DOI•

Deep Reinforcement Learning: A Brief Survey

[...]

Kai Arulkumaran¹, Marc Peter Deisenroth¹, Miles Brundage², Anil A. Bharath¹•Institutions (2)

Imperial College London¹, University of Oxford²

09 Nov 2017-IEEE Signal Processing Magazine

TL;DR: Deep reinforcement learning (DRL) is poised to revolutionize the field of artificial intelligence (AI) and represents a step toward building autonomous systems with a higher-level understanding of the visual world as discussed by the authors.

...read moreread less

Abstract: Deep reinforcement learning (DRL) is poised to revolutionize the field of artificial intelligence (AI) and represents a step toward building autonomous systems with a higherlevel understanding of the visual world. Currently, deep learning is enabling reinforcement learning (RL) to scale to problems that were previously intractable, such as learning to play video games directly from pixels. DRL algorithms are also applied to robotics, allowing control policies for robots to be learned directly from camera inputs in the real world. In this survey, we begin with an introduction to the general field of RL, then progress to the main streams of value-based and policy-based methods. Our survey will cover central algorithms in deep RL, including the deep Q-network (DQN), trust region policy optimization (TRPO), and asynchronous advantage actor critic. In parallel, we highlight the unique advantages of deep neural networks, focusing on visual understanding via RL. To conclude, we describe several current areas of research within the field.

...read moreread less

1,743 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse