Information theoretic MPC for model-based reinforcement learning

doi:10.1109/ICRA.2017.7989202

Home
/
Papers
/
Information theoretic MPC for model-based reinforcement learning

Proceedings Article•DOI•

Information theoretic MPC for model-based reinforcement learning

Grady Williams¹, Nolan Wagener¹, Brian Goldfain¹, Paul Drews¹, James M. Rehg¹, Byron Boots¹, Evangelos A. Theodorou¹ - Show less +3 more•Institutions (1)

Georgia Institute of Technology¹

01 May 2017-pp 1714-1721

TL;DR: An information theoretic model predictive control algorithm capable of handling complex cost criteria and general nonlinear dynamics and using multi-layer neural networks as dynamics models to solve model-based reinforcement learning tasks is introduced.

read less

Abstract: We introduce an information theoretic model predictive control (MPC) algorithm capable of handling complex cost criteria and general nonlinear dynamics. The generality of the approach makes it possible to use multi-layer neural networks as dynamics models, which we incorporate into our MPC algorithm in order to solve model-based reinforcement learning tasks. We test the algorithm in simulation on a cart-pole swing up and quadrotor navigation task, as well as on actual hardware in an aggressive driving task. Empirical results demonstrate that the algorithm is capable of achieving a high level of performance and does so only utilizing data collected from the system.

...read moreread less

Citations

PDF

Open Access

More filters

Posted Content•

Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models

[...]

Kurtland Chua¹, Roberto Calandra², Rowan McAllister², Sergey Levine²•Institutions (2)

Princeton University¹, University of California, Berkeley²

30 May 2018-arXiv: Learning

TL;DR: This article proposed a probabilistic ensembles with trajectory sampling (PETS) algorithm, which combines uncertainty-aware deep network dynamics models with sampling-based uncertainty propagation to match the asymptotic performance of model-based and model-free deep RL algorithms.

...read moreread less

Abstract: Model-based reinforcement learning (RL) algorithms can attain excellent sample efficiency, but often lag behind the best model-free algorithms in terms of asymptotic performance. This is especially true with high-capacity parametric function approximators, such as deep networks. In this paper, we study how to bridge this gap, by employing uncertainty-aware dynamics models. We propose a new algorithm called probabilistic ensembles with trajectory sampling (PETS) that combines uncertainty-aware deep network dynamics models with sampling-based uncertainty propagation. Our comparison to state-of-the-art model-based and model-free deep RL algorithms shows that our approach matches the asymptotic performance of model-free algorithms on several challenging benchmark tasks, while requiring significantly fewer samples (e.g., 8 and 125 times fewer samples than Soft Actor Critic and Proximal Policy Optimization respectively on the half-cheetah task).

...read moreread less

391 citations

Proceedings Article•DOI•

Learning to Drive in a Day

[...]

Alex Kendall, Jeffrey Hawke, David Janz, Przemysław Mazur, Daniele Reda, John-Mark Allen, Vinh-Dieu Lam, Alex Bewley, Amar Shah - Show less +5 more

20 May 2019

TL;DR: In this paper, the authors demonstrate the first application of deep reinforcement learning to autonomous driving using a single monocular image as input, and provide a general and easy to obtain reward: the distance travelled by the vehicle without the safety driver taking control.

...read moreread less

Abstract: We demonstrate the first application of deep reinforcement learning to autonomous driving. From randomly initialised parameters, our model is able to learn a policy for lane following in a handful of training episodes using a single monocular image as input. We provide a general and easy to obtain reward: the distance travelled by the vehicle without the safety driver taking control. We use a continuous, model-free deep reinforcement learning algorithm, with all exploration and optimisation performed on-vehicle. This demonstrates a new framework for autonomous driving which moves away from reliance on defined logical rules, mapping, and direct supervision. We discuss the challenges and opportunities to scale this approach to a broader range of autonomous driving tasks.

...read moreread less

318 citations

Posted Content•

Learning to Drive in a Day

[...]

Alex Kendall, Jeffrey Hawke, David Janz, Przemysław Mazur, Daniele Reda, John-Mark Allen, Vinh-Dieu Lam, Alex Bewley, Amar Shah - Show less +5 more

01 Jul 2018-arXiv: Learning

TL;DR: This work demonstrates a new framework for autonomous driving which moves away from reliance on defined logical rules, mapping, and direct supervision and provides a general and easy to obtain reward: the distance travelled by the vehicle without the safety driver taking control.

...read moreread less

291 citations

Cites methods from "Information theoretic MPC for model..."

...In autonomous driving, deep learning has been used to learn dynamics models for model-based reinforcement learning using off-line data [24]....
[...]

Posted Content•

MOReL : Model-Based Offline Reinforcement Learning

[...]

Rahul Kidambi¹, Aravind Rajeswaran², Praneeth Netrapalli³, Thorsten Joachims¹•Institutions (3)

Cornell University¹, University of Washington², Microsoft³

12 May 2020-arXiv: Learning

TL;DR: Theoretically, it is shown that MOReL is minimax optimal (up to log factors) for offline RL, and through experiments, it matches or exceeds state-of-the-art results in widely studied offline RL benchmarks.

...read moreread less

Abstract: In offline reinforcement learning (RL), the goal is to learn a highly rewarding policy based solely on a dataset of historical interactions with the environment. The ability to train RL policies offline can greatly expand the applicability of RL, its data efficiency, and its experimental velocity. Prior work in offline RL has been confined almost exclusively to model-free RL approaches. In this work, we present MOReL, an algorithmic framework for model-based offline RL. This framework consists of two steps: (a) learning a pessimistic MDP (P-MDP) using the offline dataset; and (b) learning a near-optimal policy in this P-MDP. The learned P-MDP has the property that for any policy, the performance in the real environment is approximately lower-bounded by the performance in the P-MDP. This enables it to serve as a good surrogate for purposes of policy evaluation and learning, and overcome common pitfalls of model-based RL like model exploitation. Theoretically, we show that MOReL is minimax optimal (up to log factors) for offline RL. Through experiments, we show that MOReL matches or exceeds state-of-the-art results in widely studied offline RL benchmarks. Moreover, the modular design of MOReL enables future advances in its components (e.g. generative modeling, uncertainty estimation, planning etc.) to directly translate into advances for offline RL.

...read moreread less

250 citations

Cites methods from "Information theoretic MPC for model..."

...A number of algorithms based on MPC [23, 64], search-based planning [65, 25], dynamic programming [49, 26], or policy optimization [27, 51, 66, 67] can be used to approximately realize this....
[...]
...Furthermore, MBRL can draw upon the rich literature on model-based planning including model predictive control (MPC) [23, 24, 64, 72], search based planning [25, 65], dynamic programming [26, 81], and policy optimization [82, 76, 66, 27, 51]....
[...]

Posted Content•

Deep Dynamics Models for Learning Dexterous Manipulation

[...]

Anusha Nagabandi, Kurt Konoglie, Sergey Levine, Vikash Kumar

25 Sep 2019-arXiv: Robotics

TL;DR: It is shown that improvements in learned dynamics models, together with improvements in online model-predictive control, can indeed enable efficient and effective learning of flexible contact-rich dexterous manipulation skills -- and that too, on a 24-DoF anthropomorphic hand in the real world, using just 4 hours of purely real-world data to learn to simultaneously coordinate multiple free-floating objects.

...read moreread less

Abstract: Dexterous multi-fingered hands can provide robots with the ability to flexibly perform a wide range of manipulation skills. However, many of the more complex behaviors are also notoriously difficult to control: Performing in-hand object manipulation, executing finger gaits to move objects, and exhibiting precise fine motor skills such as writing, all require finely balancing contact forces, breaking and reestablishing contacts repeatedly, and maintaining control of unactuated objects. Learning-based techniques provide the appealing possibility of acquiring these skills directly from data, but current learning approaches either require large amounts of data and produce task-specific policies, or they have not yet been shown to scale up to more complex and realistic tasks requiring fine motor skills. In this work, we demonstrate that our method of online planning with deep dynamics models (PDDM) addresses both of these limitations; we show that improvements in learned dynamics models, together with improvements in online model-predictive control, can indeed enable efficient and effective learning of flexible contact-rich dexterous manipulation skills -- and that too, on a 24-DoF anthropomorphic hand in the real world, using just 4 hours of purely real-world data to learn to simultaneously coordinate multiple free-floating objects. Videos can be found at this https URL

...read moreread less

201 citations

Cites background from "Information theoretic MPC for model..."

...Other work has focused on learning these models using high-capacity function approximators [17, 18, 3, 19, 20, 21, 22] and probabilistic dynamics models [23, 24, 25, 26]....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

A survey of industrial model predictive control technology

[...]

S. Joe Qin¹, Thomas A. Badgwell²•Institutions (2)

University of Texas at Austin¹, Aspen Technology²

01 Jul 2003-Control Engineering Practice

TL;DR: An overview of commercially available model predictive control (MPC) technology, both linear and nonlinear, based primarily on data provided by MPC vendors, is provided in this article, where a brief history of industrial MPC technology is presented first, followed by results of our vendor survey of MPC control and identification technology.

...read moreread less

4,819 citations

"Information theoretic MPC for model..." refers background in this paper

...The theory of model predictive control for linear systems is well understood and has many successful applications in the process industry [15]....
[...]

Journal Article•DOI•

Reinforcement learning in robotics: A survey

[...]

Jens Kober¹, J. Andrew Bagnell², Jan Peters³•Institutions (3)

Bielefeld University¹, Carnegie Mellon University², Max Planck Society³

01 Sep 2013-The International Journal of Robotics Research

TL;DR: This article attempts to strengthen the links between the two research communities by providing a survey of work in reinforcement learning for behavior generation in robots by highlighting both key challenges in robot reinforcement learning as well as notable successes.

...read moreread less

Abstract: Reinforcement learning offers to robotics a framework and set of tools for the design of sophisticated and hard-to-engineer behaviors. Conversely, the challenges of robotic problems provide both inspiration, impact, and validation for developments in reinforcement learning. The relationship between disciplines has sufficient promise to be likened to that between physics and mathematics. In this article, we attempt to strengthen the links between the two research communities by providing a survey of work in reinforcement learning for behavior generation in robots. We highlight both key challenges in robot reinforcement learning as well as notable successes. We discuss how contributions tamed the complexity of the domain and study the role of algorithms, representations, and prior knowledge in achieving these successes. As a result, a particular focus of our paper lies on the choice between model-based and model-free as well as between value-function-based and policy-search methods. By analyzing a simple problem in some detail we demonstrate how reinforcement learning approaches may be profitably applied, and we note throughout open questions and the tremendous potential for future research.

...read moreread less

2,391 citations

"Information theoretic MPC for model..." refers background in this paper

...The types of reinforcement learning problems encountered in robotic tasks are frequently in the continuous state-action space and high dimensional [1]....
[...]

Book•

Model Predictive Control

[...]

David Q. Mayne¹•Institutions (1)

Imperial College London¹

01 Dec 2004

TL;DR: This paper recalls a few past achievements in Model Predictive Control, gives an overview of some current developments and suggests a few avenues for future research.

...read moreread less

Abstract: This paper recalls a few past achievements in Model Predictive Control, gives an overview of some current developments and suggests a few avenues for future research.

...read moreread less

1,897 citations

"Information theoretic MPC for model..." refers background in this paper

...For nonlinear systems, MPC is an increasingly active area of research in control theory [16]....
[...]

Reinforcement Learning in Robotics: A Survey.

[...]

Jens Kober, Jan Peters

01 Jan 2012

TL;DR: A survey of work in reinforcement learning for behavior generation in robots can be found in this article, where the authors highlight key challenges in robot reinforcement learning as well as notable successes and discuss the role of algorithms, representations and prior knowledge in achieving these successes.

...read moreread less

1,513 citations

Journal Article•DOI•

The GRASP Multiple Micro-UAV Testbed

[...]

Nathan Michael¹, Daniel Mellinger², Quentin Lindsey¹, Vijay Kumar³•Institutions (3)

University of Pennsylvania¹, North Carolina State University², Ohio State University³

09 Sep 2010-IEEE Robotics & Automation Magazine

TL;DR: In the last five years, advances in materials, electronics, sensors, and batteries havefueled a growth in the development of microunmanned aerial vehicles (MAVs) that are between 0.1 and 0.5 m in length and0.1-0.5 kg in mass.

...read moreread less

Abstract: In the last five years, advances in materials, electronics, sensors, and batteries have fueled a growth in the development of microunmanned aerial vehicles (MAVs) that are between 0.1 and 0.5 m in length and 0.1-0.5 kg in mass [1]. A few groups have built and analyzed MAVs in the 10-cm range [2], [3]. One of the smallest MAV is the Picoftyer with a 60-mmpropellor diameter and a mass of 3.3 g [4]. Platforms in the 50-cm range are more prevalent with several groups having built and flown systems of this size [5]-[7]. In fact, there are severalcommercially available radiocontrolled (PvC) helicopters and research-grade helicopters in this size range [8].

...read moreread less

806 citations