Safe Model-based Reinforcement Learning with Stability Guarantees

Home
/
Papers
/
Safe Model-based Reinforcement Learning with Stability Guarantees

Proceedings Article•

Safe Model-based Reinforcement Learning with Stability Guarantees

Felix Berkenkamp¹, Matteo Turchetta², Angela P. Schoellig³, Andreas Krause¹•Institutions (3)

ETH Zurich¹, Institute of Robotics and Intelligent Systems², University of Toronto³

01 Jan 2017-Vol. 30, pp 908-918

TL;DR: In this paper, the authors present a learning algorithm that explicitly considers safety, defined in terms of stability guarantees, and show how to use statistical models of the dynamics to obtain high-performance control policies with provable stability certificates.

read less

Abstract: Reinforcement learning is a powerful paradigm for learning optimal policies from experimental data. However, to find optimal policies, most reinforcement learning algorithms explore all possible actions, which may be harmful for real-world systems. As a consequence, learning algorithms are rarely applied on safety-critical systems in the real world. In this paper, we present a learning algorithm that explicitly considers safety, defined in terms of stability guarantees. Specifically, we extend control-theoretic results on Lyapunov stability verification and show how to use statistical models of the dynamics to obtain high-performance control policies with provable stability certificates. Moreover, under additional regularity assumptions in terms of a Gaussian process prior, we prove that one can effectively and safely collect data in order to learn about the dynamics and thus both improve control performance and expand the safe region of the state space. In our experiments, we show how the resulting algorithm can safely optimize a neural network policy on a simulated inverted pendulum, without the pendulum ever falling down.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Posted Content•

Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems

[...]

Sergey Levine, Aviral Kumar, George Tucker, Justin Fu

04 May 2020-arXiv: Learning

TL;DR: This tutorial article aims to provide the reader with the conceptual tools needed to get started on research on offline reinforcement learning algorithms: reinforcementlearning algorithms that utilize previously collected data, without additional online data collection.

...read moreread less

Abstract: In this tutorial article, we aim to provide the reader with the conceptual tools needed to get started on research on offline reinforcement learning algorithms: reinforcement learning algorithms that utilize previously collected data, without additional online data collection. Offline reinforcement learning algorithms hold tremendous promise for making it possible to turn large datasets into powerful decision making engines. Effective offline reinforcement learning methods would be able to extract policies with the maximum possible utility out of the available data, thereby allowing automation of a wide range of decision-making domains, from healthcare and education to robotics. However, the limitations of current algorithms make this difficult. We will aim to provide the reader with an understanding of these challenges, particularly in the context of modern deep reinforcement learning methods, and describe some potential solutions that have been explored in recent work to mitigate these challenges, along with recent applications, and a discussion of perspectives on open problems in the field.

...read moreread less

950 citations

Posted Content•

Deep Reinforcement Learning: An Overview

[...]

Yuxi Li

25 Jan 2017-arXiv: Learning

TL;DR: This work discusses core RL elements, including value function, in particular, Deep Q-Network (DQN), policy, reward, model, planning, and exploration, and important mechanisms for RL, including attention and memory, unsupervised learning, transfer learning, multi-agent RL, hierarchical RL, and learning to learn.

...read moreread less

Abstract: We give an overview of recent exciting achievements of deep reinforcement learning (RL). We discuss six core elements, six important mechanisms, and twelve applications. We start with background of machine learning, deep learning and reinforcement learning. Next we discuss core RL elements, including value function, in particular, Deep Q-Network (DQN), policy, reward, model, planning, and exploration. After that, we discuss important mechanisms for RL, including attention and memory, unsupervised learning, transfer learning, multi-agent RL, hierarchical RL, and learning to learn. Then we discuss various applications of RL, including games, in particular, AlphaGo, robotics, natural language processing, including dialogue systems, machine translation, and text generation, computer vision, neural architecture design, business management, finance, healthcare, Industry 4.0, smart grid, intelligent transportation systems, and computer systems. We mention topics not reviewed yet, and list a collection of RL resources. After presenting a brief summary, we close with discussions. Please see Deep Reinforcement Learning, arXiv:1810.06339, for a significant update.

...read moreread less

935 citations

Proceedings Article•DOI•

Data-Enabled Predictive Control: In the Shallows of the DeePC

[...]

Jeremy Coulson¹, John Lygeros¹, Florian Dörfler¹•Institutions (1)

ETH Zurich¹

25 Jun 2019

TL;DR: In this paper, a data-enabled predictive control (DeePC) algorithm is presented that computes optimal and safe control policies using real-time feedback driving the unknown system along a desired trajectory while satisfying system constraints.

...read moreread less

Abstract: We consider the problem of optimal trajectory tracking for unknown systems. A novel data-enabled predictive control (DeePC) algorithm is presented that computes optimal and safe control policies using real-time feedback driving the unknown system along a desired trajectory while satisfying system constraints. Using a finite number of data samples from the unknown system, our proposed algorithm uses a behavioural systems theory approach to learn a non-parametric system model used to predict future trajectories. The DeePC algorithm is shown to be equivalent to the classical and widely adopted Model Predictive Control (MPC) algorithm in the case of deterministic linear time-invariant systems. In the case of nonlinear stochastic systems, we propose regularizations to the DeePC algorithm. Simulations are provided to illustrate performance and compare the algorithm with other methods.

...read moreread less

411 citations

Journal Article•DOI•

Data-Driven Model Predictive Control With Stability and Robustness Guarantees

[...]

Julian Berberich¹, Johannes Köhler¹, Matthias A. Müller², Frank Allgöwer¹•Institutions (2)

University of Stuttgart¹, Leibniz University of Hanover²

18 Apr 2021-IEEE Transactions on Automatic Control

TL;DR: The presented results provide the first (theoretical) analysis of closed-loop properties, resulting from a simple, purely data-driven MPC scheme, including a slack variable with regularization in the cost.

...read moreread less

Abstract: We propose a robust data-driven model predictive control (MPC) scheme to control linear time-invariant systems. The scheme uses an implicit model description based on behavioral systems theory and past measured trajectories. In particular, it does not require any prior identification step, but only an initially measured input–output trajectory as well as an upper bound on the order of the unknown system. First, we prove exponential stability of a nominal data-driven MPC scheme with terminal equality constraints in the case of no measurement noise. For bounded additive output measurement noise, we propose a robust modification of the scheme, including a slack variable with regularization in the cost. We prove that the application of this robust MPC scheme in a multistep fashion leads to practical exponential stability of the closed loop w.r.t. the noise level. The presented results provide the first (theoretical) analysis of closed-loop properties, resulting from a simple, purely data-driven MPC scheme.

...read moreread less

381 citations

Journal Article•DOI•

Explaining Deep Neural Networks and Beyond: A Review of Methods and Applications

[...]

Wojciech Samek¹, Grégoire Montavon, Sebastian Lapuschkin¹, Christopher J. Anders, Klaus-Robert Müller - Show less +1 more•Institutions (1)

Heinrich Hertz Institute¹

04 Mar 2021

TL;DR: In this paper, the authors provide a timely overview of post hoc explanations and explain its theoretical foundations, and put interpretability algorithms to a test both from a theory and comparative evaluation perspective using extensive simulations, and demonstrate successful usage of XAI in a representative selection of application scenarios.

...read moreread less

Abstract: With the broader and highly successful usage of machine learning (ML) in industry and the sciences, there has been a growing demand for explainable artificial intelligence (XAI). Interpretability and explanation methods for gaining a better understanding of the problem-solving abilities and strategies of nonlinear ML, in particular, deep neural networks, are, therefore, receiving increased attention. In this work, we aim to: 1) provide a timely overview of this active emerging field, with a focus on “ post hoc ” explanations, and explain its theoretical foundations; 2) put interpretability algorithms to a test both from a theory and comparative evaluation perspective using extensive simulations; 3) outline best practice aspects, i.e., how to best include interpretation methods into the standard usage of ML; and 4) demonstrate successful usage of XAI in a representative selection of application scenarios. Finally, we discuss challenges and possible future directions of this exciting foundational field of ML.

...read moreread less

321 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91

Collapse

References

PDF

Open Access

More filters

Book•

Reinforcement Learning: An Introduction

[...]

Richard S. Sutton¹, Andrew G. Barto•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 1988

TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.

...read moreread less

Abstract: Reinforcement learning, one of the most active research areas in artificial intelligence, is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives when interacting with a complex, uncertain environment. In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning. Their discussion ranges from the history of the field's intellectual foundations to the most recent developments and applications. The only necessary mathematical background is familiarity with elementary concepts of probability. The book is divided into three parts. Part I defines the reinforcement learning problem in terms of Markov decision processes. Part II provides basic solution methods: dynamic programming, Monte Carlo methods, and temporal-difference learning. Part III presents a unified view of the solution methods and incorporates artificial neural networks, eligibility traces, and planning; the two final chapters present case studies and consider the future of reinforcement learning.

...read moreread less

37,989 citations

Journal Article•DOI•

Human-level control through deep reinforcement learning

[...]

Volodymyr Mnih¹, Koray Kavukcuoglu¹, David Silver¹, Andrei Rusu¹, Joel Veness¹, Marc G. Bellemare¹, Alex Graves¹, Martin Riedmiller¹, Andreas K. Fidjeland¹, Georg Ostrovski¹, Stig Petersen¹, Charles Beattie¹, Amir Sadik¹, Ioannis Antonoglou¹, Helen King¹, Dharshan Kumaran¹, Daan Wierstra¹, Shane Legg¹, Demis Hassabis¹ - Show less +15 more•Institutions (1)

Google¹

26 Feb 2015-Nature

TL;DR: This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

...read moreread less

Abstract: The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

...read moreread less

23,074 citations

Posted Content•

TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

[...]

Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Rajat Monga, Sherry Moore, Derek G. Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul A. Tucker, Vincent Vanhoucke, Vijay K. Vasudevan, Fernanda B. Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, Xiaoqiang Zheng - Show less +36 more

01 Jan 2015-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: The TensorFlow interface and an implementation of that interface that is built at Google are described, which has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields.

...read moreread less

Abstract: TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational devices such as GPU cards. The system is flexible and can be used to express a wide variety of algorithms, including training and inference algorithms for deep neural network models, and it has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields, including speech recognition, computer vision, robotics, information retrieval, natural language processing, geographic information extraction, and computational drug discovery. This paper describes the TensorFlow interface and an implementation of that interface that we have built at Google. The TensorFlow API and a reference implementation were released as an open-source package under the Apache 2.0 license in November, 2015 and are available at www.tensorflow.org.

...read moreread less

10,447 citations

Book•

Support Vector Machines

[...]

Ingo Steinwart, Andreas Christmann

12 Aug 2008

TL;DR: This book explains the principles that make support vector machines (SVMs) a successful modelling and prediction tool for a variety of applications and provides a unique in-depth treatment of both fundamental and recent material on SVMs that so far has been scattered in the literature.

...read moreread less

Abstract: This book explains the principles that make support vector machines (SVMs) a successful modelling and prediction tool for a variety of applications. The authors present the basic ideas of SVMs together with the latest developments and current research questions in a unified style. They identify three reasons for the success of SVMs: their ability to learn well with only a very small number of free parameters, their robustness against several types of model violations and outliers, and their computational efficiency compared to several other methods. Since their appearance in the early nineties, support vector machines and related kernel-based methods have been successfully applied in diverse fields of application such as bioinformatics, fraud detection, construction of insurance tariffs, direct marketing, and data and text mining. As a consequence, SVMs now play an important role in statistical machine learning and are used not only by statisticians, mathematicians, and computer scientists, but also by engineers and data analysts. The book provides a unique in-depth treatment of both fundamental and recent material on SVMs that so far has been scattered in the literature. The book can thus serve as both a basis for graduate courses and an introduction for statisticians, mathematicians, and computer scientists. It further provides a valuable reference for researchers working in the field. The book covers all important topics concerning support vector machines such as: loss functions and their role in the learning process; reproducing kernel Hilbert spaces and their properties; a thorough statistical analysis that uses both traditional uniform bounds and more advanced localized techniques based on Rademacher averages and Talagrand's inequality; a detailed treatment of classification and regression; a detailed robustness analysis; and a description of some of the most recent implementation techniques. To make the book self-contained, an extensive appendix is added which provides the reader with the necessary background from statistics, probability theory, functional analysis, convex analysis, and topology.

...read moreread less

4,664 citations

Proceedings Article•

Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design

[...]

Niranjan Srinivas¹, Andreas Krause¹, Matthias Seeger², Sham M. Kakade³•Institutions (3)

California Institute of Technology¹, Saarland University², University of Pennsylvania³

21 Jun 2010

TL;DR: This work analyzes GP-UCB, an intuitive upper-confidence based algorithm, and bound its cumulative regret in terms of maximal information gain, establishing a novel connection between GP optimization and experimental design and obtaining explicit sublinear regret bounds for many commonly used covariance functions.

...read moreread less

Abstract: Many applications require optimizing an unknown, noisy function that is expensive to evaluate. We formalize this task as a multi-armed bandit problem, where the payoff function is either sampled from a Gaussian process (GP) or has low RKHS norm. We resolve the important open problem of deriving regret bounds for this setting, which imply novel convergence rates for GP optimization. We analyze GP-UCB, an intuitive upper-confidence based algorithm, and bound its cumulative regret in terms of maximal information gain, establishing a novel connection between GP optimization and experimental design. Moreover, by bounding the latter in terms of operator spectra, we obtain explicit sublinear regret bounds for many commonly used covariance functions. In some important cases, our bounds have surprisingly weak dependence on the dimensionality. In our experiments on real sensor data, GP-UCB compares favorably with other heuristical GP optimization approaches.

...read moreread less

1,876 citations