scispace - formally typeset
Search or ask a question
Posted Content

Performance and safety of Bayesian model predictive control: Scalable model-based RL with guarantees

05 Jun 2020-
TL;DR: This work proposes a cautious model-based reinforcement learning algorithm that can be efficiently implemented in the form of a standard MPC controller and bound the expected number of unsafe learning episodes using an exact penalty soft-constrained MPC formulation.
Abstract: Despite the success of reinforcement learning (RL) in various research fields, relatively few algorithms have been applied to industrial control applications. The reason for this unexplored potential is partly related to the significant required tuning effort, large numbers of required learning episodes, i.e. experiments, and the limited availability of RL methods that can address high dimensional and safety-critical dynamical systems with continuous state and action spaces. By building on model predictive control (MPC) concepts, we propose a cautious model-based reinforcement learning algorithm to mitigate these limitations. While the underlying policy of the approach can be efficiently implemented in the form of a standard MPC controller, data-efficient learning is achieved through posterior sampling techniques. We provide a rigorous performance analysis of the resulting `Bayesian MPC' algorithm by establishing Lipschitz continuity of the corresponding future reward function and bound the expected number of unsafe learning episodes using an exact penalty soft-constrained MPC formulation. The efficiency and scalability of the method are illustrated using a 100-dimensional server cooling example and a nonlinear 10-dimensional drone example by comparing the performance against nominal posterior MPC, which is commonly used for data-driven control of constrained dynamical systems.
Citations
More filters
01 Jan 2016

171 citations

Journal ArticleDOI
TL;DR: This work demonstrates the possibility of applying RL to solve practical control problems by developing several critical strategies for practical implementation of RL, and a multivariable, multi-modal, hybrid three-tank (HTT) physical process is utilized to illustrate the proposed strategies.

17 citations

Posted Content
TL;DR: In this paper, the authors conducted a systematic literature review (SLR) of research papers published between 2015 to 2020, covering topics related to the certification of ML systems, and identified 217 papers covering topics considered to be the main pillars of ML certification: Robustness, Uncertainty, Explainability, Verification, Safe Reinforcement Learning, and Direct Certification.
Abstract: Context: Machine Learning (ML) has been at the heart of many innovations over the past years. However, including it in so-called 'safety-critical' systems such as automotive or aeronautic has proven to be very challenging, since the shift in paradigm that ML brings completely changes traditional certification approaches. Objective: This paper aims to elucidate challenges related to the certification of ML-based safety-critical systems, as well as the solutions that are proposed in the literature to tackle them, answering the question 'How to Certify Machine Learning Based Safety-critical Systems?'. Method: We conduct a Systematic Literature Review (SLR) of research papers published between 2015 to 2020, covering topics related to the certification of ML systems. In total, we identified 217 papers covering topics considered to be the main pillars of ML certification: Robustness, Uncertainty, Explainability, Verification, Safe Reinforcement Learning, and Direct Certification. We analyzed the main trends and problems of each sub-field and provided summaries of the papers extracted. Results: The SLR results highlighted the enthusiasm of the community for this subject, as well as the lack of diversity in terms of datasets and type of models. It also emphasized the need to further develop connections between academia and industries to deepen the domain study. Finally, it also illustrated the necessity to build connections between the above mention main pillars that are for now mainly studied separately. Conclusion: We highlighted current efforts deployed to enable the certification of ML based software systems, and discuss some future research directions.

12 citations

Posted Content
23 Nov 2020
TL;DR: This work proposes Kernel Predictive Control (KPC), a learning-based predictive control strategy that enjoys deterministic guarantees of safety and presents a relaxation strategy that exploits on-line data to weaken the optimization problem constraints while preserving safety.
Abstract: We propose Kernel Predictive Control (KPC), a learning-based predictive control strategy that enjoys deterministic guarantees of safety. Noise-corrupted samples of the unknown system dynamics are used to learn several models through the formalism of non-parametric kernel regression. By treating each prediction step individually, we dispense with the need of propagating sets through highly non-linear maps, a procedure that often involves multiple conservative approximation steps. Finite-sample error bounds are then used to enforce state-feasibility by employing an efficient robust formulation. We then present a relaxation strategy that exploits on-line data to weaken the optimization problem constraints while preserving safety. Two numerical examples are provided to illustrate the applicability of the proposed control method.

11 citations


Cites background from "Performance and safety of Bayesian ..."

  • ...For decades, this has been a major concern when control systems incorporate forms of adaptation or learning (Anderson, 2005; Garcıa and Fernández, 2015; Hewing et al., 2020; Wabersich and Zeilinger, 2020)....

    [...]

  • ...For decades, this has been a major concern when control systems incorporate forms of adaptation or learning (Anderson, 2005; Garcıa and Fernández, 2015; Hewing et al., 2020; Wabersich and Zeilinger, 2020)....

    [...]

Journal ArticleDOI
01 Jan 2022
TL;DR: In this article, a multi-objective optimization problem is formulated based on rewards for both the search process and the terminal condition, and solutions are found through Bayesian learning model predictive control (BLMPC).
Abstract: Classical source seeking algorithms aim to make the robot reach the source location eventually. This letter proposes a process-aware source seeking approach which finds an informative trajectory to reach the source location. A multi-objective optimization problem is formulated based on rewards for both the search process and the terminal condition. Due to the unknown source location, solutions are found through Bayesian learning model predictive control (BLMPC). The consistency of the Bayesian estimator, as well as the convergence of the proposed algorithm are proved. The performance of the algorithm is evaluated through simulation results. The process-aware source seeking algorithm demonstrates improvements over other classical source seeking algorithms.

5 citations

References
More filters
Book
01 Jan 1988
TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
Abstract: Reinforcement learning, one of the most active research areas in artificial intelligence, is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives when interacting with a complex, uncertain environment. In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning. Their discussion ranges from the history of the field's intellectual foundations to the most recent developments and applications. The only necessary mathematical background is familiarity with elementary concepts of probability. The book is divided into three parts. Part I defines the reinforcement learning problem in terms of Markov decision processes. Part II provides basic solution methods: dynamic programming, Monte Carlo methods, and temporal-difference learning. Part III presents a unified view of the solution methods and incorporates artificial neural networks, eligibility traces, and planning; the two final chapters present case studies and consider the future of reinforcement learning.

37,989 citations


"Performance and safety of Bayesian ..." refers methods in this paper

  • ...By relying on approximate stochastic dynamic programming [16], these MPC-based techniques are closely related to reinforcement learning concepts, see e....

    [...]

Journal ArticleDOI
TL;DR: Chapter 11 includes more case studies in other areas, ranging from manufacturing to marketing research, and a detailed comparison with other diagnostic tools, such as logistic regression and tree-based methods.
Abstract: Chapter 11 includes more case studies in other areas, ranging from manufacturing to marketing research. Chapter 12 concludes the book with some commentary about the scientiŽ c contributions of MTS. The Taguchi method for design of experiment has generated considerable controversy in the statistical community over the past few decades. The MTS/MTGS method seems to lead another source of discussions on the methodology it advocates (Montgomery 2003). As pointed out by Woodall et al. (2003), the MTS/MTGS methods are considered ad hoc in the sense that they have not been developed using any underlying statistical theory. Because the “normal” and “abnormal” groups form the basis of the theory, some sampling restrictions are fundamental to the applications. First, it is essential that the “normal” sample be uniform, unbiased, and/or complete so that a reliable measurement scale is obtained. Second, the selection of “abnormal” samples is crucial to the success of dimensionality reduction when OAs are used. For example, if each abnormal item is really unique in the medical example, then it is unclear how the statistical distance MD can be guaranteed to give a consistent diagnosis measure of severity on a continuous scale when the larger-the-better type S/N ratio is used. Multivariate diagnosis is not new to Technometrics readers and is now becoming increasingly more popular in statistical analysis and data mining for knowledge discovery. As a promising alternative that assumes no underlying data model, The Mahalanobis–Taguchi Strategy does not provide sufŽ cient evidence of gains achieved by using the proposed method over existing tools. Readers may be very interested in a detailed comparison with other diagnostic tools, such as logistic regression and tree-based methods. Overall, although the idea of MTS/MTGS is intriguing, this book would be more valuable had it been written in a rigorous fashion as a technical reference. There is some lack of precision even in several mathematical notations. Perhaps a follow-up with additional theoretical justiŽ cation and careful case studies would answer some of the lingering questions.

11,507 citations


"Performance and safety of Bayesian ..." refers background in this paper

  • ...) [42] with given hyper-parameters Wi and unknown parameters C....

    [...]

Journal ArticleDOI
TL;DR: A comprehensive description of the primal-dual interior-point algorithm with a filter line-search method for nonlinear programming is provided, including the feasibility restoration phase for the filter method, second-order corrections, and inertia correction of the KKT matrix.
Abstract: We present a primal-dual interior-point algorithm with a filter line-search method for nonlinear programming. Local and global convergence properties of this method were analyzed in previous work. Here we provide a comprehensive description of the algorithm, including the feasibility restoration phase for the filter method, second-order corrections, and inertia correction of the KKT matrix. Heuristics are also considered that allow faster performance. This method has been implemented in the IPOPT code, which we demonstrate in a detailed numerical study based on 954 problems from the CUTEr test set. An evaluation is made of several line-search options, and a comparison is provided with two state-of-the-art interior-point codes for nonlinear programming.

7,966 citations


"Performance and safety of Bayesian ..." refers methods in this paper

  • ...All examples are implemented using the Casadi framework [39] together with the IPOPT solver [40]....

    [...]

Journal ArticleDOI
TL;DR: In this article, a theoretical basis for model predictive control (MPC) has started to emerge and many practical problems like control objective prioritization and symptom-aided diagnosis can be integrated into the MPC framework by expanding the problem formulation to include integer variables yielding a mixed-integer quadratic or linear program.

2,320 citations


"Performance and safety of Bayesian ..." refers background in this paper

  • ...Particularly in the case of general, complex, and safety-critical control problems, model predictive control (MPC) techniques [5, 6, 7] have shown significant impact on both, industrial and research-driven applications, see also Figure 1....

    [...]