Reachability-Based Trajectory Safeguard (RTS): A Safe and Fast Reinforcement Learning Safety Layer for Continuous Control

doi:10.1109/LRA.2021.3063989

Home
/
Papers
/
Reachability-Based Trajectory Safeguard (RTS): A Safe and Fast Reinforcement Learning Safety Layer for Continuous Control

Journal Article•DOI•

Reachability-Based Trajectory Safeguard (RTS): A Safe and Fast Reinforcement Learning Safety Layer for Continuous Control

Yifei Simon Shao¹, Chao Chen¹, Shreyas Kousik², Ram Vasudevan¹•Institutions (2)

University of Michigan¹, Stanford University²

04 Mar 2021-Vol. 6, Iss: 2, pp 3663-3670

TL;DR: In this paper, a Reachability-based Trajectory Safeguard (RTS) algorithm is proposed to ensure safety during training and operation of a robot in a safety critical environment.

read less

Abstract: Reinforcement Learning (RL) algorithms have achieved remarkable performance in decision making and control tasks by reasoning about long-term, cumulative reward using trial and error. However, during RL training, applying this trial-and-error approach to real-world robots operating in safety critical environment may lead to collisions. To address this challenge, this letter proposes a Reachability-based Trajectory Safeguard (RTS), which leverages reachability analysis to ensure safety during training and operation. Given a known (but uncertain) model of a robot, RTS precomputes a Forward Reachable Set of the robot tracking a continuum of parameterized trajectories. At runtime, the RL agent selects from this continuum in a receding-horizon way to control the robot; the FRS is used to identify if the agent's choice is safe or not, and to adjust unsafe choices. The efficacy of this method is illustrated in static environments on three nonlinear robot models, including a 12-D quadrotor drone, in simulation and in comparison with state-of-the-art safe motion planning methods.

...read moreread less

Citations

PDF

Open Access

More filters

Posted Content•

Unified Multi-Rate Control: from Low Level Actuation to High Level Planning.

[...]

Ugo Rosolia, Andrew Singletary, Aaron D. Ames¹•Institutions (1)

California Institute of Technology¹

11 Dec 2020

TL;DR: The proposed hierarchical multi-rate control architecture maximizes the probability of satisfying the high-level specifications while guaranteeing state and input constraint satisfaction and is tested in simulations and experiments on examples inspired by the Mars exploration mission.

...read moreread less

Abstract: In this paper we present a hierarchical multi-rate control architecture for nonlinear autonomous systems operating in partially observable environments. Control objectives are expressed using syntactically co-safe Linear Temporal Logic (LTL) specifications and the nonlinear system is subject to state and input constraints. At the highest level of abstraction, we model the system-environment interaction using a discrete Mixed Observable Markov Decision Problem (MOMDP), where the environment states are partially observed. The high level control policy is used to update the constraint sets and cost function of a Model Predictive Controller (MPC) which plans a reference trajectory. Afterwards, the MPC planned trajectory is fed to a low-level high-frequency tracking controller, which leverages Control Barrier Functions (CBFs) to guarantee bounded tracking errors. Our strategy is based on model abstractions of increasing complexity and layers running at different frequencies. We show that the proposed hierarchical multi-rate control architecture maximizes the probability of satisfying the high-level specifications while guaranteeing state and input constraint satisfaction. Finally, we tested the proposed strategy in simulations and experiments on examples inspired by the Mars exploration mission, where only partial environment observations are available.

...read moreread less

17 citations

Journal Article•DOI•

A survey of guidance, navigation, and control systems for autonomous multi-rotor small unmanned aerial systems

[...]

Julius A. Marshall¹, Wei Sun², Andrea L'Afflitto¹•Institutions (2)

Virginia Tech¹, University of Oklahoma²

25 Nov 2021-Annual Reviews in Control

TL;DR: In this article, the authors present a holistic perspective on the state-of-the-art in the design of guidance, navigation, and control systems for autonomous multi-rotor small unmanned aerial systems (sUAS).

...read moreread less

7 citations

Posted Content•

Collision Avoidance in Tightly-Constrained Environments without Coordination: a Hierarchical Control Approach.

[...]

Xu Shen¹, Edward L. Zhu¹, Yvonne R. Stürz¹, Francesco Borrelli¹•Institutions (1)

University of California, Berkeley¹

01 Nov 2020-arXiv: Robotics

TL;DR: The effectiveness of the proposed data-driven hierarchical control framework in a two-car collision avoidance scenario through simulations and experiments on a 1/10 scale autonomous car platform is demonstrated where the strategy-guided approach outperforms a model predictive control baseline in both cases.

...read moreread less

Abstract: We present a hierarchical control approach for maneuvering an autonomous vehicle (AV) in tightly-constrained environments where other moving AVs and/or human driven vehicles are present. A two-level hierarchy is proposed: a high-level data-driven strategy predictor and a lower-level model-based feedback controller. The strategy predictor maps an encoding of a dynamic environment to a set of high-level strategies via a neural network. Depending on the selected strategy, a set of time-varying hyperplanes in the AV's position space is generated online and the corresponding halfspace constraints are included in a lower-level model-based receding horizon controller. These strategy-dependent constraints drive the vehicle towards areas where it is likely to remain feasible. Moreover, the predicted strategy also informs switching between a discrete set of policies, which allows for more conservative behavior when prediction confidence is low. We demonstrate the effectiveness of the proposed data-driven hierarchical control framework in a two-car collision avoidance scenario through simulations and experiments on a 1/10 scale autonomous car platform where the strategy-guided approach outperforms a model predictive control baseline in both cases.

...read moreread less

7 citations

Journal Article•DOI•

Optimization-based path-planning for connected and non-connected automated vehicles

[...]

Panagiotis Typaldos¹•Institutions (1)

Technical University of Crete¹

01 Jan 2022-Transportation Research Part C-emerging Technologies

TL;DR: In this article , a path planning algorithm for connected and non-connected automated road vehicles on multilane motorways is derived from the opportune formulation of an optimal control problem, where the objective function to be minimized contains appropriate respective terms to reflect: the goals of vehicle advancement; passenger comfort; and avoidance of collisions with other vehicles and of road departures.

...read moreread less

Abstract: A path-planning algorithm for connected and non-connected automated road vehicles on multilane motorways is derived from the opportune formulation of an optimal control problem. In this framework, the objective function to be minimized contains appropriate respective terms to reflect: the goals of vehicle advancement; passenger comfort; and avoidance of collisions with other vehicles and of road departures. Connectivity implies, within the present work, that connected vehicles can exchange with each other (V2V) real-time information about their last generated short-term path. For the numerical solution of the optimal control problem, an efficient feasible direction algorithm (FDA) is used. To ensure high-quality local minima, a simplified Dynamic Programming (DP) algorithm is also conceived to deliver the initial guess trajectory for the start of the FDA iterations. Thanks to very low computation times, the approach is readily executable within a model predictive control (MPC) framework. The proposed MPC-based approach is embedded within the Aimsun microsimulation platform, which enables the evaluation of a plethora of realistic vehicle driving and advancement scenarios under different vehicles mixes. Results obtained on a multilane motorway stretch indicate higher efficiency of the optimally controlled vehicles in driving closer to their desired speed, compared to ordinary manually driven vehicles. Increased penetration rates of automated vehicles are found to increase the efficiency of the overall traffic flow, benefiting manual vehicles as well. Moreover, connected controlled vehicles appear to be more efficient in achieving their desired speed, compared also to the corresponding non-connected controlled vehicles, due to the improved real-time information and short-term prediction achieved via V2V communication.

...read moreread less

6 citations

Proceedings Article•DOI•

Provably Safe Deep Reinforcement Learning for Robotic Manipulation in Human Environments

[...]

23 May 2022

TL;DR: In this paper , a shielding mechanism is proposed to ensure ISO-verified human safety while training and deploying RL algorithms on manipulators, which improves the RL performance by preventing episode-ending collisions.

...read moreread less

Abstract: Deep reinforcement learning (RL) has shown promising results in the motion planning of manipulators. However, no method guarantees the safety of highly dynamic obstacles, such as humans, in RL-based manipulator control. This lack of formal safety assurances prevents the application of RL for manipulators in real-world human environments. Therefore, we propose a shielding mechanism that ensures ISO- verified human safety while training and deploying RL algorithms on manipulators. We utilize a fast reachability analysis of humans and manipulators to guarantee that the manipulator comes to a complete stop before a human is within its range. Our proposed method guarantees safety and significantly improves the RL performance by preventing episode-ending collisions. We demonstrate the performance of our proposed method in simulation using human motion capture data.

...read moreread less

5 citations

1
2
3
4
…

References

PDF

Open Access

More filters

Posted Content•

YOLOv4: Optimal Speed and Accuracy of Object Detection

[...]

Alexey Bochkovskiy, Chien-Yao Wang¹, Hong-Yuan Mark Liao¹•Institutions (1)

Academia Sinica¹

23 Apr 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work uses new features: WRC, CSP, CmBN, SAT, Mish activation, Mosaic data augmentation, C mBN, DropBlock regularization, and CIoU loss, and combine some of them to achieve state-of-the-art results: 43.5% AP for the MS COCO dataset at a realtime speed of ~65 FPS on Tesla V100.

...read moreread less

Abstract: There are a huge number of features which are said to improve Convolutional Neural Network (CNN) accuracy. Practical testing of combinations of such features on large datasets, and theoretical justification of the result, is required. Some features operate on certain models exclusively and for certain problems exclusively, or only for small-scale datasets; while some features, such as batch-normalization and residual-connections, are applicable to the majority of models, tasks, and datasets. We assume that such universal features include Weighted-Residual-Connections (WRC), Cross-Stage-Partial-connections (CSP), Cross mini-Batch Normalization (CmBN), Self-adversarial-training (SAT) and Mish-activation. We use new features: WRC, CSP, CmBN, SAT, Mish activation, Mosaic data augmentation, CmBN, DropBlock regularization, and CIoU loss, and combine some of them to achieve state-of-the-art results: 43.5% AP (65.7% AP50) for the MS COCO dataset at a realtime speed of ~65 FPS on Tesla V100. Source code is at this https URL

...read moreread less

5,709 citations

Proceedings Article•

Trust Region Policy Optimization

[...]

John Schulman¹, Sergey Levine¹, Pieter Abbeel¹, Michael I. Jordan¹, Philipp Moritz¹ - Show less +1 more•Institutions (1)

University of California, Berkeley¹

06 Jul 2015

TL;DR: A method for optimizing control policies, with guaranteed monotonic improvement, by making several approximations to the theoretically-justified scheme, called Trust Region Policy Optimization (TRPO).

...read moreread less

Abstract: In this article, we describe a method for optimizing control policies, with guaranteed monotonic improvement. By making several approximations to the theoretically-justified scheme, we develop a practical algorithm, called Trust Region Policy Optimization (TRPO). This algorithm is effective for optimizing large nonlinear policies such as neural networks. Our experiments demonstrate its robust performance on a wide variety of tasks: learning simulated robotic swimming, hopping, and walking gaits; and playing Atari games using images of the screen as input. Despite its approximations that deviate from the theory, TRPO tends to give monotonic improvement, with little tuning of hyperparameters.

...read moreread less

3,479 citations

Book•

Robotic mapping: a survey

[...]

Sebastian Thrun¹•Institutions (1)

Carnegie Mellon University¹

01 Jan 2003

TL;DR: This article provides a comprehensive introduction into the field of robotic mapping, with a focus on indoor mapping, and describes and compares various probabilistic techniques, as they are presently being applied to a vast array of mobile robot mapping problems.

...read moreread less

Abstract: This article provides a comprehensive introduction into the field of robotic mapping, with a focus on indoor mapping. It describes and compares various probabilistic techniques, as they are presently being applied to a vast array of mobile robot mapping problems. The history of robotic mapping is also detailed, along with an extensive list of open research problems.

...read moreread less

1,584 citations

Proceedings Article•

Addressing Function Approximation Error in Actor-Critic Methods

[...]

Scott Fujimoto¹, Herke van Hoof², David Meger¹•Institutions (2)

McGill University¹, University of Amsterdam²

03 Jul 2018

TL;DR: In this paper, the authors show that the overestimation bias persists in an actor-critic setting and propose novel mechanisms to minimize its effects on both the actor and the critic.

...read moreread less

Abstract: In value-based reinforcement learning methods such as deep Q-learning, function approximation errors are known to lead to overestimated value estimates and suboptimal policies. We show that this problem persists in an actor-critic setting and propose novel mechanisms to minimize its effects on both the actor and the critic. Our algorithm builds on Double Q-learning, by taking the minimum value between a pair of critics to limit overestimation. We draw the connection between target networks and overestimation bias, and suggest delaying policy updates to reduce per-update error and further improve performance. We evaluate our method on the suite of OpenAI gym tasks, outperforming the state of the art in every environment tested.

...read moreread less

954 citations

Proceedings Article•

Constrained policy optimization

[...]

Joshua Achiam¹, David Held¹, Aviv Tamar¹, Pieter Abbeel¹•Institutions (1)

University of California, Berkeley¹

06 Aug 2017

TL;DR: Constrained Policy Optimization (CPO) as discussed by the authors is the first general-purpose policy search algorithm for constrained reinforcement learning with guarantees for near-constraint satisfaction at each iteration.

...read moreread less

Abstract: For many applications of reinforcement learning it can be more convenient to specify both a reward function and constraints, rather than trying to design behavior through the reward function. For example, systems that physically interact with or around humans should satisfy safety constraints. Recent advances in policy search algorithms (Mnih et al., 2016; Schulman et al., 2015; Lillicrap et al., 2016; Levine et al., 2016) have enabled new capabilities in high-dimensional control, but do not consider the constrained setting. We propose Constrained Policy Optimization (CPO), the first general-purpose policy search algorithm for constrained reinforcement learning with guarantees for near-constraint satisfaction at each iteration. Our method allows us to train neural network policies for high-dimensional control while making guarantees about policy behavior all throughout training. Our guarantees are based on a new theoretical result, which is of independent interest: we prove a bound relating the expected returns of two policies to an average divergence between them. We demonstrate the effectiveness of our approach on simulated robot locomotion tasks where the agent must satisfy constraints motivated by safety.

...read moreread less

768 citations