scispace - formally typeset
Open accessJournal ArticleDOI: 10.1109/LRA.2021.3063989

Reachability-Based Trajectory Safeguard (RTS): A Safe and Fast Reinforcement Learning Safety Layer for Continuous Control

04 Mar 2021-Vol. 6, Iss: 2, pp 3663-3670
Abstract: Reinforcement Learning (RL) algorithms have achieved remarkable performance in decision making and control tasks by reasoning about long-term, cumulative reward using trial and error. However, during RL training, applying this trial-and-error approach to real-world robots operating in safety critical environment may lead to collisions. To address this challenge, this letter proposes a Reachability-based Trajectory Safeguard (RTS), which leverages reachability analysis to ensure safety during training and operation. Given a known (but uncertain) model of a robot, RTS precomputes a Forward Reachable Set of the robot tracking a continuum of parameterized trajectories. At runtime, the RL agent selects from this continuum in a receding-horizon way to control the robot; the FRS is used to identify if the agent's choice is safe or not, and to adjust unsafe choices. The efficacy of this method is illustrated in static environments on three nonlinear robot models, including a 12-D quadrotor drone, in simulation and in comparison with state-of-the-art safe motion planning methods.

... read more

Topics: Reinforcement learning (55%), Motion planning (51%), Reachability (51%) ... show more
Citations
  More

5 results found


Open accessPosted Content
11 Dec 2020-
Abstract: In this paper we present a hierarchical multi-rate control architecture for nonlinear autonomous systems operating in partially observable environments. Control objectives are expressed using syntactically co-safe Linear Temporal Logic (LTL) specifications and the nonlinear system is subject to state and input constraints. At the highest level of abstraction, we model the system-environment interaction using a discrete Mixed Observable Markov Decision Problem (MOMDP), where the environment states are partially observed. The high level control policy is used to update the constraint sets and cost function of a Model Predictive Controller (MPC) which plans a reference trajectory. Afterwards, the MPC planned trajectory is fed to a low-level high-frequency tracking controller, which leverages Control Barrier Functions (CBFs) to guarantee bounded tracking errors. Our strategy is based on model abstractions of increasing complexity and layers running at different frequencies. We show that the proposed hierarchical multi-rate control architecture maximizes the probability of satisfying the high-level specifications while guaranteeing state and input constraint satisfaction. Finally, we tested the proposed strategy in simulations and experiments on examples inspired by the Mars exploration mission, where only partial environment observations are available.

... read more

8 Citations


Open accessPosted Content
01 Nov 2020-arXiv: Robotics
Abstract: We present a hierarchical control approach for maneuvering an autonomous vehicle (AV) in tightly-constrained environments where other moving AVs and/or human driven vehicles are present. A two-level hierarchy is proposed: a high-level data-driven strategy predictor and a lower-level model-based feedback controller. The strategy predictor maps an encoding of a dynamic environment to a set of high-level strategies via a neural network. Depending on the selected strategy, a set of time-varying hyperplanes in the AV's position space is generated online and the corresponding halfspace constraints are included in a lower-level model-based receding horizon controller. These strategy-dependent constraints drive the vehicle towards areas where it is likely to remain feasible. Moreover, the predicted strategy also informs switching between a discrete set of policies, which allows for more conservative behavior when prediction confidence is low. We demonstrate the effectiveness of the proposed data-driven hierarchical control framework in a two-car collision avoidance scenario through simulations and experiments on a 1/10 scale autonomous car platform where the strategy-guided approach outperforms a model predictive control baseline in both cases.

... read more

3 Citations


Open accessPosted Content
Astghik Hakobyan, Insoon Yang1Institutions (1)
03 May 2021-arXiv: Robotics
Abstract: This paper proposes a novel safety specification tool, called the distributionally robust risk map (DR-risk map), for a mobile robot operating in a learning-enabled environment. Given the robot's position, the map aims to reliably assess the conditional value-at-risk (CVaR) of collision with obstacles whose movements are inferred by Gaussian process regression (GPR). Unfortunately, the inferred distribution is subject to errors, making it difficult to accurately evaluate the CVaR of collision. To overcome this challenge, this tool measures the risk under the worst-case distribution in a so-called ambiguity set that characterizes allowable distribution errors. To resolve the infinite-dimensionality issue inherent in the construction of the DR-risk map, we derive a tractable semidefinite programming formulation that provides an upper bound of the risk, exploiting techniques from modern distributionally robust optimization. As a concrete application for motion planning, a distributionally robust RRT* algorithm is considered using the risk map that addresses distribution errors caused by GPR. Furthermore, a motion control method is devised using the DR-risk map in a learning-based model predictive control (MPC) formulation. In particular, a neural network approximation of the risk map is proposed to reduce the computational cost in solving the MPC problem. The performance and utility of the proposed risk map are demonstrated through simulation studies that show its ability to ensure the safety of mobile robots despite learning errors.

... read more

Topics: Robust optimization (53%), Semidefinite programming (53%), Motion planning (52%) ... show more

3 Citations


Open accessJournal ArticleDOI: 10.1016/J.ARCONTROL.2021.10.013
Abstract: This survey paper presents a holistic perspective on the state-of-the-art in the design of guidance, navigation, and control systems for autonomous multi-rotor small unmanned aerial systems (sUAS). By citing more than 300 publications, this work recalls fundamental results that enabled the design of these systems, describes some of the latest advances, and compares the performance of several techniques. This paper also lists some techniques that, although already employed by different classes of mobile robots, have not been employed yet on sUAS, but may lead to satisfactory results. Furthermore, this publication highlights some limitations in the theoretical and technological solutions underlying existing guidance, navigation, and control systems for sUAS and places special emphasis on some of the most relevant gaps that hinder the integration of these three systems. In light of the surveyed results, this paper provides recommendations for macro-research areas that would improve the overall quality of autopilots for autonomous sUAS and would facilitate the transition of existing results from sUAS to larger autonomous aircraft for payload delivery and commercial transportation.

... read more


Open accessProceedings ArticleDOI: 10.1109/ICRA48506.2021.9561417
30 May 2021-
Abstract: We present a hierarchical control approach for maneuvering an autonomous vehicle (AV) in tightly-constrained environments where other moving AVs and/or human driven vehicles are present. A two-level hierarchy is proposed: a high-level data-driven strategy predictor and a lower-level model-based feedback controller. The strategy predictor maps an encoding of a dynamic environment to a set of high-level strategies via a neural network. Depending on the selected strategy, a set of time-varying hyperplanes in the AV’s position space is generated online and the corresponding halfspace constraints are included in a lower-level model-based receding horizon controller. These strategy-dependent constraints drive the vehicle towards areas where it is likely to remain feasible. Moreover, the predicted strategy also informs switching between a discrete set of policies, which allows for more conservative behavior when prediction confidence is low. We demonstrate the effectiveness of the proposed data-driven hierarchical control framework in a two-car collision avoidance scenario through simulations and experiments on a 1/10 scale autonomous car platform where the strategy-guided approach outperforms a model predictive control baseline in both cases.

... read more

References
  More

28 results found


Open accessProceedings Article
John Schulman1, Sergey Levine1, Pieter Abbeel1, Michael I. Jordan1  +1 moreInstitutions (1)
06 Jul 2015-
Abstract: In this article, we describe a method for optimizing control policies, with guaranteed monotonic improvement. By making several approximations to the theoretically-justified scheme, we develop a practical algorithm, called Trust Region Policy Optimization (TRPO). This algorithm is effective for optimizing large nonlinear policies such as neural networks. Our experiments demonstrate its robust performance on a wide variety of tasks: learning simulated robotic swimming, hopping, and walking gaits; and playing Atari games using images of the screen as input. Despite its approximations that deviate from the theory, TRPO tends to give monotonic improvement, with little tuning of hyperparameters.

... read more

Topics: Trust region (58%)

2,271 Citations


Open accessBook
Sebastian Thrun1Institutions (1)
01 Jan 2003-
Abstract: This article provides a comprehensive introduction into the field of robotic mapping, with a focus on indoor mapping. It describes and compares various probabilistic techniques, as they are presently being applied to a vast array of mobile robot mapping problems. The history of robotic mapping is also detailed, along with an extensive list of open research problems.

... read more

Topics: Robotic mapping (63%), Mobile robot (53%)

1,550 Citations


Open accessPosted Content
Abstract: There are a huge number of features which are said to improve Convolutional Neural Network (CNN) accuracy. Practical testing of combinations of such features on large datasets, and theoretical justification of the result, is required. Some features operate on certain models exclusively and for certain problems exclusively, or only for small-scale datasets; while some features, such as batch-normalization and residual-connections, are applicable to the majority of models, tasks, and datasets. We assume that such universal features include Weighted-Residual-Connections (WRC), Cross-Stage-Partial-connections (CSP), Cross mini-Batch Normalization (CmBN), Self-adversarial-training (SAT) and Mish-activation. We use new features: WRC, CSP, CmBN, SAT, Mish activation, Mosaic data augmentation, CmBN, DropBlock regularization, and CIoU loss, and combine some of them to achieve state-of-the-art results: 43.5% AP (65.7% AP50) for the MS COCO dataset at a realtime speed of ~65 FPS on Tesla V100. Source code is at this https URL

... read more

1,513 Citations


Open accessProceedings Article
Scott Fujimoto1, Herke van Hoof2, David Meger1Institutions (2)
03 Jul 2018-
Abstract: In value-based reinforcement learning methods such as deep Q-learning, function approximation errors are known to lead to overestimated value estimates and suboptimal policies. We show that this problem persists in an actor-critic setting and propose novel mechanisms to minimize its effects on both the actor and the critic. Our algorithm builds on Double Q-learning, by taking the minimum value between a pair of critics to limit overestimation. We draw the connection between target networks and overestimation bias, and suggest delaying policy updates to reduce per-update error and further improve performance. We evaluate our method on the suite of OpenAI gym tasks, outperforming the state of the art in every environment tested.

... read more

944 Citations


Open accessProceedings Article
06 Aug 2017-
Abstract: For many applications of reinforcement learning it can be more convenient to specify both a reward function and constraints, rather than trying to design behavior through the reward function. For example, systems that physically interact with or around humans should satisfy safety constraints. Recent advances in policy search algorithms (Mnih et al., 2016; Schulman et al., 2015; Lillicrap et al., 2016; Levine et al., 2016) have enabled new capabilities in high-dimensional control, but do not consider the constrained setting. We propose Constrained Policy Optimization (CPO), the first general-purpose policy search algorithm for constrained reinforcement learning with guarantees for near-constraint satisfaction at each iteration. Our method allows us to train neural network policies for high-dimensional control while making guarantees about policy behavior all throughout training. Our guarantees are based on a new theoretical result, which is of independent interest: we prove a bound relating the expected returns of two policies to an average divergence between them. We demonstrate the effectiveness of our approach on simulated robot locomotion tasks where the agent must satisfy constraints motivated by safety.

... read more

454 Citations


Performance
Metrics
No. of citations received by the Paper in previous years
YearCitations
20213
20202
Network Information
Related Papers (5)
Reachable Sets for Safe, Real-Time Manipulator Trajectory Design12 Jul 2020

Patrick Holmes, Shreyas Kousik +5 more

76% related
RL-RRT: Kinodynamic Motion Planning via Learning Reachability Estimators from RL Policies25 Jul 2019

Hao-Tien Lewis Chiang, Jasmine Hsu +3 more

74% related
An MPC Framework for Online Motion Planning in Human-Robot Collaborative Tasks01 Sep 2019

Marco Faroni, Manuel Beschi +1 more

74% related
Optimal Robot Motion Planning in Constrained Workspaces Using Reinforcement Learning24 Oct 2020

Panagiotis Rousseas, Charalampos P. Bechlioulis +1 more

74% related