scispace - formally typeset
Search or ask a question

Showing papers by "Mykel J. Kochenderfer published in 2018"


Proceedings ArticleDOI
26 Jun 2018
TL;DR: It is shown that DRL can find more likely failure scenarios than MCTS with fewer calls to the simulator and can be easily applied to other scenarios given the appropriate models of the vehicle and the environment.
Abstract: This paper presents a method for testing the decision making systems of autonomous vehicles. Our approach involves perturbing stochastic elements in the vehicle's environment until the vehicle is involved in a collision. Instead of applying direct Monte Carlo sampling to find collision scenarios, we formulate the problem as a Markov decision process and use reinforcement learning algorithms to find the most likely failure scenarios. This paper presents Monte Carlo Tree Search (MCTS) and Deep Reinforcement Learning (DRL) solutions that can scale to large environments. We show that DRL can find more likely failure scenarios than MCTS with fewer calls to the simulator. A simulation scenario involving a vehicle approaching a crosswalk is used to validate the framework. Our proposed approach is very general and can be easily applied to other scenarios given the appropriate models of the vehicle and the environment.

140 citations


Proceedings Article
01 Jan 2018
TL;DR: Two new algorithms, POMCPOW and PFT-DPW, are proposed and evaluated that overcome this deficiency by using weighted particle filtering and Simulation results show that these modifications allow the algorithms to be successful where previous approaches fail.
Abstract: Online solvers for partially observable Markov decision processes have been applied to problems with large discrete state spaces, but continuous state, action, and observation spaces remain a challenge. This paper begins by investigating double progressive widening (DPW) as a solution to this challenge. However, we prove that this modification alone is not sufficient because the belief representations in the search tree collapse to a single particle causing the algorithm to converge to a policy that is suboptimal regardless of the computation time. This paper proposes and evaluates two new algorithms, POMCPOW and PFT-DPW, that overcome this deficiency by using weighted particle filtering. Simulation results show that these modifications allow the algorithms to be successful where previous approaches fail.

120 citations


Proceedings Article
18 May 2018
TL;DR: The proposed approach, grounded in Koopman theory, is shown to produce stable dynamical models that can predict the time evolution of the cylinder system over extended time horizons and is able to find a straightforward, interpretable control law for suppressing vortex shedding in the wake of the cylinders.
Abstract: The design of flow control systems remains a challenge due to the nonlinear nature of the equations that govern fluid flow. However, recent advances in computational fluid dynamics (CFD) have enabled the simulation of complex fluid flows with high accuracy, opening the possibility of using learning-based approaches to facilitate controller design. We present a method for learning the forced and unforced dynamics of airflow over a cylinder directly from CFD data. The proposed approach, grounded in Koopman theory, is shown to produce stable dynamical models that can predict the time evolution of the cylinder system over extended time horizons. Finally, by performing model predictive control with the learned dynamical models, we are able to find a straightforward, interpretable control law for suppressing vortex shedding in the wake of the cylinder.

104 citations


Posted Content
TL;DR: HG-DAgger is proposed, a variant of DAgger that is more suitable for interactive imitation learning from human experts in real-world systems and learns a safety threshold for a model-uncertainty-based risk metric that can be used to predict the performance of the fully trained novice in different regions of the state space.
Abstract: Imitation learning has proven to be useful for many real-world problems, but approaches such as behavioral cloning suffer from data mismatch and compounding error issues. One attempt to address these limitations is the DAgger algorithm, which uses the state distribution induced by the novice to sample corrective actions from the expert. Such sampling schemes, however, require the expert to provide action labels without being fully in control of the system. This can decrease safety and, when using humans as experts, is likely to degrade the quality of the collected labels due to perceived actuator lag. In this work, we propose HG-DAgger, a variant of DAgger that is more suitable for interactive imitation learning from human experts in real-world systems. In addition to training a novice policy, HG-DAgger also learns a safety threshold for a model-uncertainty-based risk metric that can be used to predict the performance of the fully trained novice in different regions of the state space. We evaluate our method on both a simulated and real-world autonomous driving task, and demonstrate improved performance over both DAgger and behavioral cloning.

87 citations


Journal ArticleDOI
TL;DR: In this paper, a deep neural network is used to approximate a numeric table for collision avoidance in an aircraft collision avoidance system. But the use of deep neural networks does not address the high dimensionality of the state space, which leads to very large tables.
Abstract: One approach to designing decision making logic for an aircraft collision avoidance system frames the problem as a Markov decision process and optimizes the system using dynamic programming. The resulting collision avoidance strategy can be represented as a numeric table. This methodology has been used in the development of the Airborne Collision Avoidance System X (ACAS X) family of collision avoidance systems for manned and unmanned aircraft, but the high dimensionality of the state space leads to very large tables. To improve storage efficiency, a deep neural network is used to approximate the table. With the use of an asymmetric loss function and a gradient descent algorithm, the parameters for this network can be trained to provide accurate estimates of table values while preserving the relative preferences of the possible advisories for each state. By training multiple networks to represent subtables, the network also decreases the required runtime for computing the collision avoidance advisory. Simulation studies show that the network improves the safety and efficiency of the collision avoidance system. Because only the network parameters need to be stored, the required storage space is reduced by a factor of 1000, enabling the collision avoidance system to operate using current avionics systems.

85 citations


Proceedings ArticleDOI
01 Oct 2018
TL;DR: This paper extended Generative Adversarial Imitation Learning (GAIL) to address these shortcomings through a parameter-sharing approach grounded in curriculum learning and showed that policies generated by their PS-GAIL method proved superior at interacting stably in a multi-agent setting and capturing the emergent behavior of human drivers.
Abstract: Simulation is an appealing option for validating the safety of autonomous vehicles. Generative Adversarial Imitation Learning (GAIL) has recently been shown to learn representative human driver models. These human driver models were learned through training in single-agent environments, but they have difficulty in generalizing to multi-agent driving scenarios. We argue these difficulties arise because observations at training and test time are sampled from different distributions. This difference makes such models unsuitable for the simulation of driving scenes, where multiple agents must interact realistically over long time horizons. We extend GAIL to address these shortcomings through a parameter-sharing approach grounded in curriculum learning. Compared with single-agent GAIL policies, policies generated by our PS-GAIL method prove superior at interacting stably in a multi-agent setting and capturing the emergent behavior of human drivers.

78 citations


Proceedings Article
Rui Shu1, Hung Bui2, Shengjia Zhao1, Mykel J. Kochenderfer1, Stefano Ermon1 
23 May 2018
TL;DR: This paper challenges the belief that amortized inference is simply a mechanism for approximating maximum likelihood training and illustrates that regularization of the amortization family provides a new direction for understanding and improving generalization in VAEs.
Abstract: The variational autoencoder (VAE) is a popular model for density estimation and representation learning. Canonically, the variational principle suggests to prefer an expressive inference model so that the variational approximation is accurate. However, it is often overlooked that an overly-expressive inference model can be detrimental to the test set performance of both the amortized posterior approximator and, more importantly, the generative density estimator. In this paper, we leverage the fact that VAEs rely on amortized inference and propose techniques for amortized inference regularization (AIR) that control the smoothness of the inference model. We demonstrate that, by applying AIR, it is possible to improve VAE generalization on both inference and generative performance. Our paper challenges the belief that amortized inference is simply a mechanism for approximating maximum likelihood training and illustrates that regularization of the amortization family provides a new direction for understanding and improving generalization in VAEs.

67 citations


Proceedings ArticleDOI
21 May 2018
TL;DR: A decomposition method is demonstrated that leverages the optimal avoidance strategy for a single user in a partially observable Markov decision process to bypass the computational cost of scaling the formulation to avoiding multiple road users.
Abstract: Autonomous driving in urban areas requires avoiding other road users with only partial observability of the environment. Observations are only partial because obstacles can occlude the field of view of the sensors. The problem of robust and efficient navigation under uncertainty can be framed as a partially observable Markov decision process (POMDP). In order to bypass the computational cost of scaling the formulation to avoiding multiple road users, this paper demonstrates a decomposition method that leverages the optimal avoidance strategy for a single user. We evaluate the performance of two POMDP solution techniques augmented with the decomposition method for scenarios involving a pedestrian crosswalk and an intersection.

65 citations


Posted Content
TL;DR: In this paper, two deep reinforcement learning approaches for training decentralized controllers that accommodate the high dimensionality and uncertainty inherent in the problem of forest fire coverage are presented, where aircraft collaborate on a map of the wildfire's state and maintain a time history of locations visited.
Abstract: Teams of autonomous unmanned aircraft can be used to monitor wildfires, enabling firefighters to make informed decisions. However, controlling multiple autonomous fixed-wing aircraft to maximize forest fire coverage is a complex problem. The state space is high dimensional, the fire propagates stochastically, the sensor information is imperfect, and the aircraft must coordinate with each other to accomplish their mission. This work presents two deep reinforcement learning approaches for training decentralized controllers that accommodate the high dimensionality and uncertainty inherent in the problem. The first approach controls the aircraft using immediate observations of the individual aircraft. The second approach allows aircraft to collaborate on a map of the wildfire's state and maintain a time history of locations visited, which are used as inputs to the controller. Simulation results show that both approaches allow the aircraft to accurately track wildfire expansions and outperform an online receding horizon controller. Additional simulations demonstrate that the approach scales with different numbers of aircraft and generalizes to different wildfire shapes.

54 citations


Proceedings ArticleDOI
08 Jan 2018
TL;DR: This paper presents a scalable method to efficiently search for the most likely state trajectory leading to an event given only a simulator of a system using Monte Carlo Tree Search (MCTS), and presents results for both single and multi-threat encounters.
Abstract: The next-generation Airborne Collision Avoidance System (ACAS X) is currently being developed and tested to replace the Traffic Alert and Collision Avoidance System (TCAS) as the next international standard for collision avoidance. To validate the safety of the system, stress testing in simulation is one of several approaches for analyzing near mid-air collisions (NMACs). Understanding how NMACs can occur is important for characterizing risk and informingdevelopment of the system. Recently, adaptive stress testing (AST) has been proposed as a way to find the most likely path to a failure event. The simulation-based approach accelerates search by formulating stress testing as a sequential decision process then optimizing it using reinforcement learning. The approach has been successfully applied to stress test a prototype of ACAS Xin various simulated aircraft encounters. In some applications, we are not as interestedin the system's absolute performance as its performance relative to another system. Such situations arise, for example, during regression testing or when deciding whether a new system should replace an existing system. In our collision avoidance application, we are interested in finding cases where ACAS X fails but TCAS succeeds in resolving a conflict. Existing approaches do not provide an efficient means to perform this type of analysis. This paper extends the AST approach to differential analysis by searching two simulators simultaneously and maximizing the difference between their outcomes. We call this approach differential adaptive stress testing (DAST). We apply DAST to compare a prototype of ACAS X against TCAS and show examples of encounters found by the algorithm.

47 citations


Posted Content
TL;DR: The increasing use of deep neural networks for safety-critical applications, such as autonomous driving and flight control, raises concerns about their safety and reliability, so work on mitigating this difficulty is given, by developing scalable verification techniques and identifying design choices that result in deep learning systems that are more amenable to verification.
Abstract: The increasing use of deep neural networks for safety-critical applications, such as autonomous driving and flight control, raises concerns about their safety and reliability. Formal verification can address these concerns by guaranteeing that a deep learning system operates as intended, but the state-of-the-art is limited to small systems. In this work-in-progress report we give an overview of our work on mitigating this difficulty, by pursuing two complementary directions: devising scalable verification techniques, and identifying design choices that result in deep learning systems that are more amenable to verification.

Proceedings ArticleDOI
26 Jun 2018
TL;DR: A modified value sensitive design methodology is applied to the development of an autonomous vehicle speed control algorithm to safely navigate an occluded pedestrian crosswalk to compute an optimal policy to control the longitudinal acceleration of the vehicle based on the belief of a pedestrian crossing.
Abstract: Human drivers navigate the roadways by balancing values such as safety, legality, and mobility. The public will likely judge an autonomous vehicle by similar values. The iterative methodology of value sensitive design formalizes the connection of human values to engineering specifications. We apply a modified value sensitive design methodology to the development of an autonomous vehicle speed control algorithm to safely navigate an occluded pedestrian crosswalk. The first iteration presented here models the problem as a partially observable Markov decision process and uses dynamic programming to compute an optimal policy to control the longitudinal acceleration of the vehicle based on the belief of a pedestrian crossing. The speed control algorithm is then tested in real-time on an experimental vehicle on a closed road course.

Posted Content
TL;DR: Compared with single-agent GAIL policies, policies generated by the PS-GAIL method prove superior at interacting stably in a multi-agent setting and capturing the emergent behavior of human drivers.
Abstract: Simulation is an appealing option for validating the safety of autonomous vehicles. Generative Adversarial Imitation Learning (GAIL) has recently been shown to learn representative human driver models. These human driver models were learned through training in single-agent environments, but they have difficulty in generalizing to multi-agent driving scenarios. We argue these difficulties arise because observations at training and test time are sampled from different distributions. This difference makes such models unsuitable for the simulation of driving scenes, where multiple agents must interact realistically over long time horizons. We extend GAIL to address these shortcomings through a parameter-sharing approach grounded in curriculum learning. Compared with single-agent GAIL policies, policies generated by our PS-GAIL method prove superior at interacting stably in a multi-agent setting and capturing the emergent behavior of human drivers.

Proceedings ArticleDOI
18 Oct 2018
TL;DR: In this article, the authors extended the game formulation to a semi-competitive setting and demonstrate that the resulting adversary better captures meaningful disturbances that lead to better overall performance, while effectively reducing collision rates compared to baseline control policies produced by traditional reinforcement learning methods.
Abstract: To improve efficiency and reduce failures in autonomous vehicles, research has focused on developing robust and safe learning methods that take into account disturbances in the environment. Existing literature in robust reinforcement learning poses the learning problem as a two player game between the autonomous system and disturbances. This paper examines two different algorithms to solve the game, Robust Adversarial Reinforcement Learning and Neural Fictitious Self Play, and compares performance on an autonomous driving scenario. We extend the game formulation to a semi-competitive setting and demonstrate that the resulting adversary better captures meaningful disturbances that lead to better overall performance. The resulting robust policy exhibits improved driving efficiency while effectively reducing collision rates compared to baseline control policies produced by traditional reinforcement learning methods.

Posted Content
Rui Shu1, Hung Bui2, Shengjia Zhao1, Mykel J. Kochenderfer1, Stefano Ermon1 
TL;DR: In this paper, the authors leverage the fact that VAEs rely on amortized inference and propose techniques for amortised inference regularization (AIR) that control the smoothness of the inference model.
Abstract: The variational autoencoder (VAE) is a popular model for density estimation and representation learning. Canonically, the variational principle suggests to prefer an expressive inference model so that the variational approximation is accurate. However, it is often overlooked that an overly-expressive inference model can be detrimental to the test set performance of both the amortized posterior approximator and, more importantly, the generative density estimator. In this paper, we leverage the fact that VAEs rely on amortized inference and propose techniques for amortized inference regularization (AIR) that control the smoothness of the inference model. We demonstrate that, by applying AIR, it is possible to improve VAE generalization on both inference and generative performance. Our paper challenges the belief that amortized inference is simply a mechanism for approximating maximum likelihood training and illustrates that regularization of the amortization family provides a new direction for understanding and improving generalization in VAEs.

Posted Content
TL;DR: In this paper, a method for learning the forced and unforced dynamics of airflow over a cylinder directly from CFD data is presented, grounded in Koopman theory, which produces stable dynamical models that can predict the time evolution of the cylinder system over extended time horizons.
Abstract: The design of flow control systems remains a challenge due to the nonlinear nature of the equations that govern fluid flow. However, recent advances in computational fluid dynamics (CFD) have enabled the simulation of complex fluid flows with high accuracy, opening the possibility of using learning-based approaches to facilitate controller design. We present a method for learning the forced and unforced dynamics of airflow over a cylinder directly from CFD data. The proposed approach, grounded in Koopman theory, is shown to produce stable dynamical models that can predict the time evolution of the cylinder system over extended time horizons. Finally, by performing model predictive control with the learned dynamical models, we are able to find a straightforward, interpretable control law for suppressing vortex shedding in the wake of the cylinder.

Proceedings Article
09 Jul 2018
TL;DR: A novel, multi-agent reinforcement learning formula- tion of multi-object tracking that treats creating, propagating, and terminating object tracks as actions in a sequential decision-making problem, parameterized by a multi-layer neural network.
Abstract: We present a novel, multi-agent reinforcement learning formula- tion of multi-object tracking that treats creating, propagating, and terminating object tracks as actions in a sequential decision-making problem. In our formulation, each agent tracks a single object at a time by updating a Bayesian lter according to a discrete set of actions. At each timestep, the reward received is dependent on the joint actions taken by all agents and the ground truth object tracks. We optimize for di erent tracking metrics directly while propagat- ing covariance information about each object's state. We use trust region policy optimization (TRPO) to train a shared policy across all agents, parameterized by a multi-layer neural network. Our ex- periments show an improvement in tracking accuracy over similar state-of-the-art, rule-based approaches on a popular multi-object tracking dataset.

Proceedings Article
21 Aug 2018
TL;DR: This paper presents an approach that can greatly reduce the complexity of computing the optimal strategy in problems where only some of the dimensions of the problem are controllable.
Abstract: Deciding when and how to avoid collision in stochastic environments requires accounting for the likelihood and relative costs of future sequences of outcomes in response to different sequences of actions. Prior work has investigated formulating the problem as a Markov decision process, discretizing the state space, and solving for the optimal strategy using dynamic programming. Experiments have shown that such an approach can be very effective, but scaling to higher-dimensional problems can be challenging due to the exponential growth of the discrete state space. This paper presents an approach that can greatly reduce the complexity of computing the optimal strategy in problems where only some of the dimensions of the problem are controllable. The approach is demonstrated on an airborne collision avoidance problem where the system must recommend maneuvers to an imperfect pilot.

Proceedings ArticleDOI
01 Oct 2018
TL;DR: In this article, the interaction between human drivers and pedestrians and how it might influence map estimation is modeled as a proxy for detection, which improves overall environment awareness and outperforms standard mapping techniques.
Abstract: Despite growing attention in autonomy, there are still many open problems, including how autonomous vehicles will interact and communicate with other agents, such as human drivers and pedestrians. Unlike most approaches that focus on pedestrian detection and planning for collision avoidance, this paper considers modeling the interaction between human drivers and pedestrians and how it might influence map estimation, as a proxy for detection. We take a mapping inspired approach and incorporate people as sensors into mapping frameworks. By taking advantage of other agents' actions, we demonstrate how we can impute portions of the map that would otherwise be occluded. We evaluate our framework in human driving experiments and on real-world data, using occupancy grids and landmark-based mapping approaches. Our approach significantly improves overall environment awareness and outperforms standard mapping techniques.

Book ChapterDOI
10 Sep 2018
TL;DR: This paper focuses on the problem of determining as large a region as possible where a function exceeds a given threshold with high probability, and proposes maximizing the expected volume of the domain identified as above the threshold as predicted by a Gaussian process, robustified by a variance term.
Abstract: This paper focuses on the problem of determining as large a region as possible where a function exceeds a given threshold with high probability. We assume that we only have access to a noise-corrupted version of the function and that function evaluations are costly. To select the next query point, we propose maximizing the expected volume of the domain identified as above the threshold as predicted by a Gaussian process, robustified by a variance term. We also give asymptotic guarantees on the exploration effect of the algorithm, regardless of the prior misspecification. We show by various numerical examples that our approach also outperforms existing techniques in the literature in practice.

Posted Content
TL;DR: In this paper, the authors focus on the problem of determining as large a region as possible where a function exceeds a given threshold with high probability, assuming that we only have access to a noise-corrupted version of the function and that function evaluations are costly.
Abstract: This paper focuses on the problem of determining as large a region as possible where a function exceeds a given threshold with high probability. We assume that we only have access to a noise-corrupted version of the function and that function evaluations are costly. To select the next query point, we propose maximizing the expected volume of the domain identified as above the threshold as predicted by a Gaussian process, robustified by a variance term. We also give asymptotic guarantees on the exploration effect of the algorithm, regardless of the prior misspecification. We show by various numerical examples that our approach also outperforms existing techniques in the literature in practice.

Proceedings Article
01 Jan 2018
TL;DR: In this paper, the authors proposed a grammar-based decision tree for classification of high-dimensional and heterogeneous time series data for near-mid-air collisions (NMACs) in simulated aircraft encounters.
Abstract: We analyze data from simulated aircraft encounters to validate and inform the development of a prototype aircraft collision avoidance system. The high-dimensional and heterogeneous time series dataset is analyzed to discover properties of near mid-air collisions (NMACs) and categorize the NMAC encounters. Domain experts use these properties to better organize and understand NMAC occurrences. Existing solutions either are not capable of handling high-dimensional and heterogeneous time series datasets or do not provide explanations that are interpretable by a domain expert. The latter is critical to the acceptance and deployment of safety-critical systems. To address this gap, we propose grammar-based decision trees along with a learning algorithm. Our approach extends decision trees with a grammar framework for classifying heterogeneous time series data. A context-free grammar is used to derive decision expressions that are interpretable, application-specific, and support heterogeneous data types. In addition to classification, we show how grammar-based decision trees can also be used for categorization, which is a combination of clustering and generating interpretable explanations for each cluster. We apply grammar-based decision trees to a simulated aircraft encounter dataset and evaluate the performance of four variants of our learning algorithm. The best algorithm is used to analyze and categorize near mid-air collisions in the aircraft encounter dataset. We describe each discovered category in detail and discuss its relevance to aircraft collision avoidance.

Proceedings Article
09 Jul 2018
TL;DR: An approach inspired from multi-fidelity optimization to learn a correction term with a neural network representation that leads to a significant improvement over the decomposition method alone and outperforms a policy trained on the full scale problem without utility decomposition.
Abstract: Decomposition methods have been proposed in the past to approximate solutions to large sequential decision making problems. In contexts where an agent interacts with multiple entities, utility decomposition can be used where each individual entity is considered independently. The individual utility functions are then combined in real time to solve the global problem. Although these techniques can perform well empirically, they sacrifice optimality. This paper proposes an approach inspired from multi-fidelity optimization to learn a correction term with a neural network representation. Learning this correction can significantly improve performance. We demonstrate this approach on a pedestrian avoidance problem for autonomous driving. By leveraging strategies to avoid a single pedestrian, the decomposition method can scale to avoid multiple pedestrians. We verify empirically that the proposed correction method leads to a significant improvement over the decomposition method alone and outperforms a policy trained on the full scale problem without utility decomposition.


Proceedings ArticleDOI
08 Jan 2018
TL;DR: This work presents a system based on a multirotor UAV that significantly outperforms previous methods, localizing the RF source in the same time it takes previous methods to make a single measurement.
Abstract: Localizing radio frequency (RF) sources with an unmanned aerial vehicle (UAV) has many important applications. As a result, UAV-based localization has been the focus of much research. However, previous approaches rely heavily on custom electronics and specialized knowledge, are not robust and require extensive calibration, or are inefficient with measurements and waste energy on a battery-constrained platform. In this work, we present a system based on a multirotor UAV that addresses these shortcomings. Our system measures signal strength received by two antennas to update a probability distribution over possible transmitter locations. An information-theoretic controller is used to direct the UAV's search. Signal strength is measured with low-cost, commercial-off-the-shelf components. We demonstrate our system using three transmitters: a continuous signal in the UHF band, a wildlife collar pulsing in the VHF band, and a cell phone making a voice call over LTE. Our system significantly outperforms previous methods, localizing the RF source in the same time it takes previous methods to make a single measurement.

Posted Content
TL;DR: Adaptive stress testing (AST) as discussed by the authors is a framework for finding the most likely path to a failure event in simulation, which is suitable for black-box testing of large systems.
Abstract: Finding the most likely path to a set of failure states is important to the analysis of safety-critical systems that operate over a sequence of time steps, such as aircraft collision avoidance systems and autonomous cars. In many applications such as autonomous driving, failures cannot be completely eliminated due to the complex stochastic environment in which the system operates. As a result, safety validation is not only concerned about whether a failure can occur, but also discovering which failures are most likely to occur. This article presents adaptive stress testing (AST), a framework for finding the most likely path to a failure event in simulation. We consider a general black box setting for partially observable and continuous-valued systems operating in an environment with stochastic disturbances. We formulate the problem as a Markov decision process and use reinforcement learning to optimize it. The approach is simulation-based and does not require internal knowledge of the system, making it suitable for black-box testing of large systems. We present formulations for fully observable and partially observable systems. In the latter case, we present a modified Monte Carlo tree search algorithm that only requires access to the pseudorandom number generator of the simulator to overcome partial observability. We also present an extension of the framework, called differential adaptive stress testing (DAST), that can find failures that occur in one system but not in another. This type of differential analysis is useful in applications such as regression testing, where we are concerned with finding areas of relative weakness compared to a baseline. We demonstrate the effectiveness of the approach on an aircraft collision avoidance application, where a prototype aircraft collision avoidance system is stress tested to find the most likely scenarios of near mid-air collision.

Proceedings Article
09 Jul 2018
TL;DR: In this article, the authors extend InfoGAIL, an algorithm for multi-modal imitation learning, to reproduce behavior over an extended period of time, by including burn-in demonstrations upon which policies are conditioned at test time.
Abstract: Recent work on imitation learning has generated policies that reproduce expert behavior from multi-modal data. However, past approaches have focused only on recreating a small number of distinct, expert maneuvers, or have relied on supervised learning techniques that produce unstable policies. This work extends InfoGAIL, an algorithm for multi-modal imitation learning, to reproduce behavior over an extended period of time. Our approach involves reformulating the typical imitation learning setting to include "burn-in demonstrations" upon which policies are conditioned at test time. We demonstrate that our approach outperforms standard InfoGAIL in maximizing the mutual information between predicted and unseen style labels in road scene simulations, and we show that our method leads to policies that imitate expert autonomous driving systems over long time horizons.

Journal ArticleDOI
01 Jun 2018
TL;DR: By solving for policies over a wide range of problem formulations, this paper is able to provide high-level guidance for manufacturers and regulators on issues relating to the testing of safety-critical systems.
Abstract: Manufacturers of safety-critical systems must make the case that their product is sufficiently safe for public deployment. Much of this case often relies upon critical event outcomes from real-world testing, requiring manufacturers to be strategic about how they allocate testing resources in order to maximize their chances of demonstrating system safety. This paper frames the partially observable and the belief-dependent problem of test scheduling as a Markov decision process, which can be solved efficiently to yield closed-loop manufacturer testing policies. By solving for policies over a wide range of problem formulations, we are able to provide high-level guidance for manufacturers and regulators on issues relating to the testing of safety-critical systems. This guidance spans an array of topics, including circumstances under which manufacturers should continue testing despite observed incidents, when manufacturers should test aggressively, and when regulators should increase or reduce the real-world testing requirements for safety-critical systems.

Proceedings ArticleDOI
01 May 2018
TL;DR: The omnidirectional antenna serves to normalize measurements made by the directional antenna, yielding “pseudo-bearing” measurements, which are less informative than bearing measurements but do not require a full rotation, leading to more measurements and faster localization.
Abstract: Localizing radio frequency (RF) sources is an important application for unmanned aerial vehicles (UAVs), Localization is often carried out by estimating bearing to an RF source, which can be achieved by rotating a directional antenna in place. Multirotor UAVs are well-suited for this sensing modality because they can efficiently rotate in place. However, a full rotation from a single location is needed to account for scale factors affecting the directional antenna's measurements. Although easy to perform, these rotations tend to be slow and delay localization. In this paper, we equip a multirotor UAV with a directional antenna and an omnidirectional antenna. The omnidirectional antenna serves to normalize measurements made by the directional antenna, yielding “pseudo-bearing” measurements. These bearing-like measurements are less informative than bearing measurements but do not require a full rotation, leading to more measurements and faster localization. We validate the normalization with antenna theory and ground tests. Claims of improved localization are validated with simulations and flight tests on a multirotor UAV. Our setup significantly reduces localization time compared to a multirotor UAV equipped with only a directional antenna.

Proceedings ArticleDOI
01 Oct 2018
TL;DR: An encouraging result is presented that shows that reducing the discount factor while planning in the state space can actually improve performance significantly when evaluated on the original problem.
Abstract: A common solution criterion for partially observable Markov decision processes (POMDPs) is to maximize the expected sum of exponentially discounted rewards, for which a variety of approximate methods have been proposed. Those that plan in the belief space typically provide tighter performance guarantees, but those that plan over the state space (e.g., QMDP and FIB) often require much less memory and computation. This paper presents an encouraging result that shows that reducing the discount factor while planning in the state space can actually improve performance significantly when evaluated on the original problem. This phenomenon is confirmed by both a theoretical analysis as well as a series of empirical studies on benchmark problems. As predicted by the theory and confirmed empirically, the phenomenon is most prominent when the observation model is noisy or rewards are sparse.