Top 53 papers published by Mykel J. Kochenderfer from Stanford University in 2018

Proceedings Article•DOI•

Adaptive Stress Testing for Autonomous Vehicles

[...]

Mark Koren¹, Saud Al-Saif¹, Ritchie Lee², Mykel J. Kochenderfer¹•Institutions (2)

Stanford University¹, Carnegie Mellon University²

26 Jun 2018

TL;DR: It is shown that DRL can find more likely failure scenarios than MCTS with fewer calls to the simulator and can be easily applied to other scenarios given the appropriate models of the vehicle and the environment.

...read moreread less

Abstract: This paper presents a method for testing the decision making systems of autonomous vehicles. Our approach involves perturbing stochastic elements in the vehicle's environment until the vehicle is involved in a collision. Instead of applying direct Monte Carlo sampling to find collision scenarios, we formulate the problem as a Markov decision process and use reinforcement learning algorithms to find the most likely failure scenarios. This paper presents Monte Carlo Tree Search (MCTS) and Deep Reinforcement Learning (DRL) solutions that can scale to large environments. We show that DRL can find more likely failure scenarios than MCTS with fewer calls to the simulator. A simulation scenario involving a vehicle approaching a crosswalk is used to validate the framework. Our proposed approach is very general and can be easily applied to other scenarios given the appropriate models of the vehicle and the environment.

...read moreread less

140 citations

Proceedings Article•

Online Algorithms for POMDPs with Continuous State, Action, and Observation Spaces.

[...]

Zachary N. Sunberg¹, Mykel J. Kochenderfer¹•Institutions (1)

Stanford University¹

01 Jan 2018

TL;DR: Two new algorithms, POMCPOW and PFT-DPW, are proposed and evaluated that overcome this deficiency by using weighted particle filtering and Simulation results show that these modifications allow the algorithms to be successful where previous approaches fail.

...read moreread less

Abstract: Online solvers for partially observable Markov decision processes have been applied to problems with large discrete state spaces, but continuous state, action, and observation spaces remain a challenge. This paper begins by investigating double progressive widening (DPW) as a solution to this challenge. However, we prove that this modification alone is not sufficient because the belief representations in the search tree collapse to a single particle causing the algorithm to converge to a policy that is suboptimal regardless of the computation time. This paper proposes and evaluates two new algorithms, POMCPOW and PFT-DPW, that overcome this deficiency by using weighted particle filtering. Simulation results show that these modifications allow the algorithms to be successful where previous approaches fail.

...read moreread less

120 citations

Proceedings Article•

Deep Dynamical Modeling and Control of Unsteady Fluid Flows

[...]

Jeremy Morton¹, Antony Jameson¹, Mykel J. Kochenderfer¹, Freddie D. Witherden¹•Institutions (1)

Stanford University¹

18 May 2018

TL;DR: The proposed approach, grounded in Koopman theory, is shown to produce stable dynamical models that can predict the time evolution of the cylinder system over extended time horizons and is able to find a straightforward, interpretable control law for suppressing vortex shedding in the wake of the cylinders.

...read moreread less

Abstract: The design of flow control systems remains a challenge due to the nonlinear nature of the equations that govern fluid flow. However, recent advances in computational fluid dynamics (CFD) have enabled the simulation of complex fluid flows with high accuracy, opening the possibility of using learning-based approaches to facilitate controller design. We present a method for learning the forced and unforced dynamics of airflow over a cylinder directly from CFD data. The proposed approach, grounded in Koopman theory, is shown to produce stable dynamical models that can predict the time evolution of the cylinder system over extended time horizons. Finally, by performing model predictive control with the learned dynamical models, we are able to find a straightforward, interpretable control law for suppressing vortex shedding in the wake of the cylinder.

...read moreread less

104 citations

Posted Content•

HG-DAgger: Interactive Imitation Learning with Human Experts

[...]

Michael A. Kelly¹, Chelsea Sidrane¹, Katherine Driggs-Campbell¹, Mykel J. Kochenderfer²•Institutions (2)

Stanford University¹, University of Illinois at Urbana–Champaign²

05 Oct 2018-arXiv: Robotics

TL;DR: HG-DAgger is proposed, a variant of DAgger that is more suitable for interactive imitation learning from human experts in real-world systems and learns a safety threshold for a model-uncertainty-based risk metric that can be used to predict the performance of the fully trained novice in different regions of the state space.

...read moreread less

Abstract: Imitation learning has proven to be useful for many real-world problems, but approaches such as behavioral cloning suffer from data mismatch and compounding error issues. One attempt to address these limitations is the DAgger algorithm, which uses the state distribution induced by the novice to sample corrective actions from the expert. Such sampling schemes, however, require the expert to provide action labels without being fully in control of the system. This can decrease safety and, when using humans as experts, is likely to degrade the quality of the collected labels due to perceived actuator lag. In this work, we propose HG-DAgger, a variant of DAgger that is more suitable for interactive imitation learning from human experts in real-world systems. In addition to training a novice policy, HG-DAgger also learns a safety threshold for a model-uncertainty-based risk metric that can be used to predict the performance of the fully trained novice in different regions of the state space. We evaluate our method on both a simulated and real-world autonomous driving task, and demonstrate improved performance over both DAgger and behavioral cloning.

...read moreread less

87 citations

Journal Article•DOI•

Deep Neural Network Compression for Aircraft Collision Avoidance Systems

[...]

Kyle D. Julian¹, Mykel J. Kochenderfer¹, Michael P. Owen²•Institutions (2)

Stanford University¹, Massachusetts Institute of Technology²

09 Oct 2018-arXiv: Learning

TL;DR: In this paper, a deep neural network is used to approximate a numeric table for collision avoidance in an aircraft collision avoidance system. But the use of deep neural networks does not address the high dimensionality of the state space, which leads to very large tables.

...read moreread less

Abstract: One approach to designing decision making logic for an aircraft collision avoidance system frames the problem as a Markov decision process and optimizes the system using dynamic programming. The resulting collision avoidance strategy can be represented as a numeric table. This methodology has been used in the development of the Airborne Collision Avoidance System X (ACAS X) family of collision avoidance systems for manned and unmanned aircraft, but the high dimensionality of the state space leads to very large tables. To improve storage efficiency, a deep neural network is used to approximate the table. With the use of an asymmetric loss function and a gradient descent algorithm, the parameters for this network can be trained to provide accurate estimates of table values while preserving the relative preferences of the possible advisories for each state. By training multiple networks to represent subtables, the network also decreases the required runtime for computing the collision avoidance advisory. Simulation studies show that the network improves the safety and efficiency of the collision avoidance system. Because only the network parameters need to be stored, the required storage space is reduced by a factor of 1000, enabling the collision avoidance system to operate using current avionics systems.

...read moreread less

85 citations

Proceedings Article•DOI•

Multi-Agent Imitation Learning for Driving Simulation

[...]

Raunak P. Bhattacharyya¹, Derek J. Phillips¹, Blake Wulfe¹, Jeremy Morton¹, Alex Kuefler, Mykel J. Kochenderfer¹ - Show less +2 more•Institutions (1)

Stanford University¹

01 Oct 2018

TL;DR: This paper extended Generative Adversarial Imitation Learning (GAIL) to address these shortcomings through a parameter-sharing approach grounded in curriculum learning and showed that policies generated by their PS-GAIL method proved superior at interacting stably in a multi-agent setting and capturing the emergent behavior of human drivers.

...read moreread less

Abstract: Simulation is an appealing option for validating the safety of autonomous vehicles. Generative Adversarial Imitation Learning (GAIL) has recently been shown to learn representative human driver models. These human driver models were learned through training in single-agent environments, but they have difficulty in generalizing to multi-agent driving scenarios. We argue these difficulties arise because observations at training and test time are sampled from different distributions. This difference makes such models unsuitable for the simulation of driving scenes, where multiple agents must interact realistically over long time horizons. We extend GAIL to address these shortcomings through a parameter-sharing approach grounded in curriculum learning. Compared with single-agent GAIL policies, policies generated by our PS-GAIL method prove superior at interacting stably in a multi-agent setting and capturing the emergent behavior of human drivers.

...read moreread less

78 citations

Proceedings Article•

Amortized Inference Regularization

[...]

Rui Shu¹, Hung Bui², Shengjia Zhao¹, Mykel J. Kochenderfer¹, Stefano Ermon¹ - Show less +1 more•Institutions (2)

Stanford University¹, Google²

23 May 2018

TL;DR: This paper challenges the belief that amortized inference is simply a mechanism for approximating maximum likelihood training and illustrates that regularization of the amortization family provides a new direction for understanding and improving generalization in VAEs.

...read moreread less

Abstract: The variational autoencoder (VAE) is a popular model for density estimation and representation learning. Canonically, the variational principle suggests to prefer an expressive inference model so that the variational approximation is accurate. However, it is often overlooked that an overly-expressive inference model can be detrimental to the test set performance of both the amortized posterior approximator and, more importantly, the generative density estimator. In this paper, we leverage the fact that VAEs rely on amortized inference and propose techniques for amortized inference regularization (AIR) that control the smoothness of the inference model. We demonstrate that, by applying AIR, it is possible to improve VAE generalization on both inference and generative performance. Our paper challenges the belief that amortized inference is simply a mechanism for approximating maximum likelihood training and illustrates that regularization of the amortization family provides a new direction for understanding and improving generalization in VAEs.

...read moreread less

67 citations

Proceedings Article•DOI•

Scalable Decision Making with Sensor Occlusions for Autonomous Driving

[...]

Maxime Bouton¹, Alireza Nakhaei², Kikuo Fujimura², Mykel J. Kochenderfer¹•Institutions (2)

Stanford University¹, Honda²

21 May 2018

TL;DR: A decomposition method is demonstrated that leverages the optimal avoidance strategy for a single user in a partially observable Markov decision process to bypass the computational cost of scaling the formulation to avoiding multiple road users.

...read moreread less

Abstract: Autonomous driving in urban areas requires avoiding other road users with only partial observability of the environment. Observations are only partial because obstacles can occlude the field of view of the sensors. The problem of robust and efficient navigation under uncertainty can be framed as a partially observable Markov decision process (POMDP). In order to bypass the computational cost of scaling the formulation to avoiding multiple road users, this paper demonstrates a decomposition method that leverages the optimal avoidance strategy for a single user. We evaluate the performance of two POMDP solution techniques augmented with the decomposition method for scenarios involving a pedestrian crosswalk and an intersection.

...read moreread less

65 citations

Posted Content•

Distributed Wildfire Surveillance with Autonomous Aircraft using Deep Reinforcement Learning

[...]

Kyle D. Julian¹, Mykel J. Kochenderfer¹•Institutions (1)

Stanford University¹

09 Oct 2018-arXiv: Robotics

TL;DR: In this paper, two deep reinforcement learning approaches for training decentralized controllers that accommodate the high dimensionality and uncertainty inherent in the problem of forest fire coverage are presented, where aircraft collaborate on a map of the wildfire's state and maintain a time history of locations visited.

...read moreread less

Abstract: Teams of autonomous unmanned aircraft can be used to monitor wildfires, enabling firefighters to make informed decisions. However, controlling multiple autonomous fixed-wing aircraft to maximize forest fire coverage is a complex problem. The state space is high dimensional, the fire propagates stochastically, the sensor information is imperfect, and the aircraft must coordinate with each other to accomplish their mission. This work presents two deep reinforcement learning approaches for training decentralized controllers that accommodate the high dimensionality and uncertainty inherent in the problem. The first approach controls the aircraft using immediate observations of the individual aircraft. The second approach allows aircraft to collaborate on a map of the wildfire's state and maintain a time history of locations visited, which are used as inputs to the controller. Simulation results show that both approaches allow the aircraft to accurately track wildfire expansions and outperform an online receding horizon controller. Additional simulations demonstrate that the approach scales with different numbers of aircraft and generalizes to different wildfire shapes.

...read moreread less

54 citations

Proceedings Article•DOI•

Differential Adaptive Stress Testing of Airborne Collision Avoidance Systems

[...]

Ritchie Lee¹, Ole J. Mengshoel¹, Anshu Saksena², Ryan W. Gardner², Daniel Genin², Jeffrey S. Brush², Mykel J. Kochenderfer³ - Show less +3 more•Institutions (3)

Carnegie Mellon University¹, Johns Hopkins University Applied Physics Laboratory², Stanford University³

08 Jan 2018

TL;DR: This paper presents a scalable method to efficiently search for the most likely state trajectory leading to an event given only a simulator of a system using Monte Carlo Tree Search (MCTS), and presents results for both single and multi-threat encounters.

...read moreread less

Abstract: The next-generation Airborne Collision Avoidance System (ACAS X) is currently being developed and tested to replace the Traffic Alert and Collision Avoidance System (TCAS) as the next international standard for collision avoidance. To validate the safety of the system, stress testing in simulation is one of several approaches for analyzing near mid-air collisions (NMACs). Understanding how NMACs can occur is important for characterizing risk and informingdevelopment of the system. Recently, adaptive stress testing (AST) has been proposed as a way to find the most likely path to a failure event. The simulation-based approach accelerates search by formulating stress testing as a sequential decision process then optimizing it using reinforcement learning. The approach has been successfully applied to stress test a prototype of ACAS Xin various simulated aircraft encounters. In some applications, we are not as interestedin the system's absolute performance as its performance relative to another system. Such situations arise, for example, during regression testing or when deciding whether a new system should replace an existing system. In our collision avoidance application, we are interested in finding cases where ACAS X fails but TCAS succeeds in resolving a conflict. Existing approaches do not provide an efficient means to perform this type of analysis. This paper extends the AST approach to differential analysis by searching two simulators simultaneously and maximizing the difference between their outcomes. We call this approach differential adaptive stress testing (DAST). We apply DAST to compare a prototype of ACAS X against TCAS and show examples of encounters found by the algorithm.

...read moreread less

47 citations

Posted Content•

Toward Scalable Verification for Safety-Critical Deep Networks

[...]

Lindsey Kuper, Guy Katz, Justin Gottschlich, Kyle D. Julian, Clark Barrett, Mykel J. Kochenderfer - Show less +2 more

18 Jan 2018-arXiv: Artificial Intelligence

TL;DR: The increasing use of deep neural networks for safety-critical applications, such as autonomous driving and flight control, raises concerns about their safety and reliability, so work on mitigating this difficulty is given, by developing scalable verification techniques and identifying design choices that result in deep learning systems that are more amenable to verification.

...read moreread less

Abstract: The increasing use of deep neural networks for safety-critical applications, such as autonomous driving and flight control, raises concerns about their safety and reliability. Formal verification can address these concerns by guaranteeing that a deep learning system operates as intended, but the state-of-the-art is limited to small systems. In this work-in-progress report we give an overview of our work on mitigating this difficulty, by pursuing two complementary directions: devising scalable verification techniques, and identifying design choices that result in deep learning systems that are more amenable to verification.

...read moreread less

Proceedings Article•DOI•

Value Sensitive Design for Autonomous Vehicle Motion Planning

[...]

Sarah M. Thornton¹, Francis E. Lewis¹, Vivian Zhang¹, Mykel J. Kochenderfer¹, J. Christian Gerdes¹ - Show less +1 more•Institutions (1)

Stanford University¹

26 Jun 2018

TL;DR: A modified value sensitive design methodology is applied to the development of an autonomous vehicle speed control algorithm to safely navigate an occluded pedestrian crosswalk to compute an optimal policy to control the longitudinal acceleration of the vehicle based on the belief of a pedestrian crossing.

...read moreread less

Abstract: Human drivers navigate the roadways by balancing values such as safety, legality, and mobility. The public will likely judge an autonomous vehicle by similar values. The iterative methodology of value sensitive design formalizes the connection of human values to engineering specifications. We apply a modified value sensitive design methodology to the development of an autonomous vehicle speed control algorithm to safely navigate an occluded pedestrian crosswalk. The first iteration presented here models the problem as a partially observable Markov decision process and uses dynamic programming to compute an optimal policy to control the longitudinal acceleration of the vehicle based on the belief of a pedestrian crossing. The speed control algorithm is then tested in real-time on an experimental vehicle on a closed road course.

...read moreread less

Posted Content•

Multi-Agent Imitation Learning for Driving Simulation

[...]

Raunak P. Bhattacharyya¹, Derek J. Phillips¹, Blake Wulfe¹, Jeremy Morton¹, Alex Kuefler, Mykel J. Kochenderfer¹ - Show less +2 more•Institutions (1)

Stanford University¹

02 Mar 2018-arXiv: Artificial Intelligence

TL;DR: Compared with single-agent GAIL policies, policies generated by the PS-GAIL method prove superior at interacting stably in a multi-agent setting and capturing the emergent behavior of human drivers.

...read moreread less

Abstract: Simulation is an appealing option for validating the safety of autonomous vehicles. Generative Adversarial Imitation Learning (GAIL) has recently been shown to learn representative human driver models. These human driver models were learned through training in single-agent environments, but they have difficulty in generalizing to multi-agent driving scenarios. We argue these difficulties arise because observations at training and test time are sampled from different distributions. This difference makes such models unsuitable for the simulation of driving scenes, where multiple agents must interact realistically over long time horizons. We extend GAIL to address these shortcomings through a parameter-sharing approach grounded in curriculum learning. Compared with single-agent GAIL policies, policies generated by our PS-GAIL method prove superior at interacting stably in a multi-agent setting and capturing the emergent behavior of human drivers.

...read moreread less

Proceedings Article•DOI•

Improved Robustness and Safety for Autonomous Vehicle Control with Adversarial Reinforcement Learning

[...]

Xiaobai Ma¹, Katherine Driggs-Campbell¹, Mykel J. Kochenderfer¹•Institutions (1)

Stanford University¹

18 Oct 2018

TL;DR: In this article, the authors extended the game formulation to a semi-competitive setting and demonstrate that the resulting adversary better captures meaningful disturbances that lead to better overall performance, while effectively reducing collision rates compared to baseline control policies produced by traditional reinforcement learning methods.

...read moreread less

Abstract: To improve efficiency and reduce failures in autonomous vehicles, research has focused on developing robust and safe learning methods that take into account disturbances in the environment. Existing literature in robust reinforcement learning poses the learning problem as a two player game between the autonomous system and disturbances. This paper examines two different algorithms to solve the game, Robust Adversarial Reinforcement Learning and Neural Fictitious Self Play, and compares performance on an autonomous driving scenario. We extend the game formulation to a semi-competitive setting and demonstrate that the resulting adversary better captures meaningful disturbances that lead to better overall performance. The resulting robust policy exhibits improved driving efficiency while effectively reducing collision rates compared to baseline control policies produced by traditional reinforcement learning methods.

...read moreread less

Posted Content•

Amortized Inference Regularization.

[...]

Rui Shu¹, Hung Bui², Shengjia Zhao¹, Mykel J. Kochenderfer¹, Stefano Ermon¹ - Show less +1 more•Institutions (2)

Stanford University¹, Google²

23 May 2018-arXiv: Machine Learning

TL;DR: In this paper, the authors leverage the fact that VAEs rely on amortized inference and propose techniques for amortised inference regularization (AIR) that control the smoothness of the inference model.

...read moreread less

Abstract: The variational autoencoder (VAE) is a popular model for density estimation and representation learning. Canonically, the variational principle suggests to prefer an expressive inference model so that the variational approximation is accurate. However, it is often overlooked that an overly-expressive inference model can be detrimental to the test set performance of both the amortized posterior approximator and, more importantly, the generative density estimator. In this paper, we leverage the fact that VAEs rely on amortized inference and propose techniques for amortized inference regularization (AIR) that control the smoothness of the inference model. We demonstrate that, by applying AIR, it is possible to improve VAE generalization on both inference and generative performance. Our paper challenges the belief that amortized inference is simply a mechanism for approximating maximum likelihood training and illustrates that regularization of the amortization family provides a new direction for understanding and improving generalization in VAEs.

...read moreread less

Posted Content•

Deep Dynamical Modeling and Control of Unsteady Fluid Flows

[...]

Jeremy Morton, Freddie D. Witherden, Antony Jameson, Mykel J. Kochenderfer

18 May 2018-arXiv: Computational Engineering, Finance, and Science

TL;DR: In this paper, a method for learning the forced and unforced dynamics of airflow over a cylinder directly from CFD data is presented, grounded in Koopman theory, which produces stable dynamical models that can predict the time evolution of the cylinder system over extended time horizons.

...read moreread less

Abstract: The design of flow control systems remains a challenge due to the nonlinear nature of the equations that govern fluid flow. However, recent advances in computational fluid dynamics (CFD) have enabled the simulation of complex fluid flows with high accuracy, opening the possibility of using learning-based approaches to facilitate controller design. We present a method for learning the forced and unforced dynamics of airflow over a cylinder directly from CFD data. The proposed approach, grounded in Koopman theory, is shown to produce stable dynamical models that can predict the time evolution of the cylinder system over extended time horizons. Finally, by performing model predictive control with the learned dynamical models, we are able to find a straightforward, interpretable control law for suppressing vortex shedding in the wake of the cylinder.

...read moreread less

Proceedings Article•

Multi-Agent Reinforcement Learning for Multi-Object Tracking

[...]

Pol Rosello¹, Mykel J. Kochenderfer¹•Institutions (1)

Stanford University¹

09 Jul 2018

TL;DR: A novel, multi-agent reinforcement learning formula- tion of multi-object tracking that treats creating, propagating, and terminating object tracks as actions in a sequential decision-making problem, parameterized by a multi-layer neural network.

...read moreread less

Abstract: We present a novel, multi-agent reinforcement learning formula- tion of multi-object tracking that treats creating, propagating, and terminating object tracks as actions in a sequential decision-making problem. In our formulation, each agent tracks a single object at a time by updating a Bayesian lter according to a discrete set of actions. At each timestep, the reward received is dependent on the joint actions taken by all agents and the ground truth object tracks. We optimize for di erent tracking metrics directly while propagat- ing covariance information about each object's state. We use trust region policy optimization (TRPO) to train a shared policy across all agents, parameterized by a multi-layer neural network. Our ex- periments show an improvement in tracking accuracy over similar state-of-the-art, rule-based approaches on a popular multi-object tracking dataset.

...read moreread less

Proceedings Article•

Partially-controlled markov decision processes for collision avoidance systems

[...]

Mykel J. Kochenderfer¹, James P. Chryssanthacopoulos²•Institutions (2)

Stanford University¹, University of Chicago²

21 Aug 2018

TL;DR: This paper presents an approach that can greatly reduce the complexity of computing the optimal strategy in problems where only some of the dimensions of the problem are controllable.

...read moreread less

Abstract: Deciding when and how to avoid collision in stochastic environments requires accounting for the likelihood and relative costs of future sequences of outcomes in response to different sequences of actions. Prior work has investigated formulating the problem as a Markov decision process, discretizing the state space, and solving for the optimal strategy using dynamic programming. Experiments have shown that such an approach can be very effective, but scaling to higher-dimensional problems can be challenging due to the exponential growth of the discrete state space. This paper presents an approach that can greatly reduce the complexity of computing the optimal strategy in problems where only some of the dimensions of the problem are controllable. The approach is demonstrated on an airborne collision avoidance problem where the system must recommend maneuvers to an imperfect pilot.

...read moreread less

Proceedings Article•DOI•

People as Sensors: Imputing Maps from Human Actions

[...]

Oladapo Afolabi¹, Katherine Driggs-Campbell², Roy Dong¹, Mykel J. Kochenderfer², S. Shankar Sastry¹ - Show less +1 more•Institutions (2)

University of California, Berkeley¹, Stanford University²

01 Oct 2018

TL;DR: In this article, the interaction between human drivers and pedestrians and how it might influence map estimation is modeled as a proxy for detection, which improves overall environment awareness and outperforms standard mapping techniques.

...read moreread less

Abstract: Despite growing attention in autonomy, there are still many open problems, including how autonomous vehicles will interact and communicate with other agents, such as human drivers and pedestrians. Unlike most approaches that focus on pedestrian detection and planning for collision avoidance, this paper considers modeling the interaction between human drivers and pedestrians and how it might influence map estimation, as a proxy for detection. We take a mapping inspired approach and incorporate people as sensors into mapping frameworks. By taking advantage of other agents' actions, we demonstrate how we can impute portions of the map that would otherwise be occluded. We evaluate our framework in human driving experiments and on real-world data, using occupancy grids and landmark-based mapping approaches. Our approach significantly improves overall environment awareness and outperforms standard mapping techniques.

...read moreread less

Book Chapter•DOI•

Robust Super-Level Set Estimation Using Gaussian Processes

[...]

Andrea Zanette¹, Junzi Zhang¹, Mykel J. Kochenderfer¹•Institutions (1)

Stanford University¹

10 Sep 2018

TL;DR: This paper focuses on the problem of determining as large a region as possible where a function exceeds a given threshold with high probability, and proposes maximizing the expected volume of the domain identified as above the threshold as predicted by a Gaussian process, robustified by a variance term.

...read moreread less

Abstract: This paper focuses on the problem of determining as large a region as possible where a function exceeds a given threshold with high probability. We assume that we only have access to a noise-corrupted version of the function and that function evaluations are costly. To select the next query point, we propose maximizing the expected volume of the domain identified as above the threshold as predicted by a Gaussian process, robustified by a variance term. We also give asymptotic guarantees on the exploration effect of the algorithm, regardless of the prior misspecification. We show by various numerical examples that our approach also outperforms existing techniques in the literature in practice.

...read moreread less

Posted Content•

Robust Super-Level Set Estimation using Gaussian Processes

[...]

Andrea Zanette¹, Junzi Zhang¹, Mykel J. Kochenderfer¹•Institutions (1)

Stanford University¹

25 Nov 2018-arXiv: Machine Learning

TL;DR: In this paper, the authors focus on the problem of determining as large a region as possible where a function exceeds a given threshold with high probability, assuming that we only have access to a noise-corrupted version of the function and that function evaluations are costly.

...read moreread less

Abstract: This paper focuses on the problem of determining as large a region as possible where a function exceeds a given threshold with high probability. We assume that we only have access to a noise-corrupted version of the function and that function evaluations are costly. To select the next query point, we propose maximizing the expected volume of the domain identified as above the threshold as predicted by a Gaussian process, robustified by a variance term. We also give asymptotic guarantees on the exploration effect of the algorithm, regardless of the prior misspecification. We show by various numerical examples that our approach also outperforms existing techniques in the literature in practice.

...read moreread less

Proceedings Article•

Interpretable Categorization of Heterogeneous Time Series Data

[...]

Ritchie Lee¹, Mykel J. Kochenderfer², Ole J. Mengshoel¹, Joshua Silbermann³•Institutions (3)

Carnegie Mellon University¹, Stanford University², Johns Hopkins University³

01 Jan 2018

TL;DR: In this paper, the authors proposed a grammar-based decision tree for classification of high-dimensional and heterogeneous time series data for near-mid-air collisions (NMACs) in simulated aircraft encounters.

...read moreread less

Abstract: We analyze data from simulated aircraft encounters to validate and inform the development of a prototype aircraft collision avoidance system. The high-dimensional and heterogeneous time series dataset is analyzed to discover properties of near mid-air collisions (NMACs) and categorize the NMAC encounters. Domain experts use these properties to better organize and understand NMAC occurrences. Existing solutions either are not capable of handling high-dimensional and heterogeneous time series datasets or do not provide explanations that are interpretable by a domain expert. The latter is critical to the acceptance and deployment of safety-critical systems. To address this gap, we propose grammar-based decision trees along with a learning algorithm. Our approach extends decision trees with a grammar framework for classifying heterogeneous time series data. A context-free grammar is used to derive decision expressions that are interpretable, application-specific, and support heterogeneous data types. In addition to classification, we show how grammar-based decision trees can also be used for categorization, which is a combination of clustering and generating interpretable explanations for each cluster. We apply grammar-based decision trees to a simulated aircraft encounter dataset and evaluate the performance of four variants of our learning algorithm. The best algorithm is used to analyze and categorize near mid-air collisions in the aircraft encounter dataset. We describe each discovered category in detail and discuss its relevance to aircraft collision avoidance.

...read moreread less

Proceedings Article•

Utility Decomposition with Deep Corrections for Scalable Planning under Uncertainty

[...]

Maxime Bouton¹, Kyle D. Julian¹, Alireza Nakhaei², Kikuo Fujimura², Mykel J. Kochenderfer¹ - Show less +1 more•Institutions (2)

Stanford University¹, Honda²

09 Jul 2018

TL;DR: An approach inspired from multi-fidelity optimization to learn a correction term with a neural network representation that leads to a significant improvement over the decomposition method alone and outperforms a policy trained on the full scale problem without utility decomposition.

...read moreread less

Abstract: Decomposition methods have been proposed in the past to approximate solutions to large sequential decision making problems. In contexts where an agent interacts with multiple entities, utility decomposition can be used where each individual entity is considered independently. The individual utility functions are then combined in real time to solve the global problem. Although these techniques can perform well empirically, they sacrifice optimality. This paper proposes an approach inspired from multi-fidelity optimization to learn a correction term with a neural network representation. Learning this correction can significantly improve performance. We demonstrate this approach on a pedestrian avoidance problem for autonomous driving. By leveraging strategies to avoid a single pedestrian, the decomposition method can scale to avoid multiple pedestrians. We verify empirically that the proposed correction method leads to a significant improvement over the decomposition method alone and outperforms a policy trained on the full scale problem without utility decomposition.

...read moreread less

Proceedings Article•DOI•

Autonomous Distributed Wildfire Surveillance using Deep Reinforcement Learning

[...]

Kyle D. Julian¹, Mykel J. Kochenderfer¹•Institutions (1)

Stanford University¹

08 Jan 2018

Proceedings Article•DOI•

Efficient and Low-cost Localization of Radio Signals with a Multirotor UAV

[...]

Louis Dressel¹, Mykel J. Kochenderfer¹•Institutions (1)

Stanford University¹

08 Jan 2018

TL;DR: This work presents a system based on a multirotor UAV that significantly outperforms previous methods, localizing the RF source in the same time it takes previous methods to make a single measurement.

...read moreread less

Abstract: Localizing radio frequency (RF) sources with an unmanned aerial vehicle (UAV) has many important applications. As a result, UAV-based localization has been the focus of much research. However, previous approaches rely heavily on custom electronics and specialized knowledge, are not robust and require extensive calibration, or are inefficient with measurements and waste energy on a battery-constrained platform. In this work, we present a system based on a multirotor UAV that addresses these shortcomings. Our system measures signal strength received by two antennas to update a probability distribution over possible transmitter locations. An information-theoretic controller is used to direct the UAV's search. Signal strength is measured with low-cost, commercial-off-the-shelf components. We demonstrate our system using three transmitters: a continuous signal in the UHF band, a wildlife collar pulsing in the VHF band, and a cell phone making a voice call over LTE. Our system significantly outperforms previous methods, localizing the RF source in the same time it takes previous methods to make a single measurement.

...read moreread less

Posted Content•

Adaptive Stress Testing: Finding Failure Events with Reinforcement Learning

[...]

Ritchie Lee, Ole J. Mengshoel¹, Anshu Saksena², Ryan W. Gardner², Daniel Genin², Joshua Silbermann², Michael P. Owen³, Mykel J. Kochenderfer⁴ - Show less +4 more•Institutions (4)

Norwegian University of Science and Technology¹, Johns Hopkins University Applied Physics Laboratory², Massachusetts Institute of Technology³, Stanford University⁴

06 Nov 2018-arXiv: Artificial Intelligence

TL;DR: Adaptive stress testing (AST) as discussed by the authors is a framework for finding the most likely path to a failure event in simulation, which is suitable for black-box testing of large systems.

...read moreread less

Abstract: Finding the most likely path to a set of failure states is important to the analysis of safety-critical systems that operate over a sequence of time steps, such as aircraft collision avoidance systems and autonomous cars. In many applications such as autonomous driving, failures cannot be completely eliminated due to the complex stochastic environment in which the system operates. As a result, safety validation is not only concerned about whether a failure can occur, but also discovering which failures are most likely to occur. This article presents adaptive stress testing (AST), a framework for finding the most likely path to a failure event in simulation. We consider a general black box setting for partially observable and continuous-valued systems operating in an environment with stochastic disturbances. We formulate the problem as a Markov decision process and use reinforcement learning to optimize it. The approach is simulation-based and does not require internal knowledge of the system, making it suitable for black-box testing of large systems. We present formulations for fully observable and partially observable systems. In the latter case, we present a modified Monte Carlo tree search algorithm that only requires access to the pseudorandom number generator of the simulator to overcome partial observability. We also present an extension of the framework, called differential adaptive stress testing (DAST), that can find failures that occur in one system but not in another. This type of differential analysis is useful in applications such as regression testing, where we are concerned with finding areas of relative weakness compared to a baseline. We demonstrate the effectiveness of the approach on an aircraft collision avoidance application, where a prototype aircraft collision avoidance system is stress tested to find the most likely scenarios of near mid-air collision.

...read moreread less

Proceedings Article•

Burn-In Demonstrations for Multi-Modal Imitation Learning

[...]

Alex Kuefler, Mykel J. Kochenderfer¹•Institutions (1)

Stanford University¹

09 Jul 2018

TL;DR: In this article, the authors extend InfoGAIL, an algorithm for multi-modal imitation learning, to reproduce behavior over an extended period of time, by including burn-in demonstrations upon which policies are conditioned at test time.

...read moreread less

Abstract: Recent work on imitation learning has generated policies that reproduce expert behavior from multi-modal data. However, past approaches have focused only on recreating a small number of distinct, expert maneuvers, or have relied on supervised learning techniques that produce unstable policies. This work extends InfoGAIL, an algorithm for multi-modal imitation learning, to reproduce behavior over an extended period of time. Our approach involves reformulating the typical imitation learning setting to include "burn-in demonstrations" upon which policies are conditioned at test time. We demonstrate that our approach outperforms standard InfoGAIL in maximizing the mutual information between predicted and unseen style labels in road scene simulations, and we show that our method leads to policies that imitate expert autonomous driving systems over long time horizons.

...read moreread less

Journal Article•DOI•

Closed-Loop Policies for Operational Tests of Safety-Critical Systems

[...]

Jeremy Morton¹, Tim A. Wheeler¹, Mykel J. Kochenderfer¹•Institutions (1)

Stanford University¹

01 Jun 2018

TL;DR: By solving for policies over a wide range of problem formulations, this paper is able to provide high-level guidance for manufacturers and regulators on issues relating to the testing of safety-critical systems.

...read moreread less

Abstract: Manufacturers of safety-critical systems must make the case that their product is sufficiently safe for public deployment. Much of this case often relies upon critical event outcomes from real-world testing, requiring manufacturers to be strategic about how they allocate testing resources in order to maximize their chances of demonstrating system safety. This paper frames the partially observable and the belief-dependent problem of test scheduling as a Markov decision process, which can be solved efficiently to yield closed-loop manufacturer testing policies. By solving for policies over a wide range of problem formulations, we are able to provide high-level guidance for manufacturers and regulators on issues relating to the testing of safety-critical systems. This guidance spans an array of topics, including circumstances under which manufacturers should continue testing despite observed incidents, when manufacturers should test aggressively, and when regulators should increase or reduce the real-world testing requirements for safety-critical systems.

...read moreread less

Proceedings Article•DOI•

Pseudo-bearing Measurements for Improved Localization of Radio Sources with Multirotor UAVs

[...]

Louis Dressel¹, Mykel J. Kochenderfer¹•Institutions (1)

Stanford University¹

01 May 2018

TL;DR: The omnidirectional antenna serves to normalize measurements made by the directional antenna, yielding “pseudo-bearing” measurements, which are less informative than bearing measurements but do not require a full rotation, leading to more measurements and faster localization.

...read moreread less

Abstract: Localizing radio frequency (RF) sources is an important application for unmanned aerial vehicles (UAVs), Localization is often carried out by estimating bearing to an RF source, which can be achieved by rotating a directional antenna in place. Multirotor UAVs are well-suited for this sensing modality because they can efficiently rotate in place. However, a full rotation from a single location is needed to account for scale factors affecting the directional antenna's measurements. Although easy to perform, these rotations tend to be slow and delay localization. In this paper, we equip a multirotor UAV with a directional antenna and an omnidirectional antenna. The omnidirectional antenna serves to normalize measurements made by the directional antenna, yielding “pseudo-bearing” measurements. These bearing-like measurements are less informative than bearing measurements but do not require a full rotation, leading to more measurements and faster localization. We validate the normalization with antenna theory and ground tests. Claims of improved localization are validated with simulations and flight tests on a multirotor UAV. Our setup significantly reduces localization time compared to a multirotor UAV equipped with only a directional antenna.

...read moreread less

Proceedings Article•DOI•

Improving Offline Value-Function Approximations for POMDPs by Reducing Discount Factors

[...]

Yi-Chun Chen¹, Mykel J. Kochenderfer², Matthijs T. J. Spaan³•Institutions (3)

University of California, Los Angeles¹, Stanford University², Delft University of Technology³

01 Oct 2018

TL;DR: An encouraging result is presented that shows that reducing the discount factor while planning in the state space can actually improve performance significantly when evaluated on the original problem.

...read moreread less

Abstract: A common solution criterion for partially observable Markov decision processes (POMDPs) is to maximize the expected sum of exponentially discounted rewards, for which a variety of approximate methods have been proposed. Those that plan in the belief space typically provide tighter performance guarantees, but those that plan over the state space (e.g., QMDP and FIB) often require much less memory and computation. This paper presents an encouraging result that shows that reducing the discount factor while planning in the state space can actually improve performance significantly when evaluated on the original problem. This phenomenon is confirmed by both a theoretical analysis as well as a series of empirical studies on benchmark problems. As predicted by the theory and confirmed empirically, the phenomenon is most prominent when the observation model is noisy or rewards are sparse.

...read moreread less

Showing papers by "Mykel J. Kochenderfer published in 2018"