scispace - formally typeset
Search or ask a question

Showing papers by "Sebastian Elbaum published in 2021"


Proceedings ArticleDOI
22 May 2021
TL;DR: SelfChecker as mentioned in this paper is a self-checking system that monitors DNN outputs and triggers an alarm if the internal layer features of the model are inconsistent with the final prediction, also providing advice in the form of an alternative prediction.
Abstract: The widespread adoption of Deep Neural Networks (DNNs) in important domains raises questions about the trustworthiness of DNN outputs. Even a highly accurate DNN will make mistakes some of the time, and in settings like self-driving vehicles these mistakes must be quickly detected and properly dealt with in deployment. Just as our community has developed effective techniques and mechanisms to monitor and check programmed components, we believe it is now necessary to do the same for DNNs. In this paper we present DNN self-checking as a process by which internal DNN layer features are used to check DNN predictions. We detail SelfChecker, a self-checking system that monitors DNN outputs and triggers an alarm if the internal layer features of the model are inconsistent with the final prediction. SelfChecker also provides advice in the form of an alternative prediction. We evaluated SelfChecker on four popular image datasets and three DNN models and found that SelfChecker triggers correct alarms on 60.56% of wrong DNN predictions, and false alarms on 2.04% of correct DNN predictions. This is a substantial improvement over prior work (SelfOracle, Dissector, and ConfidNet). In experiments with self-driving car scenarios, SelfChecker triggers more correct alarms than SelfOracle for two DNN models (DAVE-2 and Chauffeur) with comparable false alarms. Our implementation is available as open source.

21 citations


Book ChapterDOI
18 Jul 2021
TL;DR: In this paper, the authors present DNNV, a framework for reducing the burden on DNN verifier researchers, developers, and users by standardizing input and output formats, including a simple yet expressive DSL for specifying DNN properties.
Abstract: Despite the large number of sophisticated deep neural network (DNN) verification algorithms, DNN verifier developers, users, and researchers still face several challenges. First, verifier developers must contend with the rapidly changing DNN field to support new DNN operations and property types. Second, verifier users have the burden of selecting a verifier input format to specify their problem. Due to the many input formats, this decision can greatly restrict the verifiers that a user may run. Finally, researchers face difficulties in re-using benchmarks to evaluate and compare verifiers, due to the large number of input formats required to run different verifiers. Existing benchmarks are rarely in formats supported by verifiers other than the one for which the benchmark was introduced. In this work we present DNNV, a framework for reducing the burden on DNN verifier researchers, developers, and users. DNNV standardizes input and output formats, includes a simple yet expressive DSL for specifying DNN properties, and provides powerful simplification and reduction operations to facilitate the application, development, and comparison of DNN verifiers. We show how DNNV increases the support of verifiers for existing benchmarks from 30% to 74%.

15 citations


Proceedings ArticleDOI
22 May 2021
TL;DR: In this article, a semantics-preserving reduction of multiple safety property types, which subsume prior work, into a set of equivalid correctness problems amenable to adversarial attacks is proposed.
Abstract: Deep Neural Networks (DNN) are increasingly being deployed in safety-critical domains, from autonomous vehicles to medical devices, where the consequences of errors demand techniques that can provide stronger guarantees about behavior than just high test accuracy. This paper explores broadening the application of existing adversarial attack techniques for the falsification of DNN safety properties. We contend and later show that such attacks provide a powerful repertoire of scalable algorithms for property falsification. To enable the broad application of falsification, we introduce a semantics-preserving reduction of multiple safety property types, which subsume prior work, into a set of equivalid correctness problems amenable to adversarial attacks. We evaluate our reduction approach as an enabler of falsification on a range of DNN correctness problems and show its cost-effectiveness and scalability.

13 citations


Proceedings ArticleDOI
30 May 2021
TL;DR: In this paper, the authors explored whether fuzzing, an automated test input generation technique, can more quickly find failure inducing inputs in mobile robots, finding 56.5% more than uniform random input selection and 7.0% more more than BASE-FUZZ during 7 days of testing.
Abstract: Testing mobile robots is difficult and expensive, and many faults go undetected. In this work we explore whether fuzzing, an automated test input generation technique, can more quickly find failure inducing inputs in mobile robots. We developed a simple fuzzing adaptation, BASE-FUZZ, and one specialized for fuzzing mobile robots, PHYS-FUZZ. PHYS-FUZZ is unique in that it accounts for physical attributes such as the robot dimensions, estimated trajectories, and time to impact measures to guide the test input generation process. The results of evaluating PHYS-FUZZ suggest that it has the potential to speed up the discovery of input scenarios that reveal failures, finding 56.5% more than uniform random input selection and 7.0% more than BASE-FUZZ during 7 days of testing.

8 citations


Proceedings ArticleDOI
30 May 2021
TL;DR: In this article, the authors present a novel approach, World-in-the-loop (WIL) simulation, which integrates sensing data from simulation and real world to provide the AS with a mixed-reality.
Abstract: Simulation is at the core of validating autonomous systems (AS), enabling the detection of faults at a lower cost and earlier in the development life cycle. However, simulation can only produce an approximation of the real world, leading to a gap between simulation and reality where undesirable system behaviors can go unnoticed. To address that gap, we present a novel approach, world-in-the-loop (WIL) simulation, which integrates sensing data from simulation and the real world to provide the AS with a mixed-reality. The approach executes multiple instances of the AS in parallel, one in the real world and at least one in simulation, performs configurable transformations, filtering, and merging operations on the body of sensed data in order to integrate it, and provides the pipelines to distribute the original sensor data and the integrated sensor data back to the executing AS. We present a study on multiple scenarios and two simulators that demonstrates how WIL reduces the simulation-reality gap and increases the chances of exposing failures before deployment.

6 citations


Proceedings ArticleDOI
20 Aug 2021
TL;DR: In this article, the authors propose Swarmbug, a swarm debugging system that automatically diagnoses and fixes buggy behaviors caused by misconfiguration, which abstracts impacts of environment configurations (e.g., obstacles) to the drones in a swarm via behavior causal analysis.
Abstract: Swarm robotics collectively solve problems that are challenging for individual robots, from environmental monitoring to entertainment. The algorithms enabling swarms allow individual robots of the swarm to plan, share, and coordinate their trajectories and tasks to achieve a common goal. Such algorithms rely on a large number of configurable parameters that can be tailored to target particular scenarios. This large configuration space, the complexity of the algorithms, and the dependencies with the robots’ setup and performance make debugging and fixing swarms configuration bugs extremely challenging. This paper proposes Swarmbug, a swarm debugging system that automatically diagnoses and fixes buggy behaviors caused by misconfiguration. The essence of Swarmbug is the novel concept called the degree of causal contribution (Dcc), which abstracts impacts of environment configurations (e.g., obstacles) to the drones in a swarm via behavior causal analysis. Swarmbug automatically generates, validates, and ranks fixes for configuration bugs. We evaluate Swarmbug on four diverse swarm algorithms. Swarmbug successfully fixes four configuration bugs in the evaluated algorithms, showing that it is generic and effective. We also conduct a real-world experiment with physical drones to show the Swarmbug’s fix is effective in the real-world.

5 citations


Proceedings ArticleDOI
30 May 2021
TL;DR: In this paper, a complete end-to-end pipeline that enables precise, aggressive and agile maneuvers for multi-rotor UASs under real and challenging outdoor environments is presented, leveraging state-of-the-art optimal methods from the literature for trajectory planning and control, such that designing and executing dynamic paths is fast, robust and easy to customize for a particular application.
Abstract: Several independent approaches exist for state estimation and control of multirotor unmanned aerial systems (UASs) that address specific and constrained operational conditions. This work presents a complete end-to-end pipeline that enables precise, aggressive and agile maneuvers for multirotor UASs under real and challenging outdoor environments. We leverage state-of-the-art optimal methods from the literature for trajectory planning and control, such that designing and executing dynamic paths is fast, robust and easy to customize for a particular application. The complete pipeline, built entirely using commercially available components, is made open-source and fully documented to facilitate adoption. We demonstrate its performance in a variety of operational settings, such as hovering at a spot under dynamic wind speeds of up to 5– 6m/s (12–15mi/h) while staying within 12cm of 3D error. We also characterize its capabilities in flying high-speed trajectories outdoors, and enabling fast aerial docking with a moving target with planning and interception occurring in under 8s.

5 citations


Proceedings ArticleDOI
30 May 2021
TL;DR: This work presents the first automated approach for reducing the environment in which a robot failed, which systematically performs a partition of the environment space causing a failure, executes the robot in each partition containing a reduced environment, and further partitions reduced environments that still lead to a failure.
Abstract: Complex environments can cause robots to fail. Identifying the key elements of the environment associated with such failures is critical for faster fault isolation and, ultimately, debugging those failures. In this work we present the first automated approach for reducing the environment in which a robot failed. Similar to software debugging techniques, our approach systematically performs a partition of the environment space causing a failure, executes the robot in each partition containing a reduced environment, and further partitions reduced environments that still lead to a failure. The technique is novel in the spatial-temporal partition strategies it employs, and in how it manages the potential different robot behaviors occurring under the same environments. Our study of a ground robot on three failure scenarios finds that environment reductions of over 95% are achievable within a 2-hour window.

3 citations


DOI
14 Jul 2021
TL;DR: This work introduces the first approach that leverages environmental models to focus DNN falsification and verification on the relevant input space, and automatically builds an input distribution model using unsupervised learning and prefixes that model to the DNN to force all inputs to come from the learned distribution.
Abstract: DNN validation and verification approaches that are input distribution agnostic waste effort on irrelevant inputs and report false property violations. Drawing on the large body of work on model-based validation and verification of traditional systems, we introduce the first approach that leverages environmental models to focus DNN falsification and verification on the relevant input space. Our approach, DFV, automatically builds an input distribution model using unsupervised learning, prefixes that model to the DNN to force all inputs to come from the learned distribution, and reformulates the property to the input space of the distribution model. This transformed verification problem allows existing DNN falsification and verification tools to target the input distribution – avoiding consideration of infeasible inputs. Our study of DFV with 7 falsification and verification tools, two DNNs defined over different data sets, and 93 distinct distribution models, provides clear evidence that the counterexamples found by the tools are much more representative of the data distribution, and it shows how the performance of DFV varies across domains, models, and tools.

2 citations


Journal ArticleDOI
TL;DR: In this paper, the authors conduct an empirical study of 97 developers using 20 randomly selected code artifacts from the robotics domain containing physical unit types, and find that subjects select the correct physical type with just 51% accuracy, and a single correct annotation takes about 2 minutes on average.
Abstract: Type annotations connect variables to domain-specific types. They enable the power of type checking and can detect faults early. In practice, type annotations have a reputation of being burdensome to developers. We lack, however, an empirical understanding of how and why they are burdensome. Hence, we seek to measure the baseline accuracy and speed for developers making type annotations to previously unseen code. We also study the impact of one or more type suggestions. We conduct an empirical study of 97 developers using 20 randomly selected code artifacts from the robotics domain containing physical unit types. We find that subjects select the correct physical type with just 51% accuracy, and a single correct annotation takes about 2 minutes on average. Showing subjects a single suggestion has a strong and significant impact on accuracy both when correct and incorrect, while showing three suggestions retains the significant benefits without the negative effects. We also find that suggestions do not come with a time penalty. We require subjects to explain their annotation choices, and we qualitatively analyze their explanations. We find that identifier names and reasoning about code operations are the primary clues for selecting a type. We also examine two state-of-the-art automated type annotation systems and find opportunities for their improvement.

1 citations


Journal ArticleDOI
TL;DR: This paper proposes an automated numerical state space exploration method, called ACT, which leverages the model-agnostic feature of random testing and greatly improves its efficiency by guiding random testing under the feedback iteratively obtained in a test.
Abstract: The significant impact of TCP congestion control on the Internet highlights the importance of testing congestion control algorithm implementations (CCAIs) in various network environments. Many CCAI testing problems can be solved by exploring the numerical state space of CCAIs, which is defined by a group of numerical (and nonnumerical) state variables of the CCAIs. However, the current practices for automated numerical state space exploration are either limited by the approximate abstract CCAI models or inefficient due to the large space of network environment parameters and the complicated relation between the CCAI states and network environment parameters. In this paper, we propose an automated numerical state space exploration method, called ACT, which leverages the model-agnostic feature of random testing and greatly improves its efficiency by guiding random testing under the feedback iteratively obtained in a test. Our experiments on five representative Linux TCP CCAIs show that ACT can more efficiently explore a large numerical state space than manual testing, undirected random testing, and symbolic execution based testing, while without requiring an abstract CCAI model. ACT detects multiple design and implementation bugs of these Linux TCP CCAIs, including some new bugs not reported before.

Proceedings ArticleDOI
TL;DR: In this article, a type system that can automatically infer variables' frame types and in turn detect any type inconsistencies and violations of frame conventions is presented. But it is limited to a set of 180 publicly available ROS projects.
Abstract: A robotic system continuously measures its own motions and the external world during operation. Such measurements are with respect to some frame of reference, i.e., a coordinate system. A nontrivial robotic system has a large number of different frames and data have to be translated back-and-forth from a frame to another. The onus is on the developers to get such translation right. However, this is very challenging and error-prone, evidenced by the large number of questions and issues related to frame uses on developers' forum. Since any state variable can be associated with some frame, reference frames can be naturally modeled as variable types. We hence develop a novel type system that can automatically infer variables' frame types and in turn detect any type inconsistencies and violations of frame conventions. The evaluation on a set of 180 publicly available ROS projects shows that our system can detect 190 inconsistencies with 154 true positives. We reported 52 to developers and received 18 responses so far, with 15 fixed/acknowledged. Our technique also finds 45 violations of common practices.

Proceedings ArticleDOI
20 Aug 2021
TL;DR: In this article, a type system that can automatically infer variables' frame types and in turn detect any type inconsistencies and violations of frame conventions is presented. But it is limited to a set of 180 publicly available ROS projects.
Abstract: A robotic system continuously measures its own motions and the external world during operation. Such measurements are with respect to some frame of reference, i.e., a coordinate system. A nontrivial robotic system has a large number of different frames and data have to be translated back-and-forth from a frame to another. The onus is on the developers to get such translation right. However, this is very challenging and error-prone, evidenced by the large number of questions and issues related to frame uses on developers' forum. Since any state variable can be associated with some frame, reference frames can be naturally modeled as variable types. We hence develop a novel type system that can automatically infer variables' frame types and in turn detect any type inconsistencies and violations of frame conventions. The evaluation on a set of 180 publicly available ROS projects shows that our system can detect 190 inconsistencies with 154 true positives. We reported 52 to developers and received 18 responses so far, with 15 fixed/acknowledged. Our technique also finds 45 violations of common practices.

Posted Content
TL;DR: In this paper, the authors present DNNV, a framework for reducing the burden on DNN verifier researchers, developers, and users by standardizing input and output formats, including a simple yet expressive DSL for specifying DNN properties.
Abstract: Despite the large number of sophisticated deep neural network (DNN) verification algorithms, DNN verifier developers, users, and researchers still face several challenges. First, verifier developers must contend with the rapidly changing DNN field to support new DNN operations and property types. Second, verifier users have the burden of selecting a verifier input format to specify their problem. Due to the many input formats, this decision can greatly restrict the verifiers that a user may run. Finally, researchers face difficulties in re-using benchmarks to evaluate and compare verifiers, due to the large number of input formats required to run different verifiers. Existing benchmarks are rarely in formats supported by verifiers other than the one for which the benchmark was introduced. In this work we present DNNV, a framework for reducing the burden on DNN verifier researchers, developers, and users. DNNV standardizes input and output formats, includes a simple yet expressive DSL for specifying DNN properties, and provides powerful simplification and reduction operations to facilitate the application, development, and comparison of DNN verifiers. We show how DNNV increases the support of verifiers for existing benchmarks from 30% to 74%.

Posted Content
TL;DR: SelfChecker as mentioned in this paper is a self-checking system that monitors DNN outputs and triggers an alarm if the internal layer features of the model are inconsistent with the final prediction, also providing advice in the form of an alternative prediction.
Abstract: The widespread adoption of Deep Neural Networks (DNNs) in important domains raises questions about the trustworthiness of DNN outputs. Even a highly accurate DNN will make mistakes some of the time, and in settings like self-driving vehicles these mistakes must be quickly detected and properly dealt with in deployment. Just as our community has developed effective techniques and mechanisms to monitor and check programmed components, we believe it is now necessary to do the same for DNNs. In this paper we present DNN self-checking as a process by which internal DNN layer features are used to check DNN predictions. We detail SelfChecker, a self-checking system that monitors DNN outputs and triggers an alarm if the internal layer features of the model are inconsistent with the final prediction. SelfChecker also provides advice in the form of an alternative prediction. We evaluated SelfChecker on four popular image datasets and three DNN models and found that SelfChecker triggers correct alarms on 60.56% of wrong DNN predictions, and false alarms on 2.04% of correct DNN predictions. This is a substantial improvement over prior work (SELFORACLE, DISSECTOR, and ConfidNet). In experiments with self-driving car scenarios, SelfChecker triggers more correct alarms than SELFORACLE for two DNN models (DAVE-2 and Chauffeur) with comparable false alarms. Our implementation is available as open source.

Proceedings ArticleDOI
25 May 2021
TL;DR: In this article, the authors present an artifact to accompany Reducing DNN Properties to Enable Falsification with Adversarial Attacks which includes the DNNF tool, data and scripts to facilitate the replication of its study.
Abstract: We present an artifact to accompany Reducing DNN Properties to Enable Falsification with Adversarial Attacks which includes the DNNF tool, data and scripts to facilitate the replication of its study. The artifact is both reusable and available. DNNF is available on Github, and we provide an artifact to reproduce our study as a VirtualBox virtual machine image. Full replication of the study requires 64GB of memory and 8 CPU cores. Users should know how to use VirtualBox, as well as have basic knowledge of the bash shell.