scispace - formally typeset
Search or ask a question

Showing papers by "Eric Bodden published in 2019"


Journal ArticleDOI
02 Jan 2019
TL;DR: The concept of synchronized pushdown systems (SPDS) is introduced, which can provide high precision and further improve scalability, in particularly when used in analyses that expose rather local data flows.
Abstract: Precise static analyses are context-, field- and flow-sensitive. Context- and field-sensitivity are both expressible as context-free language (CFL) reachability problems. Solving both CFL problems along the same data-flow path is undecidable, which is why most flow-sensitive data-flow analyses over-approximate field-sensitivity through k-limited access-path, or through access graphs. Unfortunately, as our experience and this paper show, both representations do not scale very well when used to analyze programs with recursive data structures. Any single CFL-reachability problem is efficiently solvable, by means of a pushdown system. This work thus introduces the concept of synchronized pushdown systems (SPDS). SPDS encode both procedure calls/returns and field stores/loads as separate but “synchronized” CFL reachability problems. An SPDS solves both individual problems precisely, and approximation occurs only in corner cases that are apparently rare in practice: at statements where both problems are satisfied but not along the same data-flow path. SPDS are also efficient: formal complexity analysis shows that SPDS shift the complexity from |F|3k under k-limiting to |S||F|2, where F is the set of fields and S the set of statements involved in a data-flow. Our evaluation using DaCapo shows this shift to pay off in practice: SPDS are almost as efficient as k-limiting with k=1 although their precision equals k=∞. For a typestate analysis SPDS accelerate the analysis up to 83× for data-flows of objects that involve many field accesses but span rather few methods. We conclude that SPDS can provide high precision and further improve scalability, in particularly when used in analyses that expose rather local data flows.

59 citations


Book ChapterDOI
06 Apr 2019
TL;DR: The design and implementation of the LLVM-based static analysis framework PhASAR for C/C++ code is described and its abstractions and their implementations are found to provide a whole-program analysis that scales well to real-world programs.
Abstract: Static program analysis is used to automatically determine program properties, or to detect bugs or security vulnerabilities in programs. It can be used as a stand-alone tool or to aid compiler optimization as an intermediary step. Developing precise, inter-procedural static analyses, however, is a challenging task, due to the algorithmic complexity, implementation effort, and the threat of state explosion which leads to unsatisfactory performance. Software written in C and C++ is notoriously hard to analyze because of the deliberately unsafe type system, unrestricted use of pointers, and (for C++) virtual dispatch. In this work, we describe the design and implementation of the LLVM-based static analysis framework PhASAR for C/C++ code. PhASAR allows data-flow problems to be solved in a fully automated manner. It provides class hierarchy, call-graph, points-to, and data-flow information, hence requiring analysis developers only to specify a definition of the data-flow problem. PhASAR thus hides the complexity of static analysis behind a high-level API, making static program analysis more accessible and easy to use. PhASAR is available as an open-source project. We evaluate PhASAR’s scalability during whole-program analysis. Analyzing 12 real-world programs using a taint analysis written in PhASAR, we found PhASAR’s abstractions and their implementations to provide a whole-program analysis that scales well to real-world programs. Furthermore, we peek into the details of analysis runs, discuss our experience in developing static analyses for C/C++, and present possible future improvements. Data or code related to this paper is available at: [34].

48 citations


Proceedings ArticleDOI
10 Nov 2019
TL;DR: COVA is designed, an analysis tool to compute partial path constraints that inform about the circumstances under which taint flows may actually occur in practice, and shows that few tainted flows are guarded by multiple different kinds of conditions simultaneously, so tools that seek to confirm true positives dynamically can concentrate on one kind at a time.
Abstract: In the past, researchers have developed a number of popular taint-analysis approaches, particularly in the context of Android applications. Numerous studies have shown that automated code analyses are adopted by developers only if they yield a good "signal to noise ratio", i.e., high precision. Many previous studies have reported analysis precision quantitatively, but this gives little insight into what can and should be done to increase precision further. To guide future research on increasing precision, we present a comprehensive study that evaluates static Android taint-analysis results on a qualitative level. To unravel the exact nature of taint flows, we have designed COVA, an analysis tool to compute partial path constraints that inform about the circumstances under which taint flows may actually occur in practice. We have conducted a qualitative study on the taint flows reported by FlowDroid in 1,022 real-world Android applications. Our results reveal several key findings: Many taint flows occur only under specific conditions, e.g., environment settings, user interaction, I/O. Taint analyses should consider the application context to discern such situations. COVA shows that few taint flows are guarded by multiple different kinds of conditions simultaneously, so tools that seek to confirm true positives dynamically can concentrate on one kind at a time, e.g., only simulating user interactions. Lastly, many false positives arise due to a too liberal source/sink configuration. Taint analyses must be more carefully configured, and their configuration could benefit from better tool assistance.

21 citations


Proceedings ArticleDOI
13 Mar 2019
TL;DR: ACMiner as discussed by the authors combines program and text analysis techniques to generate a rich set of authorization checks, mines the corresponding protection policy for each service entry point, and uses association rule mining at a service granularity to identify inconsistencies that may correspond to vulnerabilities.
Abstract: Billions of users rely on the security of the Android platform to protect phones, tablets, and many different types of consumer electronics. While Android's permission model is well studied, the enforcement of the protection policy has received relatively little attention. Much of this enforcement is spread across system services, taking the form of hard-coded checks within their implementations. In this paper, we propose Authorization Check Miner (ACMiner), a framework for evaluating the correctness of Android's access control enforcement through consistency analysis of authorization checks. ACMiner combines program and text analysis techniques to generate a rich set of authorization checks, mines the corresponding protection policy for each service entry point, and uses association rule mining at a service granularity to identify inconsistencies that may correspond to vulnerabilities. We used ACMiner to study the AOSP version of Android 7.1.1 to identify 28 vulnerabilities relating to missing authorization checks. In doing so, we demonstrate ACMiner's ability to help domain experts process thousands of authorization checks scattered across millions of lines of code.

18 citations


Proceedings ArticleDOI
19 Sep 2019
TL;DR: It is found that, in general, the experience of developers in using JCA does not correlate with their performance, and none of the factors such as the number or frequency of committed lines of code, the number of JCA APIs developers use, or thenumber of projects they are involved in correlate with developer performance in this domain.
Abstract: Background: Previous research has shown that crypto APIs are hard for developers to understand and difficult for them to use. They consequently rely on unvalidated boilerplate code from online resources where security vulnerabilities are common.Aims and method: We analyzed 2,324 open-source Java projects that rely on Java Cryptography Architecture (JCA) to understand how crypto APIs are used in practice, and what factors account for the performance of developers in using these APIs.Results: We found that, in general, the experience of developers in using JCA does not correlate with their performance. In particular, none of the factors such as the number or frequency of committed lines of code, the number of JCA APIs developers use, or the number of projects they are involved in correlate with developer performance in this domain.Conclusions: We call for qualitative studies to shed light on the reasons underlying the success of developers who are expert in using cryptography. Also, detailed investigation at API level is necessary to further clarify a developer obstacles in this domain.

16 citations


Proceedings ArticleDOI
22 Jun 2019
TL;DR: The tool SootDiff is proposed that uses Soot's intermediate representation Jimple, in combination with code clone detection techniques, to reduce dissimilarities introduced by different compilers, and to identify clones.
Abstract: Different Java compilers and compiler versions, e.g., javac or ecj, produce different bytecode from the same source code. This makes it hard to trace if the bytecode of an open-source library really matches the provided source code. Moreover, it prevents one from detecting which open-source libraries have been re-compiled and rebundled into a single jar, which is a common way to distribute an application. Such rebundling is problematic because it prevents one to check if the jar file contains open-source libraries with known vulnerabilities. To cope with these problems, we propose the tool SootDiff that uses Soot's intermediate representation Jimple, in combination with code clone detection techniques, to reduce dissimilarities introduced by different compilers, and to identify clones. Our results show that SootDiff successfully identifies clones in 102 of 144 cases, whereas bytecode comparison succeeds in 58 cases only.

12 citations


Proceedings ArticleDOI
01 Jan 2019
TL;DR: MagpieBridge - a general approach to integrating static analyses into IDEs and editors, which reduces the mxn complexity problem of integrating m analyses into n IDEs to m+n complexity because each analysis and type of plugin need be done just once for MagpieBridge itself.
Abstract: In the past, many static analyses have been created in academia, but only a few of them have found widespread use in industry. Those analyses which are adopted by developers usually have IDE support in the form of plugins, without which developers have no convenient mechanism to use the analysis. Hence, the key to making static analyses more accessible to developers is to integrate the analyses into IDEs and editors. However, integrating static analyses into IDEs is non-trivial: different IDEs have different UI workflows and APIs, expertise in those matters is required to write such plugins, and analysis experts are not typically familiar with doing this. As a result, especially in academia, most analysis tools are headless and only have command-line interfaces. To make static analyses more usable, we propose MagpieBridge - a general approach to integrating static analyses into IDEs and editors. MagpieBridge reduces the mxn complexity problem of integrating m analyses into n IDEs to m+n complexity because each analysis and type of plugin need be done just once for MagpieBridge itself. We demonstrate our approach by integrating two existing analyses, Ariadne and CogniCrypt, into IDEs; these two analyses illustrate the generality of MagpieBridge, as they are based on different program analysis frameworks - WALA and Soot respectively - for different application areas - machine learning and security - and different programming languages - Python and Java. We show further generality of MagpieBridge by using multiple popular IDEs and editors, such as Eclipse, IntelliJ, PyCharm, Jupyter, Sublime Text and even Emacs and Vim.

9 citations


Proceedings ArticleDOI
10 Jul 2019
TL;DR: This paper presents SWAN, a fully-automated machine-learning approach to detect sources, sinks, validators, and authentication methods for Java programs and introduces SWANAssist, an extension to SWAN that allows analysis users to refine the classifications.
Abstract: More and more companies use static analysis to perform regular code reviews to detect security vulnerabilities in their code, configuring them to detect various types of bugs and vulnerabilities such as the SANS top 25 or the OWASP top 10. For such analyses to be as precise as possible, they must be adapted to the code base they scan. The particular challenge we address in this paper is to provide analyses with the correct security-relevant methods (Srm): sources, sinks, etc. We present SWAN, a fully-automated machine-learning approach to detect sources, sinks, validators, and authentication methods for Java programs. SWAN further classifies the Srm into specific vulnerability classes of the SANS top 25. To further adapt the lists detected by SWAN to the code base and to improve its precision, we also introduce SWANAssist, an extension to SWAN that allows analysis users to refine the classifications. On twelve popular Java frameworks, SWAN achieves an average precision of 0.826, which is better or comparable to existing approaches. Our experiments show that SWANAssist requires a relatively low effort from the developer to significantly improve its precision.

9 citations


Proceedings ArticleDOI
01 Nov 2019
TL;DR: This paper summarizes the main design challenges for building usable static analysis tools, and shows that they revolve around the notion of explainability, which is a subarea of usability, which leads to proposed potential lines of future work in explainability for static analysis.
Abstract: Static code analysis is widely used to support the development of high-quality software. It helps developers detect potential bugs and security vulnerabilities in a program's source code without executing it. While the potential benefits of static analysis tools are beyond question, their usability is often criticised and prevents software developers from using static analysis to its full potential. In the past decade, researchers have studied developer needs and contrasted them to available static analysis tool functionalities. In this paper, we summarize the main design challenges for building usable static analysis tools, and show that they revolve around the notion of explainability, which is a subarea of usability. We present existing analysis tools and current research in static analysis usability, and detail how they approach those challenges. This leads us to proposing potential lines of future work in explainability for static analysis, namely turning static analysis tools into assistants and teachers.

6 citations


Proceedings ArticleDOI
22 Jun 2019
TL;DR: This work presents some of the insights gained by instrumenting the LLVM-based static analysis framework PhASAR for C/C++ code and shows the broad area of applications at which flexible instrumentation supports analysis and framework developers.
Abstract: The development of a high-quality data-flow analysis---one that is precise and scalable---is a challenging task. A concrete client analysis not only requires data-flow but, in addition, type-hierarchy, points-to, and call-graph information, all of which need to be obtained by wisely chosen and correctly parameterized algorithms. Therefore, many static analysis frameworks have been developed that provide analysis writers with generic data-flow solvers as well as those additional pieces of information. Such frameworks ease the development of an analysis by requiring only a description of the data-flow problem to be solved and a set of framework parameters. Yet, analysis writers often struggle when an analysis does not behave as expected on real-world code. It is usually not apparent what causes a failure due to the complex interplay of the several algorithms and the client analysis code within such frameworks. In this work, we present some of the insights we gained by instrumenting the LLVM-based static analysis framework PhASAR for C/C++ code and show the broad area of applications at which flexible instrumentation supports analysis and framework developers. We present five cases in which instrumentation gave us valuable insights to debug and improve both, the concrete analyses and the underlying PhASAR framework.

4 citations


Posted Content
TL;DR: In this article, the authors analyzed 2,324 open-source Java projects that rely on Java Cryptography Architecture (JCA) to understand how crypto APIs are used in practice, and what factors account for the performance of developers in using these APIs.
Abstract: Previous research has shown that crypto APIs are hard for developers to understand and difficult for them to use. They consequently rely on unvalidated boilerplate code from online resources where security vulnerabilities are common. We analyzed 2,324 open-source Java projects that rely on Java Cryptography Architecture (JCA) to understand how crypto APIs are used in practice, and what factors account for the performance of developers in using these APIs. We found that, in general, the experience of developers in using JCA does not correlate with their performance. In particular, none of the factors such as the number or frequency of committed lines of code, the number of JCA APIs developers use, or the number of projects they are involved in correlate with developer performance in this domain. We call for qualitative studies to shed light on the reasons underlying the success of developers who are expert in using cryptography. Also, detailed investigation at API level is necessary to further clarify a developer obstacles in this domain.

Proceedings ArticleDOI
25 Mar 2019
TL;DR: A generic approach to analyze system behavior on architecture level using the principles of Runtime Verification, which allows an analyst to easily verify or refute hypotheses about system behavior regarding the interaction of components, without the need to inspect the source code.
Abstract: Analyzing runtime behavior is an important part of developing and verifying software systems. This is especially true for complex component-based systems used in the vehicle industry. Here, locating the actual cause of (mis-)behavior can be time-consuming, because the analysis is usually not performed on the architecture level, where the system has initially been designed. Instead, it often relies on source code debugging or visualizing signals and events. The results must then be correlated to what is expected regarding the architecture. With an ever-growing complexity of the systems, the advent of model-based development, code generators and the distributed nature of the development process, this becomes increasingly difficult. This paper therefore presents Architectural Runtime Verification (ARV), a generic approach to analyze system behavior on architecture level using the principles of Runtime Verification. It relies on the architecture description and on the runtime information that is collected in simulation-based tests. This allows an analyst to easily verify or refute hypotheses about system behavior regarding the interaction of components, without the need to inspect the source code. We have instantiated ARV as a framework that allows a client to make queries about architectural elements using a timed LTL-based constraint language. From this, ARV generates a Runtime Verification monitor and applies it to runtime data stored in a database. We demonstrate the applicability of this approach with a running example from the automotive industry.

Proceedings ArticleDOI
10 Nov 2019
TL;DR: This work presents the semi-automated tool SWAN_ASSIST, which aids the configuration with an IntelliJ plugin based on active machine learning, which integrates the novel automated machine-learning approach SWAN, which identifies and classifies Java SRM.
Abstract: To detect specific types of bugs and vulnerabilities, static analysis tools must be correctly configured with security-relevant methods (SRM), e.g., sources, sinks, sanitizers and authentication methods–usually a very labour-intensive and error-prone process. This work presents the semi-automated tool SWAN_ASSIST, which aids the configuration with an IntelliJ plugin based on active machine learning. It integrates our novel automated machine-learning approach SWAN, which identifies and classifies Java SRM. SWAN_ASSIST further integrates user feedback through iterative learning. SWAN_ASSIST aids developers by asking them to classify at each point in time exactly those methods whose classification best impact the classification result. Our experiments show that SWAN_ASSIST classifies SRM with a high precision, and requires a relatively low effort from the user. A video demo of SWAN_ASSIST can be found at https://youtu.be/fSyD3V6EQOY. The source code is available at https://github.com/secure-software-engineering/swan.

Book ChapterDOI
07 Oct 2019
TL;DR: This work implemented AuthCheck for the Spring framework and identified four types of mistakes that developers can make when using Spring Security, and analyzed an existing open-source Spring application with inserted vulnerable code and detected the vulnerabilities.
Abstract: According to security rankings such as the SANS Top 25 and the OWASP Top 10, access-control vulnerabilities are still highly relevant. Even though developers use web frameworks such as Spring and Struts, which handle the entire access-control mechanism, their implementation can still be vulnerable because of misuses, errors, or inconsistent implementation from the design specification. We propose AuthCheck, a static analysis that tracks the program’s state using a finite state machine to report illegal states caused by vulnerable implementation. We implemented AuthCheck for the Spring framework and identified four types of mistakes that developers can make when using Spring Security. With AuthCheck, we analyzed an existing open-source Spring application with inserted vulnerable code and detected the vulnerabilities.

Posted Content
TL;DR: This paper proposes Authorization Check Miner (ACMiner), a framework for evaluating the correctness of Android's access control enforcement through consistency analysis of authorization checks, and uses ACMiner to study the AOSP version of Android 7.1.1 to identify 28 vulnerabilities relating to missing authorization checks.
Abstract: Billions of users rely on the security of the Android platform to protect phones, tablets, and many different types of consumer electronics. While Android's permission model is well studied, the enforcement of the protection policy has received relatively little attention. Much of this enforcement is spread across system services, taking the form of hard-coded checks within their implementations. In this paper, we propose Authorization Check Miner (ACMiner), a framework for evaluating the correctness of Android's access control enforcement through consistency analysis of authorization checks. ACMiner combines program and text analysis techniques to generate a rich set of authorization checks, mines the corresponding protection policy for each service entry point, and uses association rule mining at a service granularity to identify inconsistencies that may correspond to vulnerabilities. We used ACMiner to study the AOSP version of Android 7.1.1 to identify 28 vulnerabilities relating to missing authorization checks. In doing so, we demonstrate ACMiner's ability to help domain experts process thousands of authorization checks scattered across millions of lines of code.