scispace - formally typeset
Search or ask a question

Showing papers by "Jens Dietrich published in 2017"


Journal ArticleDOI
TL;DR: This work presents XCorpus, a set of 76 executable, real-world Java programs, including a subset of 70 programs from the Qualitas Corpus, which uses a harness that is a combination of built-in and generated test cases, resulting in a branch coverage that is significantly better than what is available from DaCapo.
Abstract: Empirical studies on code require standardized datasets of significant size extracted from real-world programs in order to be reproducible and generalisable. We argue that there is a need for such data sets that are executable and can therefore be used for experiments using static and dynamic analysis. A harness for such a data set should have high coverage in order to facilitate the construction of comprehensive models of program execution. We present XCorpus, a set of 76 executable, real-world Java programs, including a subset of 70 programs from the Qualitas Corpus. XCorpus uses a harness that is a combination of built-in and generated test cases, resulting in a branch coverage that is significantly better than what is available from DaCapo.

30 citations


Journal ArticleDOI
TL;DR: It is found that only a small number of tools suitable to analyse API evolution exist, and those tools are only infrequently maintained by small communities.
Abstract: The development of software components with independent release cycles is nowadays widely supported by multiple languages and frameworks. A critical feature of any such platform is to safeguard composition by ensuring backward compatibility of substituted components. In recent years, some tooling has been developed to help developers and DevOps engineers to establish whether components are backward compatible by means of static analysis. We investigate the state of the art in this space by benchmarking such tools for Java. For this purpose, we have developed a compact benchmark data set of less than 200KB. Using this dataset, we study possible API changes of Java libraries, and whether the tools investigated can detect them. We find that only a small number of tools suitable to analyse API evolution exist. Those tools are only infrequently maintained by small communities. All tools investigated have some shortcomings in that they fail to detect certain API incompatibilities.

18 citations


Journal ArticleDOI
TL;DR: This paper presents an approach to this problem in description logic learning by computing partial descriptions of both positive and negative examples and combining them, and demonstrates that this algorithm provides significantly better results than standard approaches on some typical data sets.
Abstract: In machine learning, one often encounters data sets where a general pattern is violated by a relatively small number of exceptions (for example, a rule that says that all birds can fly is violated by examples such as penguins). This complicates the concept learning process and may lead to the rejection of some simple and expressive rules that cover many cases. In this paper we present an approach to this problem in description logic learning by computing partial descriptions (which are not necessarily entirely complete) of both positive and negative examples and combining them. Our Symmetric Parallel Class Expression Learning approach enables the generation of general rules with exception patterns included. We demonstrate that this algorithm provides significantly better results (in terms of metrics such as accuracy, search space covered, and learning time) than standard approaches on some typical data sets. Further, the approach has the added benefit that it can be parallelised relatively simply, leading to much faster exploration of the search tree on modern computers.

15 citations


Proceedings ArticleDOI
13 May 2017
TL;DR: A new type of serialisation-related vulnerabilities for Java that exploit the topology of object graphs constructed from classes of the standard library in a way that deserialisation leads to resource exhaustion, facilitating denial of service attacks is investigated.
Abstract: In recent years, multiple vulnerabilities exploiting the serialisation APIs of various programming languages, including Java, have been discovered. These vulnerabilities can be used to devise in- jection attacks, exploiting the presence of dynamic programming language features like reflection or dynamic proxies. In this paper, we investigate a new type of serialisation-related vulnerabilit- ies for Java that exploit the topology of object graphs constructed from classes of the standard library in a way that deserialisation leads to resource exhaustion, facilitating denial of service attacks. We analyse three such vulnerabilities that can be exploited to exhaust stack memory, heap memory and CPU time. We discuss the language and library design features that enable these vulnerabilities, and investigate whether these vulnerabilities can be ported to C#, Java- Script and Ruby. We present two case studies that demonstrate how the vulnerabilities can be used in attacks on two widely used servers, Jenkins deployed on Tomcat and JBoss. Finally, we propose a mitigation strategy based on contract injection.

10 citations


Proceedings ArticleDOI
18 Jun 2017
TL;DR: A methodology that can be used to check the (un)soundness of a particular static analysis, call-graph construction, based on soundness oracles is discussed, which can also be used for hybrid analyses.
Abstract: One of the inherent advantages of static analysis is that it can create and reason about models of an entire program. However, mainstream languages such as Java use numerous dynamic language features designed to boost programmer productivity, but these features are notoriously difficult to capture by static analysis, leading to unsoundness in practice. While existing research has focused on providing sound handling for selected language features (mostly reflection) based on anecdotal evidence and case studies, there is little empirical work to investigate the extent to which particular features cause unsoundness of static analysis in practice. In this paper, we (1) discuss language features that may cause unsoundness and (2) discuss a methodology that can be used to check the (un)soundness of a particular static analysis, call-graph construction, based on soundness oracles. These oracles can also be used for hybrid analyses.

9 citations


Proceedings ArticleDOI
01 Dec 2017
TL;DR: This paper investigates the question whether information harvested from stack traces obtained from the GitHub issue tracker and Stack Overflow Q&A forums can be used in order to complement statically built call graphs, and finds edges that Doop misses when analysing real-world programs, even when reflection analysis is enabled.
Abstract: Static program analysis is a cornerstone of modern software engineering - it is used to detect bugs and security vulnerabilities early before software is deployed. While there is a large body of research into the scalability and the precision of static analysis, the (un) soundness of static analysis is a critical issue that has not attracted the same level of attention by the research community. In this paper we investigate the question whether information harvested from stack traces obtained from the GitHub issue tracker and Stack Overflow Q&A forums can be used in order to complement statically built call graphs. For this purpose, we extract reflective call graph edges from parsed stack traces, and check whether these edges are correctly computed by Doop, a widely used tool for static analysis with built-in support for reflection analysis. We do find edges that Doop misses when analysing real-world programs, even when reflection analysis is enabled. This suggests that mining techniques are a useful tool to test and improve the soundness of static analysis.

7 citations


DOI
01 Jan 2017
TL;DR: This artefact allows users to witness how the serialisation-based vulnerabilities result in behavior that can be used in security attacks and supports the repeatability of the case study experiments and the benchmark for the mitigation measures proposed in the paper.
Abstract: This artefact demonstrates the effects of the serialisation vulnerabilities described in the companion paper. It is composed of three components: scripts, including source code, for Java, Ruby and C# serialisation-vulnerabilities, two case studies that demonstrate attacks based on the vulnerabilities, and a contracts-based mitigation strategy for serialisation-based attacks on Java applications. The artefact allows users to witness how the serialisation-based vulnerabilities result in behavior that can be used in security attacks. It also supports the repeatability of the case study experiments and the benchmark for the mitigation measures proposed in the paper. Instructions for running the tasks are provided along with a description of the artefact setup.

6 citations


Proceedings ArticleDOI
01 Jan 2017
TL;DR: The use of formal contracts has long been advocated as an approach to develop programs that are provably correct as mentioned in this paper, however, the adoption of contracts has been slow in practice, despite the fact that built-in features of the language (e.g. assertions and exceptions) can be used for this.
Abstract: The use of formal contracts has long been advocated as an approach to develop programs that are provably correct. However, the reality is that adoption of contracts has been slow in practice. Despite this, the adoption of lightweight contracts — typically utilising runtime checking — has progressed. In the case of Java, built-in features of the language (e.g. assertions and exceptions) can be used for this. Furthermore, a number of libraries which facilitate contract checking have arisen. In this paper, we catalogue 25 techniques and tools for lightweight contract checking in Java, and present the results of an empirical study looking at a dataset extracted from the 200 most popular projects found on Maven Central, constituting roughly 351,034 KLOC. We examine (1) the extent to which contracts are used and (2) what kind of contracts are used. We then investigate how contracts are used to safeguard code, and study problems in the context of two types of substitutability that can be guarded by contracts: (3) unsafe evolution of APIs that may break client programs and (4) violations of Liskovs Substitution Principle (LSP) when methods are overridden. We find that: (1) a wide range of techniques and constructs are used to represent contracts, and often the same program uses different techniques at the same time; (2) overall, contracts are used less than expected, with significant differences between programs; (3) projects that use contracts continue to do so, and expand the use of contracts as they grow and evolve; and, (4) there are cases where the use of contracts points to unsafe subtyping (violations of Liskov's Substitution Principle) and unsafe evolution.

5 citations


DOI
01 Jan 2017
TL;DR: This artefact contains a dataset of open-source programs obtained from the Maven Central Repository and scripts that first extract contracts from these programs and then perform several analyses on the contracts extracted, showing how contracts are used in real-world program, and how their usage changes between versions and within inheritance hierarchies.
Abstract: This artefact contains a dataset of open-source programs obtained from the Maven Central Repository and scripts that first extract contracts from these programs and then perform several analyses on the contracts extracted. The extraction and analysis is fully automated and directly produces the tables presented in the accompanying paper. The results show how contracts are used in real-world program, and how their usage changes between versions and within inheritance hierarchies.

1 citations