scispace - formally typeset
Search or ask a question

Showing papers on "Program transformation published in 2019"


Proceedings ArticleDOI
10 Nov 2019
TL;DR: A framework for transformation inference is proposed, where programs are represented as hypergraphs to enable fine-grained generalization of transformations and a transformation inference approach, GENPAT, is designed that infers a program transformation based on code context and statistics from a big code corpus.
Abstract: Inferring program transformations from concrete program changes has many potential uses, such as applying systematic program edits, refactoring, and automated program repair. Existing work for inferring program transformations usually rely on statistical information over a potentially large set of program-change examples. However, in many practical scenarios we do not have such a large set of program-change examples. In this paper, we address the challenge of inferring a program transformation from one single example. Our core insight is that "big code" can provide effective guide for the generalization of a concrete change into a program transformation, i.e., code elements appearing in many files are general and should not be abstracted away. We first propose a framework for transformation inference, where programs are represented as hypergraphs to enable fine-grained generalization of transformations. We then design a transformation inference approach, GENPAT, that infers a program transformation based on code context and statistics from a big code corpus. We have evaluated GENPAT under two distinct application scenarios, systematic editing and program repair. The evaluation on systematic editing shows that GENPAT significantly outperforms a state-of-the-art approach, SYDIT, with up to 5.5x correctly transformed cases. The evaluation on program repair suggests that GENPAT has the potential to be integrated in advanced program repair tools-GENPAT successfully repaired 19 real-world bugs in the Defects4J benchmark by simply applying transformations inferred from existing patches, where 4 bugs have never been repaired by any existing technique. Overall, the evaluation results suggest that GENPAT is effective for transformation inference and can potentially be adopted for many different applications.

52 citations


Proceedings ArticleDOI
08 Jun 2019
TL;DR: This work develops the language and type system, formalizes the constant-time transformation, and presents an empirical evaluation that uses FaCT to implement core crypto routines from several open-source projects including OpenSSL, libsodium, and curve25519-donna.
Abstract: Real-world cryptographic code is often written in a subset of C intended to execute in constant-time, thereby avoiding timing side channel vulnerabilities. This C subset eschews structured programming as we know it: if-statements, looping constructs, and procedural abstractions can leak timing information when handling sensitive data. The resulting obfuscation has led to subtle bugs, even in widely-used high-profile libraries like OpenSSL. To address the challenge of writing constant-time cryptographic code, we present FaCT, a crypto DSL that provides high-level but safe language constructs. The FaCT compiler uses a secrecy type system to automatically transform potentially timing-sensitive high-level code into low-level, constant-time LLVM bitcode. We develop the language and type system, formalize the constant-time transformation, and present an empirical evaluation that uses FaCT to implement core crypto routines from several open-source projects including OpenSSL, libsodium, and curve25519-donna. Our evaluation shows that FaCT’s design makes it possible to write readable, high-level cryptographic code, with efficient, constant-time behavior.

48 citations


Journal ArticleDOI
TL;DR: A compositional program transformation from the simply-typed lambda-calculus to itself augmented with a notion of linear negation is defined, and it is proved that this computes the gradient of the source program with the same efficiency as first-order backpropagation.
Abstract: Backpropagation is a classic automatic differentiation algorithm computing the gradient of functions specified by a certain class of simple, first-order programs, called computational graphs. It is a fundamental tool in several fields, most notably machine learning, where it is the key for efficiently training (deep) neural networks. Recent years have witnessed the quick growth of a research field called differentiable programming, the aim of which is to express computational graphs more synthetically and modularly by resorting to actual programming languages endowed with control flow operators and higher-order combinators, such as map and fold. In this paper, we extend the backpropagation algorithm to a paradigmatic example of such a programming language: we define a compositional program transformation from the simply-typed lambda-calculus to itself augmented with a notion of linear negation, and prove that this computes the gradient of the source program with the same efficiency as first-order backpropagation. The transformation is completely effect-free and thus provides a purely logical understanding of the dynamics of backpropagation.

32 citations


Journal ArticleDOI
20 Dec 2019
TL;DR: Stacked Borrows is proposed, an operational semantics for memory accesses in Rust that defines an aliasing discipline and declares programs violating it to have undefined behavior, meaning the compiler does not have to consider such programs when performing optimizations.
Abstract: Type systems are useful not just for the safety guarantees they provide, but also for helping compilers generate more efficient code by simplifying important program analyses. In Rust, the type system imposes a strict discipline on pointer aliasing, and it is an express goal of the Rust compiler developers to make use of that alias information for the purpose of program optimizations that reorder memory accesses. The problem is that Rust also supports unsafe code, and programmers can write unsafe code that bypasses the usual compiler checks to violate the aliasing discipline. To strike a balance between optimizations and unsafe code, the language needs to provide a set of rules such that unsafe code authors can be sure, if they are following these rules, that the compiler will preserve the semantics of their code despite all the optimizations it is doing. In this work, we propose Stacked Borrows, an operational semantics for memory accesses in Rust. Stacked Borrows defines an aliasing discipline and declares programs violating it to have undefined behavior, meaning the compiler does not have to consider such programs when performing optimizations. We give formal proofs (mechanized in Coq) showing that this rules out enough programs to enable optimizations that reorder memory accesses around unknown code and function calls, based solely on intraprocedural reasoning. We also implemented this operational model in an interpreter for Rust and ran large parts of the Rust standard library test suite in the interpreter to validate that the model permits enough real-world unsafe Rust code.

29 citations


Journal ArticleDOI
20 Dec 2019
TL;DR: In this paper, a compositional program transformation from the simply-typed lambda-calculus to itself augmented with a notion of linear negation is presented, which computes the gradient of the source program with the same efficiency as first-order backpropagation.
Abstract: Backpropagation is a classic automatic differentiation algorithm computing the gradient of functions specified by a certain class of simple, first-order programs, called computational graphs. It is a fundamental tool in several fields, most notably machine learning, where it is the key for efficiently training (deep) neural networks. Recent years have witnessed the quick growth of a research field called differentiable programming, the aim of which is to express computational graphs more synthetically and modularly by resorting to actual programming languages endowed with control flow operators and higher-order combinators, such as map and fold. In this paper, we extend the backpropagation algorithm to a paradigmatic example of such a programming language: we define a compositional program transformation from the simply-typed lambda-calculus to itself augmented with a notion of linear negation, and prove that this computes the gradient of the source program with the same efficiency as first-order backpropagation. The transformation is completely effect-free and thus provides a purely logical understanding of the dynamics of backpropagation.

22 citations


Proceedings ArticleDOI
27 Oct 2019
TL;DR: Gerenuk is developed, a compiler and runtime that aims to enable a JVM-based data-parallel system to achieve near-native efficiency by transforming a set of statements in the system for direct execution over inlined native bytes.
Abstract: Big Data systems are typically implemented in object-oriented languages such as Java and Scala due to the quick development cycle they provide. These systems are executed on top of a managed runtime such as the Java Virtual Machine (JVM), which requires each data item to be represented as an object before it can be processed. This representation is the direct cause of many kinds of severe inefficiencies. We developed Gerenuk, a compiler and runtime that aims to enable a JVM-based data-parallel system to achieve near-native efficiency by transforming a set of statements in the system for direct execution over inlined native bytes. The key insight leading to Gerenuk's success is two-fold: (1) analytics workloads often use immutable and confined data types. If we speculatively optimize the system and user code with this assumption, the transformation can be made tractable. (2) The flow of data starts at a deserialization point where objects are created from a sequence of native bytes and ends at a serialization point where they are turned back into a byte sequence to be sent to the disk or network. This flow naturally defines a speculative execution region (SER) to be transformed. Gerenuk compiles a SER speculatively into a version that can operate directly over native bytes that come from the disk or network. The Gerenuk runtime aborts the SER execution upon violations of the immutability and confinement assumption and switches to the slow path by deserializing the bytes and re-executing the original SER. Our evaluation on Spark and Hadoop demonstrates promising results.

20 citations


Proceedings ArticleDOI
23 Sep 2019
TL;DR: It is argued in this paper that one can improve program comprehension when she applies particular transformations to introduce lambda expressions, and that state-of-the-art models for estimating program readability are not helpful to capture the benefits of a program transformation to introducelambda expressions.
Abstract: Background: The Java programming language version eighth introduced a number of features that encourage the functional style of programming, including the support for lambda expressions and the Stream API. Currently, there is a common wisdom that refactoring a legacy code to introduce lambda expressions, besides other potential benefits, simplifies the code and improves program comprehension. Aims: The purpose of this paper is to investigate this belief, conducting an in depth study to evaluate the effect of introducing lambda expressions on program comprehension. Method: We conduct this research using a mixed-method study. First, we quantitatively analyze 66 pairs of real code snippets, where each pair corresponds to the body of a method before and after the introduction of lambda expressions. We computed two metrics related to source code complexity (number of lines of code and cyclomatic complexity) and two metrics that estimate the readability of the source code. Second, we conduct a survey with practitioners to collect their perceptions about the benefits on program comprehension, with the introduction of lambda expressions. The practitioners evaluate a number between three and six pairs of code snippets, to answer questions about possible improvements. Results: We found contradictory results in our research. Based on the quantitative assessment, we could not find evidences that the introduction of lambda expressions improves software readability--one of the components of program comprehension. Differently, our findings of the qualitative assessment suggest that the introduction of lambda expression improves program comprehension. Implications: We argue in this paper that one can improve program comprehension when she applies particular transformations to introduce lambda expressions (e.g., replacing anonymous inner classes by lambda expressions). In addition, the opinion of the participants shine the opportunities in which a transformation for introducing lambda might be advantageous. This might support the implementation of effective tools for automatic program transformations. Finally, our results suggest that state-of-the-art models for estimating program readability are not helpful to capture the benefits of a program transformation to introduce lambda expressions.

18 citations


Proceedings ArticleDOI
10 Nov 2019
TL;DR: An extensive set of experiments demonstrate that AuCS can automate 20 out of 24 erroneous synchronization scenarios, and the effectiveness and efficiency of AuCS is evaluated.
Abstract: While CUDA has been the most popular parallel computing platform and programming model for general purpose GPU computing, CUDA synchronization undergoes significant challenges for GPU programmers due to its intricate parallel computing mechanism and coding practices. In this paper, we propose AuCS, the first general framework to automate synchronization for CUDA kernel functions. AuCS transforms the original LLVM-level CUDA program control flow graph in a semantic-preserving manner for exploring the possible barrier function locations. Accordingly, AuCS develops mechanisms to correctly place barrier functions for automating synchronization in multiple erroneous (challenging-to-be-detected) synchronization scenarios, including data race, barrier divergence, and redundant barrier functions. To evaluate the effectiveness and efficiency of AuCS, we conduct an extensive set of experiments and the results demonstrate that AuCS can automate 20 out of 24 erroneous synchronization scenarios.

15 citations


Proceedings ArticleDOI
16 Feb 2019
TL;DR: The design and the implementation of Locus are discussed, a system and a language to orchestrate the optimization of applications that is intended to help experts in the optimization process, specially for complex, long-lived applications that are to be executed on different environments.
Abstract: We discuss the design and the implementation of Locus, a system and a language to orchestrate the optimization of applications. The increasing complexity of machines and the large space of program variants, produced by the many transformations available, conspire to make compilers deliver unsatisfactory performance. As a result, optimization experts must intervene to manually explore the space of program variants seeking the best version for each target machine. This intervention is unproductive, and maintaining and managing sequences of transformations as new architectures are adopted and new application features are incorporated is challenging. Locus allows collections of program transformation sequences to be specified separately from the application code. The language is able to represent in a clear notation complex collections of transformations that are applied to code regions selected by the programmer. The system integrates multiple optimization modules as well as search modules that facilitate the efficient traversal of the space of program variants. Locus is intended to help experts in the optimization process, specially for complex, long-lived applications that are to be executed on different environments. Four examples are presented to illustrate the power and simplicity of the language. Although not the primary focus of this paper, the examples also show that exploring the space of variants typically leads to better performing codes than those produced by conventional compiler optimizations that are based on heuristics.

12 citations


Journal ArticleDOI
TL;DR: In this article, the authors focus on characterizing plastic code regions in Java programs, i.e., the code regions that are modifiable while maintaining functional correctness, according to a test suite.
Abstract: Neutral program variants are alternative implementations of a program, yet equivalent with respect to the test suite. Techniques such as approximate computing or genetic improvement share the intuition that potential for enhancements lies in these acceptable behavioral differences (e.g., enhanced performance or reliability). Yet, the automatic synthesis of neutral program variants, through program transformations remains a key challenge. This work aims at characterizing plastic code regions in Java programs, i.e., the code regions that are modifiable while maintaining functional correctness, according to a test suite. Our empirical study relies on automatic variations of 6 real-world Java programs. First, we transform these programs with three state-of-the-art program transformations: add, replace and delete statements. We get a pool of 23,445 neutral variants, from which we gather the following novel insights: developers naturally write code that supports fine-grain behavioral changes; statement deletion is a surprisingly effective program transformation; high-level design decisions, such as the choice of a data structure, are natural points that can evolve while keeping functionality. Second, we design 3 novel program transformations, targeted at specific plastic regions. New experiments reveal that respectively 60%, 58% and 73% of the synthesized variants (175,688 in total) are neutral and exhibit execution traces that are different from the original.

11 citations


Book ChapterDOI
08 Apr 2019
TL;DR: This work design, implement and prove correct a program instrumentation phase as part of the formally verified compiler CompCert that enforces a sandboxing security property a priori and eliminates the need for a binary verifier and leverages the soundness proof of the compiler to prove the security of the sandboxing transformation.
Abstract: Software Fault Isolation (SFI) is a security-enhancing program transformation for instrumenting an untrusted binary module so that it runs inside a dedicated isolated address space, called a sandbox. To ensure that the untrusted module cannot escape its sandbox, existing approaches such as Google’s Native Client rely on a binary verifier to check that all memory accesses are within the sandbox. Instead of relying on a posteriori verification, we design, implement and prove correct a program instrumentation phase as part of the formally verified compiler CompCert that enforces a sandboxing security property a priori. This eliminates the need for a binary verifier and, instead, leverages the soundness proof of the compiler to prove the security of the sandboxing transformation. The technical contributions are a novel sandboxing transformation that has a well-defined C semantics and which supports arbitrary function pointers, and a formally verified C compiler that implements SFI. Experiments show that our formally verified technique is a competitive way of implementing SFI.

Proceedings ArticleDOI
25 Jun 2019
TL;DR: A compositional proof principle for proving that a transformation is IFP is proposed and it is shown how a translation validation technique can be used to automatically verify and even close information-flow leaks introduced by standard compiler passes such as dead-store elimination and register allocation.
Abstract: Correct compilers perform program transformations preserving input/output behaviours of programs. Yet, correctness does not prevent program optimisations from introducing information-flow leaks that would make the target program more vulnerable to side-channel attacks than the source program. To tackle this problem, we propose a notion of Information-Flow Preserving (IFP) program transformation which ensures that a target program is no more vulnerable to passive side-channel attacks than a source program. To protect against a wide range of attacks, we model an attacker who is granted arbitrary memory accesses for a pre-defined set of observation points. We propose a compositional proof principle for proving that a transformation is IFP. Using this principle, we show how a translation validation technique can be used to automatically verify and even close information-flow leaks introduced by standard compiler passes such as dead-store elimination and register allocation. The technique has been experimentally validated on the CompCert C compiler.

Journal ArticleDOI
20 Aug 2019
TL;DR: In this paper, a property-based abstraction technique is proposed to control polyvariance in program specialisation in a standard online specialisation algorithm, using a set of properties to define a finite set of abstract versions of predicates, ensuring that only a finite number of specialised versions is generated.
Abstract: In this paper we show that property-based abstraction, an established technique originating in software model checking, is a flexible method of controlling polyvariance in program specialisation in a standard online specialisation algorithm. Specialisation is a program transformation that transforms a program with respect to given constraints that restrict its behaviour. Polyvariant specialisation refers to the generation of two or more specialised versions of the same program code. The same program point can be reached more than once during a computation, with different constraints applying in each case, and polyvariant specialisation allows different specialisations to be realised. A property-based abstraction uses a finite set of properties to define a finite set of abstract versions of predicates, ensuring that only a finite number of specialised versions is generated. The particular choice of properties is critical for polyvariance; too few versions can result in insufficient specialisation, while too many can result in an increase of code size with no corresponding efficiency gains. Using examples, we show the flexibility of specialisation with property-based abstraction and discuss its application in control flow refinement, verification, termination analysis and dimension-based specialisation.

Posted Content
TL;DR: Venkman is a system that employs program transformation to completely thwart Spectre attacks that poison entries in the Branch Target Buffer and the Return Stack Buffer, allowing safe sharing and reuse among programs while maintaining strong protection against Spectre attacks.
Abstract: Side-channel attacks such as Spectre that utilize speculative execution to steal application secrets pose a significant threat to modern computing systems. While program transformations can mitigate some Spectre attacks, more advanced attacks can divert control flow speculatively to bypass these protective instructions, rendering existing defenses useless. In this paper, we present Venkman: a system that employs program transformation to completely thwart Spectre attacks that poison entries in the Branch Target Buffer (BTB) and the Return Stack Buffer (RSB). Venkman transforms code so that all valid targets of a control-flow transfer have an identical alignment in the virtual address space; it further transforms all branches to ensure that all entries added to the BTB and RSB are properly aligned. By transforming all code this way, Venkman ensures that, in any program wanting Spectre defenses, all control-flow transfers, including speculative ones, do not skip over protective instructions Venkman adds to the code segment to mitigate Spectre attacks. Unlike existing defenses, Venkman does not reduce sharing of the BTB and RSB and does not flush these structures, allowing safe sharing and reuse among programs while maintaining strong protection against Spectre attacks. We built a prototype of Venkman on an IBM POWER8 machine. Our evaluation on the SPEC benchmarks and selected applications shows that Venkman increases execution time to 3.47$\times$ on average and increases code size to 1.94$\times$ on average when it is used to ensure that fences are executed to mitigate Spectre attacks. Our evaluation also shows that Spectre-resistant Software Fault Isolation (SFI) built using Venkman incurs a geometric mean of 2.42$\times$ space overhead and 1.68$\times$ performance overhead.

Proceedings ArticleDOI
01 Apr 2019
TL;DR: This paper proposes CFHider, a hardware-assisted method to protect the control flow confidentiality by combining program transformation and Intel Software Guard Extension (SGX) technology, which moves branch statement conditions to an opaque and trusted memory space, i.e., the enclave.
Abstract: When a program is executed on an untrusted cloud, the confidentiality of the program's logics needs to be protected. Control flow obfuscation is a direct approach to obtain this goal. However, existing methods in this direction cannot achieve both high confidentiality and low overhead. In this paper, we propose CFHider, a hardware-assisted method to protect the control flow confidentiality. By combining program transformation and Intel Software Guard Extension (SGX) technology, CFHider moves branch statement conditions to an opaque and trusted memory space, i.e., the enclave, thereby offering a guaranteed control flow confidentiality. Based on the design of CFHider, we developed a prototype system targeting on Java applications. Our analysis and experimental results indicate that CFHider is effective in protecting the control flow confidentiality and incurs a much reduced performance overhead than existing software-based solutions (by a factor of 8.8).

Journal ArticleDOI
TL;DR: An alternative, more permissive answer set semantics, called the determining inference (DI) semantics is presented, which induces a DI-semantics for simple disjunctive programs and leads to giving a satisfactory solution to the open problem presented by Hitzler and Seda about characterizing split normal derivatives.

Proceedings ArticleDOI
16 Feb 2019
TL;DR: Poly-prof is developed, an end-to-end infrastructure for dynamic binary analysis, which produces feedback about the potential to apply complex program rescheduling and can handle both inter- and intra-procedural aspects of the program in a unified way, thus providing interprocesural transformation feedback.
Abstract: Profiling feedback is an important technique used by developers for performance debugging, where it is usually used to pinpoint performance bottlenecks and also to find optimization opportunities. Assessing the validity and potential benefit of a program transformation requires accurate knowledge of the data flow and dependencies, which can be uncovered by profiling a particular execution of the program. In this work we develop poly-prof, an end-to-end infrastructure for dynamic binary analysis, which produces feedback about the potential to apply complex program rescheduling. Our tool can handle both inter- and intraprocedural aspects of the program in a unified way, thus providing interprocedural transformation feedback.

Proceedings ArticleDOI
01 Sep 2019
TL;DR: This paper implemented an innovative algorithm based on computing reaching definitions, only assuming that global variables and formal parameters are defined at the beginning of the program, and compared it to a well-known dominance frontiers-based algorithm in the Clang/LLVM compiler framework.
Abstract: The Static Single Assignment (SSA) form is an intermediate representation used for the analysis and optimization of programs in modern compilers. The ϕ-function placement is the most computationally expensive part of converting any program into its SSA form. The most widely-used ϕ-function placement algorithms are based on computing dominance frontiers. However, this kind of algorithms works under the limiting assumption that all variables are defined at the beginning of the program, which is not the case for local variables. In this paper, we introduce an innovative algorithm based on computing reaching definitions, only assuming that global variables and formal parameters are defined at the beginning of the program. We implemented our algorithm and compared it to a well-known dominance frontiers-based algorithm in the Clang/LLVM compiler framework by performing experiments on a benchmarking suite for Perl. The results of our experiments show that, besides a few computationally expensive cases, our algorithm is fairly efficient, and most notably it produces up to 169% and on an average 74% fewer ϕ-functions than the reference dominance frontiers-based algorithm.

Proceedings ArticleDOI
25 May 2019
TL;DR: This paper introduces a novel global optimization framework that achieves significantly higher worst-case accuracy than the state-of-the-art numerical optimization tool and applies it on real-world code to successfully detect numerical bugs that have been confirmed by developers.
Abstract: Numerical code is often applied in the safety-critical, but resource-limited areas. Hence, it is crucial for it to be correct and efficient, both of which are difficult to ensure. On one hand, accumulated rounding errors in numerical programs can cause system failures. On the other hand, arbitrary/infinite-precision arithmetic, although accurate, is infeasible in practice and especially in resource-limited scenarios because it performs thousands of times slower than floating-point arithmetic. Thus, it has been a significant challenge to obtain high-precision, easy-to-maintain, and efficient numerical code. This paper introduces a novel global optimization framework to tackle this challenge. Using our framework, a developer simply writes the infinite-precision numerical program directly following the problem's mathematical requirement specification. The resulting code is correct and easy-to-maintain, but inefficient. Our framework then optimizes the program in a global fashion (i.e., considering the whole program, rather than individual expressions or statements as in prior work), the key technical difficulty this work solves. To this end, it analyzes the program's numerical value flows across different statements through a symbolic trace extraction algorithm, and generates optimized traces via stochastic algebraic transformations guided by effective rule selection. We first evaluate our technique on numerical benchmarks from the literature; results show that our global optimization achieves significantly higher worst-case accuracy than the state-of-the-art numerical optimization tool. Second, we show that our framework is also effective on benchmarks having complicated program structures, which are challenging for numerical optimization. Finally, we apply our framework on real-world code to successfully detect numerical bugs that have been confirmed by developers.

Journal ArticleDOI
20 Dec 2019
TL;DR: The seminaïve transformation to higher-order programs written in the Datafun language is extended, which extends Datalog with features like first-class relations, higher- order functions, and datatypes like sum types.
Abstract: One of the workhorse techniques for implementing bottom-up Datalog engines is seminaive evaluation. This optimization improves the performance of Datalog's most distinctive feature: recursively defined predicates. These are computed iteratively, and under a naive evaluation strategy, each iteration recomputes all previous values. Seminaive evaluation computes a safe approximation of the difference between iterations. This can asymptotically improve the performance of Datalog queries. Seminaive evaluation is defined partly as a program transformation and partly as a modified iteration strategy, and takes advantage of the first-order nature of Datalog code. This paper extends the seminaive transformation to higher-order programs written in the Datafun language, which extends Datalog with features like first-class relations, higher-order functions, and datatypes like sum types.

Proceedings ArticleDOI
03 Dec 2019
TL;DR: The first systematic study of match-action program representations is provided in order to assist network programmers in navigating this vast design space and finds that normalization generally improves the capacity of the control plane to program the data-plane and to observe its state, at the same time having negligible, or positive, performance impact.
Abstract: Packet processing programs may have multiple semantically equivalent representations in terms of the match-action abstraction exposed by the underlying data plane. Some representations may encode the entire packet processing program into one large table allowing packets to be matched in a single lookup, while others may encode the same functionality decomposed into a pipeline of smaller match-action tables, maximizing modularity at the cost of increased lookup latency. In this paper, we provide the first systematic study of match-action program representations in order to assist network programmers in navigating this vast design space. Borrowing from relational database and formal language theory, we define a framework for the equivalent transformation of match-action programs to obtain certain irredundant representations that we call "normal forms". We find that normalization generally improves the capacity of the control plane to program the data-plane and to observe its state, at the same time having negligible, or positive, performance impact.

Journal ArticleDOI
10 Oct 2019
TL;DR: A formal model, based on a high-level intermediate analysis language, a practical realization in a prototype tool that analyzes C code, and an experimental evaluation that demonstrates competitive results on a series of benchmarks are presented.
Abstract: Despite decades of progress, static analysis tools still have great difficulty dealing with programs that combine arithmetic, loops, dynamic memory allocation, and linked data structures. In this paper we draw attention to two fundamental reasons for this difficulty: First, typical underlying program abstractions are low-level and inherently scalar, characterizing compound entities like data structures or results computed through iteration only indirectly. Second, to ensure termination, analyses typically project away the dimension of time, and merge information per program point, which incurs a loss in precision. As a remedy, we propose to make collective operations first-class in program analysis – inspired by Σ-notation in mathematics, and also by the success of high-level intermediate languages based on @map/reduce@ operations in program generators and aggressive optimizing compilers for domain-specific languages (DSLs). We further propose a novel structured heap abstraction that preserves a symbolic dimension of time, reflecting the program’s loop structure and thus unambiguously correlating multiple temporal points in the dynamic execution with a single point in the program text. This paper presents a formal model, based on a high-level intermediate analysis language, a practical realization in a prototype tool that analyzes C code, and an experimental evaluation that demonstrates competitive results on a series of benchmarks. Remarkably, our implementation achieves these results in a fully semantics-preserving strongest-postcondition model, which is a worst-case for analysis/verification. The underlying ideas, however, are not tied to this model and would equally apply in other settings, e.g., demand-driven invariant inference in a weakest-precondition model. Given its semantics-preserving nature, our implementation is not limited to analysis for verification, but can also check program equivalence, and translate legacy C code to high-performance DSLs.

Journal ArticleDOI
TL;DR: An approach of scenario-oriented program slicing is proposed, which combines constraint logic programming and program transformation, and is implemented and demonstrated its effectiveness and efficiency on three open source software projects in GitHub.
Abstract: Program slicing, as a technique of program decomposition, is widely used in the field of program testing, model checking, software verification, symbolic execution, and other fields. However, the traditional approaches of program slicing tend to produce too large slices and the static program analyses are hard to be precise enough. Scenario-oriented program slicing, which considers the actual usage of software, gives a more precise perspective of program slicing. In this paper, we propose an approach of scenario-oriented program slicing, which combines constraint logic programming and program transformation. According to the observation that the output of a program transformation is a semantically equivalent program where the properties of interest are preserved, we can apply a sequence of transformations, more powerful than those needed for program specialization, refining the slicing to the desired degree of precision. And constraint logic programming has been shown to be a powerful, flexible formalism to reason about the correctness of programs. The novel contributions of this paper are as follows: 1) converting the problem of program slicing into program transformation and retrieval; 2) presenting a set of constraint handling rules for scenario-oriented program slicing in constraint logical programming programs; and 3) deriving a scenario-oriented program slicing algorithm. The method of scenario-oriented program slicing has been implemented and we have demonstrated its effectiveness and efficiency on three open source software projects in GitHub.

Journal ArticleDOI
John P. Gallagher1
TL;DR: Property-based abstraction is shown to be a flexible method of controlling polyvariance in program specialisation in a standard online specialisation algorithm and its application in control flow refinement, verification, termination analysis and dimension-based specialisation is discussed.
Abstract: In this paper we show that property-based abstraction, an established technique originating in software model checking, is a flexible method of controlling polyvariance in program specialisation in a standard online specialisation algorithm. Specialisation is a program transformation that transforms a program with respect to given constraints that restrict its behaviour. Polyvariant specialisation refers to the generation of two or more specialised versions of the same program code. The same program point can be reached more than once during a computation, with different constraints applying in each case, and polyvariant specialisation allows different specialisations to be realised. A property-based abstraction uses a finite set of properties to define a finite set of abstract versions of predicates, ensuring that only a finite number of specialised versions is generated. The particular choice of properties is critical for polyvariance; too few versions can result in insufficient specialisation, while too many can result in an increase of code size with no corresponding efficiency gains. Using examples, we show the flexibility of specialisation with property-based abstraction and discuss its application in control flow refinement, verification, termination analysis and dimension-based specialisation.

Book ChapterDOI
08 Oct 2019
TL;DR: A polyvariant semi-inversion algorithm for conditional constructor term rewriting systems, which can model logic and functional languages, and makes use of local inversion and a simple but effective heuristic.
Abstract: Inversion is an important and useful program transformation and has been studied in various programming language paradigms. Semi-inversion is more general than just swapping the input and output of a program; instead, parts of the input and output can be freely swapped. In this paper, we present a polyvariant semi-inversion algorithm for conditional constructor term rewriting systems. These systems can model logic and functional languages, which have the advantage that semi-inversion, as well as partial and full inversion, can be studied across different programming paradigms. The semi-inverter makes use of local inversion and a simple but effective heuristic and is proven to be correct. A Prolog implementation is applied to several problems, including inversion of a simple encrypter and of a program inverter for a reversible language.

18 Feb 2019
TL;DR: This work prototype Coccinelle4J, an extension to Cocc inelle, which is a program transformation tool designed for widespread changes in C code, in order to work on Java source code, and describes how SmPL can be used to express several API migrations.
Abstract: Developing software often requires code changes that are widespread and applied to multiple locations. There are tools for Java that allow developers to specify patterns for program matching and source-to-source transformation. However, to our knowledge, none allows for transforming code based on its control-flow context. We prototype Coccinelle4J, an extension to Coccinelle, which is a program transformation tool designed for widespread changes in C code, in order to work on Java source code. We adapt Coccinelle to be able to apply scripts written in the Semantic Patch Language (SmPL), a language provided by Coccinelle, to Java source files. As a case study, we demonstrate the utility of Coccinelle4J with the task of API migration. We show 6 semantic patches to migrate from deprecated Android API methods on several open source Android projects. We describe how SmPL can be used to express several API migrations and justify several of our design decisions.

Proceedings ArticleDOI
15 Jul 2019
TL;DR: Coccinelle4J as discussed by the authors is an extension to Cocinelle, which is a program transformation tool designed for widespread changes in C code, in order to work on Java source code.
Abstract: Developing software often requires code changes that are widespread and applied to multiple locations. There are tools for Java that allow developers to specify patterns for program matching and source-to-source transformation. However, to our knowledge, none allows for transforming code based on its control-flow context. We prototype Coccinelle4J, an extension to Coccinelle, which is a program transformation tool designed for widespread changes in C code, in order to work on Java source code. We adapt Coccinelle to be able to apply scripts written in the Semantic Patch Language (SmPL), a language provided by Coccinelle, to Java source files. As a case study, we demonstrate the utility of Coccinelle4J with the task of API migration. We show 6 semantic patches to migrate from deprecated Android API methods on several open source Android projects. We describe how SmPL can be used to express several API migrations and justify several of our design decisions.

Posted Content
TL;DR: Two different batching algorithms are presented: a simpler, previously published one that inherits recursion from the host Python, and a more complex, novel one that implemenents recursion directly and can batch across it.
Abstract: We present a general approach to batching arbitrary computations for accelerators such as GPUs. We show orders-of-magnitude speedups using our method on the No U-Turn Sampler (NUTS), a workhorse algorithm in Bayesian statistics. The central challenge of batching NUTS and other Markov chain Monte Carlo algorithms is data-dependent control flow and recursion. We overcome this by mechanically transforming a single-example implementation into a form that explicitly tracks the current program point for each batch member, and only steps forward those in the same place. We present two different batching algorithms: a simpler, previously published one that inherits recursion from the host Python, and a more complex, novel one that implemenents recursion directly and can batch across it. We implement these batching methods as a general program transformation on Python source. Both the batching system and the NUTS implementation presented here are available as part of the popular TensorFlow Probability software package.

Journal ArticleDOI
TL;DR: A novel approach to program optimisation based on compiler-based type-driven program transformations and a fast and accurate cost/performance model for the target architecture and the effectiveness of the approach is demonstrated by comparison with a commercial toolchain.
Abstract: In this paper we present a novel approach to program optimisation based on compiler-based type-driven program transformations and a fast and accurate cost/performance model for the target architecture. We target streaming programs for the problem domain of scientific computing, such as numerical weather prediction. We present our theoretical framework for type-driven program transformation, our target high-level language and intermediate representation languages and the cost model and demonstrate the effectiveness of our approach by comparison with a commercial toolchain.

Journal ArticleDOI
TL;DR: A Linear Temporal Logic (LTL) model checking approach to verify a dataflow program transformation, using three LTL properties to identify cyclostatic actors in dynamic dataflow programs.