scispace - formally typeset
Search or ask a question

Showing papers on "Program transformation published in 2017"


Proceedings ArticleDOI
20 May 2017
TL;DR: Refazer as mentioned in this paper is a technique for automatically learning program transformations from examples of code edits performed by developers to fix incorrect programming assignment submissions, which can be used as input-output examples to learn program transformations.
Abstract: Automatic program transformation tools can be valuable for programmers to help them with refactoring tasks, and for Computer Science students in the form of tutoring systems that suggest repairs to programming assignments. However, manually creating catalogs of transformations is complex and time-consuming. In this paper, we present Refazer, a technique for automatically learning program transformations. Refazer builds on the observation that code edits performed by developers can be used as input-output examples for learning program transformations. Example edits may share the same structure but involve different variables and subexpressions, which must be generalized in a transformation at the right level of abstraction. To learn transformations, Refazer leverages state-of-the-art programming-by-example methodology using the following key components: (a) a novel domain-specific language (DSL) for describing program transformations, (b) domain-specific deductive algorithms for efficiently synthesizing transformations in the DSL, and (c) functions for ranking the synthesized transformations. We instantiate and evaluate Refazer in two domains. First, given examples of code edits used by students to fix incorrect programming assignment submissions, we learn program transformations that can fix other students' submissions with similar faults. In our evaluation conducted on 4 programming tasks performed by 720 students, our technique helped to fix incorrect submissions for 87% of the students. In the second domain, we use repetitive code edits applied by developers to the same project to synthesize a program transformation that applies these edits to other locations in the code. In our evaluation conducted on 56 scenarios of repetitive edits taken from three large C# open-source projects, Refazer learns the intended program transformation in 84% of the cases using only 2.9 examples on average.

191 citations


Proceedings ArticleDOI
01 Jan 2017
TL;DR: This paper presents a relational model of a type-and-effect system for a higher-order, concurrent program- ming language that supports both effect-based optimizations and data abstraction and proves semantic invariants expressed by the effect annotations strong enough to prove advanced program transformations.
Abstract: Recently we have seen a renewed interest in programming languages that tame the complexity of state and concurrency through refined type systems with more fine-grained control over effects. In addition to simplifying reasoning and eliminating whole classes of bugs, statically tracking effects opens the door to advanced compiler optimizations. In this paper we present a relational model of a type-and-effect system for a higher-order, concurrent program- ming language. The model precisely captures the semantic invariants expressed by the effect annotations. We demonstrate that these invariants are strong enough to prove advanced program transformations, including automatic parallelization of expressions with suitably disjoint effects. The model also supports refinement proofs between abstract data types implementations with different internal data representations, including proofs that fine-grained concurrent algorithms refine their coarse-grained counterparts. This is the first model for such an expressive language that supports both effect-based optimizations and data abstraction. The logical relation is defined in Iris, a state-of-the-art higher-order concurrent separation logic. This greatly simplifies proving well-definedness of the logical relation and also provides us with a powerful logic for reasoning in the model.

39 citations


Journal ArticleDOI
TL;DR: This article defines a set of transformation rules allowing the generation, under certain conditions and in polynomial time, of larger expressions by performing limited formal computations, possibly among several iterations of a loop, to improve the numerical accuracy of the program results.
Abstract: The dangers of programs performing floating-point computations are well known. This is due to the sensitivity of the results to the way formulae are written. These last years, several techniques have been proposed concerning the transformation of arithmetic expressions in order to improve their numerical accuracy and, in this article, we go one step further by automatically transforming larger pieces of code containing assignments and control structures. We define a set of transformation rules allowing the generation, under certain conditions and in polynomial time, of larger expressions by performing limited formal computations, possibly among several iterations of a loop. These larger expressions are better suited to improve, by reparsing, the numerical accuracy of the program results. We use abstract interpretation-based static analysis techniques to over-approximate the round-off errors in programs and during the transformation of expressions. A tool has been implemented and experimental results are presented concerning classical numerical algorithms and algorithms for embedded systems.

30 citations


Journal ArticleDOI
27 Dec 2017
TL;DR: A closed-form solution for modeling of misses in a set associative cache hierarchy is developed that can enable program transformation choice at compile time to optimize cache misses.
Abstract: Optimizing compilers implement program transformation strategies aimed at reducing data movement to or from main memory by exploiting the data-cache hierarchy. However, instead of attempting to minimize the number of cache misses, very approximate cost models are used, due to the lack of precise compile-time models for misses for hierarchical caches. The current state of practice for cache miss analysis is based on accurate simulation. However, simulation requires time proportional to the dataset/problem size, as well as the number of distinct cache configurations of interest to be evaluated. This paper takes a fundamentally different approach, by focusing on polyhedral programs with static control flow. Instead of relying on costly simulation, a closed-form solution for modeling of misses in a set associative cache hierarchy is developed. This solution can enable program transformation choice at compile time to optimize cache misses. A tool implementing the approach has been developed and used for validation of the framework.

29 citations


Journal ArticleDOI
TL;DR: A restricted form of higher-order refinement types where refinement predicates can refer to functions is introduced, and a systematic program transformation is formalized to reduce type checking/inference for higher-orders refinement types to that for first- order refinement types, so that the latter can be automatically solved by using an existing software model checker.

15 citations


Journal ArticleDOI
TL;DR: In this article, a transformation technique called predicate pairing is introduced to transform a set of clauses into an equisatisfiable set whose satisfiability can be proved by finding an ǫ-definable model, and hence can be effectively verified by a state-of-the-art CHC solver.
Abstract: It is well-known that the verification of partial correctness properties of imperative programs can be reduced to the satisfiability problem for constrained Horn clauses (CHCs). However, state-of-the-art solvers for constrained Horn clauses (or CHC solvers) based on predicate abstraction are sometimes unable to verify satisfiability because they look for models that are definable in a given class 𝓐 of constraints, called 𝓐-definable models. We introduce a transformation technique, called Predicate Pairing, which is able, in many interesting cases, to transform a set of clauses into an equisatisfiable set whose satisfiability can be proved by finding an 𝓐-definable model, and hence can be effectively verified by a state-of-the-art CHC solver. In particular, we prove that, under very general conditions on 𝓐, the unfold/fold transformation rules preserve the existence of an 𝓐-definable model, that is, if the original clauses have an 𝓐-definable model, then the transformed clauses have an 𝓐-definable model. The converse does not hold in general, and we provide suitable conditions under which the transformed clauses have an 𝓐-definable model if and only if the original ones have an 𝓐-definable model. Then, we present a strategy, called Predicate Pairing, which guides the application of the transformation rules with the objective of deriving a set of clauses whose satisfiability problem can be solved by looking for 𝓐-definable models. The Predicate Pairing (PP) strategy introduces a new predicate defined by the conjunction of two predicates occurring in the original set of clauses, together with a conjunction of constraints. We will show through some examples that an 𝓐-definable model may exist for the new predicate even if it does not exist for its defining atomic conjuncts. We will also present some case studies showing that Predicate Pairing plays a crucial role in the verification of relational properties of programs, that is, properties relating two programs (such as program equivalence) or two executions of the same program (such as non-interference). Finally, we perform an experimental evaluation of the proposed techniques to assess the effectiveness of Predicate Pairing in increasing the power of CHC solving.

14 citations


Journal ArticleDOI
TL;DR: This article proposes a method capable of symbolically proving CTL* properties of (infinite-state) integer programs and demonstrates the viability of the approach in practice, thus leading to a new class of fully-automated tools capable of proving crucial properties that no tool could previously prove.
Abstract: Temporal logic is a formal system for specifying and reasoning about propositions qualified in terms of time. It offers a unified approach to program verification as it applies to both sequential and parallel programs and provides a uniform framework for describing a system at any level of abstraction. Thus, a number of automated systems have been proposed to exclusively reason about either Computation-Tree Logic (CTL) or Linear Temporal Logic (LTL) in the infinite-state setting. Unfortunately, these logics have significantly reduced expressiveness as they restrict the interplay between temporal operators and path quantifiers, thus disallowing the expression of many practical properties, for example, “along some future an event occurs infinitely often.” Contrarily, CTLa, a superset of both CTL and LTL, can facilitate the interplay between path-based and state-based reasoning. CTLa thus exclusively allows for the expressiveness of properties involving existential system stabilization and “possibility” properties. Until now, there have not existed automated systems that allow for the verification of such expressive CTLa properties over infinite-state systems. This article proposes a method capable of such a task, thus introducing the first known fully automated tool for symbolically proving CTLa properties of (infinite-state) integer programs. The method uses an internal encoding that admits reasoning about the subtle interplay between the nesting of temporal operators and path quantifiers that occurs within CTLa proofs. A program transformation is first employed that trades nondeterminism in the transition relation for nondeterminism explicit in variables predicting future outcomes when necessary. We then synthesize and quantify preconditions over the transformed program that represent program states that satisfy a CTLa formula.This article demonstrates the viability of our approach in practice, thus leading to a new class of fully-automated tools capable of proving crucial properties that no tool could previously prove. Additionally, we consider the linear-past extension to CTLa for infinite-state systems in which the past is linear and each moment in time has a unique past. We discuss the practice of this extension and how it is further supported through the use of history variables. We have implemented our approach and report our benchmarks carried out on case studies ranging from smaller programs to demonstrate the expressiveness of CTLa specifications, to larger code bases drawn from device drivers and various industrial examples.

13 citations


Proceedings ArticleDOI
01 Aug 2017
TL;DR: It is shown that an information flow analysis with fixed labels can be both flow- and path-sensitive, and it allows sound control of information flow in the presence of mutable variables without resorting to run-time mechanisms.
Abstract: This paper investigates a flow- and path-sensitive static information flow analysis. Compared with security type systems with fixed labels, it has been shown that flow-sensitive type systems accept more secure programs. We show that an information flow analysis with fixed labels can be both flow- and path-sensitive. The novel analysis has two major components: 1) a general-purpose program transformation that removes false dataflow dependencies in a program that confuse a fixed-label type system, and 2) a fixed-label type system that allows security types to depend on path conditions. We formally prove that the proposed analysis enforces a rigorous security property: noninterference. Moreover, we show that the analysis is strictly more precise than a classic flow-sensitive type system, and it allows sound control of information flow in the presence of mutable variables without resorting to run-time mechanisms.

13 citations


Journal ArticleDOI
TL;DR: This paper presents an example of formal reasoning about the semantics of a Prolog program of practical importance (the SAT solver of Howe and King), and shows that the paradigm of semantics-preserving program transformations may be not sufficient.
Abstract: This paper presents an example of formal reasoning about the semantics of a Prolog program of practical importance (the SAT solver of Howe and King). The program is treated as a definite clause logic program with added control. The logic program is constructed by means of stepwise refinement, hand in hand with its correctness and completeness proofs. The proofs are declarative – they do not refer to any operational semantics. Each step of the logic program construction follows a systematic approach to constructing programs which are provably correct and complete. We also prove that correctness and completeness of the logic program is preserved in the final Prolog program. Additionally, we prove termination, occur-check freedom and non-floundering. Our example shows how dealing with “logic” and with “control” can be separated. Most of the proofs can be done at the “logic” level, abstracting from any operational semantics. The example employs approximate specifications; they are crucial in simplifying reasoning about logic programs. It also shows that the paradigm of semantics-preserving program transformations may be not sufficient. We suggest considering transformations which preserve correctness and completeness with respect to an approximate specification.

12 citations


Journal ArticleDOI
TL;DR: The effectiveness of abstract interpretation in detecting parts of program specifications that can be statically simplified to true or false, as well as in reducing the cost of the run-time checks required for the remaining parts of these specifications are explored.

11 citations


Journal ArticleDOI
29 Aug 2017
TL;DR: The disintegration algorithm is extended to symbolically condition arrays in probabilistic programs, where repetition is treated symbolically and without unrolling loops, and the method works well for arbitrarily-sized arrays of independent random choices.
Abstract: Probabilistic programming systems make machine learning more modular by automating inference. Recent work by Shan and Ramsey makes inference more modular by automating conditioning. Their technique uses a symbolic program transformation that treats conditioning generally via the measure-theoretic notion of disintegration. This technique, however, is limited to conditioning a single scalar variable. As a step towards modular inference for realistic machine learning applications, we have extended the disintegration algorithm to symbolically condition arrays in probabilistic programs. The extended algorithm implements lifted disintegration, where repetition is treated symbolically and without unrolling loops. The technique uses a language of index variables for tracking expressions at various array levels. We find that the method works well for arbitrarily-sized arrays of independent random choices, with the conditioning step taking time linear in the number of indices needed to select an element.

Posted Content
TL;DR: This work has been partially supported by the EU (FEDER) and the Spanish Ministerio de Economia y Competitividad (MINECO) under grants TIN2013-44742-C4-1-R and TIN2016-76843-C 4-1 -R.
Abstract: Essentially, in a reversible programming language, for each forward computation from state $S$ to state $S'$, there exists a constructive method to go backwards from state $S'$ to state $S$. Besides its theoretical interest, reversible computation is a fundamental concept which is relevant in many different areas like cellular automata, bidirectional program transformation, or quantum computing, to name a few. In this work, we focus on term rewriting, a computation model that underlies most rule-based programming languages. In general, term rewriting is not reversible, even for injective functions; namely, given a rewrite step $t_1 \rightarrow t_2$, we do not always have a decidable method to get $t_1$ from $t_2$. Here, we introduce a conservative extension of term rewriting that becomes reversible. Furthermore, we also define two transformations, injectivization and inversion, to make a rewrite system reversible using standard term rewriting. We illustrate the usefulness of our transformations in the context of bidirectional program transformation.

Posted Content
TL;DR: In this article, a flow-and path-sensitive static information flow analysis with fixed labels is proposed, which is strictly more precise than a classic flow-sensitive type system, and it allows sound control of information flow in the presence of mutable variables without resorting to run-time mechanisms.
Abstract: This paper investigates a flow- and path-sensitive static information flow analysis. Compared with security type systems with fixed labels, it has been shown that flow-sensitive type systems accept more secure programs. We show that an information flow analysis with fixed labels can be both flow- and path-sensitive. The novel analysis has two major components: 1) a general-purpose program transformation that removes false dataflow dependencies in a program that confuse a fixed-label type system, and 2) a fixed-label type system that allows security types to depend on path conditions. We formally prove that the proposed analysis enforces a rigorous security property: noninterference. Moreover, we show that the analysis is strictly more precise than a classic flow-sensitive type system, and it allows sound control of information flow in the presence of mutable variables without resorting to run-time mechanisms.

Journal ArticleDOI
TL;DR: It is shown that Constraint Handling Rules (CHR) are a suitable formalism for representing and applying constraint replacements during the transformation of CLP programs and a novel generalization strategy for constraints on integer arrays is proposed.
Abstract: The transformation of constraint logic programs (CLP programs) has been shown to be an effective methodology for verifying properties of imperative programs. By following this methodology, we encode the negation of a partial correctness property of an imperative program prog as a predicate incorrect defined by a CLP program P , and we show that prog is correct by transforming P into the empty program through the application of semantics preserving transformation rules. Some of these rules perform replacements of constraints that encode properties of the data structures manipulated by the program prog. In this paper we show that Constraint Handling Rules (CHR) are a suitable formalism for representing and applying constraint replacements during the transformation of CLP programs. In particular, we consider programs that manipulate integer arrays and we present a CHR encoding of a constraint replacement strategy based on the theory of arrays. We also propose a novel generalization strategy for constraints on integer arrays that combines the CHR constraint replacement strategy with various generalization operators for linear constraints, such as widening and convex hull. Generalization is controlled by additional constraints that relate the variable identifiers in the imperative program prog and the CLP representation of their values. The method presented in this paper has been implemented and we have demonstrated its effectiveness on a set of benchmark programs taken from the literature.

Journal ArticleDOI
04 Dec 2017
TL;DR: This paper introduces an approach to multi-tier programming where the tierless code is decoupled from the tier specification, and shows that slices, together with a recommender system, enable the developer to experiment with different placements of slices, until the distribution of the code satisfies the programmer's needs.
Abstract: Web programmers are often faced with several challenges in the development process of modern, rich internet applications. Technologies for the different tiers of the application have to be selected: a server-side language, a combination of JavaScript, HTML and CSS for the client, and a database technology. Meeting the expectations of contemporary web applications requires even more effort from the developer: many state of the art libraries must be mastered and glued together. This leads to an impedance mismatch problem between the different technologies and it is up to the programmer to align them manually. Multi-tier or tierless programming is a web programming paradigm that provides one language for the different tiers of the web application, allowing the programmer to focus on the actual program logic instead of the accidental complexity that comes from combining several technologies. While current tierless approaches therefore relieve the burden of having to combine different technologies into one application, the distribution of the code is explicitly tied into the program. Certain distribution decisions have an impact on crosscutting concerns such as information security or offline availability. Moreover, adapting the programs such that the application complies better with these concerns often leads to code tangling, rendering the program more difficult to understand and maintain. We introduce an approach to multi-tier programming where the tierless code is decoupled from the tier specification. The developer implements the web application in terms of slices and an external specification that assigns the slices to tiers. A recommender system completes the picture for those slices that do not have a fixed placement and proposes slice refinements as well. This recommender system tries to optimise the tier specification with respect to one or more crosscutting concerns. This is in contrast with current cutting edge solutions that hide distribution decisions from the programmer. In this paper we show that slices, together with a recommender system, enable the developer to experiment with different placements of slices, until the distribution of the code satisfies the programmer's needs. We present a search-based recommender system that maximises the offline availability of a web application and a concrete implementation of these concepts in the tier-splitting tool Stip.js.

Proceedings ArticleDOI
21 Aug 2017
TL;DR: In this article, the authors present the Accurate Recommendation System (ARES) that achieves a higher accuracy than other tools because its algorithms take care of code movements when creating patterns and recommendations, achieving an accuracy of 96% with respect to code changes that developers have manually performed in commits of source code archives.
Abstract: During the life span of large software projects, developers often apply the same code changes to different code locations in slight variations. Since the application of these changes to all locations is time-consuming and error-prone, tools exist that learn change patterns from input examples, search for possible pattern applications, and generate corresponding recommendations. In many cases, the generated recommendations are syntactically or semantically wrong due to code movements in the input examples. Thus, they are of low accuracy and developers cannot directly copy them into their projects without adjustments. We present the Accurate REcommendation System (ARES) that achieves a higher accuracy than other tools because its algorithms take care of code movements when creating patterns and recommendations. On average, the recommendations by ARES have an accuracy of 96% with respect to code changes that developers have manually performed in commits of source code archives. At the same time ARES achieves precision and recall values that are on par with other tools.

Journal ArticleDOI
TL;DR: A new framework for the safe execution of untrusted code called Programs from Proofs (PfP), which transforms the program into an efficiently checkable form, thus guaranteeing quick safety checks for software consumers.
Abstract: Today, software is traded worldwide on global markets, with apps being downloaded to smartphones within minutes or seconds. This poses, more than ever, the challenge of ensuring safety of software in the face of (1) unknown or untrusted software providers together with (2) resource-limited software consumers. The concept of Proof-Carrying Code (PCC), years ago suggested by Necula, provides one framework for securing the execution of untrusted code. PCC techniques attach safety proofs, constructed by software producers, to code. Based on the assumption that checking proofs is usually much simpler than constructing proofs, software consumers should thus be able to quickly check the safety of software. However, PCC techniques often suffer from the size of certificates (i.e., the attached proofs), making PCC techniques inefficient in practice.In this article, we introduce a new framework for the safe execution of untrusted code called Programs from Proofs (PfP). The basic assumption underlying the PfP technique is the fact that the structure of programs significantly influences the complexity of checking a specific safety property. Instead of attaching proofs to program code, the PfP technique transforms the program into an efficiently checkable form, thus guaranteeing quick safety checks for software consumers. For this transformation, the technique also uses a producer-side automatic proof of safety. More specifically, safety proving for the software producer proceeds via the construction of an abstract reachability graph (ARG) unfolding the control-flow automaton (CFA) up to the degree necessary for simple checking. To this end, we combine different sorts of software analysis: expensive analyses incrementally determining the degree of unfolding, and cheap analyses responsible for safety checking. Out of the abstract reachability graph we generate the new program. In its CFA structure, it is isomorphic to the graph and hence another, this time consumer-side, cheap analysis can quickly determine its safety.Like PCC, Programs from Proofs is a general framework instantiable with different sorts of (expensive and cheap) analysis. Here, we present the general framework and exemplify it by some concrete examples. We have implemented different instantiations on top of the configurable program analysis tool CPAchecker and report on experiments, in particular on comparisons with PCC techniques.

Proceedings ArticleDOI
12 Jun 2017
TL;DR: This article introduces the interprocedural transformation of programs to improve accuracy and corrects partly these round-off errors of computations by automatically transforming programs in a source to source manner.
Abstract: Floating-point numbers are used to approximate the exact real numbers in a wide range of domains like numerical simulations, embedded software, etc However, floating-point numbers are a finite approximation of real numbers In practice, this approximation may introduce round-off errors and this can lead to catastrophic results To cope with this issue, we have developed a tool which corrects partly these round-off errors and which consequently improves the numerical accuracy of computations by automatically transforming programs in a source to source manner Our transformation, relies on static analysis by abstract interpretation and operates on pieces of code with assignments, conditionals and loops In former work, we have focused on the intraprocedural transformation of programs and, in this article, we introduce the interprocedural transformation to improve accuracy

Journal ArticleDOI
TL;DR: A new operational semantics is presented, which correctly handles slicing for nonterminating and nondeterministic programs and also presents a modified denotational semantics, which is proved to be equivalent to the operational semantics.
Abstract: Since the original development of program slicing in 1979 there have been many attempts to define a suitable semantics, which will precisely define the meaning of a slice. Particular issues include handling termination and nontermination, slicing nonterminating programs, and slicing nondeterministic programs. In this paper we review and critique the main attempts to construct a semantics for slicing and present a new operational semantics, which correctly handles slicing for nonterminating and nondeterministic programs. We also present a modified denotational semantics, which we prove to be equivalent to the operational semantics. This provides programmers with 2 different methods to prove the correctness of a slice or a slicing algorithm and means that the program transformation theory and FermaT transformation system, developed last 25 years of research, and which has proved so successful in analyzing terminating programs, can now be applied to nonterminating interactive programs.

Journal ArticleDOI
TL;DR: This paper proposes a formal model for specifying and understanding the strength of obfuscating transformations with respect to a given attack model and introduces a framework for transforming abstract domains, i.e., analyses, towards incompleteness.
Abstract: Obfuscation is the art of making code hard to reverse engineer and understand. In this paper, we propose a formal model for specifying and understanding the strength of obfuscating transformations with respect to a given attack model. The idea is to consider the attacker as an abstract interpreter willing to extract information about the program's semantics. In this scenario, we show that obfuscating code is making the analysis imprecise, namely making the corresponding abstract domain incomplete. It is known that completeness is a property of the abstract domain and the program to analyse. We introduce a framework for transforming abstract domains, i.e., analyses, towards incompleteness. The family of incomplete abstractions for a given program provides a characterisation of the potency of obfuscation employed in that program, i.e., its strength against the attack specified by those abstractions. We show this characterisation for known obfuscating transformations used to inhibit program slicing and automated disassembly.

Proceedings ArticleDOI
02 Jan 2017
TL;DR: This work applies the techniques of game semantics to the untyped λ-calculus, but take a more operational viewpoint that uses less mathematical machinery than traditional presentations ofgame semantics.
Abstract: Any expression M in ULC (the untyped λ-calculus) can be compiled into a rather low-level language we call LLL, whose programs contain none of the traditional implementation devices for functional languages: environments, thunks, closures, etc. A compiled program is first-order functional and has a fixed set of working variables, whose number is independent of M. The generated LLL code in effect traverses the subexpressions of M. We apply the techniques of game semantics to the untyped λ-calculus, but take a more operational viewpoint that uses less mathematical machinery than traditional presentations of game semantics. Further, the untyped lambda calculus ULC is compiled into LLL by partially evaluating a traversal algorithm for ULC.

Journal ArticleDOI
TL;DR: It is shown that one kind of equational expressions is sufficient and unification is nothing else than an optimization of Boolean equality.
Abstract: Although functional as well as logic languages use equality to discriminate between logically different cases, the operational meaning of equality is different in such languages. Functional languages reduce equational expressions to their Boolean values, True or False, logic languages use unification to check the validity only and fail otherwise. Consequently, the language Curry, which amalgamates functional and logic programming features, offers two kinds of equational expressions so that the programmer has to distinguish between these uses. We show that this distinction can be avoided by providing an analysis and transformation method that automatically selects the appropriate operation. Without this distinction in source programs, the language design can be simplified and the execution of programs can be optimized. As a consequence, we show that one kind of equational expressions is sufficient and unification is nothing else than an optimization of Boolean equality.

Book ChapterDOI
03 Oct 2017
TL;DR: A semantics-preserving program transformation is presented that drastically improves the precision of existing analyses when deciding if a pointer can alias Null and allows even a flow-insensitive analysis to make use of branch conditions such as checking if apointer is Null and gain precision.
Abstract: Precise analysis of pointer information plays an important role in many static analysis tools. The precision, however, must be balanced against the scalability of the analysis. This paper focusses on improving the precision of standard context and flow insensitive alias analysis algorithms at a low scalability cost. In particular, we present a semantics-preserving program transformation that drastically improves the precision of existing analyses when deciding if a pointer can alias Null. Our program transformation is based on Global Value Numbering, a scheme inspired from compiler optimization literature. It allows even a flow-insensitive analysis to make use of branch conditions such as checking if a pointer is Null and gain precision. We perform experiments on real-world code and show that the transformation improves precision (in terms of the number of dereferences proved safe) from 86.56% to 98.05%, while incurring a small overhead in the running time.

Proceedings ArticleDOI
01 Dec 2017
TL;DR: Concerto is a Parallelization, Orchestration and Distribution Framework, and is a component of the larger Program Transformation and Parallelization Solution, where the Distribution and Mapping process is entirely automated, requires no user directives and is based solely on Dependence and Flow analysis of the sequential program.
Abstract: The important step in Program Parallelization, is identifying the pieces of the given program, that can be run concurrently, on separate processing elements The parallel pieces once identified, need to be hoisted and executed remotely, and the results combined This is a complex process, usually referred to as Program Orchestration and Distribution, and the details are closely tied to the target architecture of the parallel machine Program Distribution in a Shared Memory Parallel Computer, is comparatively simpler, and involves structuring the parallel pieces, as separate threads, with synchronization provided as needed, and just scheduling the threads, on the various processors On the contrary, Program Orchestration on a Distributed Machine, such as a Cluster is more involved, and requires explicit message passing, with the help of Send and Receive primitives, to share variables between the parallel subprograms, which are running on separate machines Concerto is a Parallelization, Orchestration and Distribution Framework, and is a component of our larger Program Transformation and Parallelization Solution The parallel architectures targeted, include both Shared Memory Multicomputer and Distributed Memory Multicomputer However, the focus of this paper, is mainly on the class of Distributed Memory Parallel Machines Here we look at issues involved in Program Distribution, and provide a high level design of Concerto, our solution to the problem, along with the Program Parallelization and Distribution Algorithm Majority of the existing Program Distribution Solutions, require user annotations to identify parallel pieces of code and data, which can be a cumbersome process, from the programmer perspective However in Concerto, the Distribution and Mapping process is entirely automated, requires no user directives and is based solely on Dependence and Flow analysis of the sequential program

Journal ArticleDOI
TL;DR: A set of restructurings to systematically normalize selective syntax in C++ is presented to convert variations in syntax of specific portions of code into a single form to simplify the construction of large, complex program transformation rules.
Abstract: A set of restructurings to systematically normalize selective syntax in C++ is presented. The objective is to convert variations in syntax of specific portions of code into a single form to simplify the construction of large, complex program transformation rules. Current approaches to constructing transformations require developers to account for a large number of syntactic cases, many of which are syntactically different but semantically equivalent. The work identifies classes of such syntactic variations and presents normalizing restructurings to simplify each variation to a single, consistent syntactic form. The normalizing restructurings for C++ are presented and applied to two open source systems for evaluation. The evaluation uses the system's test cases to validate that the normalizing restructurings do not affect the systems' tested behavior. In addition, a set of example transformations that benefit from the prior application of normalizing restructurings are presented along with a small survey to assess the effect of the readability of the resultant code.

Proceedings ArticleDOI
25 Dec 2017
TL;DR: It is shown that the proposed language solves the performance problem in functor applications pointed out by Inoue et al., and that it provides a suitable basis for writing code generators for modules.
Abstract: Program generation has been successful in various domains which need high performance and high productivity. Yet, programming-language supports for program generation need further improvement. An important omission is the functionality of generating modules in a type safe way. Inoue et al. have addressed this issue in 2016, but investigated only a few examples. We propose a language as an extension of (a small subset of) MetaOCaml in which one can manipulate and generate code of a module, and implement it based on a simple translation to MetaOCaml. We show that our language solves the performance problem in functor applications pointed out by Inoue et al., and that it provides a suitable basis for writing code generators for modules.

Posted Content
TL;DR: In this article, a semantics-preserving program transformation is proposed to improve the precision of standard context and flow insensitive alias analysis algorithms at a low scalability cost, which is based on Global Value Numbering, a scheme inspired from compiler optimizations literature.
Abstract: Precise analysis of pointer information plays an important role in many static analysis techniques and tools today. The precision, however, must be balanced against the scalability of the analysis. This paper focusses on improving the precision of standard context and flow insensitive alias analysis algorithms at a low scalability cost. In particular, we present a semantics-preserving program transformation that drastically improves the precision of existing analyses when deciding if a pointer can alias NULL. Our program transformation is based on Global Value Numbering, a scheme inspired from compiler optimizations literature. It allows even a flow-insensitive analysis to make use of branch conditions such as checking if a pointer is NULL and gain precision. We perform experiments on real-world code to measure the overhead in performing the transformation and the improvement in the precision of the analysis. We show that the precision improves from 86.56% to 98.05%, while the overhead is insignificant.

Journal ArticleDOI
02 May 2017
TL;DR: SimDefun as discussed by the authors is a tool that transforms the definition of a given function into a simplified definition of the new function, providing a proof checked by ACL2 that the old and new functions are equivalent.
Abstract: We present a tool, simplify-defun, that transforms the definition of a given function into a simplified definition of a new function, providing a proof checked by ACL2 that the old and new functions are equivalent When appropriate it also generates termination and guard proofs for the new function We explain how the tool is engineered so that these proofs will succeed Examples illustrate its utility, in particular for program transformation in synthesis and verification

Proceedings ArticleDOI
25 Dec 2017
TL;DR: This paper proposes a new loop fusion strategy, which can fuse any loops—even loops with data dependence—and shows that it is useful for program verification because it can simplify loop invariants, and extends the “guess-and-assume” technique to reversing loop execution, which is useful to verify a certain type of consecutive loops.
Abstract: Loop fusion—a program transformation to merge multiple consecutive loops into a single one—has been studied mainly for compiler optimization. In this paper, we propose a new loop fusion strategy, which can fuse any loops—even loops with data dependence—and show that it is useful for program verification because it can simplify loop invariants. The crux of our loop fusion is the following observation: if the state after the first loop were known, the two loop bodies could be computed at the same time without suffering from data dependence by renaming program variables. Our loop fusion produces a program that guesses the unknown state after the first loop nondeterministically, executes the fused loop where variables are renamed, compares the guessed state and the state actually computed by the fused loop, and, if they do not match, diverges. The last two steps of comparison and divergence are crucial to preserve partial correctness. We call our approach “guess-and-assume” because, in addition to the first step to guess, the last two steps can be expressed by the pseudo-instruction assume, used in program verification. We formalize our loop fusion for a simple imperative language and prove that it preserves partial correctness. We further extend the “guess-and-assume” technique to reversing loop execution, which is useful to verify a certain type of consecutive loops. Finally, we confirm by experiments that our transformation techniques are indeed effective for state-of-the-art model checkers to verify a few small programs that they could not.

Journal ArticleDOI
23 Aug 2017
TL;DR: De Angelis et al. as discussed by the authors showed that several safety properties of functional programs modeling a class of cache coherence protocols can be proved by a supercompiler and compare the results with their earlier work on direct verification via supercompilation not using intermediate interpretation.
Abstract: We explore an approach to verification of programs via program transformation applied to an interpreter of a programming language. A specialization technique known as Turchin's supercompilation is used to specialize some interpreters with respect to the program models. We show that several safety properties of functional programs modeling a class of cache coherence protocols can be proved by a supercompiler and compare the results with our earlier work on direct verification via supercompilation not using intermediate interpretation. Our approach was in part inspired by an earlier work by E. De Angelis et al. (2014-2015) where verification via program transformation and intermediate interpretation was studied in the context of specialization of constraint logic programs.