scispace - formally typeset
Search or ask a question

Showing papers on "Program transformation published in 2021"


Proceedings ArticleDOI
11 Jul 2021
TL;DR: Corder as mentioned in this paper is a self-supervised contrastive learning framework for source code model that uses a set of semantic-preserving transformation operators to generate code snippets that are syntactically diverse but semantically equivalent.
Abstract: We propose Corder, a self-supervised contrastive learning framework for source code model. Corder is designed to alleviate the need of labeled data for code retrieval and code summarization tasks. The pre-trained model of Corder can be used in two ways: (1) it can produce vector representation of code which can be applied to code retrieval tasks that do not have labeled data; (2) it can be used in a fine-tuning process for tasks that might still require label data such as code summarization. The key innovation is that we train the source code model by asking it to recognize similar and dissimilar code snippets through acontrastive learning objective. To do so, we use a set of semantic-preserving transformation operators to generate code snippets that are syntactically diverse but semantically equivalent. Through extensive experiments, we have shown that the code models pretrained by Corder substantially outperform the other baselines for code-to-code retrieval, text-to-code retrieval, and code-to-text summarization tasks.

55 citations


Journal ArticleDOI
TL;DR: The results show that even with small semantically preserving changes to the programs, these neural program models often fail to generalize their performance, and suggest that Neural program models based on data and control dependencies in programs generalize better than neural program model based only on abstract syntax trees (ASTs).
Abstract: Context: With the prevalence of publicly available source code repositories to train deep neural network models, neural program models can do well in source code analysis tasks such as predicting method names in given programs that cannot be easily done by traditional program analysis techniques. Although such neural program models have been tested on various existing datasets, the extent to which they generalize to unforeseen source code is largely unknown. Objective: Since it is very challenging to test neural program models on all unforeseen programs, in this paper, we propose to evaluate the generalizability of neural program models with respect to semantic-preserving transformations: a generalizable neural program model should perform equally well on programs that are of the same semantics but of different lexical appearances and syntactical structures. Method: We compare the results of various neural program models for the method name prediction task on programs before and after automated semantic-preserving transformations. We use three Java datasets of different sizes and three state-of-the-art neural network models for code, namely code2vec, code2seq, and GGNN, to build nine such neural program models for evaluation. Results: Our results show that even with small semantically preserving changes to the programs, these neural program models often fail to generalize their performance. Our results also suggest that neural program models based on data and control dependencies in programs generalize better than neural program models based only on abstract syntax trees (ASTs). On the positive side, we observe that as the size of the training dataset grows and diversifies the generalizability of correct predictions produced by the neural program models can be improved too. Conclusion: Our results on the generalizability of neural program models provide insights to measure their limitations and provide a stepping stone for their improvement.

43 citations


Proceedings ArticleDOI
27 Feb 2021
TL;DR: In this article, the authors show how to deliver the same runtime guarantees that Wu et al. provide, in a memory-safe way, in addition to being more efficient than its original inspiration, achieving shorter repairing times, and producing code that is smaller and faster.
Abstract: A program is said to be isochronous if its running time does not depend on classified information. The programming languages literature contains much work that transforms programs to ensure isochronicity. The current state-of-the-art approach is a code transformation technique due to Wu et al., published in 2018. That technique has an important virtue: it ensures that the transformed program runs exactly the same set of operations, regardless of inputs. However, in this paper we demonstrate that it has also a shortcoming: it might add out-of-bounds memory accesses into programs that were originally memory sound. From this observation, we show how to deliver the same runtime guarantees that Wu et al. provide, in a memory-safe way. In addition to being safer, our LLVM-based implementation is more efficient than its original inspiration, achieving shorter repairing times, and producing code that is smaller and faster.

7 citations


Journal ArticleDOI
TL;DR: This paper presents the first rigorous analysis of the type safety of FORTRAN 77 and the novel program transformation and type checking algorithms required to convert FORTRan 77 subroutines and functions into pure, side-effect free subrouting and functions in Fortran 90.
Abstract: Fortran is still widely used in scientific computing, and a very large corpus of legacy as well as new code is written in FORTRAN 77. In general this code is not type safe, so that incorrect programs can compile without errors. In this paper, we present a formal approach to ensure type safety of legacy Fortran code through automated program transformation. The objective of this work is to reduce programming errors by guaranteeing type safety. We present the first rigorous analysis of the type safety of FORTRAN 77 and the novel program transformation and type checking algorithms required to convert FORTRAN 77 subroutines and functions into pure, side-effect free subroutines and functions in Fortran 90. We have implemented these algorithms in a source-to-source compiler which type checks and automatically transforms the legacy code. We show that the resulting code is type safe and that the pure, side-effect free and referentially transparent subroutines can readily be offloaded to accelerators.

7 citations


Proceedings ArticleDOI
19 Apr 2021
TL;DR: In this paper, the authors express neural architecture operations as program transformations whose legality depends on a notion of representational capacity, and combine them with existing transformations into a unified optimization framework.
Abstract: Improving the performance of deep neural networks (DNNs) is important to both the compiler and neural architecture search (NAS) communities. Compilers apply program transformations in order to exploit hardware parallelism and memory hierarchy. However, legality concerns mean they fail to exploit the natural robustness of neural networks. In contrast, NAS techniques mutate networks by operations such as the grouping or bottlenecking of convolutions, exploiting the resilience of DNNs. In this work, we express such neural architecture operations as program transformations whose legality depends on a notion of representational capacity. This allows them to be combined with existing transformations into a unified optimization framework. This unification allows us to express existing NAS operations as combinations of simpler transformations. Crucially, it allows us to generate and explore new tensor convolutions. We prototyped the combined framework in TVM and were able to find optimizations across different DNNs, that significantly reduce inference time - over 3× in the majority of cases. Furthermore, our scheme dramatically reduces NAS search time.

6 citations


Journal ArticleDOI
15 Oct 2021
TL;DR: APIFix as mentioned in this paper is a program synthesis algorithm to automate API usage adaptation via program transformation, which can reduce the overfitting of transformation rules in the client code while synthesizing and applying the transformation rules.
Abstract: Use of third-party libraries is extremely common in application software. The libraries evolve to accommodate new features or mitigate security vulnerabilities, thereby breaking the Application Programming Interface(API) used by the software. Such breaking changes in the libraries may discourage client code from using the new library versions thereby keeping the application vulnerable and not up-to-date. We propose a novel output-oriented program synthesis algorithm to automate API usage adaptations via program transformation. Our aim is not only to rely on the few example human adaptations of the clients from the old library version to the new library version, since this can lead to over-fitting transformation rules. Instead, we also rely on example usages of the new updated library in clients, which provide valuable context for synthesizing and applying the transformation rules. Our tool APIFix provides an automated mechanism to transform application code using the old library versions to code using the new library versions - thereby achieving automated API usage adaptation to fix the effect of breaking changes. Our evaluation shows that the transformation rules inferred by APIFix achieve 98.7% precision and 91.5% recall. By comparing our approach to state-of-the-art program synthesis approaches, we show that our approach significantly reduces over-fitting while synthesizing transformation rules for API usage adaptations.

5 citations


Posted Content
TL;DR: The Combinatory Homomorphic Automatic Differentiation (CHAD) as mentioned in this paper is a principled, pure, provably correct method for performing forward and reverse-mode automatic differentiation (AD) on programming languages with expressive features.
Abstract: We introduce Combinatory Homomorphic Automatic Differentiation (CHAD), a principled, pure, provably correct method for performing forward- and reverse-mode automatic differentiation (AD) on programming languages with expressive features. It implements AD as a compositional, type-respecting source-code transformation that generates purely functional code. This code transformation is principled in the sense that it is the unique homomorphic (structure preserving) extension to expressive languages of the well-known and unambiguous definitions of automatic differentiation for a first-order functional language. Correctness of the method follows by a (compositional) logical relations argument that shows that the semantics of the syntactic derivative is the usual calculus derivative of the semantics of the original program. In their most elegant formulation, the transformations generate code with linear types. However, the transformations can be implemented in a standard functional language without sacrificing correctness. This implementation can be achieved by making use of abstract data types to represent the required linear types, e.g. through the use of a basic module system. In this paper, we detail the method when applied to a simple higher-order language for manipulating statically sized arrays. However, we explain how the methodology applies, more generally, to functional languages with other expressive features. Finally, we discuss how the scope of CHAD extends beyond applications in automatic differentiation to other dynamic program analyses that accumulate data in a commutative monoid.

5 citations


Book ChapterDOI
27 Mar 2021
TL;DR: In this paper, the authors present a generalization of automated cost analysis that can handle abstract programs and, hence, can analyze the impact on the cost of program transformations and certify by deductive verification that the inferred abstract cost bounds are correct and sufficiently precise.
Abstract: A program containing placeholders for unspecified statements or expressions is called an abstract (or schematic) program. Placeholder symbols occur naturally in program transformation rules, as used in refactoring, compilation, optimization, or parallelization. We present a generalization of automated cost analysis that can handle abstract programs and, hence, can analyze the impact on the cost of program transformations. This kind of relational property requires provably precise cost bounds which are not always produced by cost analysis. Therefore, we certify by deductive verification that the inferred abstract cost bounds are correct and sufficiently precise. It is the first approach solving this problem. Both, abstract cost analysis and certification, are based on quantitative abstract execution (QAE) which in turn is a variation of abstract execution, a recently developed symbolic execution technique for abstract programs. To realize QAE the new concept of a cost invariant is introduced. QAE is implemented and runs fully automatically on a benchmark set consisting of representative optimization rules.

3 citations


Book ChapterDOI
17 Sep 2021
TL;DR: In this paper, the authors proposed a deobfuscation technique for mixed Boolean arithmetic expressions, which combines bitwise operations (e.g., AND, OR, and NOT) and arithmetic operations (i.e., ADD and IMUL).
Abstract: Mixed Boolean-Arithmetic (MBA) expression mixes bitwise operations (e.g., AND, OR, and NOT) and arithmetic operations (e.g., ADD and IMUL). It enables a semantic-preserving program transformation to convert a simple expression to a difficult-to-understand but equivalent form. MBA expression has been widely adopted as a highly effective and low-cost obfuscation scheme. However, state-of-the-art deobfuscation research proposes substantial challenges to the MBA obfuscation technique. Attacking methods such as bit-blasting, pattern matching, program synthesis, deep learning, and mathematical transformation can successfully simplify specific categories of MBA expressions. Existing MBA obfuscation must be enhanced to overcome these emerging challenges.

3 citations


Posted ContentDOI
27 Apr 2021
TL;DR: The proposed solution is described by combining the partitioning technique, program transformation, and TEEs to protect the execution of security-sensitive data of applications, therefore reducing the Trusted Computing Base (TCB).
Abstract: Cloud computing allows clients to upload their sensitive data to the public cloud and perform sensitive computations in those untrusted areas, which drives to possible violations of the confidentiality of client sensitive data. Utilizing Trusted Execution Environments (TEEs) to protect data confidentiality from other software is an effective solution. TEE is supported by different platforms, such as Intel’s Software Guard Extension (SGX). SGX provides a TEE, called an enclave, which can be used to protect the integrity of the code and the confidentiality of data. Some efforts have proposed different solutions in order to isolate the execution of security-sensitive code from the rest of the application. Unlike our previous work, CFHider, a hardware-assisted method that aimed to protect only the confidentiality of control flow of applications, in this study, we develop a new approach for partitioning applications into security-sensitive code to be run in the trusted execution setting and cleartext code to be run in the public cloud setting. Our approach leverages program transformation and TEE to hide security-sensitive data of the code. We describe our proposed solution by combining the partitioning technique, program transformation, and TEEs to protect the execution of security-sensitive data of applications. Some former works have shown that most applications can run in their entirety inside trusted areas such as SGX enclaves, and that leads to a large Trusted Computing Base (TCB). Instead, we analyze three case studies, in which we partition real Java applications and employ the SGX enclave to protect the execution of sensitive statements, therefore reducing the TCB. We also showed the advantages of the proposed solution and demonstrated how the confidentiality of security-sensitive data is protected.

3 citations


Proceedings ArticleDOI
06 Sep 2021
TL;DR: Secure multiparty computation (MPC) is a cryptographic technology that allows to perform computation on private data without actually seeing the data as mentioned in this paper, and allows users to write privacy-preserving applications in logic programming language.
Abstract: Logic Programming (LP) is a subcategory of declarative programming that is considered to be relatively simple for non-programmers. LP developers focus on describing facts and rules of a logical derivation, and do not need to think about the algorithms actually implementing the derivation. Secure multiparty computation (MPC) is a cryptographic technology that allows to perform computation on private data without actually seeing the data. In this paper, we bring together the notions of MPC and LP, allowing users to write privacy-preserving applications in logic programming language.

Journal ArticleDOI
18 Aug 2021
TL;DR: In this paper, the authors present a system for semi-automatically deriving both an efficient program transformation and its correctness proof from a list of rewrite rules and specifications of the auxiliary data structures it requires.
Abstract: An efficient optimizing compiler can perform many cascading rewrites in a single pass, using auxiliary data structures such as variable binding maps, delayed substitutions, and occurrence counts. Such optimizers often perform transformations according to relatively simple rewrite rules, but the subtle interactions between the data structures needed for efficiency make them tricky to write and trickier to prove correct. We present a system for semi-automatically deriving both an efficient program transformation and its correctness proof from a list of rewrite rules and specifications of the auxiliary data structures it requires. Dependent types ensure that the holes left behind by our system (for the user to fill in) are filled in correctly, allowing the user low-level control over the implementation without having to worry about getting it wrong. We implemented our system in Coq (though it could be implemented in other logics as well), and used it to write optimization passes that perform uncurrying, inlining, dead code elimination, and static evaluation of case expressions and record projections. The generated implementations are sometimes faster, and at most 40% slower, than hand-written counterparts on a small set of benchmarks; in some cases, they require significantly less code to write and prove correct.

Book ChapterDOI
17 Oct 2021
TL;DR: In this article, the problem of automatically proving resource bounds has been studied, where the focus has often been on developing precise amortized reasoning techniques to infer the most exact resource usage.
Abstract: We consider the problem of automatically proving resource bounds. That is, we study how to prove that an integer-valued resource variable is bounded by a given program expression. Automatic resource-bound analysis has recently received significant attention because of a number of important applications (e.g., detecting performance bugs, preventing algorithmic-complexity attacks, identifying side-channel vulnerabilities), where the focus has often been on developing precise amortized reasoning techniques to infer the most exact resource usage. While such innovations remain critical, we observe that fully precise amortization is not always necessary to prove a bound of interest. And in fact, by amortizing selectively, the needed supporting invariants can be simpler, making the invariant inference task more feasible and predictable. We present a framework for selectively-amortized analysis that mixes worst-case and amortized reasoning via a property decomposition and a program transformation. We show that proving bounds in any such decomposition yields a sound resource bound in the original program, and we give an algorithm for selecting a reasonable decomposition.

Book ChapterDOI
19 May 2021
TL;DR: In this article, the authors propose an experimental technique for automated replication of tuple spaces in distributed systems, where different threads represent the behaviour of the separate components, each owning its own local tuple repository.
Abstract: Coordination languages for tuple spaces can offer significant advantages in the specification and implementation of distributed systems, but often do require manual programming effort to ensure consistency. We propose an experimental technique for automated replication of tuple spaces in distributed systems. The system of interest is modelled as a concurrent Go program where different threads represent the behaviour of the separate components, each owning its own local tuple repository. We automatically transform the initial program by combining program transformation and static analysis, so that tuples are replicated depending on the components’ read-write access patterns. In this way, we turn the initial system into a replicated one where the replication of tuples is automatically achieved, while avoiding unnecessary replication overhead. Custom static analyses may be plugged in easily in our prototype implementation. We see this as a first step towards developing a fully-fledged framework to support designers to quickly evaluate many classes of replication-based systems under different consistency levels.

Posted Content
TL;DR: In this paper, the authors present an algorithm that automatically transforms an evaluator written in a dedicated minimal functional meta-language to administrative normal form, which facilitates program analysis, before performing selective translation to continuation-passing style, and selective defunctionalization.
Abstract: The functional correspondence is a manual derivation technique transforming higher-order evaluators into the semantically equivalent abstract machines. The transformation consists of two well-known program transformations: translation to continuation-passing style that uncovers the control flow of the evaluator and Reynolds's defunctionalization that generates a first-order transition function. Ever since the transformation was first described by Danvy et al. it has found numerous applications in connecting known evaluators and abstract machines, but also in discovering new abstract machines for a variety of $\lambda$-calculi as well as for logic-programming, imperative and object-oriented languages. We present an algorithm that automates the functional correspondence. The algorithm accepts an evaluator written in a dedicated minimal functional meta-language and it first transforms it to administrative normal form, which facilitates program analysis, before performing selective translation to continuation-passing style, and selective defunctionalization. The two selective transformations are driven by a control-flow analysis that is computed by an abstract interpreter obtained using the abstracting abstract machines methodology, which makes it possible to transform only the desired parts of the evaluator. The article is accompanied by an implementation of the algorithm in the form of a command-line tool that allows for automatic transformation of an evaluator embedded in a Racket source file and gives fine-grained control over the resulting machine.

Posted Content
TL;DR: In this article, the problem of automatically proving resource bounds has been studied, where the focus has often been on developing precise amortized reasoning techniques to infer the most exact resource usage.
Abstract: We consider the problem of automatically proving resource bounds. That is, we study how to prove that an integer-valued resource variable is bounded by a given program expression. Automatic resource-bound analysis has recently received significant attention because of a number of important applications (e.g., detecting performance bugs, preventing algorithmic-complexity attacks, identifying side-channel vulnerabilities), where the focus has often been on developing precise amortized reasoning techniques to infer the most exact resource usage. While such innovations remain critical, we observe that fully precise amortization is not always necessary to prove a bound of interest. And in fact, by amortizing selectively, the needed supporting invariants can be simpler, making the invariant inference task more feasible and predictable. We present a framework for selectively-amortized analysis that mixes worst-case and amortized reasoning via a property decomposition and a program transformation. We show that proving bounds in any such decomposition yields a sound resource bound in the original program, and we give an algorithm for selecting a reasonable decomposition.

DOI
17 Oct 2021
TL;DR: In this paper, a delta-based verification approach is proposed, where each modification of a method in a code delta is verified in isolation, but which overcomes the strict limitations of behavioral subtyping and works for many practical programs.
Abstract: The quest for feature- and family-oriented deductive verification of software product lines resulted in several proposals. In this paper we look at delta-oriented modeling of product lines and combine two new ideas: first, we extend Hahnle & Schaefer’s delta-oriented version of Liskov’s substitution principle for behavioral subtyping to work also for overridden behavior in benign cases. For this to succeed, programs need to be in a certain normal form. The required normal form turns out to be achievable in many cases by a set of program transformations, whose correctness is ensured by the recent technique of abstract execution. This is a generalization of symbolic execution that permits reasoning about abstract code elements. It is needed, because code deltas contain partially unknown code contexts in terms of “original” calls. Second, we devise a modular verification procedure for deltas based on abstract execution, representing deltas as abstract programs calling into unknown contexts. The result is a “delta-based” verification approach, where each modification of a method in a code delta is verified in isolation, but which overcomes the strict limitations of behavioral subtyping and works for many practical programs. The latter claim is substantiated with case studies and benchmarks.

Journal ArticleDOI
TL;DR: In this article, an efficient regular expression (regex) matcher using a variety of program transformation techniques, but very little specialized formal language and automata theory, is presented.
Abstract: We show how to systematically derive an efficient regular expression (regex) matcher using a variety of program transformation techniques, but very little specialized formal language and automata theory. Starting from the standard specification of the set-theoretic semantics of regular expressions, we proceed via a continuation-based backtracking matcher, to a classical, table-driven state machine. All steps of the development are supported by self-contained (and machine-verified) equational correctness proofs.

Posted Content
TL;DR: This work proposes an implementation of "Tail Modulo Cons" (TMC) for OCaml, a program transformation for a fragment of non-tail-recursive functions, that rewrites them in _destination-passing style_.
Abstract: OCaml function calls consume space on the system stack. Operating systems set default limits on the stack space which are much lower than the available memory. If a program runs out of stack space, they get the dreaded "Stack Overflow" exception -- they crash. As a result, OCaml programmers have to be careful, when they write recursive functions, to remain in the so-called _tail-recursive_ fragment, using _tail_ calls that do not consume stack space. This discipline is a source of difficulties for both beginners and experts. Beginners have to be taught recursion, and then tail-recursion. Experts disagree on the "right" way to write `List.map`. The direct version is beautiful but not tail-recursive, so it crashes on larger inputs. The naive tail-recursive transformation is (slightly) slower than the direct version, and experts may want to avoid that cost. Some libraries propose horrible implementations, unrolling code by hand, to compensate for this performance loss. In general, tail-recursion requires the programmer to manually perform sophisticated program transformations. In this work we propose an implementation of "Tail Modulo Cons" (TMC) for OCaml. TMC is a program transformation for a fragment of non-tail-recursive functions, that rewrites them in _destination-passing style_. The supported fragment is smaller than other approaches such as continuation-passing-style, but the performance of the transformed code is on par with the direct, non-tail-recursive version. Many useful functions that traverse a recursive datastructure and rebuild another recursive structure are in the TMC fragment, in particular `List.map` (and `List.filter`, `List.append`, etc.). Finally those functions can be written in a way that is beautiful, correct on all inputs, and efficient.

Posted Content
TL;DR: In this paper, the problem of automatically proving resource bounds has been studied, where the focus has often been on developing precise amortized reasoning techniques to infer the most exact resource usage.
Abstract: We consider the problem of automatically proving resource bounds. That is, we study how to prove that an integer-valued resource variable is bounded by a given program expression. Automatic resource-bound analysis has recently received significant attention because of a number of important applications (e.g., detecting performance bugs, preventing algorithmic-complexity attacks, identifying side-channel vulnerabilities), where the focus has often been on developing precise amortized reasoning techniques to infer the most exact resource usage. While such innovations remain critical, we observe that fully precise amortization is not always necessary to prove a bound of interest. And in fact, by amortizing \emph{selectively}, the needed supporting invariants can be simpler, making the invariant inference task more feasible and predictable. We present a framework for selectively-amortized analysis that mixes worst-case and amortized reasoning via a property decomposition and a program transformation. We show that proving bounds in any such decomposition yields a sound resource bound in the original program, and we give an algorithm for selecting a reasonable decomposition.

DOI
13 Mar 2021
TL;DR: The design choices that determine the flavor of a representation are explored, and a representation that includes, Instructions and Annotations that together effectively represent a given program internally are proposed that is not tied to any specific higher level language or hardware architecture.
Abstract: Adding the Transformation and Parallelization capabilities to a compiler requires selecting a suitable language for representing the given program internally. The higher level language used to develop the code is an obvious choice but supporting the transformations at that level would require major rework to support other higher languages. The other choice is to use the assembly representation of the given program for implementing transformations. But this would require rework when supporting multiple targets. These considerations lead to the development of an internal representation that is not tied to any specific higher level language or hardware architecture. However, creating a new Internal Representation for a compiler that ultimately determines the quality and the capabilities of the compiler offers challenges of its own. Here we explore the design choices that determine the flavor of a representation, and propose a representation that includes, Instructions and Annotations that together effectively represent a given program internally. The instruction set has operators that most resemble a Reduced Instruction Set architecture format, and use three explicit memory operands which are sufficient for translation purposes and also simplify Symbolic Analysis. In addition to instructions, we support Annotations which carry additional information about the given program in the form of Keyword-Value pairs. Together instructions and annotations contain all the information necessary to support Analysis, Transformation and Parallel Conversion processes. ASIF which stands for Asterix Intermediate Format, at the time of writing is comparable to the cutting-edge-solutions offered by the competition and in many instances such as suitability for Program Analysis is superior.

Journal ArticleDOI
TL;DR: This paper presents a new software restoration methodology, to transform legacy-parallel programs implemented using pthreads into structured farm and pipeline patterned equivalents and demonstrates improvements in cyclomatic complexity and speedups on a number of representative benchmarks.
Abstract: Parallel patterns are a high-level programming paradigm that enables non-experts in parallelism to develop structured parallel programs that are maintainable, adaptive, and portable whilst achieving good performance on a variety of parallel systems. However, there still exists a large base of legacy-parallel code developed using ad-hoc methods and incorporating low-level parallel/concurrency libraries such as pthreads without any parallel patterns in the fundamental design. This code would benefit from being restructured and rewritten into pattern-based code. However, the process of rewriting the code is laborious and error-prone, due to typical concurrency and pthreading code being closely intertwined throughout the business logic of the program. In this paper, we present a new software restoration methodology, to transform legacy-parallel programs implemented using pthreads into structured farm and pipeline patterned equivalents. We demonstrate our restoration technique on a number of benchmarks, allowing the introduction of patterned farm and pipeline parallelism in the resulting code; we record improvements in cyclomatic complexity and speedups on a number of representative benchmarks.

Posted Content
TL;DR: In this article, the authors introduce new paths that over-approximate bitvector operations with linear conditions/constraints, increasing branching but allowing them to better exploit the well-developed integer reasoning and interpolation of verification tools.
Abstract: There is increasing interest in applying verification tools to programs that have bitvector operations. SMT solvers, which serve as a foundation for these tools, have thus increased support for bitvector reasoning through bit-blasting and linear arithmetic approximations. Still, verification tools are limited on termination and LTL verification of bitvector programs. In this work, we show that similar linear arithmetic approximation of bitvector operations can be done at the source level through transformations. Specifically, we introduce new paths that over-approximate bitvector operations with linear conditions/constraints, increasing branching but allowing us to better exploit the well-developed integer reasoning and interpolation of verification tools. We present two sets of rules, namely rewriting rules and weakening rules, that can be implemented as bitwise branching of program transformation, the branching path can facilitate verification tools widen verification tasks over bitvector programs. Our experiment shows this exploitation of integer reasoning and interpolation enables competitive termination verification of bitvector programs and leads to the first effective technique for LTL verification of bitvector programs.

Posted Content
TL;DR: In this paper, the authors express neural architecture operations as program transformations whose legality depends on a notion of representational capacity, and combine them with existing transformations into a unified optimization framework.
Abstract: Improving the performance of deep neural networks (DNNs) is important to both the compiler and neural architecture search (NAS) communities. Compilers apply program transformations in order to exploit hardware parallelism and memory hierarchy. However, legality concerns mean they fail to exploit the natural robustness of neural networks. In contrast, NAS techniques mutate networks by operations such as the grouping or bottlenecking of convolutions, exploiting the resilience of DNNs. In this work, we express such neural architecture operations as program transformations whose legality depends on a notion of representational capacity. This allows them to be combined with existing transformations into a unified optimization framework. This unification allows us to express existing NAS operations as combinations of simpler transformations. Crucially, it allows us to generate and explore new tensor convolutions. We prototyped the combined framework in TVM and were able to find optimizations across different DNNs, that significantly reduce inference time - over 3$\times$ in the majority of cases. Furthermore, our scheme dramatically reduces NAS search time. Code is available at~\href{this https URL}{this https url}.

Posted Content
TL;DR: In this article, a general-purpose program transformation algorithm for applying PDG-based SEPs is presented, which identifies a small transplantable structural subtree for each PDG node, thereby adapting code changes from PDG based SEPs to other locations.
Abstract: Software development often involves systematic edits, similar but nonidentical changes to many code locations, that are error-prone and laborious for developers. Mining and learning such systematic edit patterns (SEPs) from past code changes enable us to detect and repair overlooked buggy code that requires systematic edits. A recent study presented a promising SEP mining technique that is based on program dependence graphs (PDGs), while traditional approaches leverage syntax-based representations. PDG-based SEPs are highly expressive and can capture more meaningful changes than syntax-based ones. The next challenge to tackle is to apply the same code changes as in PDG-based SEPs to other code locations; detection and repair of overlooked locations that require systematic edits. Existing program transformation techniques cannot well address this challenge because (1) they expect many structural code similarities that are not guaranteed in PDG-based SEPs or (2) they work on the basis of PDGs but are limited to specific domains (e.g., API migrations). We present in this paper a general-purpose program transformation algorithm for applying PDG-based SEPs. Our algorithm identifies a small transplantable structural subtree for each PDG node, thereby adapting code changes from PDG-based SEPs to other locations. We construct a program repair pipeline Sirius that incorporates the algorithm and automates the processes of mining SEPs, detecting overlooked code locations (bugs) that require systematic edits, and repairing them by applying SEPs. We evaluated the repair performance of Sirius with a corpus of open source software consisting of over 80 repositories. Sirius achieved a precision of 0.710, recall of 0.565, and F1-score of 0.630, while those of the state-of-the-art technique were 0.470, 0.141, and 0.216, respectively.

Journal ArticleDOI
22 Mar 2021
TL;DR: The architecture of the system is discussed and the interactive subsystem which is useful to guide the SAPFOR through program parallelization is presented which was used to parallelize programs from the NAS Parallel Benchmarks in a semi-automatic way.
Abstract: Automation of parallel programming is important at any stage of parallel program development. These stages include profiling of the original program, program transformation, which allows us to achieve higher performance after program parallelization, and, finally, construction and optimization of the parallel program. It is also important to choose a suitable parallel programming model to express parallelism available in a program. On the one hand, the parallel programming model should be capable to map the parallel program to a variety of existing hardware resources. On the other hand, it should simplify the development of the assistant tools and it should allow the user to explore the parallel program the assistant tools generate in a semi-automatic way. The SAPFOR (System FOR Automated Parallelization) system combines various approaches to automation of parallel programming. Moreover, it allows the user to guide the parallelization if necessary. SAPFOR produces parallel programs according to the high-level DVMH parallel programming model which simplify the development of efficient parallel programs for heterogeneous computing clusters. This paper focuses on the approach to semi-automatic parallel programming, which SAPFOR implements. We discuss the architecture of the system and present the interactive subsystem which is useful to guide the SAPFOR through program parallelization. We used the interactive subsystem to parallelize programs from the NAS Parallel Benchmarks in a semi-automatic way. Finally, we compare the performance of manually written parallel programs with programs the SAPFOR system builds.