Semantic code search via equational reasoning
Varot Premtoon,James Koppel,Armando Solar-Lezama +2 more
- pp 1066-1082
TLDR
This work presents a new approach to semantic code search based on equational reasoning, and the Yogo tool implementing this approach, which can find equivalent code in multiple languages from a single query.Abstract:
We present a new approach to semantic code search based on equational reasoning, and the Yogo tool implementing this approach. Our approach works by considering not only the dataflow graph of a function, but also the dataflow graphs of all equivalent functions reachable via a set of rewrite rules. In doing so, it can recognize an operation even if it uses alternate APIs, is in a different but mathematically-equivalent form, is split apart with temporary variables, or is interleaved with other code. Furthermore, it can recognize when code is an instance of some higher-level concept such as iterating through a file. Because of this, from a single query, Yogo can find equivalent code in multiple languages. Our evaluation further shows the utility of Yogo beyond code search: encoding a buggy pattern as a Yogo query, we found a bug in Oracle’s Graal compiler which had been missed by a hand-written static analyzer designed for that exact kind of bug. Yogo is built on the Cubix multi-language infrastructure, and currently supports Java and Python.read more
Citations
More filters
Journal ArticleDOI
egg: Fast and Extensible Equality Saturation
TL;DR: A new amortized invariant restoration technique called rebuilding takes advantage of equality saturation's distinct workload, providing asymptotic speedups over current techniques in practice, and is implemented in a new open-source library called egg.
Journal ArticleDOI
egg: Fast and extensible equality saturation
TL;DR: Egg as mentioned in this paper is an e-graph-based rewrite-driven compiler and program synthesizers for equality saturation workloads, which can be used to represent congruence relation over many expressions.
Posted Content
CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation
Shuai Lu,Daya Guo,Shuo Ren,Junjie Huang,Alexey Svyatkovskiy,Ambrosio Blanco,Colin B. Clement,Dawn Drain,Daxin Jiang,Duyu Tang,Ge Li,Lidong Zhou,Linjun Shou,Long Zhou,Michele Tufano,Ming Gong,Ming Zhou,Nan Duan,Neel Sundaresan,Shao Kun Deng,Fu Shengyu,Shujie Liu +21 more
TL;DR: CodeXGLUE as mentioned in this paper is a benchmark dataset to foster machine learning research for program understanding and generation, which includes a collection of 10 tasks across 14 datasets and a platform for model evaluation and comparison.
Proceedings ArticleDOI
An extensive study on pre-trained models for program understanding and generation
TL;DR: The first study for natural language-programming language pre-trained model robustness via adversarial attacks is performed and it is found that a simple random attack approach can easily fool the state-of-the-art pre- trained models and thus incur security issues.
Proceedings ArticleDOI
WebRobot: web robotic process automation using interactive programming-by-demonstration
TL;DR: A formal foundation which allows semantically reasoning about web RPA programs and formulate its synthesis problem in a principled manner is developed and a novel speculate-and-validate methodology in the context of rewrite-based program synthesis is proposed.
References
More filters
Journal ArticleDOI
The program dependence graph and its use in optimization
TL;DR: An intermediate program representation, called the program dependence graph (PDG), that makes explicit both the data and control dependences for each operation in a program, allowing transformations to be triggered by one another and applied only to affected dependences.
Journal ArticleDOI
Rete: a fast algorithm for the many pattern/many object pattern match problem
TL;DR: The Rete Match Algorithm is an efficient method for comparing a large collection of patterns to a largeCollection of objects that finds all the objects that match each pattern.
Journal ArticleDOI
Simplification by Cooperating Decision Procedures
Greg Nelson,Derek C. Oppen +1 more
TL;DR: The simplifier finds a normal form for any expression formed from individual variables that is a theorem it is simplified to the constant true, so the simplifier can be used as a decision procedure for the quantifier-free theory containing these functions and predicates.
Journal ArticleDOI
Simplify: a theorem prover for program checking
TL;DR: The article describes two techniques, error context reporting and error localization, for helping the user to determine the reason that a false conjecture is false, and includes detailed performance figures on conjectures derived from realistic program-checking problems.
A Survey on Software Clone Detection Research
Chanchal K. Roy,James R. Cordy +1 more
TL;DR: The state of the art in clone detection research is surveyed, the clone terms commonly used in the literature are described along with their corresponding mappings to the commonly used clone types and several open problems related to clone detectionResearch are pointed out.