scispace - formally typeset
Open AccessProceedings ArticleDOI

Semantic code search via equational reasoning

TLDR
This work presents a new approach to semantic code search based on equational reasoning, and the Yogo tool implementing this approach, which can find equivalent code in multiple languages from a single query.
Abstract
We present a new approach to semantic code search based on equational reasoning, and the Yogo tool implementing this approach. Our approach works by considering not only the dataflow graph of a function, but also the dataflow graphs of all equivalent functions reachable via a set of rewrite rules. In doing so, it can recognize an operation even if it uses alternate APIs, is in a different but mathematically-equivalent form, is split apart with temporary variables, or is interleaved with other code. Furthermore, it can recognize when code is an instance of some higher-level concept such as iterating through a file. Because of this, from a single query, Yogo can find equivalent code in multiple languages. Our evaluation further shows the utility of Yogo beyond code search: encoding a buggy pattern as a Yogo query, we found a bug in Oracle’s Graal compiler which had been missed by a hand-written static analyzer designed for that exact kind of bug. Yogo is built on the Cubix multi-language infrastructure, and currently supports Java and Python.

read more

Citations
More filters
Journal ArticleDOI

egg: Fast and Extensible Equality Saturation

TL;DR: A new amortized invariant restoration technique called rebuilding takes advantage of equality saturation's distinct workload, providing asymptotic speedups over current techniques in practice, and is implemented in a new open-source library called egg.
Journal ArticleDOI

egg: Fast and extensible equality saturation

TL;DR: Egg as mentioned in this paper is an e-graph-based rewrite-driven compiler and program synthesizers for equality saturation workloads, which can be used to represent congruence relation over many expressions.
Posted Content

CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation

TL;DR: CodeXGLUE as mentioned in this paper is a benchmark dataset to foster machine learning research for program understanding and generation, which includes a collection of 10 tasks across 14 datasets and a platform for model evaluation and comparison.
Proceedings ArticleDOI

An extensive study on pre-trained models for program understanding and generation

TL;DR: The first study for natural language-programming language pre-trained model robustness via adversarial attacks is performed and it is found that a simple random attack approach can easily fool the state-of-the-art pre- trained models and thus incur security issues.
Proceedings ArticleDOI

WebRobot: web robotic process automation using interactive programming-by-demonstration

TL;DR: A formal foundation which allows semantically reasoning about web RPA programs and formulate its synthesis problem in a principled manner is developed and a novel speculate-and-validate methodology in the context of rewrite-based program synthesis is proposed.
References
More filters
Journal ArticleDOI

The program dependence graph and its use in optimization

TL;DR: An intermediate program representation, called the program dependence graph (PDG), that makes explicit both the data and control dependences for each operation in a program, allowing transformations to be triggered by one another and applied only to affected dependences.
Journal ArticleDOI

Rete: a fast algorithm for the many pattern/many object pattern match problem

TL;DR: The Rete Match Algorithm is an efficient method for comparing a large collection of patterns to a largeCollection of objects that finds all the objects that match each pattern.
Journal ArticleDOI

Simplification by Cooperating Decision Procedures

TL;DR: The simplifier finds a normal form for any expression formed from individual variables that is a theorem it is simplified to the constant true, so the simplifier can be used as a decision procedure for the quantifier-free theory containing these functions and predicates.
Journal ArticleDOI

Simplify: a theorem prover for program checking

TL;DR: The article describes two techniques, error context reporting and error localization, for helping the user to determine the reason that a false conjecture is false, and includes detailed performance figures on conjectures derived from realistic program-checking problems.

A Survey on Software Clone Detection Research

TL;DR: The state of the art in clone detection research is surveyed, the clone terms commonly used in the literature are described along with their corresponding mappings to the commonly used clone types and several open problems related to clone detectionResearch are pointed out.