Semantic code search via equational reasoning

doi:10.1145/3385412.3386001

Open AccessProceedings ArticleDOI

Semantic code search via equational reasoning

- pp 1066-1082

TLDR

This work presents a new approach to semantic code search based on equational reasoning, and the Yogo tool implementing this approach, which can find equivalent code in multiple languages from a single query.

Abstract:

We present a new approach to semantic code search based on equational reasoning, and the Yogo tool implementing this approach. Our approach works by considering not only the dataflow graph of a function, but also the dataflow graphs of all equivalent functions reachable via a set of rewrite rules. In doing so, it can recognize an operation even if it uses alternate APIs, is in a different but mathematically-equivalent form, is split apart with temporary variables, or is interleaved with other code. Furthermore, it can recognize when code is an instance of some higher-level concept such as iterating through a file. Because of this, from a single query, Yogo can find equivalent code in multiple languages. Our evaluation further shows the utility of Yogo beyond code search: encoding a buggy pattern as a Yogo query, we found a bug in Oracle’s Graal compiler which had been missed by a hand-written static analyzer designed for that exact kind of bug. Yogo is built on the Cubix multi-language infrastructure, and currently supports Java and Python.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

egg: Fast and Extensible Equality Saturation

Max Willsey, +5 more

- 07 Apr 2020 -

arXiv: Programming Languages

TL;DR: A new amortized invariant restoration technique called rebuilding takes advantage of equality saturation's distinct workload, providing asymptotic speedups over current techniques in practice, and is implemented in a new open-source library called egg.

...read moreread less

Journal ArticleDOI

egg: Fast and extensible equality saturation

Max Willsey, +5 more

TL;DR: Egg as mentioned in this paper is an e-graph-based rewrite-driven compiler and program synthesizers for equality saturation workloads, which can be used to represent congruence relation over many expressions.

...read moreread less

Proceedings ArticleDOI

An extensive study on pre-trained models for program understanding and generation

Zhengran Zeng, +5 more

TL;DR: The first study for natural language-programming language pre-trained model robustness via adversarial attacks is performed and it is found that a simple random attack approach can easily fool the state-of-the-art pre- trained models and thus incur security issues.

...read moreread less

Proceedings ArticleDOI

WebRobot: web robotic process automation using interactive programming-by-demonstration

Rui Dong, +4 more

TL;DR: A formal foundation which allows semantically reasoning about web RPA programs and formulate its synthesis problem in a principled manner is developed and a novel speculate-and-validate methodology in the context of rewrite-based program synthesis is proposed.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

The program dependence graph and its use in optimization

Jeanne Ferrante, +2 more

- 01 Jul 1987 -

ACM Transactions on Programming Language...

TL;DR: An intermediate program representation, called the program dependence graph (PDG), that makes explicit both the data and control dependences for each operation in a program, allowing transformations to be triggered by one another and applied only to affected dependences.

...read moreread less

Journal ArticleDOI

Rete: a fast algorithm for the many pattern/many object pattern match problem

Charles L. Forgy

- 01 Sep 1982 -

Artificial Intelligence

TL;DR: The Rete Match Algorithm is an efficient method for comparing a large collection of patterns to a largeCollection of objects that finds all the objects that match each pattern.

...read moreread less

Journal ArticleDOI

Simplification by Cooperating Decision Procedures

Greg Nelson, +1 more

- 01 Oct 1979 -

ACM Transactions on Programming Language...

TL;DR: The simplifier finds a normal form for any expression formed from individual variables that is a theorem it is simplified to the constant true, so the simplifier can be used as a decision procedure for the quantifier-free theory containing these functions and predicates.

...read moreread less

Journal ArticleDOI

Simplify: a theorem prover for program checking

David L. Detlefs, +2 more

- 01 May 2005 -

Journal of the ACM

TL;DR: The article describes two techniques, error context reporting and error localization, for helping the user to determine the reason that a false conjecture is false, and includes detailed performance figures on conjectures derived from realistic program-checking problems.

...read moreread less

A Survey on Software Clone Detection Research

Chanchal K. Roy, +1 more

TL;DR: The state of the art in clone detection research is surveyed, the clone terms commonly used in the literature are described along with their corresponding mappings to the commonly used clone types and several open problems related to clone detectionResearch are pointed out.

...read moreread less

Collapse

Semantic code search via equational reasoning

Citations

egg: Fast and Extensible Equality Saturation

egg: Fast and extensible equality saturation

CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation

An extensive study on pre-trained models for program understanding and generation

WebRobot: web robotic process automation using interactive programming-by-demonstration

References

The program dependence graph and its use in optimization

Rete: a fast algorithm for the many pattern/many object pattern match problem

Simplification by Cooperating Decision Procedures

Simplify: a theorem prover for program checking

A Survey on Software Clone Detection Research

Related Papers (5)

Equality saturation: a new approach to optimization

Equality-based translation validator for LLVM

Automatically improving accuracy for floating point expressions

Techniques for program verification

Carpentry compiler