scispace - formally typeset
Open AccessProceedings ArticleDOI

BinMatch: A Semantics-Based Hybrid Approach on Binary Code Clone Analysis

Reads0
Chats0
TLDR
In this article, a semantics-based hybrid approach is proposed to detect binary code clone functions, where the semantic signatures are extracted during the execution of the template function and emulation of the target function.
Abstract
Binary code clone analysis is an important technique which has a wide range of applications in software engineering (e.g., plagiarism detection, bug detection). The main challenge of the topic lies in the semantics-equivalent code transformation (e.g., optimization, obfuscation) which would alter representations of binary code tremendously. Another challenge is the trade-off between detection accuracy and coverage. Unfortunately, existing techniques still rely on semantics-less code features which are susceptible to the code transformation. Besides, they adopt merely either a static or a dynamic approach to detect binary code clones, which cannot achieve high accuracy and coverage simultaneously.  In this paper, we propose a semantics-based hybrid approach to detect binary clone functions. We execute a template binary function with its test cases, and emulate the execution of every target function for clone comparison with the runtime information migrated from that template function. The semantic signatures are extracted during the execution of the template function and emulation of the target function. Lastly, a similarity score is calculated from their signatures to measure their likeness. We implement the approach in a prototype system designated as BinMatch which analyzes IA-32 binary code on the Linux platform. We evaluate BinMatch with eight real-world projects compiled with different compilation configurations and commonly-used obfuscation methods, totally performing over 100 million pairs of function comparison. The experimental results show that BinMatch is robust to the semantics-equivalent code transformation. Besides, it not only covers all target functions for clone analysis, but also improves the detection accuracy comparing to the state-of-the-art solutions.

read more

Citations
More filters
Proceedings ArticleDOI

Patch based vulnerability matching for binary programs

TL;DR: The Binary X-Ray (BinXray), a patch based vulnerability matching approach, is proposed to identify the specific 1-day vulnerabilities in target programs accurately and effectively.
Posted Content

Revisiting Binary Code Similarity Analysis using Interpretable Feature Engineering and Lessons Learned.

TL;DR: This is the first systematic study on the basic features used in BCSA by leveraging interpretable feature engineering on a large-scale benchmark and shows that a simple interpretable model with a few basic features can achieve a comparable result to that of recent deep learning-based approaches.
Journal ArticleDOI

Codee: A Tensor Embedding Scheme for Binary Code Search

TL;DR: This paper presents an unsupervised tensor embedding scheme, Codee, to carry out code search efficiently and accurately at the binary function level, and achieves higher average search accuracy, shorter feature vectors, and faster feature generation performance using four datasets.
Posted Content

Modeling Functional Similarity in Source Code with Graph-Based Siamese Networks

TL;DR: A prototype tool HOLMES is developed, based on the novel approach to semantic code clone detection, and empirically evaluated it on popular code clone benchmarks, showing thatholMES performs considerably better than the other state-of-the-art tool, TBCCD.
Journal ArticleDOI

A Semantics-Based Hybrid Approach on Binary Code Similarity Comparison

TL;DR: The experimental results show that BinMatch is resilient to the semantics-equivalent code transformation, and not only covers all target functions for similarity comparison, but also improves the accuracy comparing to the state-of-the-art solutions.
References
More filters
Proceedings ArticleDOI

Valgrind: a framework for heavyweight dynamic binary instrumentation

TL;DR: Valgrind is described, a DBI framework designed for building heavyweight DBA tools that can be used to build more interesting, heavyweight tools that are difficult or impossible to build with other DBI frameworks such as Pin and DynamoRIO.
Journal ArticleDOI

CCFinder: a multilinguistic token-based code clone detection system for large scale source code

TL;DR: A new clone detection technique, which consists of the transformation of input source text and a token-by-token comparison, is proposed, which has effectively found clones and the metrics have been able to effectively identify the characteristics of the systems.
Proceedings ArticleDOI

Clone detection using abstract syntax trees

TL;DR: The paper presents simple and practical methods for detecting exact and near miss clones over arbitrary program fragments in program source code by using abstract syntax trees and suggests that clone detection could be useful in producing more structured code, and in reverse engineering to discover domain concepts and their implementations.
Journal ArticleDOI

A linear space algorithm for computing maximal common subsequences

TL;DR: The problem of finding a longest common subsequence of two strings has been solved in quadratic time and space and an algorithm is presented which will solve this problem in QuadraticTime and in linear space.
Proceedings ArticleDOI

DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones

TL;DR: This paper presents an efficient algorithm for identifying similar subtrees and apply it to tree representations of source code and implemented this algorithm as a clone detection tool called DECKARD and evaluated it on large code bases written in C and Java including the Linux kernel and JDK.