scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Approximate graph edit distance computation by means of bipartite graph matching

01 Jun 2009-Image and Vision Computing (Elsevier)-Vol. 27, Iss: 7, pp 950-959
TL;DR: A novel algorithm is introduced which allows us to approximately, or suboptimally, compute edit distance in a substantially faster way and is emprically verified that the accuracy of the suboptimal distance remains sufficiently accurate for various pattern recognition applications.
About: This article is published in Image and Vision Computing.The article was published on 2009-06-01. It has received 654 citations till now. The article focuses on the topics: Graph operations & Line graph.
Citations
More filters
Proceedings ArticleDOI
30 Oct 2017
TL;DR: This work proposes a novel neural network-based approach to compute the embedding, i.e., a numeric vector, based on the control flow graph of each binary function, then shows that Gemini outperforms the state-of-the-art approaches by large margins with respect to similarity detection accuracy.
Abstract: The problem of cross-platform binary code similarity detection aims at detecting whether two binary functions coming from different platforms are similar or not. It has many security applications, including plagiarism detection, malware detection, vulnerability search, etc. Existing approaches rely on approximate graph-matching algorithms, which are inevitably slow and sometimes inaccurate, and hard to adapt to a new task. To address these issues, in this work, we propose a novel neural network-based approach to compute the embedding, i.e., a numeric vector, based on the control flow graph of each binary function, then the similarity detection can be done efficiently by measuring the distance between the embeddings for two functions. We implement a prototype called Gemini. Our extensive evaluation shows that Gemini outperforms the state-of-the-art approaches by large margins with respect to similarity detection accuracy. Further, Gemini can speed up prior art's embedding generation time by 3 to 4 orders of magnitude and reduce the required training time from more than 1 week down to 30 minutes to 10 hours. Our real world case studies demonstrate that Gemini can identify significantly more vulnerable firmware images than the state-of-the-art, i.e., Genius. Our research showcases a successful application of deep learning on computer security problems.

339 citations


Cites background from "Approximate graph edit distance com..."

  • ...Last but not least, the search accuracy of this approach is ultimately bounded by the quality of bipartite graph matching [35]....

    [...]

Journal ArticleDOI
TL;DR: This paper examines the main advances registered in the last ten years in Pattern Recognition methodologies based on graph matching and related techniques, analyzing more than 180 papers.
Abstract: In this paper, we examine the main advances registered in the last ten years in Pattern Recognition methodologies based on graph matching and related techniques, analyzing more than 180 papers; the...

338 citations

Proceedings ArticleDOI
24 Oct 2016
TL;DR: A new bug search scheme is proposed which addresses the scalability challenge in existing cross-platform bug search techniques and further improves search accuracy, and implemented a bug search engine, Genius, and compared it with state-of-art bug search approaches.
Abstract: Because of rampant security breaches in IoT devices, searching vulnerabilities in massive IoT ecosystems is more crucial than ever. Recent studies have demonstrated that control-flow graph (CFG) based bug search techniques can be effective and accurate in IoT devices across different architectures. However, these CFG-based bug search approaches are far from being scalable to handle an enormous amount of IoT devices in the wild, due to their expensive graph matching overhead. Inspired by rich experience in image and video search, we propose a new bug search scheme which addresses the scalability challenge in existing cross-platform bug search techniques and further improves search accuracy. Unlike existing techniques that directly conduct searches based upon raw features (CFGs) from the binary code, we convert the CFGs into high-level numeric feature vectors. Compared with the CFG feature, high-level numeric feature vectors are more robust to code variation across different architectures, and can easily achieve realtime search by using state-of-the-art hashing techniques. We have implemented a bug search engine, Genius, and compared it with state-of-art bug search approaches. Experimental results show that Genius outperforms baseline approaches for various query loads in terms of speed and accuracy. We also evaluated Genius on a real-world dataset of 33,045 devices which was collected from public sources and our system. The experiment showed that Genius can finish a search within 1 second on average when performed over 8,126 firmware images of 420,558,702 functions. By only looking at the top 50 candidates in the search result, we found 38 potentially vulnerable firmware images across 5 vendors, and confirmed 23 of them by our manual analysis. We also found that it took only 0.1 seconds on average to finish searching for all 154 vulnerabilities in two latest commercial firmware images from D-LINK. 103 of them are potentially vulnerable in these images, and 16 of them were confirmed.

325 citations

Proceedings ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed a novel neural network-based approach to compute the embedding, i.e., a numeric vector, based on the control flow graph of each binary function, then the similarity detection can be done efficiently by measuring the distance between the embeddings for two functions.
Abstract: The problem of cross-platform binary code similarity detection aims at detecting whether two binary functions coming from different platforms are similar or not. It has many security applications, including plagiarism detection, malware detection, vulnerability search, etc. Existing approaches rely on approximate graph matching algorithms, which are inevitably slow and sometimes inaccurate, and hard to adapt to a new task. To address these issues, in this work, we propose a novel neural network-based approach to compute the embedding, i.e., a numeric vector, based on the control flow graph of each binary function, then the similarity detection can be done efficiently by measuring the distance between the embeddings for two functions. We implement a prototype called Gemini. Our extensive evaluation shows that Gemini outperforms the state-of-the-art approaches by large margins with respect to similarity detection accuracy. Further, Gemini can speed up prior art's embedding generation time by 3 to 4 orders of magnitude and reduce the required training time from more than 1 week down to 30 minutes to 10 hours. Our real world case studies demonstrate that Gemini can identify significantly more vulnerable firmware images than the state-of-the-art, i.e., Genius. Our research showcases a successful application of deep learning on computer security problems.

258 citations

Journal ArticleDOI
01 May 2012
TL;DR: An efficient enough solution based on the K-M algorithm that outperforms significantly the exhaustive search approach is offered.
Abstract: Role assignment is a critical task in role-based collaboration. It has three steps, i.e., agent evaluation, group role assignment, and role transfer, where group role assignment is a time-consuming process. This paper clarifies the group role assignment problem (GRAP), describes a general assignment problem (GAP), converts a GRAP to a GAP, proposes an efficient algorithm based on the Kuhn-Munkres (K-M) algorithm, conducts numerical experiments, and analyzes the solutions' performances. The results show that the proposed algorithm significantly improves the algorithm based on exhaustive search. The major contributions of this paper include formally defining the GRAPs, giving a general efficient solution for them, and expanding the application scope of the K-M algorithm. This paper offers an efficient enough solution based on the K-M algorithm that outperforms significantly the exhaustive search approach.

236 citations


Cites methods from "Approximate graph edit distance com..."

  • ...Fortunately, the well-known K-M algorithm for the GAP has been designed with the complexity of O(m(3))[7], [13], [17] and has been applied widely in industries [20], and its Java code is also available [4]....

    [...]

  • ...The K-M algorithm, when properly implemented, can operate with the computational complexity of O(m(3))[13], [17], [20], [23]....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: The goals of the PDB are described, the systems in place for data deposition and access, how to obtain further information and plans for the future development of the resource are described.
Abstract: The Protein Data Bank (PDB; http://www.rcsb.org/pdb/ ) is the single worldwide archive of structural data of biological macromolecules. This paper describes the goals of the PDB, the systems in place for data deposition and access, how to obtain further information, and near-term plans for the future development of the resource.

34,239 citations


"Approximate graph edit distance com..." refers background in this paper

  • ...This results in a training set of size 1200, a validation set of size 500, and a test set of size 1000....

    [...]

Journal ArticleDOI
TL;DR: This paper has always been one of my favorite children, combining as it does elements of the duality of linear programming and combinatorial tools from graph theory, and it may be of some interest to tell the story of its origin this article.
Abstract: This paper has always been one of my favorite “children,” combining as it does elements of the duality of linear programming and combinatorial tools from graph theory. It may be of some interest to tell the story of its origin.

11,096 citations

Journal ArticleDOI
TL;DR: How heuristic information from the problem domain can be incorporated into a formal mathematical theory of graph searching is described and an optimality property of a class of search strategies is demonstrated.
Abstract: Although the problem of determining the minimum cost path through a graph arises naturally in a number of interesting applications, there has been no underlying theory to guide the development of efficient search procedures. Moreover, there is no adequate conceptual framework within which the various ad hoc search strategies proposed to date can be compared. This paper describes how heuristic information from the problem domain can be incorporated into a formal mathematical theory of graph searching and demonstrates an optimality property of a class of search strategies.

10,366 citations


"Approximate graph edit distance com..." refers background or methods or result in this paper

  • ...Formally, for a node p in the search tree, we use gðpÞ to denote the cost of the optimal path from the root node to the current node p, i.e. gðpÞ is set equal to the cost of the partial edit path accumulated so far, and we use hðpÞ for denoting g2 e labels are represented by different shades of…...

    [...]

  • ...To find the most suitable edit path out of !ðg1; g2Þ, one introduces a cost for each edit operation, measuring the strength of the corresponding operation....

    [...]

  • ...The current paper has been significantly extended with respect to the underlying methodology and the experimental evaluation....

    [...]