Approximate graph edit distance computation by means of bipartite graph matching

doi:10.1016/J.IMAVIS.2008.04.004

Home
/
Papers
/
Approximate graph edit distance computation by means of bipartite graph matching

Journal Article•DOI•

Approximate graph edit distance computation by means of bipartite graph matching

Kaspar Riesen¹, Horst Bunke¹•Institutions (1)

University of Bern¹

01 Jun 2009-Image and Vision Computing (Elsevier)-Vol. 27, Iss: 7, pp 950-959

TL;DR: A novel algorithm is introduced which allows us to approximately, or suboptimally, compute edit distance in a substantially faster way and is emprically verified that the accuracy of the suboptimal distance remains sufficiently accurate for various pattern recognition applications.

read less

About: This article is published in Image and Vision Computing.The article was published on 2009-06-01. It has received 654 citations till now. The article focuses on the topics: Graph operations & Line graph.

...read moreread less

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Neural Network-based Graph Embedding for Cross-Platform Binary Code Similarity Detection

[...]

Xiaojun Xu¹, Chang Liu², Qian Feng³, Heng Yin⁴, Le Song⁵, Dawn Song² - Show less +2 more•Institutions (5)

Shanghai Jiao Tong University¹, University of California, Berkeley², Samsung³, University of California, Riverside⁴, Georgia Institute of Technology⁵

30 Oct 2017

TL;DR: This work proposes a novel neural network-based approach to compute the embedding, i.e., a numeric vector, based on the control flow graph of each binary function, then shows that Gemini outperforms the state-of-the-art approaches by large margins with respect to similarity detection accuracy.

...read moreread less

Abstract: The problem of cross-platform binary code similarity detection aims at detecting whether two binary functions coming from different platforms are similar or not. It has many security applications, including plagiarism detection, malware detection, vulnerability search, etc. Existing approaches rely on approximate graph-matching algorithms, which are inevitably slow and sometimes inaccurate, and hard to adapt to a new task. To address these issues, in this work, we propose a novel neural network-based approach to compute the embedding, i.e., a numeric vector, based on the control flow graph of each binary function, then the similarity detection can be done efficiently by measuring the distance between the embeddings for two functions. We implement a prototype called Gemini. Our extensive evaluation shows that Gemini outperforms the state-of-the-art approaches by large margins with respect to similarity detection accuracy. Further, Gemini can speed up prior art's embedding generation time by 3 to 4 orders of magnitude and reduce the required training time from more than 1 week down to 30 minutes to 10 hours. Our real world case studies demonstrate that Gemini can identify significantly more vulnerable firmware images than the state-of-the-art, i.e., Genius. Our research showcases a successful application of deep learning on computer security problems.

...read moreread less

339 citations

Cites background from "Approximate graph edit distance com..."

...Last but not least, the search accuracy of this approach is ultimately bounded by the quality of bipartite graph matching [35]....
[...]

Journal Article•DOI•

Graph matching and learning in pattern recognition in the last 10 years

[...]

Pasquale Foggia¹, Gennaro Percannella¹, Mario Vento¹•Institutions (1)

University of Salerno¹

01 Apr 2014-International Journal of Pattern Recognition and Artificial Intelligence

TL;DR: This paper examines the main advances registered in the last ten years in Pattern Recognition methodologies based on graph matching and related techniques, analyzing more than 180 papers.

...read moreread less

Abstract: In this paper, we examine the main advances registered in the last ten years in Pattern Recognition methodologies based on graph matching and related techniques, analyzing more than 180 papers; the...

...read moreread less

338 citations

Proceedings Article•DOI•

Scalable Graph-based Bug Search for Firmware Images

[...]

Qian Feng¹, Rundong Zhou¹, Chengcheng Xu¹, Yao Cheng¹, Brian Testa², Heng Yin³ - Show less +2 more•Institutions (3)

Syracuse University¹, Air Force Research Laboratory², University of California, Riverside³

24 Oct 2016

TL;DR: A new bug search scheme is proposed which addresses the scalability challenge in existing cross-platform bug search techniques and further improves search accuracy, and implemented a bug search engine, Genius, and compared it with state-of-art bug search approaches.

...read moreread less

Abstract: Because of rampant security breaches in IoT devices, searching vulnerabilities in massive IoT ecosystems is more crucial than ever. Recent studies have demonstrated that control-flow graph (CFG) based bug search techniques can be effective and accurate in IoT devices across different architectures. However, these CFG-based bug search approaches are far from being scalable to handle an enormous amount of IoT devices in the wild, due to their expensive graph matching overhead. Inspired by rich experience in image and video search, we propose a new bug search scheme which addresses the scalability challenge in existing cross-platform bug search techniques and further improves search accuracy. Unlike existing techniques that directly conduct searches based upon raw features (CFGs) from the binary code, we convert the CFGs into high-level numeric feature vectors. Compared with the CFG feature, high-level numeric feature vectors are more robust to code variation across different architectures, and can easily achieve realtime search by using state-of-the-art hashing techniques. We have implemented a bug search engine, Genius, and compared it with state-of-art bug search approaches. Experimental results show that Genius outperforms baseline approaches for various query loads in terms of speed and accuracy. We also evaluated Genius on a real-world dataset of 33,045 devices which was collected from public sources and our system. The experiment showed that Genius can finish a search within 1 second on average when performed over 8,126 firmware images of 420,558,702 functions. By only looking at the top 50 candidates in the search result, we found 38 potentially vulnerable firmware images across 5 vendors, and confirmed 23 of them by our manual analysis. We also found that it took only 0.1 seconds on average to finish searching for all 154 vulnerabilities in two latest commercial firmware images from D-LINK. 103 of them are potentially vulnerable in these images, and 16 of them were confirmed.

...read moreread less

325 citations

Proceedings Article•DOI•

Neural Network-based Graph Embedding for Cross-Platform Binary Code Similarity Detection

[...]

Xiaojun Xu¹, Chang Liu², Qian Feng³, Heng Yin⁴, Le Song⁵, Dawn Song² - Show less +2 more•Institutions (5)

Shanghai Jiao Tong University¹, University of California, Berkeley², Samsung³, University of California, Riverside⁴, Georgia Institute of Technology⁵

22 Aug 2017-arXiv: Cryptography and Security

TL;DR: Zhang et al. as discussed by the authors proposed a novel neural network-based approach to compute the embedding, i.e., a numeric vector, based on the control flow graph of each binary function, then the similarity detection can be done efficiently by measuring the distance between the embeddings for two functions.

...read moreread less

Abstract: The problem of cross-platform binary code similarity detection aims at detecting whether two binary functions coming from different platforms are similar or not. It has many security applications, including plagiarism detection, malware detection, vulnerability search, etc. Existing approaches rely on approximate graph matching algorithms, which are inevitably slow and sometimes inaccurate, and hard to adapt to a new task. To address these issues, in this work, we propose a novel neural network-based approach to compute the embedding, i.e., a numeric vector, based on the control flow graph of each binary function, then the similarity detection can be done efficiently by measuring the distance between the embeddings for two functions. We implement a prototype called Gemini. Our extensive evaluation shows that Gemini outperforms the state-of-the-art approaches by large margins with respect to similarity detection accuracy. Further, Gemini can speed up prior art's embedding generation time by 3 to 4 orders of magnitude and reduce the required training time from more than 1 week down to 30 minutes to 10 hours. Our real world case studies demonstrate that Gemini can identify significantly more vulnerable firmware images than the state-of-the-art, i.e., Genius. Our research showcases a successful application of deep learning on computer security problems.

...read moreread less

258 citations

Journal Article•DOI•

Group Role Assignment via a Kuhn–Munkres Algorithm-Based Solution

[...]

Haibin Zhu¹, MengChu Zhou², Rob Alkins•Institutions (2)

Nipissing University¹, Tongji University²

01 May 2012

TL;DR: An efficient enough solution based on the K-M algorithm that outperforms significantly the exhaustive search approach is offered.

...read moreread less

Abstract: Role assignment is a critical task in role-based collaboration. It has three steps, i.e., agent evaluation, group role assignment, and role transfer, where group role assignment is a time-consuming process. This paper clarifies the group role assignment problem (GRAP), describes a general assignment problem (GAP), converts a GRAP to a GAP, proposes an efficient algorithm based on the Kuhn-Munkres (K-M) algorithm, conducts numerical experiments, and analyzes the solutions' performances. The results show that the proposed algorithm significantly improves the algorithm based on exhaustive search. The major contributions of this paper include formally defining the GRAPs, giving a general efficient solution for them, and expanding the application scope of the K-M algorithm. This paper offers an efficient enough solution based on the K-M algorithm that outperforms significantly the exhaustive search approach.

...read moreread less

236 citations

Cites methods from "Approximate graph edit distance com..."

...Fortunately, the well-known K-M algorithm for the GAP has been designed with the complexity of O(m(3))[7], [13], [17] and has been applied widely in industries [20], and its Java code is also available [4]....
[...]
...The K-M algorithm, when properly implemented, can operate with the computational complexity of O(m(3))[13], [17], [20], [23]....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

The Protein Data Bank

[...]

Helen M. Berman¹, John D. Westbrook, Zukang Feng, Gary L. Gilliland, Talapady N. Bhat, Helge Weissig, Ilya N. Shindyalov, Philip E. Bourne - Show less +4 more•Institutions (1)

Rutgers University¹

01 Jan 2000-Nucleic Acids Research

TL;DR: The goals of the PDB are described, the systems in place for data deposition and access, how to obtain further information and plans for the future development of the resource are described.

...read moreread less

Abstract: The Protein Data Bank (PDB; http://www.rcsb.org/pdb/ ) is the single worldwide archive of structural data of biological macromolecules. This paper describes the goals of the PDB, the systems in place for data deposition and access, how to obtain further information, and near-term plans for the future development of the resource.

...read moreread less

34,239 citations

"Approximate graph edit distance com..." refers background in this paper

...This results in a training set of size 1200, a validation set of size 500, and a test set of size 1000....
[...]

Journal Article•DOI•

The Hungarian method for the assignment problem

[...]

Harold W. Kuhn¹•Institutions (1)

Princeton University¹

01 Mar 1955-Naval Research Logistics Quarterly

TL;DR: This paper has always been one of my favorite children, combining as it does elements of the duality of linear programming and combinatorial tools from graph theory, and it may be of some interest to tell the story of its origin this article.

...read moreread less

Abstract: This paper has always been one of my favorite “children,” combining as it does elements of the duality of linear programming and combinatorial tools from graph theory. It may be of some interest to tell the story of its origin.

...read moreread less

11,096 citations

Journal Article•DOI•

A Formal Basis for the Heuristic Determination of Minimum Cost Paths

[...]

Peter E. Hart¹, Nils J. Nilsson¹, Bertram Raphael¹•Institutions (1)

SRI International¹

01 Jul 1968-IEEE Transactions on Systems Science and Cybernetics

TL;DR: How heuristic information from the problem domain can be incorporated into a formal mathematical theory of graph searching is described and an optimality property of a class of search strategies is demonstrated.

...read moreread less

Abstract: Although the problem of determining the minimum cost path through a graph arises naturally in a number of interesting applications, there has been no underlying theory to guide the development of efficient search procedures. Moreover, there is no adequate conceptual framework within which the various ad hoc search strategies proposed to date can be compared. This paper describes how heuristic information from the problem domain can be incorporated into a formal mathematical theory of graph searching and demonstrates an optimality property of a class of search strategies.

...read moreread less

10,366 citations

"Approximate graph edit distance com..." refers background or methods or result in this paper

...Formally, for a node p in the search tree, we use gðpÞ to denote the cost of the optimal path from the root node to the current node p, i.e. gðpÞ is set equal to the cost of the partial edit path accumulated so far, and we use hðpÞ for denoting g2 e labels are represented by different shades of…...
[...]
...To find the most suitable edit path out of !ðg1; g2Þ, one introduces a cost for each edit operation, measuring the strength of the corresponding operation....
[...]
...The current paper has been significantly extended with respect to the underlying methodology and the experimental evaluation....
[...]