Mining significant graph patterns by leap search

doi:10.1145/1376616.1376662

Proceedings ArticleDOI

Mining significant graph patterns by leap search

Xifeng Yan, +3 more

- pp 433-444

Chats0

TLDR

The first comprehensive study on general mining method aiming to find most significant patterns directly, and graph classifiers built on mined patterns outperform the up-to-date graph kernel method in terms of efficiency and accuracy, demonstrating the high promise of such patterns.

Abstract:

With ever-increasing amounts of graph data from disparate sources, there has been a strong need for exploiting significant graph patterns with user-specified objective functions. Most objective functions are not antimonotonic, which could fail all of frequency-centric graph mining algorithms. In this paper, we give the first comprehensive study on general mining method aiming to find most significant patterns directly. Our new mining framework, called LEAP (Descending Leap Mine), is developed to exploit the correlation between structural similarity and significance similarity in a way that the most significant pattern could be identified quickly by searching dissimilar graph patterns. Two novel concepts, structural leap search and frequency descending mining, are proposed to support leap search in graph pattern space. Our new mining method revealed that the widely adopted branch-and-bound search in data mining literature is indeed not the best, thus sketching a new picture on scalable graph pattern discovery. Empirical results show that LEAP achieves orders of magnitude speedup in comparison with the state-of-the-art method. Furthermore, graph classifiers built on mined patterns outperform the up-to-date graph kernel method in terms of efficiency and accuracy, demonstrating the high promise of such patterns.

Mining significant graph patterns by leap search

Citations

Managing and Mining Graph Data

TUDataset: A collection of benchmark datasets for learning with graphs.

A survey of frequent subgraph mining algorithms

GraMi: frequent subgraph and pattern mining in a single large graph

Synthesizing Near-Optimal Malware Specifications from Suspicious Behaviors

References

LIBSVM: A library for support vector machines

Biometery: The principles and practice of statistics in biological research

Biometry: The Principles and Practice of Statistics in Biological Research

Mining association rules between sets of items in large databases

gSpan: graph-based substructure pattern mining

Related Papers (5)

gSpan: graph-based substructure pattern mining

Frequent subgraph discovery

An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data

CloseGraph: mining closed frequent graph patterns

Efficient mining of frequent subgraphs in the presence of isomorphism