scispace - formally typeset
Proceedings ArticleDOI

Mining significant graph patterns by leap search

Reads0
Chats0
TLDR
The first comprehensive study on general mining method aiming to find most significant patterns directly, and graph classifiers built on mined patterns outperform the up-to-date graph kernel method in terms of efficiency and accuracy, demonstrating the high promise of such patterns.
Abstract
With ever-increasing amounts of graph data from disparate sources, there has been a strong need for exploiting significant graph patterns with user-specified objective functions. Most objective functions are not antimonotonic, which could fail all of frequency-centric graph mining algorithms. In this paper, we give the first comprehensive study on general mining method aiming to find most significant patterns directly. Our new mining framework, called LEAP (Descending Leap Mine), is developed to exploit the correlation between structural similarity and significance similarity in a way that the most significant pattern could be identified quickly by searching dissimilar graph patterns. Two novel concepts, structural leap search and frequency descending mining, are proposed to support leap search in graph pattern space. Our new mining method revealed that the widely adopted branch-and-bound search in data mining literature is indeed not the best, thus sketching a new picture on scalable graph pattern discovery. Empirical results show that LEAP achieves orders of magnitude speedup in comparison with the state-of-the-art method. Furthermore, graph classifiers built on mined patterns outperform the up-to-date graph kernel method in terms of efficiency and accuracy, demonstrating the high promise of such patterns.

read more

Content maybe subject to copyright    Report

Citations
More filters
BookDOI

Managing and Mining Graph Data

TL;DR: This is the first comprehensive survey book in the emerging topic of graph data processing and contains extensive surveys on important graph topics such as graph languages, indexing, clustering, data generation, pattern mining, classification, keyword search, pattern matching, and privacy.
Posted Content

TUDataset: A collection of benchmark datasets for learning with graphs.

TL;DR: The TUDataset for graph classification and regression is introduced, which consists of over 120 datasets of varying sizes from a wide range of applications and provides Python-based data loaders, kernel and graph neural network baseline implementations, and evaluation tools.
Journal ArticleDOI

A survey of frequent subgraph mining algorithms

TL;DR: A survey of current research in the field of frequent subgraph mining is presented and solutions to address the main research issues are proposed.
Journal ArticleDOI

GraMi: frequent subgraph and pattern mining in a single large graph

TL;DR: GraMi is presented, a novel framework for frequent subgraph mining in a single large graph that only finds the minimal set of instances to satisfy the frequency threshold and avoids the costly enumeration of all instances required by previous approaches.
Proceedings ArticleDOI

Synthesizing Near-Optimal Malware Specifications from Suspicious Behaviors

TL;DR: This paper presents an automatic technique for extracting optimally discriminative specifications, which uniquely identify a class of programs, which can be used by a behavior-based malware detector.
References
More filters
Journal ArticleDOI

LIBSVM: A library for support vector machines

TL;DR: Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.

Biometery: The principles and practice of statistics in biological research

TL;DR: In this paper, the authors present a model for the analysis of variance in a single-classification and two-way and multiway analysis of Variance with the assumption of correlation.
Book

Biometry: The Principles and Practice of Statistics in Biological Research

TL;DR: In this paper, the authors present a model for the analysis of variance in a single-classification and two-way and multiway analysis of Variance with the assumption of correlation.
Proceedings ArticleDOI

Mining association rules between sets of items in large databases

TL;DR: An efficient algorithm is presented that generates all significant association rules between items in the database of customer transactions and incorporates buffer management and novel estimation and pruning techniques.
Proceedings ArticleDOI

gSpan: graph-based substructure pattern mining

TL;DR: A novel algorithm called gSpan (graph-based substructure pattern mining), which discovers frequent substructures without candidate generation by building a new lexicographic order among graphs, and maps each graph to a unique minimum DFS code as its canonical label.