Proceedings ArticleDOI
Mining significant graph patterns by leap search
Xifeng Yan,Hong Cheng,Jiawei Han,Philip S. Yu +3 more
- pp 433-444
Reads0
Chats0
TLDR
The first comprehensive study on general mining method aiming to find most significant patterns directly, and graph classifiers built on mined patterns outperform the up-to-date graph kernel method in terms of efficiency and accuracy, demonstrating the high promise of such patterns.Abstract:
With ever-increasing amounts of graph data from disparate sources, there has been a strong need for exploiting significant graph patterns with user-specified objective functions. Most objective functions are not antimonotonic, which could fail all of frequency-centric graph mining algorithms. In this paper, we give the first comprehensive study on general mining method aiming to find most significant patterns directly. Our new mining framework, called LEAP (Descending Leap Mine), is developed to exploit the correlation between structural similarity and significance similarity in a way that the most significant pattern could be identified quickly by searching dissimilar graph patterns. Two novel concepts, structural leap search and frequency descending mining, are proposed to support leap search in graph pattern space. Our new mining method revealed that the widely adopted branch-and-bound search in data mining literature is indeed not the best, thus sketching a new picture on scalable graph pattern discovery. Empirical results show that LEAP achieves orders of magnitude speedup in comparison with the state-of-the-art method. Furthermore, graph classifiers built on mined patterns outperform the up-to-date graph kernel method in terms of efficiency and accuracy, demonstrating the high promise of such patterns.read more
Citations
More filters
BookDOI
Managing and Mining Graph Data
Charu C. Aggarwal,Haixun Wang +1 more
TL;DR: This is the first comprehensive survey book in the emerging topic of graph data processing and contains extensive surveys on important graph topics such as graph languages, indexing, clustering, data generation, pattern mining, classification, keyword search, pattern matching, and privacy.
Posted Content
TUDataset: A collection of benchmark datasets for learning with graphs.
Christopher Morris,Nils M. Kriege,Franka Bause,Kristian Kersting,Petra Mutzel,Marion Neumann +5 more
TL;DR: The TUDataset for graph classification and regression is introduced, which consists of over 120 datasets of varying sizes from a wide range of applications and provides Python-based data loaders, kernel and graph neural network baseline implementations, and evaluation tools.
Journal ArticleDOI
A survey of frequent subgraph mining algorithms
TL;DR: A survey of current research in the field of frequent subgraph mining is presented and solutions to address the main research issues are proposed.
Journal ArticleDOI
GraMi: frequent subgraph and pattern mining in a single large graph
TL;DR: GraMi is presented, a novel framework for frequent subgraph mining in a single large graph that only finds the minimal set of instances to satisfy the frequency threshold and avoids the costly enumeration of all instances required by previous approaches.
Proceedings ArticleDOI
Synthesizing Near-Optimal Malware Specifications from Suspicious Behaviors
TL;DR: This paper presents an automatic technique for extracting optimally discriminative specifications, which uniquely identify a class of programs, which can be used by a behavior-based malware detector.
References
More filters
Journal ArticleDOI
LIBSVM: A library for support vector machines
Chih-Chung Chang,Chih-Jen Lin +1 more
TL;DR: Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.
Biometery: The principles and practice of statistics in biological research
TL;DR: In this paper, the authors present a model for the analysis of variance in a single-classification and two-way and multiway analysis of Variance with the assumption of correlation.
Book
Biometry: The Principles and Practice of Statistics in Biological Research
Robert R. Sokal,F. James Rohlf +1 more
TL;DR: In this paper, the authors present a model for the analysis of variance in a single-classification and two-way and multiway analysis of Variance with the assumption of correlation.
Proceedings ArticleDOI
Mining association rules between sets of items in large databases
TL;DR: An efficient algorithm is presented that generates all significant association rules between items in the database of customer transactions and incorporates buffer management and novel estimation and pruning techniques.
Proceedings ArticleDOI
gSpan: graph-based substructure pattern mining
Xifeng Yan,Jiawei Han +1 more
TL;DR: A novel algorithm called gSpan (graph-based substructure pattern mining), which discovers frequent substructures without candidate generation by building a new lexicographic order among graphs, and maps each graph to a unique minimum DFS code as its canonical label.