scispace - formally typeset
Journal ArticleDOI

An efficient graph-mining method for complicated and noisy data with real-world applications

Yi Jia, +2 more
- 01 Aug 2011 - 
- Vol. 28, Iss: 2, pp 423-447
Reads0
Chats0
TLDR
A novel graph database-mining method called APGM (APproximate Graph Mining) to mine useful patterns from noisy graph database using a general framework for modeling noisy distribution using a probability matrix and an efficient algorithm to identify approximate matched frequent subgraphs.
Abstract
In this paper, we present a novel graph database-mining method called APGM (APproximate Graph Mining) to mine useful patterns from noisy graph database. In our method, we designed a general framework for modeling noisy distribution using a probability matrix and devised an efficient algorithm to identify approximate matched frequent subgraphs. We have used APGM to both synthetic data set and real-world data sets on protein structure pattern identification and structure classification. Our experimental study demonstrates the efficiency and efficacy of the proposed method.

read more

Citations
More filters
Journal ArticleDOI

Frequent approximate subgraphs as features for graph-based image classification

TL;DR: A new algorithm for mining frequent connected subgraphs over undirected and labeled graph collections VEAM (Vertex and Edge Approximate graph Miner) is presented and a framework for graph-based image classification is introduced.
Journal ArticleDOI

gMLC: a multi-label feature selection framework for graph classification

TL;DR: This paper studies the problem of multi-label feature selection for graph classification and proposes a novel solution, called gMLC, to efficiently search for optimal subgraph features for graph objects with multiple labels and derives an evaluation criterion to estimate the dependence between sub graph features and multiple labels of graphs.
Journal ArticleDOI

Mining indirect antagonistic communities from social interactions

TL;DR: This work develops a novel pattern mining approach to mine a set of pairs of communities that behave in opposite ways with one another, and focuses on extracting a compact lossless representation based on the concept of closed patterns to prevent exploding the number of mined antagonistic communities.
Journal ArticleDOI

A new proposal for graph-based image classification using frequent approximate subgraphs

TL;DR: This paper proposes a new framework for image classification, which uses frequent approximate subgraph patterns as features and proposes to compute automatically the substitution matrices needed in the process, instead of using expert knowledge.
Proceedings ArticleDOI

Approximate graph mining with label costs

TL;DR: This work presents novel and scalable methods to efficiently solve the approximate isomorphism problem and shows that approximate mining yields interesting patterns in several real-world graphs ranging from IT and protein interaction networks to protein structures.
References
More filters
Journal ArticleDOI

CATH – a hierarchic classification of protein domain structures

TL;DR: Analysis of the structural families generated by CATH reveals the prominent features of protein structure space and a database of well-characterised protein structure families will facilitate the assignment of structure-function/evolution relationships to both known and newly determined protein structures.
Proceedings ArticleDOI

gSpan: graph-based substructure pattern mining

TL;DR: A novel algorithm called gSpan (graph-based substructure pattern mining), which discovers frequent substructures without candidate generation by building a new lexicographic order among graphs, and maps each graph to a unique minimum DFS code as its canonical label.
Journal ArticleDOI

PISCES: a protein sequence culling server

TL;DR: PISCES is a public server for culling sets of protein sequences from the Protein Data Bank by sequence identity and structural quality criteria and provides better lists than servers that use BLAST, which is unable to identify many relationships below 40% sequence identity.
Journal ArticleDOI

Frequent pattern mining: current status and future directions

TL;DR: It is believed that frequent pattern mining research has substantially broadened the scope of data analysis and will have deep impact on data mining methodologies and applications in the long run, however, there are still some challenging research issues that need to be solved before frequent patternmining can claim a cornerstone approach in data mining applications.
Proceedings ArticleDOI

Frequent subgraph discovery

TL;DR: The empirical results show that the algorithm scales linearly with the number of input transactions and it is able to discover frequent subgraphs from a set of graph transactions reasonably fast, even though it has to deal with computationally hard problems such as canonical labeling of graphs and subgraph isomorphism which are not necessary for traditional frequent itemset discovery.