An efficient graph-mining method for complicated and noisy data with real-world applications

doi:10.1007/S10115-010-0376-Y

Journal ArticleDOI

An efficient graph-mining method for complicated and noisy data with real-world applications

Yi Jia, +2 more

- 01 Aug 2011 -

Knowledge and Information Systems

- Vol. 28, Iss: 2, pp 423-447

Chats0

TLDR

A novel graph database-mining method called APGM (APproximate Graph Mining) to mine useful patterns from noisy graph database using a general framework for modeling noisy distribution using a probability matrix and an efficient algorithm to identify approximate matched frequent subgraphs.

Abstract:

In this paper, we present a novel graph database-mining method called APGM (APproximate Graph Mining) to mine useful patterns from noisy graph database. In our method, we designed a general framework for modeling noisy distribution using a probability matrix and devised an efficient algorithm to identify approximate matched frequent subgraphs. We have used APGM to both synthetic data set and real-world data sets on protein structure pattern identification and structure classification. Our experimental study demonstrates the efficiency and efficacy of the proposed method.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Frequent approximate subgraphs as features for graph-based image classification

Niusvel Acosta-Mendoza, +2 more

- 01 Mar 2012 -

Knowledge Based Systems

TL;DR: A new algorithm for mining frequent connected subgraphs over undirected and labeled graph collections VEAM (Vertex and Edge Approximate graph Miner) is presented and a framework for graph-based image classification is introduced.

...read moreread less

Journal ArticleDOI

gMLC: a multi-label feature selection framework for graph classification

Xiangnan Kong, +1 more

- 01 May 2012 -

Knowledge and Information Systems

TL;DR: This paper studies the problem of multi-label feature selection for graph classification and proposes a novel solution, called gMLC, to efficiently search for optimal subgraph features for graph objects with multiple labels and derives an evaluation criterion to estimate the dependence between sub graph features and multiple labels of graphs.

...read moreread less

Journal ArticleDOI

Mining indirect antagonistic communities from social interactions

Kuan Zhang, +3 more

- 01 Jun 2013 -

Knowledge and Information Systems

TL;DR: This work develops a novel pattern mining approach to mine a set of pairs of communities that behave in opposite ways with one another, and focuses on extracting a compact lossless representation based on the concept of closed patterns to prevent exploding the number of mined antagonistic communities.

...read moreread less

Journal ArticleDOI

A new proposal for graph-based image classification using frequent approximate subgraphs

Annette Morales-González, +4 more

- 01 Jan 2014 -

Pattern Recognition

TL;DR: This paper proposes a new framework for image classification, which uses frequent approximate subgraph patterns as features and proposes to compute automatically the substitution matrices needed in the process, instead of using expert knowledge.

...read moreread less

Proceedings ArticleDOI

Approximate graph mining with label costs

Pranay Anchuri, +4 more

TL;DR: This work presents novel and scalable methods to efficiently solve the approximate isomorphism problem and shows that approximate mining yields interesting patterns in several real-world graphs ranging from IT and protein interaction networks to protein structures.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

CATH – a hierarchic classification of protein domain structures

Christine A. Orengo, +5 more

- 15 Aug 1997 -

Structure

TL;DR: Analysis of the structural families generated by CATH reveals the prominent features of protein structure space and a database of well-characterised protein structure families will facilitate the assignment of structure-function/evolution relationships to both known and newly determined protein structures.

...read moreread less

Proceedings ArticleDOI

gSpan: graph-based substructure pattern mining

Xifeng Yan, +1 more

TL;DR: A novel algorithm called gSpan (graph-based substructure pattern mining), which discovers frequent substructures without candidate generation by building a new lexicographic order among graphs, and maps each graph to a unique minimum DFS code as its canonical label.

...read moreread less

Journal ArticleDOI

PISCES: a protein sequence culling server

Guoli Wang, +1 more

- 12 Aug 2003 -

Bioinformatics

TL;DR: PISCES is a public server for culling sets of protein sequences from the Protein Data Bank by sequence identity and structural quality criteria and provides better lists than servers that use BLAST, which is unable to identify many relationships below 40% sequence identity.

...read moreread less

Journal ArticleDOI

Frequent pattern mining: current status and future directions

Jiawei Han, +3 more

- 01 Aug 2007 -

Data Mining and Knowledge Discovery

TL;DR: It is believed that frequent pattern mining research has substantially broadened the scope of data analysis and will have deep impact on data mining methodologies and applications in the long run, however, there are still some challenging research issues that need to be solved before frequent patternmining can claim a cornerstone approach in data mining applications.

...read moreread less

Proceedings ArticleDOI

Frequent subgraph discovery

Michihiro Kuramochi, +1 more

TL;DR: The empirical results show that the algorithm scales linearly with the number of input transactions and it is able to discover frequent subgraphs from a set of graph transactions reasonably fast, even though it has to deal with computationally hard problems such as canonical labeling of graphs and subgraph isomorphism which are not necessary for traditional frequent itemset discovery.

...read moreread less

Collapse

An efficient graph-mining method for complicated and noisy data with real-world applications

Citations

Frequent approximate subgraphs as features for graph-based image classification

gMLC: a multi-label feature selection framework for graph classification

Mining indirect antagonistic communities from social interactions

A new proposal for graph-based image classification using frequent approximate subgraphs

Approximate graph mining with label costs

References

CATH – a hierarchic classification of protein domain structures

gSpan: graph-based substructure pattern mining

PISCES: a protein sequence culling server

Frequent pattern mining: current status and future directions

Frequent subgraph discovery

Related Papers (5)

gSpan: graph-based substructure pattern mining

A quickstart in frequent structure mining can make a difference

Frequent subgraph discovery

GREW - a scalable frequent subgraph discovery algorithm

Efficient mining of frequent subgraphs in the presence of isomorphism