A survey of frequent subgraph mining algorithms

doi:10.1017/S0269888912000331

Home
/
Papers
/
A survey of frequent subgraph mining algorithms

Journal Article•DOI•

A survey of frequent subgraph mining algorithms

Chuntao Jiang¹, Frans Coenen¹, Michele Zito¹•Institutions (1)

University of Liverpool¹

01 Mar 2013-Knowledge Engineering Review (Cambridge University Press)-Vol. 28, Iss: 1, pp 75-105

TL;DR: A survey of current research in the field of frequent subgraph mining is presented and solutions to address the main research issues are proposed.

read less

Abstract: Graph mining is an important research area within the domain of data mining The field of study concentrates on the identification of frequent subgraphs within graph data sets The research goals are directed at: (i) effective mechanisms for generating candidate subgraphs (without generating duplicates) and (ii) how best to process the generated candidate subgraphs so as to identify the desired frequent subgraphs in a way that is computationally efficient and procedurally effective This paper presents a survey of current research in the field of frequent subgraph mining and proposes solutions to address the main research issues

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Knowledge graph refinement: A survey of approaches and evaluation methods

[...]

Heiko Paulheim¹•Institutions (1)

University of Mannheim¹

06 Dec 2016-Social Work

TL;DR: A survey of such knowledge graph refinement approaches, with a dual look at both the methods being proposed as well as the evaluation methodologies used.

...read moreread less

Abstract: In the recent years, different Web knowledge graphs, both free and commercial, have been created. While Google coined the term "Knowledge Graph" in 2012, there are also a few openly available knowledge graphs, with DBpedia, YAGO, and Freebase being among the most prominent ones. Those graphs are often constructed from semi-structured knowledge, such as Wikipedia, or harvested from the web with a combination of statistical and linguistic methods. The result are large-scale knowledge graphs that try to make a good trade-off between completeness and correctness. In order to further increase the utility of such knowledge graphs, various refinement methods have been proposed, which try to infer and add missing knowledge to the graph, or identify erroneous pieces of information. In this article, we provide a survey of such knowledge graph refinement approaches, with a dual look at both the methods being proposed as well as the evaluation methodologies used.

...read moreread less

915 citations

Cites methods from "A survey of frequent subgraph minin..."

...In particular, for many of the methods applied in the works discussed above – such as outlier detection or association rule mining – graph-based variants have been proposed in the literature [2,43]....
[...]

Journal Article•DOI•

A survey of itemset mining

[...]

Philippe Fournier-Viger¹, Jerry Chun-Wei Lin¹, Bay Vo², Bay Vo³, Tin Truong Chi, Ji Zhang⁴, Hoai Bac Le⁵ - Show less +3 more•Institutions (5)

Harbin Institute of Technology Shenzhen Graduate School¹, Ho Chi Minh City University of Technology², Sejong University³, University of Southern Queensland⁴, Ho Chi Minh City University of Science⁵

01 Jul 2017-Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery

TL;DR: An up‐to‐date survey of itemset mining problems and the relationship to other popular pattern mining problems, such as sequential pattern mining, episode mining, subgraph mining, and association rule mining are discussed.

...read moreread less

Abstract: Itemset mining is an important subfield of data mining, which consists of discovering interesting and useful patterns in transaction databases. The traditional task of frequent itemset mining is to discover groups of items (itemsets) that appear frequently together in transactions made by customers. Although itemset mining was designed for market basket analysis, it can be viewed more generally as the task of discovering groups of attribute values frequently cooccurring in databases. Because of its numerous applications in domains such as bioinformatics, text mining, product recommendation, e-learning, and web click stream analysis, itemset mining has become a popular research area. This study provides an up-to-date survey that can serve both as an introduction and as a guide to recent advances and opportunities in the field. The problem of frequent itemset mining and its applications are described. Moreover, main approaches and strategies to solve itemset mining problems are presented, as well as their characteristics are provided. Limitations of traditional frequent itemset mining approaches are also highlighted, and extensions of the task of itemset mining are presented such as high-utility itemset mining, rare itemset mining, fuzzy itemset mining, and uncertain itemset mining. This study also discusses research opportunities and the relationship to other popular pattern mining problems, such as sequential pattern mining, episode mining, subgraph mining, and association rule mining. Main open-source libraries of itemset mining implementations are also briefly presented. WIREs Data Mining Knowl Discov 2017, 7:e1207. doi: 10.1002/widm.1207

...read moreread less

197 citations

Journal Article•DOI•

A Survey of Utility-Oriented Pattern Mining

[...]

Wensheng Gan¹, Jerry Chun-Wei Lin², Philippe Fournier-Viger¹, Han-Chieh Chao³, Vincent S. Tseng⁴, Philip S. Yu⁵ - Show less +2 more•Institutions (5)

Harbin Institute of Technology¹, Bergen University College², National Dong Hwa University³, National Chiao Tung University⁴, University of Illinois at Chicago⁵

01 Apr 2021-IEEE Transactions on Knowledge and Data Engineering

TL;DR: A survey of state-of-the-art methods for utility-oriented pattern mining can be found in this article, where the authors introduce an in-depth understanding of UPM, including concepts, examples, and comparisons with related concepts.

...read moreread less

Abstract: The main purpose of data mining and analytics is to find novel, potentially useful patterns that can be utilized in real-world applications to derive beneficial knowledge. For identifying and evaluating the usefulness of different kinds of patterns, many techniques and constraints have been proposed, such as support, confidence, sequence order, and utility parameters (e.g., weight, price, profit, quantity, satisfaction, etc.). In recent years, there has been an increasing demand for utility-oriented pattern mining (UPM, or called utility mining). UPM is a vital task, with numerous high-impact applications, including cross-marketing, e-commerce, finance, medical, and biomedical applications. This survey aims to provide a general, comprehensive, and structured overview of the state-of-the-art methods of UPM. First, we introduce an in-depth understanding of UPM, including concepts, examples, and comparisons with related concepts. A taxonomy of the most common and state-of-the-art approaches for mining different kinds of high-utility patterns is presented in detail, including Apriori-based, tree-based, projection-based, vertical-/horizontal-data-format-based, and other hybrid approaches. A comprehensive review of advanced topics of existing high-utility pattern mining techniques is offered, with a discussion of their pros and cons. Finally, we present several well-known open-source software packages for UPM. We conclude our survey with a discussion on open and practical challenges in this field.

...read moreread less

149 citations

Journal Article•DOI•

A Survey of Utility-Oriented Pattern Mining

[...]

Wensheng Gan¹, Jerry Chun-Wei Lin², Philippe Fournier-Viger¹, Han-Chieh Chao³, Vincent S. Tseng⁴, Philip S. Yu⁵ - Show less +2 more•Institutions (5)

Harbin Institute of Technology¹, Bergen University College², National Dong Hwa University³, National Chiao Tung University⁴, University of Illinois at Chicago⁵

26 May 2018-arXiv: Databases

TL;DR: An in-depth understanding of UPM is introduced, including concepts, examples, and comparisons with related concepts, and a comprehensive review of advanced topics of existing high-utility pattern mining techniques is offered, with a discussion of their pros and cons.

...read moreread less

140 citations

Additional excerpts

...episodes [11], and frequent subgraphs [51]....
[...]

Journal Article•DOI•

Attention Models in Graphs: A Survey

[...]

John Boaz Lee, Ryan A. Rossi¹, Sungchul Kim¹, Nesreen K. Ahmed², Eunyee Koh¹ - Show less +1 more•Institutions (2)

Adobe Systems¹, Intel²

11 Nov 2019-ACM Transactions on Knowledge Discovery From Data

TL;DR: This work conducts a comprehensive and focused survey of the literature on the emerging field of graph attention models and introduces three intuitive taxonomies to group existing work.

...read moreread less

Abstract: Graph-structured data arise naturally in many different application domains. By representing data as graphs, we can capture entities (i.e., nodes) as well as their relationships (i.e., edges) with each other. Many useful insights can be derived from graph-structured data as demonstrated by an ever-growing body of work focused on graph mining. However, in the real-world, graphs can be both large—with many complex patterns—and noisy, which can pose a problem for effective graph mining. An effective way to deal with this issue is to incorporate “attention” into graph mining solutions. An attention mechanism allows a method to focus on task-relevant parts of the graph, helping it to make better decisions. In this work, we conduct a comprehensive and focused survey of the literature on the emerging field of graph attention models. We introduce three intuitive taxonomies to group existing work. These are based on problem setting (type of input and output), the type of attention mechanism used, and the task (e.g., graph classification, link prediction). We motivate our taxonomies through detailed examples and use each to survey competing approaches from a unique standpoint. Finally, we highlight several challenges in the area and discuss promising directions for future work.

...read moreread less

139 citations

Cites background from "A survey of frequent subgraph minin..."

...We do not attempt to survey the vast field of general graph-based methods that do not explicitly apply attention, multiple work have already been done on this with each having a particular focus [Cai et al. 2018; Getoor and Diehl 2015; Jiang et al. 2013]....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67

Collapse

References

PDF

Open Access

More filters

Johnson: Computers and Intractability-A Guide to the Theory of NP-Completeness

[...]

Michael Randolph Garey

01 Jan 1979

42,654 citations

Book•

Computers and Intractability: A Guide to the Theory of NP-Completeness

[...]

Michael Randolph Garey, David S. Johnson

01 Jan 1979

TL;DR: The second edition of a quarterly column as discussed by the authors provides a continuing update to the list of problems (NP-complete and harder) presented by M. R. Garey and myself in our book "Computers and Intractability: A Guide to the Theory of NP-Completeness,” W. H. Freeman & Co., San Francisco, 1979.

...read moreread less

Abstract: This is the second edition of a quarterly column the purpose of which is to provide a continuing update to the list of problems (NP-complete and harder) presented by M. R. Garey and myself in our book ‘‘Computers and Intractability: A Guide to the Theory of NP-Completeness,’’ W. H. Freeman & Co., San Francisco, 1979 (hereinafter referred to as ‘‘[G&J]’’; previous columns will be referred to by their dates). A background equivalent to that provided by [G&J] is assumed. Readers having results they would like mentioned (NP-hardness, PSPACE-hardness, polynomial-time-solvability, etc.), or open problems they would like publicized, should send them to David S. Johnson, Room 2C355, Bell Laboratories, Murray Hill, NJ 07974, including details, or at least sketches, of any new proofs (full papers are preferred). In the case of unpublished results, please state explicitly that you would like the results mentioned in the column. Comments and corrections are also welcome. For more details on the nature of the column and the form of desired submissions, see the December 1981 issue of this journal.

...read moreread less

40,020 citations

Book•

Data Mining: Concepts and Techniques

[...]

Jiawei Han¹, Micheline Kamber², Jian Pei²•Institutions (2)

University of Illinois at Urbana–Champaign¹, Simon Fraser University²

08 Sep 2000

TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.

...read moreread less

Abstract: The increasing volume of data in modern business and science calls for more complex and sophisticated tools. Although advances in data mining technology have made extensive data collection much easier, it's still always evolving and there is a constant need for new techniques and tools that can help us transform this data into useful information and knowledge. Since the previous edition's publication, great advances have been made in the field of data mining. Not only does the third of edition of Data Mining: Concepts and Techniques continue the tradition of equipping you with an understanding and application of the theory and practice of discovering patterns hidden in large data sets, it also focuses on new, important topics in the field: data warehouses and data cube technology, mining stream, mining social networks, and mining spatial, multimedia and other complex data. Each chapter is a stand-alone guide to a critical topic, presenting proven algorithms and sound implementations ready to be used directly or with strategic modification against live data. This is the resource you need if you want to apply today's most powerful data mining techniques to meet real business challenges. * Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects. * Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields. *Provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data

...read moreread less

23,600 citations

"A survey of frequent subgraph minin..." refers methods in this paper

...These two categories are similar in spirit to counterparts found in ARM, namely the Apriori algorithm (Agrawal & Srikant, 1994) and Frequent Pattern (FP)-growth algorithm (Han et al., 2000), respectively....
[...]

Book•

Artificial Intelligence: A Modern Approach

[...]

Stuart Russell¹, Peter Norvig²•Institutions (2)

University of California, Berkeley¹, University of Southern California²

01 Jan 2020

TL;DR: In this article, the authors present a comprehensive introduction to the theory and practice of artificial intelligence for modern applications, including game playing, planning and acting, and reinforcement learning with neural networks.

...read moreread less

Abstract: The long-anticipated revision of this #1 selling book offers the most comprehensive, state of the art introduction to the theory and practice of artificial intelligence for modern applications. Intelligent Agents. Solving Problems by Searching. Informed Search Methods. Game Playing. Agents that Reason Logically. First-order Logic. Building a Knowledge Base. Inference in First-Order Logic. Logical Reasoning Systems. Practical Planning. Planning and Acting. Uncertainty. Probabilistic Reasoning Systems. Making Simple Decisions. Making Complex Decisions. Learning from Observations. Learning with Neural Networks. Reinforcement Learning. Knowledge in Learning. Agents that Communicate. Practical Communication in English. Perception. Robotics. For computer professionals, linguists, and cognitive scientists interested in artificial intelligence.

...read moreread less

16,983 citations

Journal Article•DOI•

Centrality in social networks conceptual clarification

[...]

Linton C. Freeman¹•Institutions (1)

Lehigh University¹

01 Jan 1978-Social Networks

TL;DR: In this article, three distinct intuitive notions of centrality are uncovered and existing measures are refined to embody these conceptions, and the implications of these measures for the experimental study of small groups are examined.

...read moreread less

14,757 citations