A survey of itemset mining

doi:10.1002/WIDM.1207

Citations

PDF

Open Access

More filters

Journal Article•DOI•

A Survey of Parallel Sequential Pattern Mining

[...]

Wensheng Gan¹, Jerry Chun-Wei Lin², Philippe Fournier-Viger¹, Han-Chieh Chao³, Philip S. Yu - Show less +1 more•Institutions (3)

Harbin Institute of Technology¹, Bergen University College², National Dong Hwa University³

07 Jun 2019-ACM Transactions on Knowledge Discovery From Data

TL;DR: An in-depth survey of the current status of parallel SPM (PSPM) is investigated and provided, including detailed categorization of traditional serial SPM approaches, and state-of-the art PSPM.

...read moreread less

Abstract: With the growing popularity of shared resources, large volumes of complex data of different types are collected automatically. Traditional data mining algorithms generally have problems and challenges including huge memory cost, low processing speed, and inadequate hard disk space. As a fundamental task of data mining, sequential pattern mining (SPM) is used in a wide variety of real-life applications. However, it is more complex and challenging than other pattern mining tasks, i.e., frequent itemset mining and association rule mining, and also suffers from the above challenges when handling the large-scale data. To solve these problems, mining sequential patterns in a parallel or distributed computing environment has emerged as an important issue with many applications. In this article, an in-depth survey of the current status of parallel SPM (PSPM) is investigated and provided, including detailed categorization of traditional serial SPM approaches, and state-of-the art PSPM. We review the related work of PSPM in details including partition-based algorithms for PSPM, apriori-based PSPM, pattern-growth-based PSPM, and hybrid algorithms for PSPM, and provide deep description (i.e., characteristics, advantages, disadvantages, and summarization) of these parallel approaches of PSPM. Some advanced topics for PSPM, including parallel quantitative/weighted/utility SPM, PSPM from uncertain data and stream data, hardware acceleration for PSPM, are further reviewed in details. Besides, we review and provide some well-known open-source software of PSPM. Finally, we summarize some challenges and opportunities of PSPM in the big data era.

...read moreread less

188 citations

Cites background from "A survey of itemset mining"

...KDD has numerous real-life applications and is crucial to some of the most fundamental tasks such as frequent itemset and association rule mining [3], [4], [6], sequential pattern mining [5], [7], [8], clustering [9], [10], classification [11], outline detection [12]....
[...]
...or association rule mining (ARM) has attracted a lot of attention [1], [3], [4], [6], [13], [14], [17]....
[...]
..., FIM, ARM and SPM) has been extensively studied and successfully applied in many fields [6], [8]....
[...]

Journal Article•DOI•

A survey of incremental high-utility itemset mining

[...]

Wensheng Gan¹, Jerry Chun-Wei Lin¹, Philippe Fournier-Viger¹, Han-Chieh Chao², Tzung-Pei Hong³, Hamido Fujita⁴ - Show less +2 more•Institutions (4)

Harbin Institute of Technology¹, National Dong Hwa University², National University of Kaohsiung³, Iwate Prefectural University⁴

01 Mar 2018-Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery

TL;DR: This paper provides an up‐to‐date survey of the state‐of‐the‐art iHUIM algorithms, including Apriori‐based, tree‐ based, and utility‐list‐based approaches, and identifies several important issues and research challenges for iH UIM.

...read moreread less

Abstract: Traditional association rule mining has been widely studied. But it is unsuitable for real-world applications where factors such as unit profits of items and purchase quantities must be considered. High-utility itemset mining HUIM is designed to find highly profitable patterns by considering both the purchase quantities and unit profits of items. However, most HUIM algorithms are designed to be applied to static databases. But in real-world applications such as market basket analysis and business decision-making, databases are often dynamically updated by inserting new data such as customer transactions. Several researchers have proposed algorithms to discover high-utility itemsets HUIs in dynamically updated databases. Unlike batch algorithms, which always process a database from scratch, incremental high-utility itemset mining iHUIM algorithms incrementally update and output HUIs, thus reducing the cost of discovering HUIs. This paper provides an up-to-date survey of the state-of-the-art iHUIM algorithms, including Apriori-based, tree-based, and utility-list-based approaches. To the best of our knowledge, this is the first survey on the mining task of incremental high-utility itemset mining. The paper also identifies several important issues and research challenges for iHUIM. WIREs Data Mining Knowl Discov 2018, 8:e1242. doi: 10.1002/widm.1242

...read moreread less

149 citations

Cites background from "A survey of itemset mining"

..., 2012), but can also serve as inspiration for other data mining tasks (Fournier-Viger et al., 2017), including incremental data mining (Hong et al....
[...]
...…are not only important for iHUIM (Ahmed et al., 2009; Fournier-Viger et al., 2015; Lin et al., 2012), but can also serve as inspiration for other data mining tasks (Fournier-Viger et al., 2017), including incremental data mining (Hong et al., 2001) and dynamic data mining (Lin et al., 2009)....
[...]
...Two fundamental tasks for revealing interesting relationships between items in transactional databases are frequent itemset mining (FIM) and association rule mining (ARM) (Agrawal, Imielinski, & Swami, 1993; Chen, Han, & Yu, 1996; Fournier-Viger et al., 2017)....
[...]

Journal Article•DOI•

A Survey of Utility-Oriented Pattern Mining

[...]

Wensheng Gan¹, Jerry Chun-Wei Lin², Philippe Fournier-Viger¹, Han-Chieh Chao³, Vincent S. Tseng⁴, Philip S. Yu⁵ - Show less +2 more•Institutions (5)

Harbin Institute of Technology¹, Bergen University College², National Dong Hwa University³, National Chiao Tung University⁴, University of Illinois at Chicago⁵

26 May 2018-arXiv: Databases

TL;DR: An in-depth understanding of UPM is introduced, including concepts, examples, and comparisons with related concepts, and a comprehensive review of advanced topics of existing high-utility pattern mining techniques is offered, with a discussion of their pros and cons.

...read moreread less

Abstract: The main purpose of data mining and analytics is to find novel, potentially useful patterns that can be utilized in real-world applications to derive beneficial knowledge. For identifying and evaluating the usefulness of different kinds of patterns, many techniques and constraints have been proposed, such as support, confidence, sequence order, and utility parameters (e.g., weight, price, profit, quantity, satisfaction, etc.). In recent years, there has been an increasing demand for utility-oriented pattern mining (UPM, or called utility mining). UPM is a vital task, with numerous high-impact applications, including cross-marketing, e-commerce, finance, medical, and biomedical applications. This survey aims to provide a general, comprehensive, and structured overview of the state-of-the-art methods of UPM. First, we introduce an in-depth understanding of UPM, including concepts, examples, and comparisons with related concepts. A taxonomy of the most common and state-of-the-art approaches for mining different kinds of high-utility patterns is presented in detail, including Apriori-based, tree-based, projection-based, vertical-/horizontal-data-format-based, and other hybrid approaches. A comprehensive review of advanced topics of existing high-utility pattern mining techniques is offered, with a discussion of their pros and cons. Finally, we present several well-known open-source software packages for UPM. We conclude our survey with a discussion on open and practical challenges in this field.

...read moreread less

140 citations

Journal Article•DOI•

Frequent itemset mining: A 25 years review

[...]

José María Luna¹, Philippe Fournier-Viger², Sebastián Ventura³, Sebastián Ventura¹•Institutions (3)

University of Córdoba (Spain)¹, Harbin Institute of Technology², King Abdulaziz University³

01 Nov 2019-Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery

TL;DR: This work analyzes how this task has been considered during the last decades by considering centralized systems as well as parallel (shared or nonshared memory) architectures and solutions can be divided into exhaustive search and nonexhaustive search models.

...read moreread less

Abstract: Frequent itemset mining (FIM) is an essential task within data analysis since it is responsible for extracting frequently occurring events, patterns, or items in data. Insights from such pattern analysis offer important benefits in decision‐making processes. However, algorithmic solutions for mining such kind of patterns are not straightforward since the computational complexity exponentially increases with the number of items in data. This issue, together with the significant memory consumption that is present in the mining process, makes it necessary to propose extremely efficient solutions. Since the FIM problem was first described in the early 1990s, multiple solutions have been proposed by considering centralized systems as well as parallel (shared or nonshared memory) architectures. Solutions can also be divided into exhaustive search and nonexhaustive search models. Many of such approaches are extensions of other solutions and it is therefore necessary to analyze how this task has been considered during the last decades.

...read moreread less

122 citations

Cites background from "A survey of itemset mining"

...While some reviews have been already proposed in literature (Chee, Jaafar, Aziz, Hasan, & Yeoh, 2018; Fournier-Viger et al., 2017), they are mainly focused on sequential exhaustive search approaches and on describing the algorithms for nonexpert users....
[...]

Journal Article•DOI•

Fast and effective cluster-based information retrieval using frequent closed itemsets

[...]

Youcef Djenouri¹, Asma Belhadi, Philippe Fournier-Viger², Jerry Chun-Wei Lin³•Institutions (3)

University of Southern Denmark¹, Harbin Institute of Technology², Bergen University College³

01 Jul 2018-Information Sciences

TL;DR: A new cluster-based information retrieval approach named ICIR (Intelligent Cluster-based Information Retrieval) is proposed, which combines k-means clustering with frequent closed itemset mining to extract clusters of documents and find frequent terms in each cluster.

...read moreread less

69 citations

Collapse

A survey of itemset mining

Citations

Cites background from "A survey of itemset mining"

Cites background from "A survey of itemset mining"

Cites background from "A survey of itemset mining"

References

"A survey of itemset mining" refers background in this paper

Related Papers (5)