AFARTICA: A Frequent Item-Set Mining Method Using Artificial Cell Division Algorithm

doi:10.4018/JDM.2019070104

Home
/
Papers
/
AFARTICA: A Frequent Item-Set Mining Method Using Artificial Cell Division Algorithm

Journal Article•DOI•

AFARTICA: A Frequent Item-Set Mining Method Using Artificial Cell Division Algorithm

Saubhik Paladhi¹, Sankhadeep Chatterjee², Takaaki Goto³, Soumya Sen²•Institutions (3)

Kalyani Government Engineering College¹, University of Calcutta², Toyo University³

01 Jul 2019-Journal of Database Management (IGI Global)-Vol. 30, Iss: 3, pp 71-93

About: This article is published in Journal of Database Management.The article was published on 2019-07-01. It has received 4 citations till now. The article focuses on the topics: Division algorithm.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Multi-Objective Optimization for High-Dimensional Maximal Frequent Itemset Mining

[...]

Yalong Zhang, Wei Yu, Xuan Ma, Hisakazu Ogura, Dongfen Ye - Show less +1 more

26 Sep 2021-Applied Sciences

TL;DR: A multi-objective optimization algorithm was proposed to mine the frequent itemset of high-dimensional data and the practicability and validity of the proposed algorithm in big data were proven by experiments.

...read moreread less

Abstract: The solution space of a frequent itemset generally presents exponential explosive growth because of the high-dimensional attributes of big data. However, the premise of the big data association rule analysis is to mine the frequent itemset in high-dimensional transaction sets. Traditional and classical algorithms such as the Apriori and FP-Growth algorithms, as well as their derivative algorithms, are unacceptable in practical big data analysis in an explosive solution space because of their huge consumption of storage space and running time. A multi-objective optimization algorithm was proposed to mine the frequent itemset of high-dimensional data. First, all frequent 2-itemsets were generated by scanning transaction sets based on which new items were added in as the objects of population evolution. Algorithms aim to search for the maximal frequent itemset to gather more non-void subsets because non-void subsets of frequent itemsets are all properties of frequent itemsets. During the operation of algorithms, lethal gene fragments in individuals were recorded and eliminated so that individuals may resurge. Finally, the set of the Pareto optimal solution of the frequent itemset was gained. All non-void subsets of these solutions were frequent itemsets, and all supersets are non-frequent itemsets. Finally, the practicability and validity of the proposed algorithm in big data were proven by experiments.

...read moreread less

4 citations

Journal Article•DOI•

Frequent Itemset Mining Algorithm Based on Linear Table

[...]

Jun Lu, Wenhe Xu, Kailong Zhou, Zhicong Guo

24 Feb 2023-Journal of Database Management

TL;DR: In this article , a linear table based algorithm was proposed to store more shared information and reduce the number of scans to the original dataset, and operations such as pruning and grouping were also used to optimize the algorithm.

...read moreread less

Abstract: Aiming at the speed of frequent itemset mining, a new frequent itemset mining algorithm based on a linear table is proposed. The linear table can store more shared information and reduce the number of scans to the original dataset. Furthermore, operations such as pruning and grouping are also used to optimize the algorithm. For different datasets, the algorithm shows different mining speeds. (1) In sparse datasets, the algorithm achieves an average 45% improvement in mining speed over the bit combination algorithm, and a 2-3 times improvement for the classic FP-growth algorithm. (2) In dense datasets, the average improvement over the classic FP-growth algorithm is 50-70%. For the bit combination algorithm, there are dozens of times of improvement. In fact, the algorithm that integrates bit combinations with bitwise AND operation can effectively avoid recursive operations and it is beneficial to the parallelization. Further analysis shows that the linear table is easy to split to facilitate the data batch mining processing.

...read moreread less

Journal Article•DOI•

Fuzzy based optimized itemset mining in high dimensional transactional database using adaptable FCM

[...]

C. Saravanabhavan, S. Kirubakaran, R. Premkumar, V. Jemmy Joyce

21 Dec 2022-Journal of Intelligent and Fuzzy Systems

TL;DR: In this paper , a technique for extracting weighted temporal designs is proposed to rectify the identified issue in HUIM, which could find erroneous patterns because they don't look at the correlation of the retrieved patterns.

...read moreread less

Abstract: One of the extremely deliberated data mining processes is HUIM (High Utility Itemset Mining). Its applications include text mining, e-learning bioinformatics, product recommendation, online click stream analysis, and market basket analysis. Likewise lot of potential applications availed in the HUIM. However, HUIM techniques could find erroneous patterns because they don’t look at the correlation of the retrieved patterns. Numerous approaches for mining related HUIs have been presented as an outcome. The computational expense of these methods continues to be problematic, both in terms of time and memory utilization. A technique for extracting weighted temporal designs is therefore suggested to rectify the identified issue in HUIM. Preprocessing of time series-based information into fuzzy item sets is the first step of the suggested technique. These feed the Graph Based Ant Colony Optimization (GACO) and Fuzzy C Means (FCM) clustering methodologies used in the Improvised Adaptable FCM (IAFCM) method. The suggested IAFCM technique achieves two objectives: optimal item placement in clusters using GACO; and ii) IAFCM clustering and information decrease in FCM cluster. The proposed technique yields high-quality clusters by GACO. Weighted sequential pattern mining, which considers facts of patterns with the highest weight and low frequency in a repository that is updated over a period, is used to locate the sequential patterns in these clusters. The outcomes of this methodology make evident that the IAFCM with GACO improves execution time when compared to other conventional approaches. Additionally, it enhances information representation by enhancing accuracy while using a smaller amount of memory.

...read moreread less

Journal Article•DOI•

Right-Hand Side Expanding Algorithm for Maximal Frequent Itemset Mining

[...]

Yalong Zhang, Wei Yu, Qiuqin Zhu, Xuan Ma, Hisakazu Ogura - Show less +1 more

05 Nov 2021-Applied Sciences

TL;DR: In this paper, a right-hand side expanding (RHSE) algorithm was proposed to find all maximal frequent itemsets (MFIs), whose supersets are not frequent itemets.

...read moreread less

Abstract: When it comes to association rule mining, all frequent itemsets are first found, and then the confidence level of association rules is calculated through the support degree of frequent itemsets. As all non-empty subsets in frequent itemsets are still frequent itemsets, all frequent itemsets can be acquired only by finding all maximal frequent itemsets (MFIs), whose supersets are not frequent itemsets. In this study, an algorithm, named right-hand side expanding (RHSE), which can accurately find all MFIs, was proposed. First, an Expanding Operation was designed, which, starting from any given frequent itemset, could add items using certain rules and form some supersets of given frequent itemsets. In addition, these supersets were all MFIs. Next, this operator was used to add items by taking all frequent 1-itemsets as the starting point alternately, and all MFIs were found in the end. Due to the special design of the Expanding Operation, each MFI could be found. Moreover, the path found was unique, which avoided the algorithm redundancy in temporal and spatial complexity. This algorithm, which has a high operating rate, is applicable to the big data of high-dimensional mass transactions as it is capable of avoiding the computing redundancy and finding all MFIs. In the end, a detailed experimental report on 10 open standard transaction sets was given in this study, including the big data calculation results of million-class transactions.

...read moreread less

References

PDF

Open Access

More filters

Journal Article•DOI•

Mining frequent patterns without candidate generation

[...]

Jiawei Han¹, Jian Pei¹, Yiwen Yin¹•Institutions (1)

Simon Fraser University¹

16 May 2000

TL;DR: This study proposes a novel frequent pattern tree (FP-tree) structure, which is an extended prefix-tree structure for storing compressed, crucial information about frequent patterns, and develops an efficient FP-tree-based mining method, FP-growth, for mining the complete set of frequent patterns by pattern fragment growth.

...read moreread less

Abstract: Mining frequent patterns in transaction databases, time-series databases, and many other kinds of databases has been studied popularly in data mining research. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. However, candidate set generation is still costly, especially when there exist prolific patterns and/or long patterns.In this study, we propose a novel frequent pattern tree (FP-tree) structure, which is an extended prefix-tree structure for storing compressed, crucial information about frequent patterns, and develop an efficient FP-tree-based mining method, FP-growth, for mining the complete set of frequent patterns by pattern fragment growth. Efficiency of mining is achieved with three techniques: (1) a large database is compressed into a highly condensed, much smaller data structure, which avoids costly, repeated database scans, (2) our FP-tree-based mining adopts a pattern fragment growth method to avoid the costly generation of a large number of candidate sets, and (3) a partitioning-based, divide-and-conquer method is used to decompose the mining task into a set of smaller tasks for mining confined patterns in conditional databases, which dramatically reduces the search space. Our performance study shows that the FP-growth method is efficient and scalable for mining both long and short frequent patterns, and is about an order of magnitude faster than the Apriori algorithm and also faster than some recently reported new frequent pattern mining methods.

...read moreread less

6,118 citations

Proceedings Article•DOI•

Efficiently mining long patterns from databases

[...]

Roberto J. Bayardo¹•Institutions (1)

IBM¹

01 Jun 1998

TL;DR: A pattern-mining algorithm that scales roughly linearly in the number of maximal patterns embedded in a database irrespective of the length of the longest pattern, compared with previous algorithms that scale exponentially with longest pattern length.

...read moreread less

Abstract: We present a pattern-mining algorithm that scales roughly linearly in the number of maximal patterns embedded in a database irrespective of the length of the longest pattern. In comparison, previous algorithms based on Apriori scale exponentially with longest pattern length. Experiments on real data show that when the patterns are long, our algorithm is more efficient by an order of magnitude or more.

...read moreread less

1,477 citations

MAFIA: A maximal frequent itemset algorithm for transactional databases

[...]

Douglas Burdick¹, Manuel Calimlim¹, Johannes Gehrke¹•Institutions (1)

Cornell University¹

01 Jan 2001

TL;DR: In this paper, a new algorithm for mining maximal frequent itemsets from a transactional database is presented, which integrates a depth-first traversal of the itemset lattice with effective pruning mechanisms.

...read moreread less

Abstract: We present a new algorithm for mining maximal frequent itemsets from a transactional database. Our algorithm is especially efficient when the itemsets in the database are very long. The search strategy of our algorithm integrates a depth-first traversal of the itemset lattice with effective pruning mechanisms. Our implementation of the search strategy combines a vertical bitmap representation of the database with an efficient relative bitmap compression schema. In a thorough experimental analysis of our algorithm on real data, we isolate the effect of the individual components of the algorithm. Our performance numbers show that our algorithm outperforms previous work by a factor of three to five.

...read moreread less

747 citations

Journal Article•DOI•

A Tree Projection Algorithm for Generation of Frequent Item Sets

[...]

Ramesh C. Agarwal¹, Charu C. Aggarwal¹, V. V. V. Prasad¹•Institutions (1)

IBM¹

01 Mar 2001-Journal of Parallel and Distributed Computing

TL;DR: This paper provides an implementation of the tree projection method which is up to one order of magnitude faster than other recent techniques in the literature and has a well-structured data access pattern which provides data locality and reuse of data for multiple levels of the cache.

...read moreread less

602 citations

Proceedings Article•DOI•

Generating non-redundant association rules

[...]

Mohammed J. Zaki¹•Institutions (1)

Rensselaer Polytechnic Institute¹

01 Aug 2000

TL;DR: A new framework for associations based on the concept of closed frequent itemsets is presented, with the number of non-redundant rules produced by the new approach is exponentially smaller than the rule set from the traditional approach.

...read moreread less

Abstract: The traditional association rule mining framework produces many redundant rules. The extent of redundancy is a lot larger than previously suspected. We present a new framework for associations based on the concept of closed frequent itemsets. The number of non-redundant rules produced by the new approach is exponentially (in the length of the longest frequent itemset) smaller than the rule set from the traditional approach. Experiments using several “hard” as well as “easy” real and synthetic databases confirm the utility of our framework in terms of reduction in the number of rules presented to the user, and in terms of time.

...read moreread less

542 citations