scispace - formally typeset
Search or ask a question
Journal ArticleDOI

AFARTICA: A Frequent Item-Set Mining Method Using Artificial Cell Division Algorithm

About: This article is published in Journal of Database Management.The article was published on 2019-07-01. It has received 4 citations till now. The article focuses on the topics: Division algorithm.
Citations
More filters
Journal ArticleDOI
TL;DR: A multi-objective optimization algorithm was proposed to mine the frequent itemset of high-dimensional data and the practicability and validity of the proposed algorithm in big data were proven by experiments.
Abstract: The solution space of a frequent itemset generally presents exponential explosive growth because of the high-dimensional attributes of big data. However, the premise of the big data association rule analysis is to mine the frequent itemset in high-dimensional transaction sets. Traditional and classical algorithms such as the Apriori and FP-Growth algorithms, as well as their derivative algorithms, are unacceptable in practical big data analysis in an explosive solution space because of their huge consumption of storage space and running time. A multi-objective optimization algorithm was proposed to mine the frequent itemset of high-dimensional data. First, all frequent 2-itemsets were generated by scanning transaction sets based on which new items were added in as the objects of population evolution. Algorithms aim to search for the maximal frequent itemset to gather more non-void subsets because non-void subsets of frequent itemsets are all properties of frequent itemsets. During the operation of algorithms, lethal gene fragments in individuals were recorded and eliminated so that individuals may resurge. Finally, the set of the Pareto optimal solution of the frequent itemset was gained. All non-void subsets of these solutions were frequent itemsets, and all supersets are non-frequent itemsets. Finally, the practicability and validity of the proposed algorithm in big data were proven by experiments.

4 citations

Journal ArticleDOI
TL;DR: In this article , a linear table based algorithm was proposed to store more shared information and reduce the number of scans to the original dataset, and operations such as pruning and grouping were also used to optimize the algorithm.
Abstract: Aiming at the speed of frequent itemset mining, a new frequent itemset mining algorithm based on a linear table is proposed. The linear table can store more shared information and reduce the number of scans to the original dataset. Furthermore, operations such as pruning and grouping are also used to optimize the algorithm. For different datasets, the algorithm shows different mining speeds. (1) In sparse datasets, the algorithm achieves an average 45% improvement in mining speed over the bit combination algorithm, and a 2-3 times improvement for the classic FP-growth algorithm. (2) In dense datasets, the average improvement over the classic FP-growth algorithm is 50-70%. For the bit combination algorithm, there are dozens of times of improvement. In fact, the algorithm that integrates bit combinations with bitwise AND operation can effectively avoid recursive operations and it is beneficial to the parallelization. Further analysis shows that the linear table is easy to split to facilitate the data batch mining processing.
Journal ArticleDOI
TL;DR: In this paper , a technique for extracting weighted temporal designs is proposed to rectify the identified issue in HUIM, which could find erroneous patterns because they don't look at the correlation of the retrieved patterns.
Abstract: One of the extremely deliberated data mining processes is HUIM (High Utility Itemset Mining). Its applications include text mining, e-learning bioinformatics, product recommendation, online click stream analysis, and market basket analysis. Likewise lot of potential applications availed in the HUIM. However, HUIM techniques could find erroneous patterns because they don’t look at the correlation of the retrieved patterns. Numerous approaches for mining related HUIs have been presented as an outcome. The computational expense of these methods continues to be problematic, both in terms of time and memory utilization. A technique for extracting weighted temporal designs is therefore suggested to rectify the identified issue in HUIM. Preprocessing of time series-based information into fuzzy item sets is the first step of the suggested technique. These feed the Graph Based Ant Colony Optimization (GACO) and Fuzzy C Means (FCM) clustering methodologies used in the Improvised Adaptable FCM (IAFCM) method. The suggested IAFCM technique achieves two objectives: optimal item placement in clusters using GACO; and ii) IAFCM clustering and information decrease in FCM cluster. The proposed technique yields high-quality clusters by GACO. Weighted sequential pattern mining, which considers facts of patterns with the highest weight and low frequency in a repository that is updated over a period, is used to locate the sequential patterns in these clusters. The outcomes of this methodology make evident that the IAFCM with GACO improves execution time when compared to other conventional approaches. Additionally, it enhances information representation by enhancing accuracy while using a smaller amount of memory.
Journal ArticleDOI
TL;DR: In this paper, a right-hand side expanding (RHSE) algorithm was proposed to find all maximal frequent itemsets (MFIs), whose supersets are not frequent itemets.
Abstract: When it comes to association rule mining, all frequent itemsets are first found, and then the confidence level of association rules is calculated through the support degree of frequent itemsets. As all non-empty subsets in frequent itemsets are still frequent itemsets, all frequent itemsets can be acquired only by finding all maximal frequent itemsets (MFIs), whose supersets are not frequent itemsets. In this study, an algorithm, named right-hand side expanding (RHSE), which can accurately find all MFIs, was proposed. First, an Expanding Operation was designed, which, starting from any given frequent itemset, could add items using certain rules and form some supersets of given frequent itemsets. In addition, these supersets were all MFIs. Next, this operator was used to add items by taking all frequent 1-itemsets as the starting point alternately, and all MFIs were found in the end. Due to the special design of the Expanding Operation, each MFI could be found. Moreover, the path found was unique, which avoided the algorithm redundancy in temporal and spatial complexity. This algorithm, which has a high operating rate, is applicable to the big data of high-dimensional mass transactions as it is capable of avoiding the computing redundancy and finding all MFIs. In the end, a detailed experimental report on 10 open standard transaction sets was given in this study, including the big data calculation results of million-class transactions.
References
More filters
Journal ArticleDOI
16 May 2000
TL;DR: This study proposes a novel frequent pattern tree (FP-tree) structure, which is an extended prefix-tree structure for storing compressed, crucial information about frequent patterns, and develops an efficient FP-tree-based mining method, FP-growth, for mining the complete set of frequent patterns by pattern fragment growth.
Abstract: Mining frequent patterns in transaction databases, time-series databases, and many other kinds of databases has been studied popularly in data mining research. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. However, candidate set generation is still costly, especially when there exist prolific patterns and/or long patterns.In this study, we propose a novel frequent pattern tree (FP-tree) structure, which is an extended prefix-tree structure for storing compressed, crucial information about frequent patterns, and develop an efficient FP-tree-based mining method, FP-growth, for mining the complete set of frequent patterns by pattern fragment growth. Efficiency of mining is achieved with three techniques: (1) a large database is compressed into a highly condensed, much smaller data structure, which avoids costly, repeated database scans, (2) our FP-tree-based mining adopts a pattern fragment growth method to avoid the costly generation of a large number of candidate sets, and (3) a partitioning-based, divide-and-conquer method is used to decompose the mining task into a set of smaller tasks for mining confined patterns in conditional databases, which dramatically reduces the search space. Our performance study shows that the FP-growth method is efficient and scalable for mining both long and short frequent patterns, and is about an order of magnitude faster than the Apriori algorithm and also faster than some recently reported new frequent pattern mining methods.

6,118 citations

Proceedings ArticleDOI
Roberto J. Bayardo1
01 Jun 1998
TL;DR: A pattern-mining algorithm that scales roughly linearly in the number of maximal patterns embedded in a database irrespective of the length of the longest pattern, compared with previous algorithms that scale exponentially with longest pattern length.
Abstract: We present a pattern-mining algorithm that scales roughly linearly in the number of maximal patterns embedded in a database irrespective of the length of the longest pattern. In comparison, previous algorithms based on Apriori scale exponentially with longest pattern length. Experiments on real data show that when the patterns are long, our algorithm is more efficient by an order of magnitude or more.

1,477 citations

01 Jan 2001
TL;DR: In this paper, a new algorithm for mining maximal frequent itemsets from a transactional database is presented, which integrates a depth-first traversal of the itemset lattice with effective pruning mechanisms.
Abstract: We present a new algorithm for mining maximal frequent itemsets from a transactional database. Our algorithm is especially efficient when the itemsets in the database are very long. The search strategy of our algorithm integrates a depth-first traversal of the itemset lattice with effective pruning mechanisms. Our implementation of the search strategy combines a vertical bitmap representation of the database with an efficient relative bitmap compression schema. In a thorough experimental analysis of our algorithm on real data, we isolate the effect of the individual components of the algorithm. Our performance numbers show that our algorithm outperforms previous work by a factor of three to five.

747 citations

Journal ArticleDOI
TL;DR: This paper provides an implementation of the tree projection method which is up to one order of magnitude faster than other recent techniques in the literature and has a well-structured data access pattern which provides data locality and reuse of data for multiple levels of the cache.

602 citations

Proceedings ArticleDOI
01 Aug 2000
TL;DR: A new framework for associations based on the concept of closed frequent itemsets is presented, with the number of non-redundant rules produced by the new approach is exponentially smaller than the rule set from the traditional approach.
Abstract: The traditional association rule mining framework produces many redundant rules. The extent of redundancy is a lot larger than previously suspected. We present a new framework for associations based on the concept of closed frequent itemsets. The number of non-redundant rules produced by the new approach is exponentially (in the length of the longest frequent itemset) smaller than the rule set from the traditional approach. Experiments using several “hard” as well as “easy” real and synthetic databases confirm the utility of our framework in terms of reduction in the number of rules presented to the user, and in terms of time.

542 citations