AFARTICA: A Frequent Item-Set Mining Method Using Artificial Cell Division Algorithm
01 Jul 2019-Journal of Database Management (IGI Global)-Vol. 30, Iss: 3, pp 71-93
About: This article is published in Journal of Database Management.The article was published on 2019-07-01. It has received 4 citations till now. The article focuses on the topics: Division algorithm.
TL;DR: A multi-objective optimization algorithm was proposed to mine the frequent itemset of high-dimensional data and the practicability and validity of the proposed algorithm in big data were proven by experiments.
Abstract: The solution space of a frequent itemset generally presents exponential explosive growth because of the high-dimensional attributes of big data. However, the premise of the big data association rule analysis is to mine the frequent itemset in high-dimensional transaction sets. Traditional and classical algorithms such as the Apriori and FP-Growth algorithms, as well as their derivative algorithms, are unacceptable in practical big data analysis in an explosive solution space because of their huge consumption of storage space and running time. A multi-objective optimization algorithm was proposed to mine the frequent itemset of high-dimensional data. First, all frequent 2-itemsets were generated by scanning transaction sets based on which new items were added in as the objects of population evolution. Algorithms aim to search for the maximal frequent itemset to gather more non-void subsets because non-void subsets of frequent itemsets are all properties of frequent itemsets. During the operation of algorithms, lethal gene fragments in individuals were recorded and eliminated so that individuals may resurge. Finally, the set of the Pareto optimal solution of the frequent itemset was gained. All non-void subsets of these solutions were frequent itemsets, and all supersets are non-frequent itemsets. Finally, the practicability and validity of the proposed algorithm in big data were proven by experiments.
TL;DR: In this article , a linear table based algorithm was proposed to store more shared information and reduce the number of scans to the original dataset, and operations such as pruning and grouping were also used to optimize the algorithm.
Abstract: Aiming at the speed of frequent itemset mining, a new frequent itemset mining algorithm based on a linear table is proposed. The linear table can store more shared information and reduce the number of scans to the original dataset. Furthermore, operations such as pruning and grouping are also used to optimize the algorithm. For different datasets, the algorithm shows different mining speeds. (1) In sparse datasets, the algorithm achieves an average 45% improvement in mining speed over the bit combination algorithm, and a 2-3 times improvement for the classic FP-growth algorithm. (2) In dense datasets, the average improvement over the classic FP-growth algorithm is 50-70%. For the bit combination algorithm, there are dozens of times of improvement. In fact, the algorithm that integrates bit combinations with bitwise AND operation can effectively avoid recursive operations and it is beneficial to the parallelization. Further analysis shows that the linear table is easy to split to facilitate the data batch mining processing.
TL;DR: In this paper , a technique for extracting weighted temporal designs is proposed to rectify the identified issue in HUIM, which could find erroneous patterns because they don't look at the correlation of the retrieved patterns.
Abstract: One of the extremely deliberated data mining processes is HUIM (High Utility Itemset Mining). Its applications include text mining, e-learning bioinformatics, product recommendation, online click stream analysis, and market basket analysis. Likewise lot of potential applications availed in the HUIM. However, HUIM techniques could find erroneous patterns because they don’t look at the correlation of the retrieved patterns. Numerous approaches for mining related HUIs have been presented as an outcome. The computational expense of these methods continues to be problematic, both in terms of time and memory utilization. A technique for extracting weighted temporal designs is therefore suggested to rectify the identified issue in HUIM. Preprocessing of time series-based information into fuzzy item sets is the first step of the suggested technique. These feed the Graph Based Ant Colony Optimization (GACO) and Fuzzy C Means (FCM) clustering methodologies used in the Improvised Adaptable FCM (IAFCM) method. The suggested IAFCM technique achieves two objectives: optimal item placement in clusters using GACO; and ii) IAFCM clustering and information decrease in FCM cluster. The proposed technique yields high-quality clusters by GACO. Weighted sequential pattern mining, which considers facts of patterns with the highest weight and low frequency in a repository that is updated over a period, is used to locate the sequential patterns in these clusters. The outcomes of this methodology make evident that the IAFCM with GACO improves execution time when compared to other conventional approaches. Additionally, it enhances information representation by enhancing accuracy while using a smaller amount of memory.
TL;DR: In this paper, a right-hand side expanding (RHSE) algorithm was proposed to find all maximal frequent itemsets (MFIs), whose supersets are not frequent itemets.
Abstract: When it comes to association rule mining, all frequent itemsets are first found, and then the confidence level of association rules is calculated through the support degree of frequent itemsets. As all non-empty subsets in frequent itemsets are still frequent itemsets, all frequent itemsets can be acquired only by finding all maximal frequent itemsets (MFIs), whose supersets are not frequent itemsets. In this study, an algorithm, named right-hand side expanding (RHSE), which can accurately find all MFIs, was proposed. First, an Expanding Operation was designed, which, starting from any given frequent itemset, could add items using certain rules and form some supersets of given frequent itemsets. In addition, these supersets were all MFIs. Next, this operator was used to add items by taking all frequent 1-itemsets as the starting point alternately, and all MFIs were found in the end. Due to the special design of the Expanding Operation, each MFI could be found. Moreover, the path found was unique, which avoided the algorithm redundancy in temporal and spatial complexity. This algorithm, which has a high operating rate, is applicable to the big data of high-dimensional mass transactions as it is capable of avoiding the computing redundancy and finding all MFIs. In the end, a detailed experimental report on 10 open standard transaction sets was given in this study, including the big data calculation results of million-class transactions.
••16 May 2000
TL;DR: This study proposes a novel frequent pattern tree (FP-tree) structure, which is an extended prefix-tree structure for storing compressed, crucial information about frequent patterns, and develops an efficient FP-tree-based mining method, FP-growth, for mining the complete set of frequent patterns by pattern fragment growth.
Abstract: Mining frequent patterns in transaction databases, time-series databases, and many other kinds of databases has been studied popularly in data mining research. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. However, candidate set generation is still costly, especially when there exist prolific patterns and/or long patterns.In this study, we propose a novel frequent pattern tree (FP-tree) structure, which is an extended prefix-tree structure for storing compressed, crucial information about frequent patterns, and develop an efficient FP-tree-based mining method, FP-growth, for mining the complete set of frequent patterns by pattern fragment growth. Efficiency of mining is achieved with three techniques: (1) a large database is compressed into a highly condensed, much smaller data structure, which avoids costly, repeated database scans, (2) our FP-tree-based mining adopts a pattern fragment growth method to avoid the costly generation of a large number of candidate sets, and (3) a partitioning-based, divide-and-conquer method is used to decompose the mining task into a set of smaller tasks for mining confined patterns in conditional databases, which dramatically reduces the search space. Our performance study shows that the FP-growth method is efficient and scalable for mining both long and short frequent patterns, and is about an order of magnitude faster than the Apriori algorithm and also faster than some recently reported new frequent pattern mining methods.
••01 Jun 1998
TL;DR: A pattern-mining algorithm that scales roughly linearly in the number of maximal patterns embedded in a database irrespective of the length of the longest pattern, compared with previous algorithms that scale exponentially with longest pattern length.
Abstract: We present a pattern-mining algorithm that scales roughly linearly in the number of maximal patterns embedded in a database irrespective of the length of the longest pattern. In comparison, previous algorithms based on Apriori scale exponentially with longest pattern length. Experiments on real data show that when the patterns are long, our algorithm is more efficient by an order of magnitude or more.
01 Jan 2001
TL;DR: In this paper, a new algorithm for mining maximal frequent itemsets from a transactional database is presented, which integrates a depth-first traversal of the itemset lattice with effective pruning mechanisms.
Abstract: We present a new algorithm for mining maximal frequent itemsets from a transactional database. Our algorithm is especially efficient when the itemsets in the database are very long. The search strategy of our algorithm integrates a depth-first traversal of the itemset lattice with effective pruning mechanisms. Our implementation of the search strategy combines a vertical bitmap representation of the database with an efficient relative bitmap compression schema. In a thorough experimental analysis of our algorithm on real data, we isolate the effect of the individual components of the algorithm. Our performance numbers show that our algorithm outperforms previous work by a factor of three to five.
TL;DR: This paper provides an implementation of the tree projection method which is up to one order of magnitude faster than other recent techniques in the literature and has a well-structured data access pattern which provides data locality and reuse of data for multiple levels of the cache.
Abstract: In this paper we propose algorithms for generation of frequent item sets by successive construction of the nodes of a lexicographic tree of item sets. We discuss different strategies in generation and traversal of the lexicographic tree such as breadth-first search, depth-first search, or a combination of the two. These techniques provide different trade-offs in terms of the I/O, memory, and computational time requirements. We use the hierarchical structure of the lexicographic tree to successively project transactions at each node of the lexicographic tree and use matrix counting on this reduced set of transactions for finding frequent item sets. We tested our algorithm on both real and synthetic data. We provide an implementation of the tree projection method which is up to one order of magnitude faster than other recent techniques in the literature. The algorithm has a well-structured data access pattern which provides data locality and reuse of data for multiple levels of the cache. We also discuss methods for parallelization of the TreeProjection algorithm.
••01 Aug 2000
TL;DR: A new framework for associations based on the concept of closed frequent itemsets is presented, with the number of non-redundant rules produced by the new approach is exponentially smaller than the rule set from the traditional approach.
Abstract: The traditional association rule mining framework produces many redundant rules. The extent of redundancy is a lot larger than previously suspected. We present a new framework for associations based on the concept of closed frequent itemsets. The number of non-redundant rules produced by the new approach is exponentially (in the length of the longest frequent itemset) smaller than the rule set from the traditional approach. Experiments using several “hard” as well as “easy” real and synthetic databases confirm the utility of our framework in terms of reduction in the number of rules presented to the user, and in terms of time.
Related Papers (5)
12 Apr 2019
29 Jul 2011