scispace - formally typeset
Search or ask a question

Showing papers on "Apriori algorithm published in 2011"


Proceedings ArticleDOI
29 Jul 2011
TL;DR: The results show that the strategy designed in this paper can archive higher efficiency when doing frequent item set mining in cloud computing environment.
Abstract: Cloud computing provides cheap and efficient solutions of storing and analyzing mass data. It is very important to research the data mining strategy based on cloud computing from the theoretical view and practical view. In this paper, the strategy of mining association rules in cloud computing environment is focused on. Firstly, cloud computing, Hadoop, MapReduce programming model, Apriori algorithm and parallel association rule mining algorithm are introduced. Then, a parallel association rule mining strategy adapting to the cloud computing environment is designed. It includes data set division method, data set allocation method, improved Apriori algorithm, and the implementation procedure of the improved Apriori algorithm on MapReduce. Finally, the Hadoop platform is built and the experiment for testing performance of the strategy as well as the improved algorithm has been done. The results show that the strategy designed in this paper can archive higher efficiency when doing frequent item set mining in cloud computing environment.

108 citations


Proceedings ArticleDOI
26 Sep 2011
TL;DR: GPA priori, a GPU-accelerated implementation of Frequent Item set Mining (FIM) with the potential for GPGPUs in speeding up data mining algorithms.
Abstract: In this paper we describe GPA priori, a GPU-accelerated implementation of Frequent Item set Mining (FIM). We tested our implementation with an Nvidia Tesla T10 graphic processor and demonstrate up to 100X speedup as compared with several state-of-the-art FIM algorithms on a CPU. In order to map the Apriori algorithm onto the SIMD execution model, we have designed a "static bitset" memory structure to represent the input database. This data structure improves upon the traditional approach of the vertical data layout in state-of-the art Apriori implementations. In our implementation, we perform a parallelized version of the support counting step on the GPU. Experimental results show that GPA priori consistently outperforms CPU-based Apriori implementations. Our results demonstrate the potential for GPGPUs in speeding up data mining algorithms.

70 citations


Journal ArticleDOI
TL;DR: A new procedure and an improved model to mine association rules of customer values are proposed and these effective rules are suggested to apply on a customized marketing function of a CRM system for enhancing their customer values to be higher grades.
Abstract: This paper proposes a new procedure and an improved model to mine association rules of customer values. The market of online shopping industry in Taiwan is the research area. Research method adopts Ward's method to partition online shopping market into three markets. Customer values are refined from an improved RFMDR model (based on RFM/RFMD model). Supervised Apriori algorithm is employed with customer values to create association rules. These effective rules are suggested to apply on a customized marketing function of a CRM system for enhancing their customer values to be higher grades.

69 citations


Book ChapterDOI
29 Aug 2011
TL;DR: RP-Tree is proposed, a method for mining a subset of rare association rules using a tree structure, and an information gain component that helps to identify the more interesting association rules.
Abstract: Most association rule mining techniques concentrate on finding frequent rules. However, rare association rules are in some cases more interesting than frequent association rules since rare rules represent unexpected or unknown associations. All current algorithms for rare association rule mining use an Apriori level-wise approach which has computationally expensive candidate generation and pruning steps. We propose RP-Tree, a method for mining a subset of rare association rules using a tree structure, and an information gain component that helps to identify the more interesting association rules. Empirical evaluation using a range of real world datasets shows that RP-Tree itemset and rule generation is more time efficient than modified versions of FP-Growth and ARIMA, and discovers 92-100% of all the interesting rare association rules.

68 citations


Journal ArticleDOI
TL;DR: An improved Apriori algorithm of mining the association rules in this paper is put forward to solve the bottleneck problems of the traditional Aprioro algorithm.

60 citations


Journal ArticleDOI
TL;DR: This paper looks at the use of missing value and clustering algorithm for a data mining approach to help predict the crimes patterns and fast up the process of solving crime.
Abstract: about national security has increased after the 26/11 Mumbai attack. In this paper we look at the use of missing value and clustering algorithm for a data mining approach to help predict the crimes patterns and fast up the process of solving crime. We will concentrate on MV algorithm and Apriori algorithm with some enhancements to aid in the process of filling the missing value and identification of crime patterns. We applied these techniques to real crime data. We also use semi- supervised learning technique in this paper for knowledge discovery from the crime records and to help increase the predictive accuracy.

45 citations


28 Apr 2011
TL;DR: A performance comparison between Apriori and FP-Growth algorithms in generating association rules is presented in Rapid Miner and the result obtain from the data processing are analyzed in SPSS.
Abstract: In this article we present a performance comparison between Apriori and FP-Growth algorithms in generating association rules. The two algorithms are implemented in Rapid Miner and the result obtain from the data processing are analyzed in SPSS. The database used in the development of processes contains a series of transactions belonging to an online shop.

43 citations


Proceedings ArticleDOI
26 Apr 2011
TL;DR: A new and improved FP tree with a table and a new algorithm for mining association rules is proposed, which mines all possible frequent item set without generating the conditional FP tree.
Abstract: Discovery of association rules among the large number of item sets is considered as an important aspect of data mining. The ever increasing demand of finding pattern from large data enhances the association rule mining. Researchers developed a lot of algorithms and techniques for determining association rules. The main problem is the generation of candidate set. Among the existing techniques, the frequent pattern growth (FP-growth) method is the most efficient and scalable approach. It mines the frequent item set without candidate set generation. The main obstacle of FP growth is, it generates a massive number of conditional FP tree. In this research paper, we proposed a new and improved FP tree with a table and a new algorithm for mining association rules. This algorithm mines all possible frequent item set without generating the conditional FP tree. It also provides the frequency of frequent items, which is used to estimate the desired association rules

36 citations


Book ChapterDOI
24 May 2011
TL;DR: This work presents an efficient mining algorithm that is inspired by the well-known Eclat algorithm and its improvements, and extends its approach to a total of twelve specific similarity measures and a generalized form.
Abstract: While in standard frequent item set mining one tries to find item sets the support of which exceeds a user-specified threshold (minimum support) in a database of transactions, we strive to find item sets for which the similarity of their covers (that is, the sets of transactions containing them) exceeds a user-specified threshold. Starting from the generalized Jaccard index we extend our approach to a total of twelve specific similarity measures and a generalized form. We present an efficient mining algorithm that is inspired by the well-known Eclat algorithm and its improvements. By reporting experiments on several benchmark data sets we demonstrate that the runtime penalty incurred by the more complex (but also more informative) item set assessment is bearable and that the approach yields high quality and more useful item sets.

33 citations


Proceedings ArticleDOI
21 Mar 2011
TL;DR: It is demonstrated that on specific data sets, which occur particularly often in the area of gene expression analysis, the implementations of the cumulative approach significantly outperform enumeration approaches to frequent item set mining.
Abstract: Most known frequent item set mining algorithms work by enumerating candidate item sets and pruning infrequent candidates. An alternative method, which works by intersecting transactions, is much less researched. To the best of our knowledge, there are only two basic algorithms: a cumulative scheme, which is based on a repository with which new transactions are intersected, and the Carpenter algorithm, which enumerates and intersects candidate transaction sets. These approaches yield the set of so-called closed frequent item sets, since any such item set can be represented as the intersection of some subset of the given transactions. In this paper we describe a considerably improved implementation scheme of the cumulative approach, which relies on a prefix tree representation of the already found intersections. In addition, we present an improved way of implementing the Carpenter algorithm. We demonstrate that on specific data sets, which occur particularly often in the area of gene expression analysis, our implementations significantly outperform enumeration approaches to frequent item set mining.

31 citations


Proceedings ArticleDOI
06 Sep 2011
TL;DR: By using the improved Apriori algorithm, the number of frequent item sets is much less and the running time is significantly shortened as well as the performance is enhanced then finally the algorithm is improved.
Abstract: The paper analyzes the basic ideas and the shortcomings of Apriori algorithm, studies the current major improvement strategies of it. In order to solve the low performance and efficiency of the algorithm caused by its generating lots of candidate sets and scanning the transaction database repeatedly, it studies the pruning optimization and transaction reduction strategies, and on this basis, the improved Apriori algorithm based on pruning optimization and transaction reduction is put forward. According to the performance comparison in the simulation experiment, by using the improved algorithm, the number of frequent item sets is much less and the running time is significantly shortened as well as the performance is enhanced then finally the algorithm is improved.

Journal ArticleDOI
TL;DR: A new algorithm is proposed to extract the best rules in a reasonable time of execution but without assuring always the optimal solutions, based on Quantum Swarm Evolutionary approach; it gives better results compared to genetic algorithms.
Abstract: Association rule mining aims to extract the correlation or causal structure existing between a set of frequent items or attributes in a database. These associations are represented by mean of rules. Association rule mining methods provide a robust but non-linear approach to find associations. The search for association rules is an NP-complete problem. The complexities mainly arise in exploiting huge number of database transactions and items. In this article we propose a new algorithm to extract the best rules in a reasonable time of execution but without assuring always the optimal solutions. The new derived algorithm is based on Quantum Swarm Evolutionary approach; it gives better results compared to genetic algorithms.

Journal ArticleDOI
TL;DR: A transaction database distribution scheme that divides the frequent item set mining task in a top-down fashion and is used in the design of two new parallel frequentitem set mining algorithms that replicate the items that correspond to the separator.
Abstract: We introduce a transaction database distribution scheme that divides the frequent item set mining task in a top-down fashion. Our method operates on a graph where vertices correspond to frequent items and edges correspond to frequent item sets of size two. We show that partitioning this graph by a vertex separator is sufficient to decide a distribution of the items such that the subdatabases determined by the item distribution can be mined independently. This distribution entails an amount of data replication, which may be reduced by setting appropriate weights to vertices. The data distribution scheme is used in the design of two new parallel frequent item set mining algorithms. Both algorithms replicate the items that correspond to the separator. NoClique replicates the work induced by the separator and NoClique2 computes the same work collectively. Computational load balancing and minimization of redundant or collective work may be achieved by assigning appropriate load estimates to vertices. The experiments show favorable speedups on a system with small-to-medium number of processors for synthetic and real-world databases.

Proceedings ArticleDOI
26 Jul 2011
TL;DR: This paper points out the bottleneck of classical Apriori's algorithm, and presents an improved association rule mining algorithm based on reducing the times of scanning candidate sets and using hash tree to store candidate itemsets.
Abstract: This paper points out the bottleneck of classical Apriori's algorithm, presents an improved association rule mining algorithm. The new algorithm is based on reducing the times of scanning candidate sets and using hash tree to store candidate itemsets. According to the running result of the algorithm, the processing time of mining is decreased and the efficiency of algorithm has improved.1

Proceedings ArticleDOI
29 Jul 2011
TL;DR: A new optimization algorithm called APRIORI-IMPROVE based on the insufficient of Apriori is proposed, which uses hash structure to generate L2, uses an efficient horizontal data representation and optimized strategy of storage to save time and space.
Abstract: In this study, it proposes a new optimization algorithm called APRIORI-IMPROVE based on the insufficient of Apriori. APRIORI-IMPROVE algorithm presents optimizations on 2-items generation, transactions compression and so on. APRIORI-IMPROVE uses hash structure to generate L2, uses an efficient horizontal data representation and optimized strategy of storage to save time and space. The performance study shows that APRIORI-IMPROVE is much faster than Apriori.

Journal ArticleDOI
01 Jan 2011
TL;DR: RNIA is extended by introducing stability factor that enables to evaluate rules in a more flexible way and by developing a question-answering functionality that enables decision makers to analyze data gathered in NISs in case there are no pre-extracted rules that may address specified conditions.
Abstract: Rough Non-deterministic Information Analysis (RNIA) is a rough set-based data analysis framework for Non-deterministic Information Systems (NISs). RNIA-related algorithms and software tools developed so far for rule generation provide good characteristics of NISs and can be successfully applied to decision making based on non-deterministic data. In this paper, we extend RNIA by introducing stability factor that enables to evaluate rules in a more flexible way and by developing a question-answering functionality that enables decision makers to analyze data gathered in NISs in case there are no pre-extracted rules that may address specified conditions.

Journal ArticleDOI
TL;DR: In this paper, an adaptation of association rules for label ranking is proposed, which is illustrated in this work with APRIORI Algorithm, essentially consists of using variations of the support and confidence measures based on ranking similarity functions.
Abstract: Recently, a number of learning algorithms have been adapted for label ranking, including instance-based and tree-based methods. In this paper, we propose an adaptation of association rules for label ranking. The adaptation, which is illustrated in this work with APRIORI Algorithm, essentially consists of using variations of the support and confidence measures based on ranking similarity functions that are suitable for label ranking. We also adapt the method to make a prediction from the possibly conflicting consequents of the rules that apply to an example. Despite having made our adaptation from a very simple variant of association rules for classification, the results clearly show that the method is making valid predictions. Additionally, they show that it competes well with state-of-the-art label ranking algorithms.

01 Jan 2011
TL;DR: This paper proposes an improved algorithm based on the Ant Colony Optimization algorithm that can optimize the result generated by Apriori algorithm using Ant colony optimization algorithm.
Abstract: Association rule mining is an important topic in data mining field. In a given large database of customer transactions. Each transaction consists of items purchased by a customer in a visit. Apriori algorithm that generates all significant association rules between items in the database. On the basis of the association rule mining and Apriori algorithm, this paper proposes an improved algorithm based on the Ant Colony Optimization algorithm. We can optimize the result generated by Apriori algorithm using Ant colony optimization algorithm. The algorithm improved result produces by Apriori algorithm. Ant Colony Optimization (ACO) is a metaheuristic inspired by the foraging behavior of ant colonies. ACO was introduced by Dorigo and has evolved significantly in the last few years.

Journal ArticleDOI
TL;DR: The Neural Network Associative Classification system is used in this paper in order to improve its accuracy and is compared with the previous Classification Based Association on four datasets from UCI machine learning repository.
Abstract: and association rule mining are two basic tasks of Data Mining. Classification rule mining is used to discover a small set of rules in the database to form an accurate classifier. Association rules mining has been used to reveal all interesting relationships in a potentially large database. An Apriori approach, which was used to generate the association rules from frequent patterns, turn out to generate a huge time-intensive query called as iceberg query. Various researches have been done under the Apriori-like approach to improve performance of the frequent pattern mining tasks but the results were not as much as expected due to many scans on the dataset. This project aims to propose a flexible way of mining frequent patterns by extending the idea of the Associative Classification methods. For better performance, the Neural Network Association Classification system is proposed here to be one of the approaches for building accurate and efficient classifiers. In this paper, the Neural Network Association Classification system is used in order to improve its accuracy. The structure of the network reflects the knowledge uncovered in the previous discovery phase. The trained network is then used to classify unseen data. The performance of the Neural Network Associative Classification system is compared with the previous Classification Based Association on four datasets from UCI machine learning repository.

01 Jan 2011
TL;DR: This paper looks at use of missing value and clustering algorithm for crime data using data mining and uses semi-supervised learning technique here for knowledge discovery from the crime records and to help increase the predictive accuracy.
Abstract: s - Crime is a behavior disorder that is an integrated result of social, economical and environmental factors. Crimes are a social nuisance and cost our society dearly in several ways. Any research that can help in solving crimes faster will pay for itself. In this paper we look at use of missing value and clustering algorithm for crime data using data mining. We will look at MV algorithm and Apriori algorithm with some enhancements to aid in the process of filling the missing value and identification of crime patterns. We applied these techniques to real crime data from a city police department. We also use semi-supervised learning technique here for knowledge discovery from the crime records and to help increase the predictive accuracy.

Proceedings ArticleDOI
01 Sep 2011
TL;DR: Results from the mining process show a correlation between the data (association rules) including the support and confidence that can be analyzed, which will give additional consideration for owners of Minimarket X to make the further decision.
Abstract: Market-Basket Analysis is a process to analyze the habits of buyers to find the relationship between different items in their market basket. The discovery of these relationships can help the merchant to develop a sales strategy by considering the items frequently purchased together by customers. In this research, the data mining with market basket analysis method is implemented, where it can analyze the buying habit of the customers. The testing is conducted in Minimarket X. Searching for frequent itemsets performed by Apriori algorithm to get the items that often appear in the database and the pair of items in one transaction. Pair of items that exceed the minimum support will be included into the frequent itemsets are selected. Frequent itemsets that exceed the minimum support will generate association rules after decoding. One frequent itemsets can generate association rules and find the confidence, which is uses a hybrid-dimension association rules. The test results show, the application can generate the information what kind of products are frequently bought in the same time by the customers according to Hybrid-dimension Association Rules criteria. Results from the mining process show a correlation between the data (association rules) including the support and confidence that can be analyzed. This information will give additional consideration for owners of Minimarket X to make the further decision.

01 Jan 2011
TL;DR: This paper looks at use of missing value and clustering algorithm for crime data using data mining and uses semi-supervised learning technique for knowledge discovery from the crime records and to help in the predictive accuracy of MV algorithm and Apriori algorithm.
Abstract: Crime is a behavior deviation from normal activity of the norms giving people losses and harms. Crimes are a social nuisance and cost our society dearly in several ways. In this paper we look at use of missing value and clustering algorithm for crime data using data mining. We will look at MV algorithm and Apriori algorithm with some enhancements to aid in the process of filling the missing value and iden- tification of crime patterns. We applied these techniques to real crime data. Crime prevention is a significant issue that people are dealing with for centuries. We also use semi-supervised learning technique in this paper for knowledge discovery from the crime records and to help in- crease the predictive accuracy. Index Terms— Crime-patterns, clustering, data mining, law-enforcement, Apriori. —————————— a —————————— 1 I

01 Jan 2011
TL;DR: This paper illustrates the apriori algorithm disadvantages and utilization of attributes which can improve the efficiency of apriori algorithm.
Abstract: In data mining a number of algorithms has been proposed. Each algorithm has a different objective. A lot of research has been done on these various data mining fields and algorithms. Extraction of valuable data from large dataset is an emerging problem. Apriori algorithm is the algorithm to extract association rules from dataset. Apriori algorithm is not an efficient algorithm as it is a time consuming algorithm in case of large dataset. With the time a number of changes proposed in Apriori to enhance the performance in term of time and number of database passes. This paper illustrate the apriori algorithm disadvantages and utilization of attributes which can improve the efficiency of apriori algorithm.

Book ChapterDOI
25 Jun 2011
TL;DR: In this article, a software tool for rule generation in incomplete information databases is developed, focusing on three kinds of information incompleteness: non-deterministic information, missing values, and intervals.
Abstract: This paper advances rule generation in Lipski's incomplete information databases, and develops a software tool for rule generation. We focus on three kinds of information incompleteness. The first is non-deterministic information, the second is missing values, and the third is intervals. For intervals, we introduce the concept of a resolution. Three kinds of information incompleteness are uniformly handled by NIS-Apriori algorithm. An overview of a prototype system in Prolog is presented.

01 Jan 2011
TL;DR: This paper elaborates upon the use of association rule mining in extracting patterns that occur frequently within a dataset and showcases the implementation of the Apriori algorithm in mining association rules from a dataset containing sales transactions of a retail store.
Abstract: Computers and software play an integral part in the working of businesses and organisations. An immense amount of data is generated with the use of software. These large datasets need to be analysed for useful information that would benefit organisations, businesses and individuals by supporting decision making and providing valuable knowledge. Data mining is an approach that aids in fulfilling this requirement. Data mining is the process of applying mathematical, statistical and machine learning techniques on large quantities of data (such as a data warehouse) with the intention of uncovering hidden patterns, often previously unknown. Data mining involves three general approaches to extracting useful information from large data sets, namely, classification, clustering and association rule mining. This paper elaborates upon the use of association rule mining in extracting patterns that occur frequently within a dataset and showcases the implementation of the Apriori algorithm in mining association rules from a dataset containing sales transactions of a retail store.

Proceedings ArticleDOI
01 Dec 2011
TL;DR: A method to hide fuzzy association rule is proposed, in which, the fuzzified data is mined using modified apriori algorithm in order to extract rules and identify sensitive rules.
Abstract: Data mining is the process of extracting hidden patterns from data. With the explosion of data at a tremendous rate, data mining is essential to extract useful information. Association rule mining is a method of finding correlation relationships among large set of data items. A rule is characterized as sensitive if its disclosure risk is above a certain confidence value. Sensitive rules should not be disclosed to the public, as they can be used to infer sensitive data and provide an advantage for the business competitors. Techniques for hiding association rules are limited to binary items. But, real world data consists of quantitative values. In this paper, a method to hide fuzzy association rule is proposed, in which, the fuzzified data is mined using modified apriori algorithm in order to extract rules and identify sensitive rules. The sensitive rules are hidden by decreasing the support value of Right Hand Side (RHS) of the rule. A framework for automated generation of membership function is also proposed. Experimental results of the proposed approach demonstrate efficient information hiding with minimum side effects.

Journal ArticleDOI
TL;DR: An algorithm for generating a sample from the database that can replace the entire database for generating association rules and is aimed at keeping a balance between accuracy and speed is presented.
Abstract: Classical data mining algorithms require expensive passes over the entire database to generate frequent items and hence to generate association rules. With the increase in the size of database, it is becoming very difficult to handle large amount of data for computation. One of the solutions to this problem is to generate sample from the database that acts as representative of the entire database for finding association rules in such a way that the distance of the sample from the complete database is minimal. Choosing correct sample that could represent data is not an easy task. Many algorithms have been proposed in the past. Some of them are computationally fast while others give better accuracy. In this paper, we present an algorithm for generating a sample from the database that can replace the entire database for generating association rules and is aimed at keeping a balance between accuracy and speed. The algorithm that is proposed takes into account the average number of small, medium and large 1-itemset in the database and average weight of the transactions to define threshold condition for the transactions. Set of transactions that satisfy the threshold condition is chosen as the representative for the entire database. The effectiveness of the proposed algorithm has been tested over several runs of database generated by IBM synthetic data generator. A vivid comparative performance evaluation of the proposed technique with the existing sampling techniques for comparing the accuracy and speed has also been carried out.

Proceedings ArticleDOI
17 Sep 2011
TL;DR: The concept of data mining and its an important branch - association rules is introduced and the basic concept of association rules, the basic model of mining association rules are described, and the classical algorithm of association Rules is introduced.
Abstract: This paper introduces the concept of data mining and its an important branch - association rules, describes the basic concept of association rules, the basic model of mining association rules, introduces the classical algorithm of association rules, and then classified discusses the association rules mining from several angles such as width, depth, partition, sampling and incremental updating Finally, this paper prospects the association rules mining

Journal Article
TL;DR: The performance study shows that the FP-growth method is efficient and scalable and is about an order of magnitude faster than the Apriori algorithm.
Abstract: Association rule mining is one of the most popular data mining methods. However, mining association rules often results in a very large number of found rules, leaving the analyst with the task to go through all the rules and discover interesting ones. In this paper, we present the performance comparison of Apriori and FP-growth algorithms. The performance is analyzed based on the execution time for different number of instances and confidence in Super market data set. These algorithms are presented together with some experimental data. Our performance study shows that the FP-growth method is efficient and scalable and is about an order of magnitude faster than the Apriori algorithm

Journal ArticleDOI
TL;DR: The experimental results demonstrate the effectiveness of the proposed approach in generating high utility association rules that can be lucratively applied for business development.
Abstract: Association rule mining has been an area of active research in the field of knowledge discovery and numerous algorithms have been developed to this end. Of late, data mining researchers have improved upon the quality of association rule mining for business development by incorporating the influential factors like value (utility), quantity of items sold (weight) and more, for the mining of association patterns. In this paper, we propose an efficient approach based on weight factor and utility for effectual mining of significant association rules. Initially, the proposed approach makes use of the traditional Apriorialgorithm to generate a set of association rules from a database. The proposed approach exploits the anti-monotone property of the Apriori algorithm, which states that for a k-itemset to be frequent all (k-1) subsets of this itemset also have to be frequent. Subsequently, the set of association rules mined are subjected to weightage (W-gain) and utility (U-gain) constraints, and for every association rule mined, a combined utility weighted score (UW-Score) is computed. Ultimately, we determine a subset of valuable association rules based on the UW-Score computed. The experimental results demonstrate the effectiveness of the proposed approach in generating high utility association rules that can be lucratively applied for business development. Key words: Association rule mining (ARM), frequent itemset, utility, weightage, apriori, utility gain (U-gain), weighted gain (W-gain), utility factor (U-factor), utility weighted score (UW-score).