Showing papers on "Apriori algorithm published in 2011"

PDF

Open Access

Proceedings Article•DOI•

The Strategy of Mining Association Rule Based on Cloud Computing

[...]

Lingjuan Li¹, Min Zhang¹•Institutions (1)

29 Jul 2011

TL;DR: The results show that the strategy designed in this paper can archive higher efficiency when doing frequent item set mining in cloud computing environment.

...read moreread less

Abstract: Cloud computing provides cheap and efficient solutions of storing and analyzing mass data. It is very important to research the data mining strategy based on cloud computing from the theoretical view and practical view. In this paper, the strategy of mining association rules in cloud computing environment is focused on. Firstly, cloud computing, Hadoop, MapReduce programming model, Apriori algorithm and parallel association rule mining algorithm are introduced. Then, a parallel association rule mining strategy adapting to the cloud computing environment is designed. It includes data set division method, data set allocation method, improved Apriori algorithm, and the implementation procedure of the improved Apriori algorithm on MapReduce. Finally, the Hadoop platform is built and the experiment for testing performance of the strategy as well as the improved algorithm has been done. The results show that the strategy designed in this paper can archive higher efficiency when doing frequent item set mining in cloud computing environment.

...read moreread less

108 citations

Proceedings Article•DOI•

GPApriori: GPU-Accelerated Frequent Itemset Mining

[...]

Fan Zhang¹, Yan Zhang¹, Jason D. Bakos¹•Institutions (1)

University of South Carolina¹

26 Sep 2011

TL;DR: GPA priori, a GPU-accelerated implementation of Frequent Item set Mining (FIM) with the potential for GPGPUs in speeding up data mining algorithms.

...read moreread less

Abstract: In this paper we describe GPA priori, a GPU-accelerated implementation of Frequent Item set Mining (FIM). We tested our implementation with an Nvidia Tesla T10 graphic processor and demonstrate up to 100X speedup as compared with several state-of-the-art FIM algorithms on a CPU. In order to map the Apriori algorithm onto the SIMD execution model, we have designed a "static bitset" memory structure to represent the input database. This data structure improves upon the traditional approach of the vertical data layout in state-of-the art Apriori implementations. In our implementation, we perform a parallelized version of the support counting step on the GPU. Experimental results show that GPA priori consistently outperforms CPU-based Apriori implementations. Our results demonstrate the potential for GPGPUs in speeding up data mining algorithms.

...read moreread less

70 citations

Journal Article•DOI•

To mine association rules of customer values via a data mining procedure with improved model: An empirical case study

[...]

Wen-Yu Chiang¹•Institutions (1)

Aletheia University¹

01 Mar 2011-Expert Systems With Applications

TL;DR: A new procedure and an improved model to mine association rules of customer values are proposed and these effective rules are suggested to apply on a customized marketing function of a CRM system for enhancing their customer values to be higher grades.

...read moreread less

Abstract: This paper proposes a new procedure and an improved model to mine association rules of customer values. The market of online shopping industry in Taiwan is the research area. Research method adopts Ward's method to partition online shopping market into three markets. Customer values are refined from an improved RFMDR model (based on RFM/RFMD model). Supervised Apriori algorithm is employed with customer values to create association rules. These effective rules are suggested to apply on a customized marketing function of a CRM system for enhancing their customer values to be higher grades.

...read moreread less

69 citations

Book Chapter•DOI•

RP-Tree: rare pattern tree mining

[...]

Sidney Tsang¹, Yun Sing Koh¹, Gillian Dobbie¹•Institutions (1)

University of Auckland¹

29 Aug 2011

TL;DR: RP-Tree is proposed, a method for mining a subset of rare association rules using a tree structure, and an information gain component that helps to identify the more interesting association rules.

...read moreread less

Abstract: Most association rule mining techniques concentrate on finding frequent rules. However, rare association rules are in some cases more interesting than frequent association rules since rare rules represent unexpected or unknown associations. All current algorithms for rare association rule mining use an Apriori level-wise approach which has computationally expensive candidate generation and pruning steps. We propose RP-Tree, a method for mining a subset of rare association rules using a tree structure, and an information gain component that helps to identify the more interesting association rules. Empirical evaluation using a range of real world datasets shows that RP-Tree itemset and rule generation is more time efficient than modified versions of FP-Growth and ARIMA, and discovers 92-100% of all the interesting rare association rules.

...read moreread less

68 citations

Journal Article•DOI•

An Improved Apriori Algorithm Based On the Boolean Matrix and Hadoop

[...]

Honglie Yu¹, Jun Wen¹, Hongmei Wang¹, Li Jun¹•Institutions (1)

University of Electronic Science and Technology of China¹

01 Jan 2011-Procedia Engineering

TL;DR: An improved Apriori algorithm of mining the association rules in this paper is put forward to solve the bottleneck problems of the traditional Aprioro algorithm.

...read moreread less

60 citations

Journal Article•DOI•

An Enhanced Algorithm to Predict a Future Crime using Data Mining

[...]

A. Malathi, S. Santhosh Baboo

31 May 2011-International Journal of Computer Applications

TL;DR: This paper looks at the use of missing value and clustering algorithm for a data mining approach to help predict the crimes patterns and fast up the process of solving crime.

...read moreread less

Abstract: about national security has increased after the 26/11 Mumbai attack. In this paper we look at the use of missing value and clustering algorithm for a data mining approach to help predict the crimes patterns and fast up the process of solving crime. We will concentrate on MV algorithm and Apriori algorithm with some enhancements to aid in the process of filling the missing value and identification of crime patterns. We applied these techniques to real crime data. We also use semi- supervised learning technique in this paper for knowledge discovery from the crime records and to help increase the predictive accuracy.

...read moreread less

45 citations

Performance comparison of apriori and FP-growth algorithms in generating association rules

[...]

Daniel Hunyadi¹•Institutions (1)

Lucian Blaga University of Sibiu¹

28 Apr 2011

TL;DR: A performance comparison between Apriori and FP-Growth algorithms in generating association rules is presented in Rapid Miner and the result obtain from the data processing are analyzed in SPSS.

...read moreread less

Abstract: In this article we present a performance comparison between Apriori and FP-Growth algorithms in generating association rules. The two algorithms are implemented in Rapid Miner and the result obtain from the data processing are analyzed in SPSS. The database used in the development of processes contains a series of transactions belonging to an online shop.

...read moreread less

43 citations

Proceedings Article•DOI•

An Improved Frequent Pattern Tree Based Association Rule Mining Technique

[...]

A. B. M. Rezbaul Islam¹, Tae-Sun Chung¹•Institutions (1)

Ajou University¹

26 Apr 2011

TL;DR: A new and improved FP tree with a table and a new algorithm for mining association rules is proposed, which mines all possible frequent item set without generating the conditional FP tree.

...read moreread less

Abstract: Discovery of association rules among the large number of item sets is considered as an important aspect of data mining. The ever increasing demand of finding pattern from large data enhances the association rule mining. Researchers developed a lot of algorithms and techniques for determining association rules. The main problem is the generation of candidate set. Among the existing techniques, the frequent pattern growth (FP-growth) method is the most efficient and scalable approach. It mines the frequent item set without candidate set generation. The main obstacle of FP growth is, it generates a massive number of conditional FP tree. In this research paper, we proposed a new and improved FP tree with a table and a new algorithm for mining association rules. This algorithm mines all possible frequent item set without generating the conditional FP tree. It also provides the frequency of frequent items, which is used to estimate the desired association rules

...read moreread less

36 citations

Book Chapter•DOI•

Item set mining based on cover similarity

[...]

Marc Segond, Christian Borgelt

24 May 2011

TL;DR: This work presents an efficient mining algorithm that is inspired by the well-known Eclat algorithm and its improvements, and extends its approach to a total of twelve specific similarity measures and a generalized form.

...read moreread less

Abstract: While in standard frequent item set mining one tries to find item sets the support of which exceeds a user-specified threshold (minimum support) in a database of transactions, we strive to find item sets for which the similarity of their covers (that is, the sets of transactions containing them) exceeds a user-specified threshold. Starting from the generalized Jaccard index we extend our approach to a total of twelve specific similarity measures and a generalized form. We present an efficient mining algorithm that is inspired by the well-known Eclat algorithm and its improvements. By reporting experiments on several benchmark data sets we demonstrate that the runtime penalty incurred by the more complex (but also more informative) item set assessment is bearable and that the approach yields high quality and more useful item sets.

...read moreread less

33 citations

Proceedings Article•DOI•

Finding closed frequent item sets by intersecting transactions

[...]

Christian Borgelt, Xiaoyuan Yang¹, Rubén Nogales-Cadenas², Pedro Carmona-Sáez², Alberto Pascual-Montano³ - Show less +1 more•Institutions (3)

Telefónica¹, Complutense University of Madrid², Spanish National Research Council³

21 Mar 2011

TL;DR: It is demonstrated that on specific data sets, which occur particularly often in the area of gene expression analysis, the implementations of the cumulative approach significantly outperform enumeration approaches to frequent item set mining.

...read moreread less

Abstract: Most known frequent item set mining algorithms work by enumerating candidate item sets and pruning infrequent candidates. An alternative method, which works by intersecting transactions, is much less researched. To the best of our knowledge, there are only two basic algorithms: a cumulative scheme, which is based on a repository with which new transactions are intersected, and the Carpenter algorithm, which enumerates and intersects candidate transaction sets. These approaches yield the set of so-called closed frequent item sets, since any such item set can be represented as the intersection of some subset of the given transactions. In this paper we describe a considerably improved implementation scheme of the cumulative approach, which relies on a prefix tree representation of the already found intersections. In addition, we present an improved way of implementing the Carpenter algorithm. We demonstrate that on specific data sets, which occur particularly often in the area of gene expression analysis, our implementations significantly outperform enumeration approaches to frequent item set mining.

...read moreread less

31 citations

Proceedings Article•DOI•

An improved Apriori algorithm based on pruning optimization and transaction reduction

[...]

Zhuang Chen¹, Shibang Cai¹, Qiulin Song¹, Chonglai Zhu¹•Institutions (1)

Chongqing University of Technology¹

06 Sep 2011

TL;DR: By using the improved Apriori algorithm, the number of frequent item sets is much less and the running time is significantly shortened as well as the performance is enhanced then finally the algorithm is improved.

...read moreread less

Abstract: The paper analyzes the basic ideas and the shortcomings of Apriori algorithm, studies the current major improvement strategies of it. In order to solve the low performance and efficiency of the algorithm caused by its generating lots of candidate sets and scanning the transaction database repeatedly, it studies the pruning optimization and transaction reduction strategies, and on this basis, the improved Apriori algorithm based on pruning optimization and transaction reduction is put forward. According to the performance comparison in the simulation experiment, by using the improved algorithm, the number of frequent item sets is much less and the running time is significantly shortened as well as the performance is enhanced then finally the algorithm is improved.

...read moreread less

Journal Article•DOI•

ORIGINAL ARTICLE: A Quantum Swarm Evolutionary Algorithm for mining association rules in large databases

[...]

Mourad Ykhlef¹•Institutions (1)

King Saud University¹

01 Jan 2011-Journal of King Saud University - Computer and Information Sciences archive

TL;DR: A new algorithm is proposed to extract the best rules in a reasonable time of execution but without assuring always the optimal solutions, based on Quantum Swarm Evolutionary approach; it gives better results compared to genetic algorithms.

...read moreread less

Abstract: Association rule mining aims to extract the correlation or causal structure existing between a set of frequent items or attributes in a database. These associations are represented by mean of rules. Association rule mining methods provide a robust but non-linear approach to find associations. The search for association rules is an NP-complete problem. The complexities mainly arise in exploiting huge number of database transactions and items. In this article we propose a new algorithm to extract the best rules in a reasonable time of execution but without assuring always the optimal solutions. The new derived algorithm is based on Quantum Swarm Evolutionary approach; it gives better results compared to genetic algorithms.

...read moreread less

Journal Article•DOI•

Parallel Frequent Item Set Mining with Selective Item Replication

[...]

Eray Özkural¹, Bora Uçar², Cevdet Aykanat¹•Institutions (2)

Bilkent University¹, École normale supérieure de Lyon²

01 Oct 2011-IEEE Transactions on Parallel and Distributed Systems

TL;DR: A transaction database distribution scheme that divides the frequent item set mining task in a top-down fashion and is used in the design of two new parallel frequentitem set mining algorithms that replicate the items that correspond to the separator.

...read moreread less

Abstract: We introduce a transaction database distribution scheme that divides the frequent item set mining task in a top-down fashion. Our method operates on a graph where vertices correspond to frequent items and edges correspond to frequent item sets of size two. We show that partitioning this graph by a vertex separator is sufficient to decide a distribution of the items such that the subdatabases determined by the item distribution can be mined independently. This distribution entails an amount of data replication, which may be reduced by setting appropriate weights to vertices. The data distribution scheme is used in the design of two new parallel frequent item set mining algorithms. Both algorithms replicate the items that correspond to the separator. NoClique replicates the work induced by the separator and NoClique2 computes the same work collectively. Computational load balancing and minimization of redundant or collective work may be achieved by assigning appropriate load estimates to vertices. The experiments show favorable speedups on a system with small-to-medium number of processors for synthetic and real-world databases.

...read moreread less

Proceedings Article•DOI•

The research of improved association rules mining Apriori algorithm

[...]

Huiying Wang¹, Xiangwei Liu²•Institutions (2)

Beijing Institute of Foreign Trade¹, Tianjin University of Finance and Economics²

26 Jul 2011

TL;DR: This paper points out the bottleneck of classical Apriori's algorithm, and presents an improved association rule mining algorithm based on reducing the times of scanning candidate sets and using hash tree to store candidate itemsets.

...read moreread less

Abstract: This paper points out the bottleneck of classical Apriori's algorithm, presents an improved association rule mining algorithm. The new algorithm is based on reducing the times of scanning candidate sets and using hash tree to store candidate itemsets. According to the running result of the algorithm, the processing time of mining is decreased and the efficiency of algorithm has improved.1

...read moreread less

Proceedings Article•DOI•

An improved apriori algorithm

[...]

Rui Chang, Zhiyi Liu

29 Jul 2011

TL;DR: A new optimization algorithm called APRIORI-IMPROVE based on the insufficient of Apriori is proposed, which uses hash structure to generate L2, uses an efficient horizontal data representation and optimized strategy of storage to save time and space.

...read moreread less

Abstract: In this study, it proposes a new optimization algorithm called APRIORI-IMPROVE based on the insufficient of Apriori. APRIORI-IMPROVE algorithm presents optimizations on 2-items generation, transactions compression and so on. APRIORI-IMPROVE uses hash structure to generate L2, uses an efficient horizontal data representation and optimized strategy of storage to save time and space. The performance study shows that APRIORI-IMPROVE is much faster than Apriori.

...read moreread less

Journal Article•DOI•

Stable rule extraction and decision making in rough non-deterministic information analysis

[...]

Hiroshi Sakai¹, Hitomi Okuma², Michinori Nakata³, Dominik Ślȩzak⁴•Institutions (4)

Kyushu Institute of Technology¹, Oita University², Josai International University³, University of Warsaw⁴

01 Jan 2011

TL;DR: RNIA is extended by introducing stability factor that enables to evaluate rules in a more flexible way and by developing a question-answering functionality that enables decision makers to analyze data gathered in NISs in case there are no pre-extracted rules that may address specified conditions.

...read moreread less

Abstract: Rough Non-deterministic Information Analysis (RNIA) is a rough set-based data analysis framework for Non-deterministic Information Systems (NISs). RNIA-related algorithms and software tools developed so far for rule generation provide good characteristics of NISs and can be successfully applied to decision making based on non-deterministic data. In this paper, we extend RNIA by introducing stability factor that enables to evaluate rules in a more flexible way and by developing a question-answering functionality that enables decision makers to analyze data gathered in NISs in case there are no pre-extracted rules that may address specified conditions.

...read moreread less

Journal Article•DOI•

Mining association rules for label ranking

[...]

Cláudio Rebelo de Sá, Carlos Soares, Alípio Mário Jorge, Paulo J. Azevedo, Joaquim Pinto da Costa - Show less +1 more

01 Jan 2011-Lecture Notes in Computer Science

TL;DR: In this paper, an adaptation of association rules for label ranking is proposed, which is illustrated in this work with APRIORI Algorithm, essentially consists of using variations of the support and confidence measures based on ranking similarity functions.

...read moreread less

Abstract: Recently, a number of learning algorithms have been adapted for label ranking, including instance-based and tree-based methods. In this paper, we propose an adaptation of association rules for label ranking. The adaptation, which is illustrated in this work with APRIORI Algorithm, essentially consists of using variations of the support and confidence measures based on ranking similarity functions that are suitable for label ranking. We also adapt the method to make a prediction from the possibly conflicting consequents of the rules that apply to an example. Despite having made our adaptation from a very simple variant of association rules for classification, the results clearly show that the method is making valid predictions. Additionally, they show that it competes well with state-of-the-art label ranking algorithms.

...read moreread less

Optimization of Association Rule Mining Apriori Algorithm Using ACO

[...]

Badri Patel, Vijay K Chaudhari, Rajneesh K Karan, Y. K. Rana

01 Jan 2011

TL;DR: This paper proposes an improved algorithm based on the Ant Colony Optimization algorithm that can optimize the result generated by Apriori algorithm using Ant colony optimization algorithm.

...read moreread less

Abstract: Association rule mining is an important topic in data mining field. In a given large database of customer transactions. Each transaction consists of items purchased by a customer in a visit. Apriori algorithm that generates all significant association rules between items in the database. On the basis of the association rule mining and Apriori algorithm, this paper proposes an improved algorithm based on the Ant Colony Optimization algorithm. We can optimize the result generated by Apriori algorithm using Ant colony optimization algorithm. The algorithm improved result produces by Apriori algorithm. Ant Colony Optimization (ACO) is a metaheuristic inspired by the foraging behavior of ant colonies. ACO was introduced by Dorigo and has evolved significantly in the last few years.

...read moreread less

Journal Article•DOI•

A Classification Technique using Associative Classification

[...]

Prachitee B. Shekhawat, Sheetal S. Dhande

30 Apr 2011-International Journal of Computer Applications

TL;DR: The Neural Network Associative Classification system is used in this paper in order to improve its accuracy and is compared with the previous Classification Based Association on four datasets from UCI machine learning repository.

...read moreread less

Abstract: and association rule mining are two basic tasks of Data Mining. Classification rule mining is used to discover a small set of rules in the database to form an accurate classifier. Association rules mining has been used to reveal all interesting relationships in a potentially large database. An Apriori approach, which was used to generate the association rules from frequent patterns, turn out to generate a huge time-intensive query called as iceberg query. Various researches have been done under the Apriori-like approach to improve performance of the frequent pattern mining tasks but the results were not as much as expected due to many scans on the dataset. This project aims to propose a flexible way of mining frequent patterns by extending the idea of the Associative Classification methods. For better performance, the Neural Network Association Classification system is proposed here to be one of the approaches for building accurate and efficient classifiers. In this paper, the Neural Network Association Classification system is used in order to improve its accuracy. The structure of the network reflects the knowledge uncovered in the previous discovery phase. The trained network is then used to classify unseen data. The performance of the Neural Network Associative Classification system is compared with the previous Classification Based Association on four datasets from UCI machine learning repository.

...read moreread less

Algorithmic Crime Prediction Model Based on the Analysis of Crime Clusters

[...]

A. Malathi, S. Santhosh Baboo

01 Jan 2011

TL;DR: This paper looks at use of missing value and clustering algorithm for crime data using data mining and uses semi-supervised learning technique here for knowledge discovery from the crime records and to help increase the predictive accuracy.

...read moreread less

Abstract: s - Crime is a behavior disorder that is an integrated result of social, economical and environmental factors. Crimes are a social nuisance and cost our society dearly in several ways. Any research that can help in solving crimes faster will pay for itself. In this paper we look at use of missing value and clustering algorithm for crime data using data mining. We will look at MV algorithm and Apriori algorithm with some enhancements to aid in the process of filling the missing value and identification of crime patterns. We applied these techniques to real crime data from a city police department. We also use semi-supervised learning technique here for knowledge discovery from the crime records and to help increase the predictive accuracy.

...read moreread less

Proceedings Article•DOI•

Data mining market basket analysis' using hybrid-dimension association rules, case study in Minimarket X

[...]

Djoni Haryadi Setiabudi¹, Gregorius Satia Budhi¹, I Wayan Jatu Purnama¹, Agustinus Noertjahyana¹•Institutions (1)

Petra Christian University¹

01 Sep 2011

TL;DR: Results from the mining process show a correlation between the data (association rules) including the support and confidence that can be analyzed, which will give additional consideration for owners of Minimarket X to make the further decision.

...read moreread less

Abstract: Market-Basket Analysis is a process to analyze the habits of buyers to find the relationship between different items in their market basket. The discovery of these relationships can help the merchant to develop a sales strategy by considering the items frequently purchased together by customers. In this research, the data mining with market basket analysis method is implemented, where it can analyze the buying habit of the customers. The testing is conducted in Minimarket X. Searching for frequent itemsets performed by Apriori algorithm to get the items that often appear in the database and the pair of items in one transaction. Pair of items that exceed the minimum support will be included into the frequent itemsets are selected. Frequent itemsets that exceed the minimum support will generate association rules after decoding. One frequent itemsets can generate association rules and find the confidence, which is uses a hybrid-dimension association rules. The test results show, the application can generate the information what kind of products are frequently bought in the same time by the customers according to Hybrid-dimension Association Rules criteria. Results from the mining process show a correlation between the data (association rules) including the support and confidence that can be analyzed. This information will give additional consideration for owners of Minimarket X to make the further decision.

...read moreread less

Evolving Data Mining Algorithms on the Prevailing Crime Trend - An Intelligent Crime Prediction Model

[...]

A. Malathi, S. Santhosh Baboo

01 Jan 2011

TL;DR: This paper looks at use of missing value and clustering algorithm for crime data using data mining and uses semi-supervised learning technique for knowledge discovery from the crime records and to help in the predictive accuracy of MV algorithm and Apriori algorithm.

...read moreread less

Abstract: Crime is a behavior deviation from normal activity of the norms giving people losses and harms. Crimes are a social nuisance and cost our society dearly in several ways. In this paper we look at use of missing value and clustering algorithm for crime data using data mining. We will look at MV algorithm and Apriori algorithm with some enhancements to aid in the process of filling the missing value and iden- tification of crime patterns. We applied these techniques to real crime data. Crime prevention is a significant issue that people are dealing with for centuries. We also use semi-supervised learning technique in this paper for knowledge discovery from the crime records and to help in- crease the predictive accuracy. Index Terms— Crime-patterns, clustering, data mining, law-enforcement, Apriori. —————————— a —————————— 1 I

...read moreread less

Mining Efficient Association Rules Through Apriori Algorithm Using Attributes

[...]

Mamta Dhanda, Sonali Guglani, Gaurav Gupta

01 Jan 2011

TL;DR: This paper illustrates the apriori algorithm disadvantages and utilization of attributes which can improve the efficiency of apriori algorithm.

...read moreread less

Abstract: In data mining a number of algorithms has been proposed. Each algorithm has a different objective. A lot of research has been done on these various data mining fields and algorithms. Extraction of valuable data from large dataset is an emerging problem. Apriori algorithm is the algorithm to extract association rules from dataset. Apriori algorithm is not an efficient algorithm as it is a time consuming algorithm in case of large dataset. With the time a number of changes proposed in Apriori to enhance the performance in term of time and number of database passes. This paper illustrate the apriori algorithm disadvantages and utilization of attributes which can improve the efficiency of apriori algorithm.

...read moreread less

Book Chapter•DOI•

A prototype system for rule generation in Lipski's incomplete information databases

[...]

Hiroshi Sakai¹, Michinori Nakata², Dominik Ślęzak³•Institutions (3)

Kyushu Institute of Technology¹, Josai International University², University of Warsaw³

25 Jun 2011

TL;DR: In this article, a software tool for rule generation in incomplete information databases is developed, focusing on three kinds of information incompleteness: non-deterministic information, missing values, and intervals.

...read moreread less

Abstract: This paper advances rule generation in Lipski's incomplete information databases, and develops a software tool for rule generation. We focus on three kinds of information incompleteness. The first is non-deterministic information, the second is missing values, and the third is intervals. For intervals, we introduce the concept of a resolution. Three kinds of information incompleteness are uniformly handled by NIS-Apriori algorithm. An overview of a prototype system in Prolog is presented.

...read moreread less

Using Association Rule Mining for Extracting Product Sales Patterns in Retail Store Transactions

[...]

Pramod Prasad, G. H. Raisoni, Latesh Malik

01 Jan 2011

TL;DR: This paper elaborates upon the use of association rule mining in extracting patterns that occur frequently within a dataset and showcases the implementation of the Apriori algorithm in mining association rules from a dataset containing sales transactions of a retail store.

...read moreread less

Abstract: Computers and software play an integral part in the working of businesses and organisations. An immense amount of data is generated with the use of software. These large datasets need to be analysed for useful information that would benefit organisations, businesses and individuals by supporting decision making and providing valuable knowledge. Data mining is an approach that aids in fulfilling this requirement. Data mining is the process of applying mathematical, statistical and machine learning techniques on large quantities of data (such as a data warehouse) with the intention of uncovering hidden patterns, often previously unknown. Data mining involves three general approaches to extracting useful information from large data sets, namely, classification, clustering and association rule mining. This paper elaborates upon the use of association rule mining in extracting patterns that occur frequently within a dataset and showcases the implementation of the Apriori algorithm in mining association rules from a dataset containing sales transactions of a retail store.

...read moreread less

Proceedings Article•DOI•

A new method for preserving privacy in quantitative association rules using DSR approach with automated generation of membership function

[...]

K. Sathiyapriya¹, G. Sudha Sadasivam¹, N. Celin¹•Institutions (1)

PSG College of Technology¹

01 Dec 2011

TL;DR: A method to hide fuzzy association rule is proposed, in which, the fuzzified data is mined using modified apriori algorithm in order to extract rules and identify sensitive rules.

...read moreread less

Abstract: Data mining is the process of extracting hidden patterns from data. With the explosion of data at a tremendous rate, data mining is essential to extract useful information. Association rule mining is a method of finding correlation relationships among large set of data items. A rule is characterized as sensitive if its disclosure risk is above a certain confidence value. Sensitive rules should not be disclosed to the public, as they can be used to infer sensitive data and provide an advantage for the business competitors. Techniques for hiding association rules are limited to binary items. But, real world data consists of quantitative values. In this paper, a method to hide fuzzy association rule is proposed, in which, the fuzzified data is mined using modified apriori algorithm in order to extract rules and identify sensitive rules. The sensitive rules are hidden by decreasing the support value of Right Hand Side (RHS) of the rule. A framework for automated generation of membership function is also proposed. Experimental results of the proposed approach demonstrate efficient information hiding with minimum side effects.

...read moreread less

Journal Article•DOI•

A new approach for generating efficient sample from market basket data

[...]

B. Chandra¹, Shalini Bhaskar¹•Institutions (1)

Indian Institutes of Technology¹

01 Mar 2011-Expert Systems With Applications

TL;DR: An algorithm for generating a sample from the database that can replace the entire database for generating association rules and is aimed at keeping a balance between accuracy and speed is presented.

...read moreread less

Abstract: Classical data mining algorithms require expensive passes over the entire database to generate frequent items and hence to generate association rules. With the increase in the size of database, it is becoming very difficult to handle large amount of data for computation. One of the solutions to this problem is to generate sample from the database that acts as representative of the entire database for finding association rules in such a way that the distance of the sample from the complete database is minimal. Choosing correct sample that could represent data is not an easy task. Many algorithms have been proposed in the past. Some of them are computationally fast while others give better accuracy. In this paper, we present an algorithm for generating a sample from the database that can replace the entire database for generating association rules and is aimed at keeping a balance between accuracy and speed. The algorithm that is proposed takes into account the average number of small, medium and large 1-itemset in the database and average weight of the transactions to define threshold condition for the transactions. Set of transactions that satisfy the threshold condition is chosen as the representative for the entire database. The effectiveness of the proposed algorithm has been tested over several runs of database generated by IBM synthetic data generator. A vivid comparative performance evaluation of the proposed technique with the existing sampling techniques for comparing the accuracy and speed has also been carried out.

...read moreread less

Proceedings Article•DOI•

Research of Commonly Used Association Rules Mining Algorithm in Data Mining

[...]

Ruowu Zhong¹, Huiping Wang¹•Institutions (1)

Shaoguan University¹

17 Sep 2011

TL;DR: The concept of data mining and its an important branch - association rules is introduced and the basic concept of association rules, the basic model of mining association rules are described, and the classical algorithm of association Rules is introduced.

...read moreread less

Abstract: This paper introduces the concept of data mining and its an important branch - association rules, describes the basic concept of association rules, the basic model of mining association rules, introduces the classical algorithm of association rules, and then classified discusses the association rules mining from several angles such as width, depth, partition, sampling and incremental updating Finally, this paper prospects the association rules mining

...read moreread less

Journal Article•

Evaluating the performance of association rule miningalgorithms

[...]

K.Vanitha, R.Santhi

07 Jul 2011-Journal of Global Research in Computer Sciences

TL;DR: The performance study shows that the FP-growth method is efficient and scalable and is about an order of magnitude faster than the Apriori algorithm.

...read moreread less

Abstract: Association rule mining is one of the most popular data mining methods. However, mining association rules often results in a very large number of found rules, leaving the analyst with the task to go through all the rules and discover interesting ones. In this paper, we present the performance comparison of Apriori and FP-growth algorithms. The performance is analyzed based on the execution time for different number of instances and confidence in Super market data set. These algorithms are presented together with some experimental data. Our performance study shows that the FP-growth method is efficient and scalable and is about an order of magnitude faster than the Apriori algorithm

...read moreread less

Journal Article•DOI•

Mining utility-oriented association rules: An efficient approach based on profit and quantity

[...]

Parvinder S. S, hu, Dalvinder S. Dhaliwal

18 Jan 2011-International Journal of Physical Sciences

TL;DR: The experimental results demonstrate the effectiveness of the proposed approach in generating high utility association rules that can be lucratively applied for business development.

...read moreread less

Abstract: Association rule mining has been an area of active research in the field of knowledge discovery and numerous algorithms have been developed to this end. Of late, data mining researchers have improved upon the quality of association rule mining for business development by incorporating the influential factors like value (utility), quantity of items sold (weight) and more, for the mining of association patterns. In this paper, we propose an efficient approach based on weight factor and utility for effectual mining of significant association rules. Initially, the proposed approach makes use of the traditional Apriorialgorithm to generate a set of association rules from a database. The proposed approach exploits the anti-monotone property of the Apriori algorithm, which states that for a k-itemset to be frequent all (k-1) subsets of this itemset also have to be frequent. Subsequently, the set of association rules mined are subjected to weightage (W-gain) and utility (U-gain) constraints, and for every association rule mined, a combined utility weighted score (UW-Score) is computed. Ultimately, we determine a subset of valuable association rules based on the UW-Score computed. The experimental results demonstrate the effectiveness of the proposed approach in generating high utility association rules that can be lucratively applied for business development. Key words: Association rule mining (ARM), frequent itemset, utility, weightage, apriori, utility gain (U-gain), weighted gain (W-gain), utility factor (U-factor), utility weighted score (UW-score).

...read moreread less

Collapse