scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

A Survey on Association Rule Mining

21 Feb 2015-pp 212-216
TL;DR: This paper discusses association rule mining, one of the important aspect for data mining, which has many aspects like clustering, classification, association mining, outlier detection, regression etc.
Abstract: Task of extracting useful and interesting knowledge from large data is called data mining. It has many aspects like clustering, classification, association mining, outlier detection, regression etc. Among them association rule mining is one of the important aspect for data mining. Best example of association rule mining is market-basket analysis. Applications of association rule mining are stock analysis, web log mining, medical diagnosis, customer market analysis bioinformatics etc. In past, many algorithms were developed by researchers for Boolean and Fuzzy association rule mining such as Apriori, FP-tree, Fuzzy FP-tree etc. We are discussing them in detail in later section of this paper.
Citations
More filters
Journal ArticleDOI
TL;DR: This research provides a review of implicit aspect/features extraction techniques from different perspectives by making a comparison analysis for the techniques available for implicit term extraction with a brief summary of each technique.
Abstract: Sentiment analysis is a text classification branch, which is defined as the process of extracting sentiment terms (i.e. feature/aspect, or opinion) and determining their opinion semantic orientation. At aspect level, aspect extraction is the core task for sentiment analysis which can either be implicit or explicit aspects. The growth of sentiment analysis has resulted in the emergence of various techniques for both explicit and implicit aspect extraction. However, majority of the research attempts targeted explicit aspect extraction, which indicates that there is a lack of research on implicit aspect extraction. This research provides a review of implicit aspect/features extraction techniques from different perspectives. The first perspective is making a comparison analysis for the techniques available for implicit term extraction with a brief summary of each technique. The second perspective is classifying and comparing the performance, datasets, language used, and shortcomings of the available techniques. In this study, over 50 articles have been reviewed, however, only 45 articles on implicit aspect extraction that span from 2005 to 2016 were analyzed and discussed. Majority of the researchers on implicit aspects extraction rely heavily on unsupervised methods in their research, which makes about 64% of the 45 articles, followed by supervised methods of about 27%, and lastly semi-supervised of 9%. In addition, 25 articles conducted the research work solely on product reviews, and 5 articles conducted their research work using product reviews jointly with other types of data, which makes product review datasets the most frequently used data type compared to other types. Furthermore, research on implicit aspect features extraction has focused on English and Chinese languages compared to other languages. Finally, this review also provides recommendations for future research directions and open problems.

89 citations

Journal ArticleDOI
TL;DR: The proposed game theoretic approach for an IoT-based employee performance evaluation in industry effectively and efficiently automates the employee evaluation system and decision-making process in the industry.
Abstract: In the present scenario, performance evaluation of employees in industries is done manually, in which there are ample chances of biases. It is observed that manual employee evaluation systems can be efficiently eliminated by using ubiquitous sensing capabilities of Internet of things (IoT) devices to monitor industrial employees. However, none of the authors have used IoT data for automating performance evaluation systems of employees. Hence, this paper proposes a game theoretic approach for an IoT-based employee performance evaluation in industry. The system infers useful results about the performance of employees by mining data collected by the sensory nodes using the MapReduce model. The information hence obtained is then used to draw automated decisions for employees using game theory. The system is analyzed both experimentally and mathematically. The experimental evaluation compares the proposed system with other techniques of data mining and decision making. The results depict that the proposed system evaluates the performance of employees efficiently and shows a performance improvement over other techniques. The mathematical evaluation shows that correct evaluation of employees by the system effectively motivates employees in favor of the industry. Thus, the proposed system effectively and efficiently automates the employee evaluation system and decision-making process in the industry.

29 citations


Cites background from "A Survey on Association Rule Mining..."

  • ...In 2015, the authors in [18] presented a survey on algorithms proposed by various authors for association rule mining....

    [...]

Journal ArticleDOI
TL;DR: A novel technique is developed to process security event logs of a computer that has been assessed and configured by a security professional, extract key domain knowledge indicative of their expert decision making, and automatically apply learnt knowledge to previously unseen systems by non-experts.
Abstract: Vulnerability assessment and security configuration activities are heavily reliant on expert knowledge. This requirement often results in many systems being left insecure due to a lack of analysis expertise and access to specialist resources. It has long been known that a system’s event logs provide historical information depicting potential security breaches, as well as recording configuration activities. However, identifying and utilising knowledge within the event logs is challenging for the non-expert. In this paper, a novel technique is developed to process security event logs of a computer that has been assessed and configured by a security professional, extract key domain knowledge indicative of their expert decision making, and automatically apply learnt knowledge to previously unseen systems by non-experts. The technique converts event log entries into an object-based model and dynamically extracts associative rules. The rules are further improved in terms of quality using a temporal metric to autonomously establish temporal-association rules and acquire a domain model of expert configuration tasks. The acquired domain model and problem instance generated from a previously unseen system can then be used to produce a plan-of-action, which can be exploited by non-professionals to improve their system’s security. Empirical analysis is subsequently performed on 20 event logs, where identified plan traces are discussed in terms of accuracy and performance.

25 citations

Journal ArticleDOI
TL;DR: Results of retail commodities datasets indicate that the proposed dynamic-model-based time series data-mining method takes into consideration the time factor, and can uncover interesting sales patterns by which to improve cross-marketing quality.

22 citations

Journal ArticleDOI
TL;DR: In ubiquitous stream of data, the issue related to the association rules of fuzzy are considered, and a new method FFP_USTREAM (Fuzzy Frequent Pattern Ubiquitous Streams) are created, which achieves better results when compared to existing method.
Abstract: In ubiquitous stream of data, the issue related to the association rules of fuzzy are considered in this paper, and a new method FFP_USTREAM (Fuzzy Frequent Pattern Ubiquitous Streams) are created. The system of Ubiquitous real-time data incorporates fuzzy ideas with automated streams of data, utilizing the method of sliding window, to mine rules associated for fuzzy logic. The proposed strategy used a matrix of fuzzification where the input patterns related to level of membership to various classes. Attribution of specific classification or class is depending on estimation level of pattern membership. This technique is applied to ten benchmarks data set with classification of learning repository from the UCI machine. The motivation is to evaluate the proposed strategy and, in this manner the performance is compared to a pair of incredible supervised classification algorithms sigmoidal Recurrent Neural Network (RNN) and Adaptive Neuro-fuzzy Inference System (ANFIS). An efficient and complexity of the system are examined. Instances of genuine set of data are utilized to test the proposed system. Existing regression and classification methods is used to compare the proposed fuzzy method. Proposed fuzzy achieves better results when compared to existing method.

17 citations

References
More filters
Proceedings ArticleDOI
01 Jun 1993
TL;DR: An efficient algorithm is presented that generates all significant association rules between items in the database of customer transactions and incorporates buffer management and novel estimation and pruning techniques.
Abstract: We are given a large database of customer transactions. Each transaction consists of items purchased by a customer in a visit. We present an efficient algorithm that generates all significant association rules between items in the database. The algorithm incorporates buffer management and novel estimation and pruning techniques. We also present results of applying this algorithm to sales data obtained from a large retailing company, which shows the effectiveness of the algorithm.

15,645 citations

Journal ArticleDOI
16 May 2000
TL;DR: This study proposes a novel frequent pattern tree (FP-tree) structure, which is an extended prefix-tree structure for storing compressed, crucial information about frequent patterns, and develops an efficient FP-tree-based mining method, FP-growth, for mining the complete set of frequent patterns by pattern fragment growth.
Abstract: Mining frequent patterns in transaction databases, time-series databases, and many other kinds of databases has been studied popularly in data mining research. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. However, candidate set generation is still costly, especially when there exist prolific patterns and/or long patterns.In this study, we propose a novel frequent pattern tree (FP-tree) structure, which is an extended prefix-tree structure for storing compressed, crucial information about frequent patterns, and develop an efficient FP-tree-based mining method, FP-growth, for mining the complete set of frequent patterns by pattern fragment growth. Efficiency of mining is achieved with three techniques: (1) a large database is compressed into a highly condensed, much smaller data structure, which avoids costly, repeated database scans, (2) our FP-tree-based mining adopts a pattern fragment growth method to avoid the costly generation of a large number of candidate sets, and (3) a partitioning-based, divide-and-conquer method is used to decompose the mining task into a set of smaller tasks for mining confined patterns in conditional databases, which dramatically reduces the search space. Our performance study shows that the FP-growth method is efficient and scalable for mining both long and short frequent patterns, and is about an order of magnitude faster than the Apriori algorithm and also faster than some recently reported new frequent pattern mining methods.

6,118 citations

Journal Article
TL;DR: Data mining is the search for new, valuable, and nontrivial information in large volumes of data, a cooperative effort of humans and computers that is possible to put data-mining activities into one of two categories: Predictive data mining, which produces the model of the system described by the given data set, or Descriptive data mining which produces new, nontrivials information based on the available data set.
Abstract: Understand the need for analyses of large, complex, information-rich data sets. Identify the goals and primary tasks of the data-mining process. Describe the roots of data-mining technology. Recognize the iterative character of a data-mining process and specify its basic steps. Explain the influence of data quality on a data-mining process. Establish the relation between data warehousing and data mining. Data mining is an iterative process within which progress is defined by discovery, through either automatic or manual methods. Data mining is most useful in an exploratory analysis scenario in which there are no predetermined notions about what will constitute an "interesting" outcome. Data mining is the search for new, valuable, and nontrivial information in large volumes of data. It is a cooperative effort of humans and computers. Best results are achieved by balancing the knowledge of human experts in describing problems and goals with the search capabilities of computers. In practice, the two primary goals of data mining tend to be prediction and description. Prediction involves using some variables or fields in the data set to predict unknown or future values of other variables of interest. Description, on the other hand, focuses on finding patterns describing the data that can be interpreted by humans. Therefore, it is possible to put data-mining activities into one of two categories: Predictive data mining, which produces the model of the system described by the given data set, or Descriptive data mining, which produces new, nontrivial information based on the available data set.

4,646 citations


"A Survey on Association Rule Mining..." refers background in this paper

  • ...Furthermore, it should be able to discover patterns at various granularities [1]....

    [...]

  • ...Therefore, the association rules of one type describe a particular local pattern which can be easily interpreted and communicated [1]....

    [...]

01 Jan 2006
TL;DR: There have been many data mining books published in recent years, including Predictive Data Mining by Weiss and Indurkhya [WI98], Data Mining Solutions: Methods and Tools for Solving Real-World Problems by Westphal and Blaxton [WB98], Mastering Data Mining: The Art and Science of Customer Relationship Management by Berry and Linofi [BL99].
Abstract: The book Knowledge Discovery in Databases, edited by Piatetsky-Shapiro and Frawley [PSF91], is an early collection of research papers on knowledge discovery from data. The book Advances in Knowledge Discovery and Data Mining, edited by Fayyad, Piatetsky-Shapiro, Smyth, and Uthurusamy [FPSSe96], is a collection of later research results on knowledge discovery and data mining. There have been many data mining books published in recent years, including Predictive Data Mining by Weiss and Indurkhya [WI98], Data Mining Solutions: Methods and Tools for Solving Real-World Problems by Westphal and Blaxton [WB98], Mastering Data Mining: The Art and Science of Customer Relationship Management by Berry and Linofi [BL99], Building Data Mining Applications for CRM by Berson, Smith, and Thearling [BST99], Data Mining: Practical Machine Learning Tools and Techniques by Witten and Frank [WF05], Principles of Data Mining (Adaptive Computation and Machine Learning) by Hand, Mannila, and Smyth [HMS01], The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman [HTF01], Data Mining: Introductory and Advanced Topics by Dunham, and Data Mining: Multimedia, Soft Computing, and Bioinformatics by Mitra and Acharya [MA03]. There are also books containing collections of papers on particular aspects of knowledge discovery, such as Machine Learning and Data Mining: Methods and Applications edited by Michalski, Brakto, and Kubat [MBK98], and Relational Data Mining edited by Dzeroski and Lavrac [De01], as well as many tutorial notes on data mining in major database, data mining and machine learning conferences.

2,591 citations

Proceedings ArticleDOI
01 Jun 1996
TL;DR: This work deals with quantitative attributes by fine-partitioning the values of the attribute and then combining adjacent partitions as necessary and introduces measures of partial completeness which quantify the information lost due to partitioning.
Abstract: We introduce the problem of mining association rules in large relational tables containing both quantitative and categorical attributes. An example of such an association might be "10% of married people between age 50 and 60 have at least 2 cars". We deal with quantitative attributes by fine-partitioning the values of the attribute and then combining adjacent partitions as necessary. We introduce measures of partial completeness which quantify the information lost due to partitioning. A direct application of this technique can generate too many similar rules. We tackle this problem by using a "greater-than-expected-value" interest measure to identify the interesting rules in the output. We give an algorithm for mining such quantitative association rules. Finally, we describe the results of using this approach on a real-life dataset.

1,697 citations


"A Survey on Association Rule Mining..." refers background in this paper

  • ...in [2] given that the problems of minimum support (MinSup) and minimum confidence (MinConf) occurs during mapping in Boolean association rules....

    [...]

  • ...It finds frequent patterns, associations, correlations or informal structures among sets of items or objects in transactional databases and other information repositories [2]....

    [...]