scispace - formally typeset
Search or ask a question
Topic

Apriori algorithm

About: Apriori algorithm is a research topic. Over the lifetime, 4105 publications have been published within this topic receiving 85965 citations.


Papers
More filters
Proceedings Article
14 Aug 1997
TL;DR: This paper proposes an incremental updating technique based on negative borders, for the maintenance of association rules when new transaction data is added to or deleted from a transaction database.
Abstract: Efficient discovery of association rules in large databases is a well studied problem and several approaches have been proposed. However, it is non trivial to maintain the association rules current when the database is updated since, such updates could invalidate existing rules or introduce new rules. In this paper, we propose an incremental updating technique based on negative borders, for the maintenance of association rules when new transaction data is added to or deleted from a transaction database. An important feature of our algorithm is that it requires a full scan (exactly one) of the whole database only if the database update causes the negative border of the set of large itemsets to expand.

249 citations

Book ChapterDOI
20 May 2008
TL;DR: A tree-based mining algorithm is proposed to efficiently find frequent patterns from uncertain data, where each item in the transactions is associated with an existential probability.
Abstract: Many frequent pattern mining algorithms find patterns from traditional transaction databases, in which the content of each transaction--namely, items--is definitely known and precise. However, there are many real-life situations in which the content of transactions is uncertain. To deal with these situations, we propose a tree-based mining algorithm to efficiently find frequent patterns from uncertain data, where each item in the transactions is associated with an existential probability. Experimental results show the efficiency of our proposed algorithm.

228 citations

Proceedings ArticleDOI
20 Feb 2012
TL;DR: DPC features in dynamically combining candidates of various lengths and outperforms both the straight-forward algorithm SPC and the fixed passes combined counting algorithm FPC, and shows that all the three algorithms scale up linearly with respect to dataset sizes and cluster sizes.
Abstract: Many parallelization techniques have been proposed to enhance the performance of the Apriori-like frequent itemset mining algorithms. Characterized by both map and reduce functions, MapReduce has emerged and excels in the mining of datasets of terabyte scale or larger in either homogeneous or heterogeneous clusters. Minimizing the scheduling overhead of each map-reduce phase and maximizing the utilization of nodes in each phase are keys to successful MapReduce implementations. In this paper, we propose three algorithms, named SPC, FPC, and DPC, to investigate effective implementations of the Apriori algorithm in the MapReduce framework. DPC features in dynamically combining candidates of various lengths and outperforms both the straight-forward algorithm SPC and the fixed passes combined counting algorithm FPC. Extensive experimental results also show that all the three algorithms scale up linearly with respect to dataset sizes and cluster sizes.

225 citations

Proceedings ArticleDOI
01 May 2000
TL;DR: A method of estimating a tight upper bound on the statistical metric associated with any superset of an itemset, as well as the novel use of the resulting information of upper bounds to prune unproductive supersets while traversing itemset lattices is presented.
Abstract: We study how to efficiently compute significant association rules according to common statistical measures such as a chi-squared value or correlation coefficient. For this purpose, one might consider to use of the Apriori algorithm, but the algorithm needs major conversion, because none of these statistical metrics are anti-monotone, and the use of higher support for reducing the search space cannot guarantee solutions in its the search space. We here present a method of estimating a tight upper bound on the statistical metric associated with any superset of an itemset, as well as the novel use of the resulting information of upper bounds to prune unproductive supersets while traversing itemset lattices. Experimental tests demonstrate the efficiency of this method.

216 citations

Proceedings ArticleDOI
28 Jun 2009
TL;DR: This paper presents IPLoM (Iterative Partitioning Log Mining), a novel algorithm for the mining of clusters from event logs that outperforms the other algorithms statistically significantly, and is also able to achieve an average F- Measure performance 78% when the closest other algorithm achieves an F-Measure performance of 10%.
Abstract: The importance of event logs, as a source of information in systems and network management cannot be overemphasized. With the ever increasing size and complexity of today's event logs, the task of analyzing event logs has become cumbersome to carry out manually. For this reason recent research has focused on the automatic analysis of these log files. In this paper we present IPLoM (Iterative Partitioning Log Mining), a novel algorithm for the mining of clusters from event logs. Through a 3-Step hierarchical partitioning process IPLoM partitions log data into its respective clusters. In its 4th and final stage IPLoM produces cluster descriptions or line formats for each of the clusters produced. Unlike other similar algorithms IPLoM is not based on the Apriori algorithm and it is able to find clusters in data whether or not its instances appear frequently. Evaluations show that IPLoM outperforms the other algorithms statistically significantly, and it is also able to achieve an average F-Measure performance 78% when the closest other algorithm achieves an F-Measure performance of 10%.

212 citations


Network Information
Related Topics (5)
Fuzzy logic
151.2K papers, 2.3M citations
84% related
Cluster analysis
146.5K papers, 2.9M citations
83% related
Support vector machine
73.6K papers, 1.7M citations
83% related
Software
130.5K papers, 2M citations
80% related
Feature extraction
111.8K papers, 2.1M citations
78% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202392
2022291
2021180
2020216
2019209
2018223