scispace - formally typeset
Search or ask a question
Topic

Apriori algorithm

About: Apriori algorithm is a research topic. Over the lifetime, 4105 publications have been published within this topic receiving 85965 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: This work develops a family of algorithms for solving association-rule mining, employing a combination of random sampling and hashing techniques, and provides analysis of the algorithms developed and experiments on real and synthetic data to obtain a comparative performance analysis.
Abstract: Association-rule mining has heretofore relied on the condition of high support to do its work efficiently. In particular, the well-known a priori algorithm is only effective when the only rules of interest are relationships that occur very frequently. However, there are a number of applications, such as data mining, identification of similar Web documents, clustering, and collaborative filtering, where the rules of interest have comparatively few instances in the data. In these cases, we must look for highly correlated items, or possibly even causal relationships between infrequent items. We develop a family of algorithms for solving this problem, employing a combination of random sampling and hashing techniques. We provide analysis of the algorithms developed and conduct experiments on real and synthetic data to obtain a comparative performance analysis.

370 citations

Journal ArticleDOI
TL;DR: It is shown that the support of frequent non-key patterns can be inferred from frequent key patterns without accessing the database, and PASCAL is among the most efficient algorithms for mining frequent patterns.
Abstract: In this paper, we propose the algorithm PASCAL which introduces a novel optimization of the well-known algorithm Apriori. This optimization is based on a new strategy called pattern counting inference that relies on the concept of key patterns. We show that the support of frequent non-key patterns can be inferred from frequent key patterns without accessing the database. Experiments comparing PASCAL to the three algorithms Apriori, Close and Max-Miner, show that PASCAL is among the most efficient algorithms for mining frequent patterns.

335 citations

Book ChapterDOI
01 Jan 2002
TL;DR: An implementation of the well-known apriori algorithm for the induction of association rules that is based on the concept of a prefix tree, which may be used in order to minimize the time needed to find the frequent itemsets as well as to reduce the amount of memory needed to store the counters.
Abstract: We describe an implementation of the well-known apriori algorithm for the induction of association rules [Agrawal et al. (1993), Agrawal et al. (1996)] that is based on the concept of a prefix tree. While the idea to use this type of data structure is not new, there are several ways to organize the nodes of such a tree, to encode the items, and to organize the transactions, which may be used in order to minimize the time needed to find the frequent itemsets as well as to reduce the amount of memory needed to store the counters. Consequently, our emphasis is less on concepts, but on implementation issues, which, however, can make a considerable difference in applications.

305 citations

Journal ArticleDOI
TL;DR: This paper proposes a novel principle and its algorithm that derive the characteristic patterns which frequently appear in graph-structured data and can derive all frequent induced subgraphs from both directed and undirected graph structured data having loops having loops with labeled or unlabeled nodes and links.
Abstract: Basket Analysis, which is a standard method for data mining, derives frequent itemsets from database However, its mining ability is limited to transaction data consisting of items In reality, there are many applications where data are described in a more structural way, eg chemical compounds and Web browsing history There are a few approaches that can discover characteristic patterns from graph-structured data in the field of machine learning However, almost all of them are not suitable for such applications that require a complete search for all frequent subgraph patterns in the data In this paper, we propose a novel principle and its algorithm that derive the characteristic patterns which frequently appear in graph-structured data Our algorithm can derive all frequent induced subgraphs from both directed and undirected graph structured data having loops (including self-loops) with labeled or unlabeled nodes and links Its performance is evaluated through the applications to Web browsing pattern analysis and chemical carcinogenesis analysis

298 citations

Book ChapterDOI
23 Mar 1998
TL;DR: This work presents a new algorithm which combines both the bottom-up and top-down searches, and produces the maximum frequent set, i.e., the set containing all maximal frequent itemsets, which therefore specifies immediately all frequent itemets.
Abstract: Discovering frequent itemsets is a key problem in important data mining applications, such as the discovery of association rules, strong rules, episodes, and minimal keys. Typical algorithms for solving this problem operate in a bottom-up breadth-first search direction. The computation starts from frequent 1-itemsets (minimal length frequent itemsets) and continues until all maximal (length) frequent itemsets are found. During the execution, every frequent itemset is explicitly considered. Such algorithms perform reasonably well when all maximal frequent itemsets are short. However, performance drastically decreases when some of the maximal frequent itemsets are relatively long. We present a new algorithm which combines both the bottom-up and top-down directions. The main search direction is still bottom-up but a restricted search is conducted in the top-down direction. This search is used only for maintaining and updating a new data structure we designed, the maximum frequent candidate set. It is used to prune candidates in the bottom-up search. As a very important characteristic of the algorithm, it is not necessary to explicitly examine every frequent itemset. Therefore it performs well even when some maximal frequent itemsets are long. As its output, the algorithm produces the maximum frequent set, i.e., the set containing all maximal frequent itemsets, which therefore specifies immediately all frequent itemsets. We evaluate the performance of the algorithm using a well-known benchmark database. The improvements can be up to several orders of magnitude, compared to the best current algorithms.

296 citations


Network Information
Related Topics (5)
Fuzzy logic
151.2K papers, 2.3M citations
84% related
Cluster analysis
146.5K papers, 2.9M citations
83% related
Support vector machine
73.6K papers, 1.7M citations
83% related
Software
130.5K papers, 2M citations
80% related
Feature extraction
111.8K papers, 2.1M citations
78% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202392
2022291
2021180
2020216
2019209
2018223