Journal ArticleDOI
Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach
Reads0
Chats0
TLDR
A novel frequent-pattern tree (FP-tree) structure is proposed, which is an extended prefix-tree structure for storing compressed, crucial information about frequent patterns, and an efficient FP-tree-based mining method, FP-growth, is developed for mining the complete set of frequent patterns by pattern fragment growth.Abstract:
Mining frequent patterns in transaction databases, time-series databases, and many other kinds of databases has been studied popularly in data mining research. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. However, candidate set generation is still costly, especially when there exist a large number of patterns and/or long patterns.
In this study, we propose a novel frequent-pattern tree (FP-tree) structure, which is an extended prefix-tree structure for storing compressed, crucial information about frequent patterns, and develop an efficient FP-tree-based mining method, FP-growth, for mining the complete set of frequent patterns by pattern fragment growth. Efficiency of mining is achieved with three techniques: (1) a large database is compressed into a condensed, smaller data structure, FP-tree which avoids costly, repeated database scans, (2) our FP-tree-based mining adopts a pattern-fragment growth method to avoid the costly generation of a large number of candidate sets, and (3) a partitioning-based, divide-and-conquer method is used to decompose the mining task into a set of smaller tasks for mining confined patterns in conditional databases, which dramatically reduces the search space. Our performance study shows that the FP-growth method is efficient and scalable for mining both long and short frequent patterns, and is about an order of magnitude faster than the Apriori algorithm and also faster than some recently reported new frequent-pattern mining methods.read more
Citations
More filters
Journal ArticleDOI
Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases
TL;DR: This paper proposes three novel tree structures to efficiently perform incremental and interactive HUP mining that can capture the incremental data without any restructuring operation, and shows that these tree structures are very efficient and scalable.
Proceedings ArticleDOI
Mining high utility itemsets without candidate generation
Mengchi Liu,Junfeng Qu +1 more
TL;DR: This paper proposes an algorithm, called HUI-Miner (High Utility Itemset Miner), which can efficiently mine high utility itemsets from the utility-lists constructed from a mined database and compares it with the state-of-the-art algorithms on various databases.
Journal ArticleDOI
Fast algorithms for frequent itemset mining using FP-trees
Gösta Grahne,J. Zhu +1 more
TL;DR: A novel FP-array technique is presented that greatly reduces the need to traverse FP-trees, thus obtaining significantly improved performance for FP-tree-based algorithms and works especially well for sparse data sets.
Journal Article
SPMF: a Java open-source pattern mining library
Philippe Fournier-Viger,Antonio Gomariz,Ted Gueniche,Azadeh Soltani,Cheng-Wei Wu,Vincent S. Tseng +5 more
TL;DR: SPMF is an open-source data mining library offering implementations of more than 55 data mining algorithms, specialized for discovering patterns in transaction and sequence databases such as frequent itemsets, association rules and sequential patterns.
Proceedings ArticleDOI
Efficient set joins on similarity predicates
Sunita Sarawagi,Alok S. Kirpal +1 more
TL;DR: This paper presents an efficient, scalable and general algorithm for performing set joins on predicates involving various similarity measures like intersect size, Jaccard-coefficient, cosine similarity, and edit-distance that generalize to several weighted and unweighted measures of partial word overlap between sets.
References
More filters
Proceedings ArticleDOI
Mining association rules between sets of items in large databases
TL;DR: An efficient algorithm is presented that generates all significant association rules between items in the database of customer transactions and incorporates buffer management and novel estimation and pruning techniques.
Proceedings Article
Fast algorithms for mining association rules
TL;DR: Two new algorithms for solving thii problem that are fundamentally different from the known algorithms are presented and empirical evaluation shows that these algorithms outperform theknown algorithms by factors ranging from three for small problems to more than an order of magnitude for large problems.
Proceedings Article
Fast Algorithms for Mining Association Rules in Large Databases
Journal ArticleDOI
Mining frequent patterns without candidate generation
Jiawei Han,Jian Pei,Yiwen Yin +2 more
TL;DR: This study proposes a novel frequent pattern tree (FP-tree) structure, which is an extended prefix-tree structure for storing compressed, crucial information about frequent patterns, and develops an efficient FP-tree-based mining method, FP-growth, for mining the complete set of frequent patterns by pattern fragment growth.
Proceedings ArticleDOI
Mining sequential patterns
TL;DR: Three algorithms are presented to solve the problem of mining sequential patterns over databases of customer transactions, and empirically evaluating their performance using synthetic data shows that two of them have comparable performance.