Journal ArticleDOI
Tree model guided candidate generation for mining frequent subtrees from XML documents
Reads0
Chats0
TLDR
A unique embedding list representation of the tree structure, which enables efficient implementation of the Tree Model Guided (TMG) candidate generation, and shows through a mathematical model and experiments that TMG has better complexity compared to the commonly used join approach.Abstract:
Due to the inherent flexibilities in both structure and semantics, XML association rules mining faces few challenges, such as: a more complicated hierarchical data structure and ordered data context. Mining frequent patterns from XML documents can be recast as mining frequent tree structures from a database of XML documents. In this study, we model a database of XML documents as a database of rooted labeled ordered subtrees. In particular, we are mainly concerned with mining frequent induced and embedded ordered subtrees. Our main contributions are as follows. We describe our unique embedding list representation of the tree structure, which enables efficient implementation of our Tree Model Guided (TMG) candidate generation. TMG is an optimal, nonredundant enumeration strategy that enumerates all the valid candidates that conform to the structural aspects of the data. We show through a mathematical model and experiments that TMG has better complexity compared to the commonly used join approach. In this article, we propose two algorithms, MB3-Miner and iMB3-Miner. MB3-Miner mines embedded subtrees. iMB3-Miner mines induced and/or embedded subtrees by using the maximum level of embedding constraint. Our experiments with both synthetic and real datasets against two well-known algorithms for mining induced and embedded subtrees, demonstrate the effectiveness and the efficiency of the proposed techniques.read more
Citations
More filters
Proceedings ArticleDOI
UNI3 - efficient algorithm for mining unordered induced subtrees using TMG candidate generation
TL;DR: The proposed UNI3 algorithm considers both transaction based and occurrence match support, and is motivated by the fact that in many applications of frequent subtree mining the order among siblings is not considered important.
Journal ArticleDOI
OInduced: An Efficient Algorithm for Mining Induced Patterns From Rooted Ordered Trees
TL;DR: This paper presents OInduced, which is a novel and efficient algorithm for finding frequent ordered induced tree patterns that uses a breadth-first candidate generation method and improves it by means of an indexing scheme.
Proceedings ArticleDOI
Frequent subtree mining on the automata processor: challenges and opportunities
TL;DR: A hybrid method to combine AP-FTM with a CPU exact-matching approach, and achieve up to 262X speedup over PatternMatcher on a challenging database, and the results on a synthetic database show the AP advantage grows further with larger datasets.
Posted Content
Mining Rooted Ordered Trees under Subtree Homeomorphism
TL;DR: This paper proposes a compact data-structure, called occ, which stores only information about the rightmost paths of occurrences and hence can encode and represent several occurrences of a tree pattern, and develops an effective pattern mining algorithm, called TPMiner.
Proceedings ArticleDOI
Efficiently Mining Unordered Trees
TL;DR: This paper introduces three new tree encodings and presents an efficient algorithm for finding frequent patterns from rooted unordered trees with the assumption that children of every node in database trees are identically labeled and proposes the UITree algorithm, which significantly outperforms the most efficient existing works on mining un ordered trees.
References
More filters
Proceedings ArticleDOI
Mining association rules between sets of items in large databases
TL;DR: An efficient algorithm is presented that generates all significant association rules between items in the database of customer transactions and incorporates buffer management and novel estimation and pruning techniques.
Proceedings Article
Fast Algorithms for Mining Association Rules in Large Databases
Journal ArticleDOI
Mining frequent patterns without candidate generation
Jiawei Han,Jian Pei,Yiwen Yin +2 more
TL;DR: This study proposes a novel frequent pattern tree (FP-tree) structure, which is an extended prefix-tree structure for storing compressed, crucial information about frequent patterns, and develops an efficient FP-tree-based mining method, FP-growth, for mining the complete set of frequent patterns by pattern fragment growth.
Proceedings ArticleDOI
Mining sequential patterns
TL;DR: Three algorithms are presented to solve the problem of mining sequential patterns over databases of customer transactions, and empirically evaluating their performance using synthetic data shows that two of them have comparable performance.