Tree model guided candidate generation for mining frequent subtrees from XML documents

doi:10.1145/1376815.1376818

Journal ArticleDOI

Tree model guided candidate generation for mining frequent subtrees from XML documents

Henry Tan, +4 more

- 24 Jul 2008 -

ACM Transactions on Knowledge Discovery ...

- Vol. 2, Iss: 2, pp 9

Chats0

TLDR

A unique embedding list representation of the tree structure, which enables efficient implementation of the Tree Model Guided (TMG) candidate generation, and shows through a mathematical model and experiments that TMG has better complexity compared to the commonly used join approach.

Abstract:

Due to the inherent flexibilities in both structure and semantics, XML association rules mining faces few challenges, such as: a more complicated hierarchical data structure and ordered data context. Mining frequent patterns from XML documents can be recast as mining frequent tree structures from a database of XML documents. In this study, we model a database of XML documents as a database of rooted labeled ordered subtrees. In particular, we are mainly concerned with mining frequent induced and embedded ordered subtrees. Our main contributions are as follows. We describe our unique embedding list representation of the tree structure, which enables efficient implementation of our Tree Model Guided (TMG) candidate generation. TMG is an optimal, nonredundant enumeration strategy that enumerates all the valid candidates that conform to the structural aspects of the data. We show through a mathematical model and experiments that TMG has better complexity compared to the commonly used join approach. In this article, we propose two algorithms, MB3-Miner and iMB3-Miner. MB3-Miner mines embedded subtrees. iMB3-Miner mines induced and/or embedded subtrees by using the maximum level of embedding constraint. Our experiments with both synthetic and real datasets against two well-known algorithms for mining induced and embedded subtrees, demonstrate the effectiveness and the efficiency of the proposed techniques.

Tree model guided candidate generation for mining frequent subtrees from XML documents

Citations

UNI3 - efficient algorithm for mining unordered induced subtrees using TMG candidate generation

OInduced: An Efficient Algorithm for Mining Induced Patterns From Rooted Ordered Trees

Frequent subtree mining on the automata processor: challenges and opportunities

Mining Rooted Ordered Trees under Subtree Homeomorphism

Efficiently Mining Unordered Trees

References

Mining association rules between sets of items in large databases

Fast Algorithms for Mining Association Rules in Large Databases

Mining frequent patterns without candidate generation

Mining sequential patterns

Fast discovery of association rules

Related Papers (5)

Efficiently mining frequent trees in a forest: algorithms and applications

Efficiently Mining Frequent Embedded Unordered Trees

Discovering frequent substructures in large unordered trees

Frequent Subtree Mining - An Overview

TreeFinder: a first step towards XML data mining