scispace - formally typeset
Search or ask a question

Showing papers by "Gao Cong published in 2002"


Proceedings ArticleDOI
09 Dec 2002
TL;DR: This paper proposes an efficient technique to utilize previous mining results to improve the efficiency of current mining when constraints are changed and introduces the concept of tree boundary to summarize useful information available from previous mining.
Abstract: Mining of frequent itemsets is a fundamental data mining task. Past research has proposed many efficient algorithms for this purpose. Recent work also highlighted the importance of using constraints to focus the mining process to mine only those relevant itemsets. In practice, data mining is often an interactive and iterative process. The user typically changes constraints and runs the mining algorithm many times before being satisfied with the final results. This interactive process is very time consuming. Existing mining algorithms are unable to take advantage of this iterative process to use previous mining results to speed up the current mining process. This results in an enormous waste of time and computation. In this paper, we propose an efficient technique to utilize previous mining results to improve the efficiency of current mining when constraints are changed. We first introduce the concept of tree boundary to summarize useful information available from previous mining. We then show that the tree boundary provides an effective and efficient framework for the new mining. The proposed technique has been implemented in the context of two existing frequent itemset mining algorithms, FP-tree and tree projection. Experiment results on both synthetic and real-life datasets show that the proposed approach achieves a dramatic saving of computation.

36 citations


Proceedings Article
01 Jan 2002
TL;DR: This paper proposes a more general and powerful wildcard mechanism, which allows for more complex and interesting substructures than existing techniques, and adopts a vertical format for the storage of semi-structured objects, and adapt a frequent set mining algorithm for the purpose.
Abstract: Frequent substructure discovery from a collection of semi-structured objects can serve for storage, browsing, querying, indexing and classification of semi-structured documents. This paper examines the problem of discovering frequent substructures from a collection of hierarchical semi-structured objects of the same type. The use of wildcard is an important aspect of substructure discovery from semi-structured data due to the irregularity and lack of fixed structure of such data. This paper proposes a more general and powerful wildcard mechanism, which allows us to find more complex and interesting substructures than existing techniques. Furthermore, the complexity of structural information of semi-structured data and the usage of wildcard make the existing frequent set mining algorithms inapplicable for substructure discovery. In this work, we adopt a vertical format for the storage of semi-structured objects, and adapt a frequent set mining algorithm for our purpose. The application of our approach to real-life data shows that it is very effective.

35 citations