XML Document Clustering Based on Spectral Analysis Method

doi:10.4028/WWW.SCIENTIFIC.NET/AMR.219-220.304

Journal ArticleDOI

XML Document Clustering Based on Spectral Analysis Method

Xin Ye Li

- 01 May 2011 -

Advanced Materials Research

- pp 304-307

TLDR

This paper uses spectral method to cluster XML documents and proposes to use path feature to represent XML document; to avoild the selection of scale parameter σ, it also proposed to use Jaccard coefficient to compute the similarity between two XML documents.

Abstract:

While K-Means algorithm usually gets local optimal solution, spectral clustering method can obtain satisfying clustering results through embedding the data points into a new space in which clusters are tighter. Since traditional spectral clustering method uses Gauss Kernel Function to compute the similarity between two points, the selection of scale parameter σ is related with domain knowledge usually. This paper uses spectral method to cluster XML documents. To consider both element and structure of XML documents, this paper proposes to use path feature to represent XML document; to avoild the selection of scale parameter σ, it also proposes to use Jaccard coefficient to compute the similarity between two XML documents. Experiment shows that using Jaccard coefficient to compute the similarity is effective, the clustering result is correct.

XML Document Clustering Based on Spectral Analysis Method

Citations

A graph digital signal processing method for semantic analysis

References

On Spectral Clustering: Analysis and an algorithm

TreeFinder: a first step towards XML data mining

Preparations for semantics-based XML mining

XML clustering by principal component analysis

XML Document Clustering Using Common XPath

Related Papers (5)

Element matching across data-oriented XML sources using a multi-strategy clustering model

Novel Method for Measuring Structure and Semantic Similarity of XML Documents Based on Extended Adjacency Matrix

A Similarity Measure for Text Classification and Clustering

Similarity Measures for XML Documents Based on Kernel Matrix Learning

Document Clustering Using Concept Space and Cosine Similarity Measurement