scispace - formally typeset
Journal ArticleDOI

XML Document Clustering Based on Spectral Analysis Method

TLDR
This paper uses spectral method to cluster XML documents and proposes to use path feature to represent XML document; to avoild the selection of scale parameter σ, it also proposed to use Jaccard coefficient to compute the similarity between two XML documents.
Abstract
While K-Means algorithm usually gets local optimal solution, spectral clustering method can obtain satisfying clustering results through embedding the data points into a new space in which clusters are tighter. Since traditional spectral clustering method uses Gauss Kernel Function to compute the similarity between two points, the selection of scale parameter σ is related with domain knowledge usually. This paper uses spectral method to cluster XML documents. To consider both element and structure of XML documents, this paper proposes to use path feature to represent XML document; to avoild the selection of scale parameter σ, it also proposes to use Jaccard coefficient to compute the similarity between two XML documents. Experiment shows that using Jaccard coefficient to compute the similarity is effective, the clustering result is correct.

read more

Citations
More filters
Proceedings ArticleDOI

A graph digital signal processing method for semantic analysis

TL;DR: This paper approaches the problem of devising a computationally tractable procedure for representing the natural language understanding (NLU) by using distributional models of meaning through a method from graph-based digital signal processing (DSP), which only recently grabbed the attention of researchers from the field of natural language processing (NLP) related to big data analysis.
References
More filters
Proceedings Article

On Spectral Clustering: Analysis and an algorithm

TL;DR: A simple spectral clustering algorithm that can be implemented using a few lines of Matlab is presented, and tools from matrix perturbation theory are used to analyze the algorithm, and give conditions under which it can be expected to do well.
Proceedings ArticleDOI

TreeFinder: a first step towards XML data mining

TL;DR: This paper considers the problem of searching frequent trees from a collection of tree-structured data modeling XML data, and shows that TreeFinder reaches completeness or falls short for a range of experimental settings.
Proceedings ArticleDOI

Preparations for semantics-based XML mining

TL;DR: This paper proposes a new methodology for preparing XML documents for quantitative determination of similarity between XML documents by taking into account XML semantics, which provides an important basis for a variety of applications of XML document mining and processing.
Proceedings ArticleDOI

XML clustering by principal component analysis

TL;DR: This work proposes a new approach to clustering XML data, which works for documents defined by different DTDs, and enables one to identify vectors with co-occurrent features, thereby enhancing the accuracy of the clustering.
Proceedings ArticleDOI

XML Document Clustering Using Common XPath

TL;DR: A novel XML structural representation called common XPath is introduced, which encodes the frequently occurring elements with the hierarchical information, and a path-based XML document clustering algorithm called PBClustering is devised which groups the documents according to their CXPs, i.e. their frequent structures.
Related Papers (5)