Journal ArticleDOI
XML Document Clustering Based on Spectral Analysis Method
TLDR
This paper uses spectral method to cluster XML documents and proposes to use path feature to represent XML document; to avoild the selection of scale parameter σ, it also proposed to use Jaccard coefficient to compute the similarity between two XML documents.Abstract:
While K-Means algorithm usually gets local optimal solution, spectral clustering method can obtain satisfying clustering results through embedding the data points into a new space in which clusters are tighter. Since traditional spectral clustering method uses Gauss Kernel Function to compute the similarity between two points, the selection of scale parameter σ is related with domain knowledge usually. This paper uses spectral method to cluster XML documents. To consider both element and structure of XML documents, this paper proposes to use path feature to represent XML document; to avoild the selection of scale parameter σ, it also proposes to use Jaccard coefficient to compute the similarity between two XML documents. Experiment shows that using Jaccard coefficient to compute the similarity is effective, the clustering result is correct.read more
Citations
More filters
Proceedings ArticleDOI
A graph digital signal processing method for semantic analysis
TL;DR: This paper approaches the problem of devising a computationally tractable procedure for representing the natural language understanding (NLU) by using distributional models of meaning through a method from graph-based digital signal processing (DSP), which only recently grabbed the attention of researchers from the field of natural language processing (NLP) related to big data analysis.
References
More filters
Proceedings Article
On Spectral Clustering: Analysis and an algorithm
TL;DR: A simple spectral clustering algorithm that can be implemented using a few lines of Matlab is presented, and tools from matrix perturbation theory are used to analyze the algorithm, and give conditions under which it can be expected to do well.
Proceedings ArticleDOI
TreeFinder: a first step towards XML data mining
TL;DR: This paper considers the problem of searching frequent trees from a collection of tree-structured data modeling XML data, and shows that TreeFinder reaches completeness or falls short for a range of experimental settings.
Proceedings ArticleDOI
Preparations for semantics-based XML mining
TL;DR: This paper proposes a new methodology for preparing XML documents for quantitative determination of similarity between XML documents by taking into account XML semantics, which provides an important basis for a variety of applications of XML document mining and processing.
Proceedings ArticleDOI
XML clustering by principal component analysis
TL;DR: This work proposes a new approach to clustering XML data, which works for documents defined by different DTDs, and enables one to identify vectors with co-occurrent features, thereby enhancing the accuracy of the clustering.
Proceedings ArticleDOI
XML Document Clustering Using Common XPath
TL;DR: A novel XML structural representation called common XPath is introduced, which encodes the frequently occurring elements with the hierarchical information, and a path-based XML document clustering algorithm called PBClustering is devised which groups the documents according to their CXPs, i.e. their frequent structures.