scispace - formally typeset
Search or ask a question

Showing papers by "Ruiming Tang published in 2011"


Book ChapterDOI
14 Sep 2011
TL;DR: This paper proposes and evaluates entropy-based metrics for XML structured-ness that measure the structural uniformity of path and subtrees, respectively and empirically study the correlation of these metrics with real and synthetic data sets.
Abstract: XML is semi-structured. It can be used to annotate unstructured data, to represent structured data and almost anything in-between. Yet, it is unclear how to formally characterize, yet to quantify, structured-ness of XML. In this paper we propose and evaluate entropy-based metrics for XML structured-ness. The metrics measure the structural uniformity of path and subtrees, respectively. We empirically study the correlation of these metrics with real and synthetic data sets.

4 citations


Book ChapterDOI
29 Aug 2011
TL;DR: This work devise and discuss an algorithm for the efficient computation of the similarity between an XML document and a probabilistic XML document, and empirically and comparatively evaluate the performance of the algorithm and its variants.
Abstract: Probabilistic XML is a hierarchical data model capturing uncertainty of both value and structure. The ability to compute the similarity between an XML document and a probabilistic XML document is a building block of many applications involving querying, comparison, alignment and classification, for instance. The new challenge in efficiently computing such similarity is the multiplicity of the possible worlds represented by a probabilistic XML document. We devise and discuss an algorithm for the efficient computation of the similarity between an XML document and a probabilistic XML document. We empirically and comparatively evaluate the performance of the algorithm and its variants.

2 citations