XBeGene: Scalable XML Documents Generator by Example Based on Real Data

doi:10.1007/978-3-642-28807-4_63

Book ChapterDOI

XBeGene: Scalable XML Documents Generator by Example Based on Real Data

- pp 449-460

TLDR

A novel XML By example Generator (XBeGene) for producing synthetic XML data which closely reflect the user’s requirements and high correlation levels between the specified user requirements and the characteristics of the generated XML data are demonstrated.

Abstract:

XML datasets of various sizes and properties are needed to evaluate the correctness and efficiency of XML-based algorithms and applications. While several downloadable datasets can be found online, these are predefined by system experts and might not be suitable to evaluate every algorithm. Tools for generating synthetic XML documents underline an alternative solution, promoting flexibility and adaptability in generating synthetic document collections. Nonetheless, the usefulness of existing XML generators remains rather limited due to the restricted levels of expressiveness allowed to users. In this paper, we develop a novel XML By example Generator (XBeGene) for producing synthetic XML data which closely reflect the user’s requirements. Inspired by the query-by-example paradigm in information retrieval, Our generator system i)allows the user to provide her own sample XML documents as input, ii) analyzes the structure, occurrence frequencies, and content distributions for each XML element in the user input documents, and iii) produces synthetic XML documents which closely concur, in both structural and content features, to the user’s input data. The size of each synthetic document as well as that of the entire document collection are also specified by the user. Clustering experiments demonstrate high correlation levels between the specified user requirements and the characteristics of the generated XML data, while timing results confirm our approach’s scalability to large scale document collections.

XBeGene: Scalable XML Documents Generator by Example Based on Real Data

Citations

XQuery Testing from XML Schema Based Random Test Cases

Automatic property‐based testing and path validation of XQuery programs

References

DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases

XMark: a benchmark for XML data management

Efficient computation of frequent and top-k elements in data streams

Evaluating Structural Similarity in XML Documents

Detecting changes in XML documents

Related Papers (5)

A cluster-based approach to XML similarity joins

XML-based retrieval method oriented to constraint on integrated paths of large amount of small-size XML documents

An Optimistic Approach for Clustering Multi-version XML Documents Using Compressed Delta

StatiX: making XML count

XCluster Synopses for Structured XML Content