Book ChapterDOI
XBeGene: Scalable XML Documents Generator by Example Based on Real Data
Manami Harazaki,Joe Tekli,Shohei Yokoyama,Naoki Fukuta,Richard Chbeir,Hiroshi Ishikawa +5 more
- pp 449-460
TLDR
A novel XML By example Generator (XBeGene) for producing synthetic XML data which closely reflect the user’s requirements and high correlation levels between the specified user requirements and the characteristics of the generated XML data are demonstrated.Abstract:
XML datasets of various sizes and properties are needed to evaluate the correctness and efficiency of XML-based algorithms and applications. While several downloadable datasets can be found online, these are predefined by system experts and might not be suitable to evaluate every algorithm. Tools for generating synthetic XML documents underline an alternative solution, promoting flexibility and adaptability in generating synthetic document collections. Nonetheless, the usefulness of existing XML generators remains rather limited due to the restricted levels of expressiveness allowed to users. In this paper, we develop a novel XML By example Generator (XBeGene) for producing synthetic XML data which closely reflect the user’s requirements. Inspired by the query-by-example paradigm in information retrieval, Our generator system i)allows the user to provide her own sample XML documents as input, ii) analyzes the structure, occurrence frequencies, and content distributions for each XML element in the user input documents, and iii) produces synthetic XML documents which closely concur, in both structural and content features, to the user’s input data. The size of each synthetic document as well as that of the entire document collection are also specified by the user. Clustering experiments demonstrate high correlation levels between the specified user requirements and the characteristics of the generated XML data, while timing results confirm our approach’s scalability to large scale document collections.read more
Citations
More filters
Book ChapterDOI
XQuery Testing from XML Schema Based Random Test Cases
TL;DR: The elements of an XQuery testing tool which makes possible to automatically test XQuery programs, implemented as an oracle able to report whether the XQuery program passes the test, that is, all the test cases satisfy the property, as well as the number of test cases used for testing.
Journal ArticleDOI
Automatic property‐based testing and path validation of XQuery programs
TL;DR: An XQuery property‐based testing tool is presented, which enables to automatically test XQuery programs and a web tool has been developed enabling to test and validate X query programs.
References
More filters
Proceedings Article
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases
Roy Goldman,Jennifer Widom +1 more
TL;DR: The theoretical foundations of DataGuides are presented along with an algorithm for their creation and an overview of incremental maintenance, and performance results based on the implementation of dataGuides in the Lore DBMS for semistructured data are provided.
Book ChapterDOI
XMark: a benchmark for XML data management
Albrecht Schmidt,Florian Waas,Martin L. Kersten,Michael J. Carey,Ioana Manolescu,Ralph Busse +5 more
TL;DR: This work provides a framework to assess the abilities of an XML database to cope with a broad range of different query types typically encountered in real-world scenarios and offers a set of queries where each query is intended to challenge a particular aspect of the query processor.
Book ChapterDOI
Efficient computation of frequent and top-k elements in data streams
TL;DR: In this paper, the authors propose an integrated approach for finding the most popular k elements and finding frequent elements in a data stream, which is efficient and exact if the alphabet under consideration is small.
Proceedings Article
Evaluating Structural Similarity in XML Documents
Andrew Nierman,H. V. Jagadish +1 more
TL;DR: A dynamic programming algorithm is developed that can compute pair-wise distances between documents in the collection, and then use these distances to cluster the documents, and finds that the resulting clusters match the original DTDs almost perfectly.
Proceedings ArticleDOI
Detecting changes in XML documents
TL;DR: This work is motivated by the support for change control in the context of the Xyleme project that is investigating dynamic warehouses capable of storing massive volumes of XML data, and offers a diff algorithm for XML data that runs in average in linear time vs. quadratic time.