Showing papers on "Simple API for XML published in 2007"

PDF

Open Access

Journal Article•DOI•

Efficiently Querying Large XML Data Repositories: A Survey

[...]

Gang Gou¹, Rada Chirkova¹•Institutions (1)

01 Oct 2007-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This survey considers two classes of major XML query processing techniques: the relational approach and the native approach, which could result in higher query processing performance and also significantly reduce system reengineering costs.

...read moreread less

Abstract: Extensible markup language (XML) is emerging as a de facto standard for information exchange among various applications on the World Wide Web. There has been a growing need for developing high-performance techniques to query large XML data repositories efficiently. One important problem in XML query processing is twig pattern matching, that is, finding in an XML data tree D all matches that satisfy a specified twig (or path) query pattern Q. In this survey, we review, classify, and compare major techniques for twig pattern matching. Specifically, we consider two classes of major XML query processing techniques: the relational approach and the native approach. The relational approach directly utilizes existing relational database systems to store and query XML data, which enables the use of all important techniques that have been developed for relational databases, whereas in the native approach, specialized storage and query processing systems tailored for XML data are developed from scratch to further improve XML query performance. As implied by existing work, XML data querying and management are developing in the direction of integrating the relational approach with the native approach, which could result in higher query processing performance and also significantly reduce system reengineering costs.

...read moreread less

191 citations

Proceedings Article•DOI•

Xproj: a framework for projected structural clustering of xml documents

[...]

Charu C. Aggarwal¹, Na Ta², Jianyong Wang², Jianhua Feng², Mohammed J. Zaki - Show less +1 more•Institutions (2)

IBM¹, Tsinghua University²

12 Aug 2007

TL;DR: This paper proposes an effective clustering algorithm for XML data which uses substructures of the documents in order to gain insights about the important underlying structures and proposes new ways of using multiple sub-structuralinformation in XML documents to evaluate the quality of intermediate cluster solutions.

...read moreread less

Abstract: XML has become a popular method of data representation both on the web and in databases in recent years. One of the reasons for the popularity of XML has been its ability to encode structural information about data records. However, this structural characteristic of data sets also makes it a challenging problem for a variety of data mining problems. One such problem is that of clustering, in which the structural aspects of the data result in a high implicit dimensionality of the data representation. As a result, it becomes more difficult to cluster the data in a meaningful way. In this paper, we propose an effective clustering algorithm for XML data which uses substructures of the documents in order to gain insights about the important underlying structures. We propose new ways of using multiple sub-structuralinformation in XML documents to evaluate the quality of intermediate cluster solutions, and guide the algorithms to a final solution which reflects the true structural behavior in individual partitions. We test the algorithm on a variety of real and synthetic data sets.

...read moreread less

127 citations

Patent•

Device control system employing extensible markup language for defining information resources

[...]

Gang Wang, Matteo Contolini, Chengyi Zheng, Heinz-Werner Stiller

17 Oct 2007

TL;DR: In this article, a device control system including at least one device operable by the system, a processor, and two or more communication components, each communication component including an XML parser for parsing the XML document and extracting the message data.

...read moreread less

Abstract: A device control system including at least one device operable by the system, at least one processor, software executing on the at least one processor for receiving message data and determining a corresponding XML document type, software executing on the at least one processor for generating a XML document based on the XML document type, the XML document including the message data, software executing on the processor for packetizing the XML document, and two or more communication components, each communication component including an XML parser for parsing the XML document and extracting the message data.

...read moreread less

113 citations

Patent•

Health integration platform API

[...]

Sean Nolan¹, Jeffrey Dick Jones¹, Johnson T. Apacible¹, Vijay Varadan¹•Institutions (1)

Microsoft¹

01 Nov 2007

TL;DR: In this article, an application program interface (API) is provided for requesting, storing, and accessing data within a health integration network, which facilitates secure and seamless access to the centrally-stored data by offering authentication/authorization, as well as the ability to receive requests in an extensible language format, such as XML, and returns resulting data in XML format.

...read moreread less

Abstract: An application program interface (API) is provided for requesting, storing, and otherwise accessing data within a health integration network. The API facilitates secure and seamless access to the centrally-stored data by offering authentication/authorization, as well as the ability to receive requests in an extensible language format, such as XML, and returns resulting data in XML format. The data can also have transformation, style and/or schema information associated with it which can be returned in the resulting XML and/or applied to the data beforehand by the API. The API can be utilized in many environment architectures including XML over HTTP and a software development kit (SDK).

...read moreread less

93 citations

Proceedings Article•DOI•

An XML transaction processing benchmark

[...]

Matthias Nicola¹, Irina Kogan¹, Berni Schiefer¹•Institutions (1)

IBM¹

11 Jun 2007

TL;DR: This paper has developed an application-oriented and domain-specific benchmark called "Transaction Processing over XML" (TPoX), which exercises all aspects of XML databases, including storage, indexing, logging, transaction processing, and concurrency control.

...read moreread less

Abstract: XML database functionality has been emerging in "XML-only" databases as well as in the major relational database products. Yet, there is no industry standard XML database benchmark to evaluate alternative implementations. The research community has proposed several benchmarks which are all useful in their respective scope, such as evaluating XQuery processors. However, they do not aim to evaluate a database system in its entirety and do not represent all relevant characteristics of a real-world XML application. Often they only define read-only single-user tests on a single XML document. We have developed an application-oriented and domain-specific benchmark called "Transaction Processing over XML" (TPoX). It exercises all aspects of XML databases, including storage, indexing, logging, transaction processing, and concurrency control. Based on our analysis of real XML applications, TPoX simulates a financial multi-user workload with XML data conforming to the FIXML standard. In this paper we describe TPoX and present early performance results. We also make its implementation publicly available.

...read moreread less

83 citations

Journal Article•DOI•

XML schema clustering with semantic and hierarchical similarity measures

[...]

Richi Nayak¹, Wina Iryadi¹•Institutions (1)

Queensland University of Technology¹

01 May 2007-Knowledge Based Systems

TL;DR: This work presents a schema clustering process by organising the heterogeneous XML schemas into various groups by considering not only the linguistic and the context of the elements but also the hierarchical structural similarity.

...read moreread less

Abstract: With the growing popularity of XML as the data representation language, collections of the XML data are exploded in numbers. The methods are required to manage and discover the useful information from them for the improved document handling. We present a schema clustering process by organising the heterogeneous XML schemas into various groups. The methodology considers not only the linguistic and the context of the elements but also the hierarchical structural similarity. We support our findings with experiments and analysis.

...read moreread less

66 citations

Proceedings Article•DOI•

X^ 3: A Cube Operator for XML OLAP

[...]

Nuwee Wiwatwattana¹, H. V. Jagadish¹, Laks V. S. Lakshmanan², Divesh Srivastava³•Institutions (3)

University of Michigan¹, University of British Columbia², AT&T³

15 Apr 2007

TL;DR: A definition for cube adapted for XML data warehouse is proposed, including a suitably generalized specification mechanism, and the results of an extensive performance evaluation experiment gauging the behavior of alternative algorithms for cube computation are presented.

...read moreread less

Abstract: With increasing amounts of data being exchanged and even generated or stored in XML, a natural question is how to perform OLAP on XML data, which can be structurally heterogeneous (e.g., parse trees) and/or marked-up text documents. A core operator for OLAP is the data cube. While the relational cube can be extended in a straightforward way to XML, we argue such an extension would not address the specific issues posed by XML. While in a relational warehouse, facts are flat records and dimensions may have hierarchies, in an XML warehouse, both facts and dimensions may be hierarchical. Second, XML is flexible: (a) an element may have missing or repeated subelements; (b) different instances of the same element type may have different structure. We identify the challenges introduced by these features of XML for cube definition and computation. We propose a definition for cube adapted for XML data warehouse, including a suitably generalized specification mechanism. We define a cube lattice over the aggregates so defined. We then identify properties of this cube lattice that can be leveraged to allow optimized computation of the cube. Finally, we present the results of an extensive performance evaluation experiment gauging the behavior of alternative algorithms for cube computation.

...read moreread less

63 citations

Book Chapter•DOI•

Efficient fragmentation of large XML documents

[...]

Angela Bonifati¹, Alfredo Cuzzocrea²•Institutions (2)

Indian Council of Agricultural Research¹, University of Calabria²

03 Sep 2007

TL;DR: This paper proposes a novel fragmentation technique that founds on structural constraints of XML documents (size, tree-width, and tree-depth) and on special-purpose structure histograms able to meaningfully summarize XML documents that allows it to predict bounding intervals of structural properties of output (XML) fragments for efficient query processing of distributed XML data.

...read moreread less

Abstract: Fragmentation techniques for XML data are gaining momentum within both distributed and centralized XML query engines and pose novel and unrecognized challenges to the community. Albeit not novel, and clearly inspired by the classical divide et impera principle, fragmentation for XML trees has been proved successful in boosting the querying performance, and in cutting down the memory requirements. However, fragmentation considered so far has been driven by semantics, i.e. built around query predicates. In this paper, we propose a novel fragmentation technique that founds on structural constraints of XML documents (size, tree-width, and tree-depth) and on special-purpose structure histograms able to meaningfully summarize XML documents. This allows us to predict bounding intervals of structural properties of output (XML) fragments for efficient query processing of distributed XML data. An experimental evaluation of our study confirms the effectiveness of our fragmentation methodology on some representative XML data sets.

...read moreread less

54 citations

Patent•

Method for loading large XML documents on demand

[...]

William D. Clarke¹, Tao Zhan¹•Institutions (1)

Pitney Bowes¹

23 Apr 2007

TL;DR: In this paper, the system provides a Wrapper class for the XML Document class and the Element class, which can be used to access external components as required by a user application.

...read moreread less

Abstract: Systems and methods for loading XML documents on demand are described. The system provides a Wrapper class for the XML Document class and the Element class. A user application then utilizes the Wrapper class in the same way that the Element class and Document class would be used to access any element in the XML Document. The Wrapper class loads external components as required. The external component retrieval is completely transparent to the user application and the user application is able to access the entire XML document as if it were completely loaded into a DOM object in memory. Accordingly, each element is accessible in a random manner. In one configuration, the XML document components or external components are stored in a database in a BLOB field as a Digital Document. The system uses external components to efficiently use resources as compared to systems using Xlink and external entities.

...read moreread less

48 citations

Proceedings Article•DOI•

Parallel XML processing by work stealing

[...]

Wei Lu¹, Dennis Gannon¹•Institutions (1)

Indiana University¹

25 Jun 2007

TL;DR: A stealing-based dynamic load-balancing mechanism, called ThreadCrew, by which multiple threads are able to process the disjointed parts of the XML document in parallel with balanced load distribution, and a novel mechanism to trace the stealing actions is provided.

...read moreread less

Abstract: A language for semi-structured documents, XML has emerged as the core of the web services architecture, and is playing crucial roles in messaging systems, databases, and document processing. However, the processing of XML documents has been regarded as the performance bottleneck in most systems and applications. On the other side, the multicore processor, emerged as a solution for the clock-speed limitation of the modern CPUs, has been growingly prevalent. Leveraging the parallelism provided by the multicorere source to speedup the software execution is becoming the trend of the software development. In this paper, we present a parallel processing model for the XML document. The model is not designed just for a specific XML processing task, instead, it is a general model, by which we are able to explore various parallel XML document processing. The kernel of the model is a stealing-based dynamic load-balancing mechanism, called ThreadCrew, by which multiple threads are able to process the disjointed parts of the XML document in parallel with balanced load distribution. The model also provides a novel mechanism to trace the stealing actions, thus the equivalent sequential result can be gotten by gluing the multiple parallel-running results together. To show the feasibility and effectiveness of our approaches, we present our C# implementation of parallel XML serialization in this paper. Our empirical study shows our parallel XML serialization algorithm can improved the XML serializing performance significantly on a multicore machine.

...read moreread less

48 citations

Proceedings Article•DOI•

Structural Selectivity Estimation for XML Documents

[...]

D. K. Fisher, Sebastian Maneth

15 Apr 2007

TL;DR: A new synopsis for XML documents is introduced which can be effectively used to estimate the selectivity of complex path queries, based on a lossy compression of the document tree that underlies the XML document, and can be computed in one pass from the document.

...read moreread less

Abstract: Estimating the selectivity of queries is a crucial problem in database systems. Virtually all database systems rely on the use of selectivity estimates to choose amongst the many possible execution plans for a particular query. In terms of XML databases, the problem of selectivity estimation of queries presents new challenges: many evaluation operators are possible, such as simple navigation, structural joins, or twig joins, and many different indexes are possible. A new synopsis for XML documents is introduced which can be effectively used to estimate the selectivity of complex path queries. The synopsis is based on a lossy compression of the document tree that underlies the XML document, and can be computed in one pass from the document. It has several advantages over existing approaches: (1) it allows one to estimate the selectivity of queries containing all XPath axes, including the order-sensitive ones, (2) the estimator returns a range within which the actual selectivity is guaranteed to lie, with the size of this range implicitly providing a confidence measure of the estimate, and (3) the synopsis can be incrementally updated to reflect changes in the XML database.

...read moreread less

Proceedings Article•DOI•

Parallel XML Parsing Using Meta-DFAs

[...]

Yinfei Pan¹, Ying Zhang¹, Kenneth Chiu¹, Wei Lu•Institutions (1)

Binghamton University¹

10 Dec 2007

TL;DR: In this paper, a parallel preparsing scan is used to build an outline of the XML document, which is then used to guide the parallel full parse, and a meta-DFA mechanism is proposed to parallelize the preparser itself.

...read moreread less

Abstract: By leveraging the growing prevalence of multicore CPUs, parallel XML parsing(PXP) can significantly improve the performance of XML, enhancing its suitability for scientific data which is often dominated by floating-point numbers. One approach is to divide the XML document into equal-sized chunks, and parse each chunk in parallel. XML parsing is inherently sequential, however, because the state of an XML parser when reading a given character depends potentially on all preceding characters. In previous work, we addressed this by using a fast preparsing scan to build an outline of the document which we called the skeleton. The skeleton is then used to guide the parallel full parse. The preparse is a sequential phase that limits scalability, however, and so in this paper, we show how the preparse itself can be parallelized using a mechanism we call a meta-DFA. For each state q of the original preparser the meta-DFA incorporates a complete copy of the preparser state machine as a sub-DFA which starts in state q. The meta-DFA thus runs multiple instances of the preparser simultaneously when parsing a chunk, with each possible preparser state at the beginning of a chunk represented by an instance. By pursuing all possibilities simultaneously, the meta-DFA allows each chunk to be preparsed independently in parallel. The parallel full parse following the preparse is performed using libxml2, and outputs DOM trees that are fully compatible with existing applications that use libxml2. Our implementation scales well on a 30 CPU Sun E6500 machine.

...read moreread less

Proceedings Article•DOI•

Querying and maintaining a compact XML storage

[...]

Raymond K. Wong, Franky Lam, William M. Shui

08 May 2007

TL;DR: This paper presents a new storage scheme for XML data that supports all navigational operations in near constant time, and features a small memory footprint that increases cache locality, whilst still supporting standard APIs and necessary database operations, such as queries and updates, efficiently.

...read moreread less

Abstract: As XML database sizes grow, the amount of space used for storing the data and auxiliary data structures becomes a major factor in query and update performance. This paper presents a new storage scheme for XML data that supports all navigational operations in near constant time. In addition to supporting efficient queries, the space requirement of the proposed scheme is within a constant factor of the information theoretic minimum, while insertions and deletions can be performed in near constant time as well. As a result, the proposed structure features a small memory footprint that increases cache locality, whilst still supporting standard APIs, such as DOM, and necessary database operations, such as queries and updates, efficiently. Analysis and experiments show that the proposed structure is space and time efficient.

...read moreread less

Journal Article•DOI•

A space efficient XML DOM parser

[...]

Fangju Wang¹, Jing Li¹, Hooman Homayounfar¹•Institutions (1)

University of Guelph¹

01 Jan 2007

TL;DR: This research develops a space efficient DOM parser, called SEDOM, based on a new compression approach and a set of manipulation algorithms, which enable many DOM operations to be performed when the data are in the compressed format, and allow individual parts of a document to be compressed, decompressed and manipulated.

...read moreread less

Abstract: In many XML applications, parsing is a key operation. When the processing involves modifying data, random access, and/or in an order different from the one in which elements are stored, a DOM parser has to be used. A major problem with using a DOM parser is memory consumption. The size of a DOM tree created from an XML document may be as large as 10 times of the size of the original document. Maintaining the tree of a big document requires a large amount of memory. It may cause costly swapping. In the worst cases, a DOM parser cannot handle a document at all because of its size. In this research, we develop a space efficient DOM parser, called SEDOM. It is based on a new compression approach and a set of manipulation algorithms, which enable many DOM operations to be performed when the data are in the compressed format, and allow individual parts of a document to be compressed, decompressed and manipulated. It can be used to efficiently manipulate very large XML documents. In this paper, we describe SEDOM, and compare its performance with three existing DOM parsers and an XML compressor.

...read moreread less

Patent•

Method and system for performing operations on data using XML streams

[...]

Arun T. Jacob

05 Oct 2007

TL;DR: In this paper, the authors present a method and system for performing operations on data using XML streams, such as addition, subtraction, multiplication, and division, in XML data.

...read moreread less

Abstract: The present invention provides a method and system for performing operations on data using XML streams. An XML schema defines a limited set of operations that may be performed on data. These operations include addition, subtraction, multiplication and division. The operations are placed in an XML stream that conforms to the XML schema. The XML stream may perform one or more of the defined operations on the data. The limited set of operations allows data to be validated and processed without excessive overhead.

...read moreread less

Proceedings Article•

Early profile pruning on XML-aware publish-subscribe systems

[...]

Mirella M. Moro¹, Petko Bakalov¹, Vassilis J. Tsotras¹•Institutions (1)

University of California, Riverside¹

23 Sep 2007

TL;DR: A novel early pruning approach called Bounding-based XML Filtering or BoXFilter is proposed, based on a new tree-like indexing structure that organizes the queries based on their similarity and provides lower and upper bound estimations needed to prune queries not related to the incoming documents.

...read moreread less

Abstract: Publish-subscribe applications are an important class of content-based dissemination systems where the message transmission is defined by the message content, rather than its destination IP address. With the increasing use of XML as the standard format on many Internet-based applications, XML aware pub-sub applications become necessary. In such systems, the messages (generated by publishers) are encoded as XML documents, and the profiles (defined by subscribers) as XML query statements. As the number of documents and query requests grow, the performance and scalability of the matching phase (i.e. matching of queries to incoming documents) become vital. Current solutions have limited or no flexibility to prune out queries in advance. In this paper, we overcome such limitation by proposing a novel early pruning approach called Bounding-based XML Filtering or BoXFilter. The BoXFilter is based on a new tree-like indexing structure that organizes the queries based on their similarity and provides lower and upper bound estimations needed to prune queries not related to the incoming documents. Our experimental evaluation shows that the early profile pruning approach offers drastic performance improvements over the current state-of-the-art in XML filtering.

...read moreread less

Journal Article•DOI•

Answering XML queries by means of data summaries

[...]

Elena Baralis¹, Paolo Garza¹, Elisa Quintarelli², Letizia Tanca²•Institutions (2)

Polytechnic University of Turin¹, Polytechnic University of Milan²

01 Jul 2007-ACM Transactions on Information Systems

TL;DR: Experiments on large XML documents show that instance patterns allow a significant reduction in storage space, while preserving almost entirely the completeness of the query result, thus overcoming the document size limitation of most current XQuery engines.

...read moreread less

Abstract: XML is a rather verbose representation of semistructured data, which may require huge amounts of storage space. We propose a summarized representation of XML data, based on the concept of instance pattern, which can both provide succinct information and be directly queried. The physical representation of instance patterns exploits itemsets or association rules to summarize the content of XML datasets. Instance patterns may be used for (possibly partially) answering queries, either when fast and approximate answers are required, or when the actual dataset is not available, for example, it is currently unreachable. Experiments on large XML documents show that instance patterns allow a significant reduction in storage space, while preserving almost entirely the completeness of the query result. Furthermore, they provide fast query answers and show good scalability on the size of the dataset, thus overcoming the document size limitation of most current XQuery engines.

...read moreread less

Patent•

Directed SAX parser for XML documents

[...]

Darrell Vaughn Hopp

27 Mar 2007

TL;DR: In this article, a two-thread SAX parser with a main thread and a parsing thread is implemented, where the main thread controls the parsing thread by sending target content to be searched for and wake-up signals to the parser, and receives the content found by the parser for further processing.

...read moreread less

Abstract: A method for processing XML documents using a SAX parser, implemented in a two-thread architecture having a main thread and a parsing thread. The parsing procedure is located in a parsing thread, which implements callback functions of a SAX parser and creates and executes the SAX parser. The main thread controls the parsing thread by sending target content to be searched for and wakeup signals to the parsing thread, and receives the content found by the parsing thread for further processing. In the parsing thread, each time a callback function is invoked by the SAX parser, it is determined whether the target content has been found. If it has, the parsing thread sends the found content to the main thread with a wakeup signal, and enters a sleep mode, whereby further parsing is halted until a wakeup signal with additional target content is received from the main thread.

...read moreread less

Journal Article•

Coupled schema transformation and data conversion for XML and SQL

[...]

Pablo Berdaguer, Alcino Cunha, Hugo Pacheco, Joost Visser

01 Jan 2007-Lecture Notes in Computer Science

TL;DR: It is shown how the system can be used to tackle various two-level transformation scenarios, such as XML schema evolution coupled with document migration, and hierarchical-relational data mappings that convert between XML documents and SQL databases.

...read moreread less

Abstract: A two-level data transformation consists of a type-level transformation of a data format coupled with value-level transformations of data instances corresponding to that format. We have implemented a system for performing two-level transformations on XML schemas and their corresponding documents, and on SQL schemas and the databases that they describe. The core of the system consists of a combinator library for composing type-changing rewrite rules that preserve structural information and referential constraints. We discuss the implementation of the system's core library, and of its SQL and XML front-ends in the functional language Haskell. We show how the system can be used to tackle various two-level transformation scenarios, such as XML schema evolution coupled with document migration, and hierarchical-relational data mappings that convert between XML documents and SQL databases.

...read moreread less

Patent•

Efficient piece-wise updates of binary encoded xml data

[...]

Sivansankaran Chandrasekar¹, Nitin Gupta¹, Ravi Murthy¹, Nipun Agarwal¹, Eric Sedlar¹ - Show less +1 more•Institutions (1)

Business International Corporation¹

24 Apr 2007

TL;DR: In this paper, a modification is applied directly to the persistently stored binary form without constructing an object tree or materializing the XML document into a corresponding textual form, taking into account the nature of the binary form in which the document is encoded, including identifying where in the binary representation the corresponding actual changes need to be made.

...read moreread less

Abstract: An XML document can be represented in a compact binary form that maintains all of the features of XML data in a useable form. In response to a request for a modification (e.g., insert, delete or update a node) to an XML document that is stored in the compact binary form, a certain representation of the requested modification is computed for application directly to the binary form of the document. Thus, the requested modification is applied directly to the persistently stored binary form without constructing an object tree or materializing the XML document into a corresponding textual form. Taking into account the nature of the binary form in which the document is encoded, the bytes that actually require change are identified, including identifying where in the binary representation the corresponding actual changes need to be made.

...read moreread less

Proceedings Article•DOI•

Efficient Query Processing for Large XML Data in Distributed Environments

[...]

H. Kurita¹, Kenji Hatano², Jun Miyazaki¹, Shunsuke Uemura¹•Institutions (2)

Nara Institute of Science and Technology¹, Doshisha University²

21 May 2007

TL;DR: An algorithm for relocating partitioned XML data based on the CPU load of query processing and it is found that there is a performance advantage in the approach for executing distributed query processing of large XML data.

...read moreread less

Abstract: We propose an efficient distributed query processing method for large XML data by partitioning and distributing XML data to multiple computation nodes. There are several steps involved in this method; however, we focused particularly on XML data partitioning and dynamic relocation of partitioned XML data in our research. Since the efficiency of query processing depends on both XML data size and its structure, these factors should be considered when XML data is partitioned. Each partitioned XML data is distributed to computation nodes so that the CPU load can be balanced. In addition, it is important to take account of the query workload among each of the computation nodes because it is closely related to the query processing cost in distributed environments. In case of load skew among computation nodes, partitioned XML data should be relocated to balance the CPU load. Thus, we implemented an algorithm for relocating partitioned XML data based on the CPU load of query processing. From our experiments, we found that there is a performance advantage in our approach for executing distributed query processing of large XML data.

...read moreread less

Book Chapter•DOI•

A methodology for coupling fragments of XPath with structural indexes for XML documents

[...]

George H. L. Fletcher¹, Dirk Van Gucht², Yuqing Wu², Marc Gyssens³, Sofia Brenes², Jan Paredaens⁴ - Show less +2 more•Institutions (4)

Washington State University Vancouver¹, Indiana University², University of Hasselt³, University of Antwerp⁴

23 Sep 2007

TL;DR: In XPath query evaluation, indices similar to those used in relational database systems - namely, value indices on tags and text values - are first used, together with structural join algorithms, which turn out to be simple and efficient.

...read moreread less

Abstract: Supporting efficient access to XML data using XPath [3] continues to be an important research problem [6, 12]. XPath queries are used to specify nodelabeled trees which match portions of the hierarchical XML data. In XPath query evaluation, indices similar to those used in relational database systems - namely, value indices on tags and text values - are first used, together with structural join algorithms [1, 2, 19]. This approach turns out to be simple and efficient. However, the structural containment relationships native to XML data are not directly captured by value indices.

...read moreread less

Proceedings Article•DOI•

A Comparative Study and Benchmarking on XML Parsers

[...]

Su-Cheng Haw¹, G.S.V.R. Krishna Rao¹•Institutions (1)

Multimedia University¹

07 May 2007

TL;DR: An extensive comparative study and benchmarking on the popular XML parsers found in the market today is done and a non-validating SAX based XML parser, xParse, is proposed, which proves the viability of the approach.

...read moreread less

Abstract: Due to its flexibility and efficiency in transmission of data, XML has become the emerging standard of data transfer and data exchange across the Internet. XML document must always be checked for well formedness before data transfer and exchange can take place. To choose the right parser for an organization respective system is crucial and critical; since improper parser will lead to degradation in performance and decrease in productivity. In this paper, we do an extensive comparative study and benchmarking on the popular XML parsers found in the market today. In addition, we also propose a non-validating SAX based XML parser, xParse. We implemented our technique and present the performance results, which prove the viability of our approach.

...read moreread less

Proceedings Article•DOI•

Schema advisor for hybrid relational-XML DBMS

[...]

Mirella M. Moro¹, Lipyeow Lim², Yuan-Chi Chang²•Institutions (2)

University of California, Riverside¹, IBM²

11 Jun 2007

TL;DR: ReXSA proposes candidate database schemas given an information model of the enterprise data that has the advantage of considering qualitative properties of the information model such as reuse, evolution and performance profiles for deciding how to persist the data.

...read moreread less

Abstract: In response to the widespread use of the XML format for document representation and message exchange, major database vendors support XML in terms of persistence, querying and indexing. Specifically, the recently released IBM DB2 9 (for Linux, Unix and Windows) is a hybrid data server with optimized management of both XML and relational data. With the new option of storing and querying XML in a relational DBMS, data architects face the the decision of what portion of their data to persist as XML and what portion as relational data. This problem has not been addressed yet and represents a serious need in the industry. Hence, this paper describes ReXSA, a schema advisor tool that is being prototyped for IBM DB2 9. ReXSA proposes candidate database schemas given an information model of the enterprise data. It has the advantage of considering qualitative properties of the information model such as reuse, evolution and performance profiles for deciding how to persist the data. Finally, we show the viability and practicality of ReXSA by applying it to custom and real usecases.

...read moreread less

Proceedings Article•DOI•

TAXI--A Tool for XML-Based Testing

[...]

Antonia Bertolino¹, Jinghua Gao¹, Eda Marchetti¹, Andrea Polini¹•Institutions (1)

Istituto di Scienza e Tecnologie dell'Informazione¹

20 May 2007

TL;DR: The tool TAXI is presented, which implements the XML-based partition testing approach for the automated generation of XML instances conforming to a given XML schema and provides a set of weighted test strategies to guide the systematic derivation of instances.

...read moreread less

Abstract: We present the tool TAXI which implements the XML-based partition testing approach for the automated generation of XML instances conforming to a given XML schema. In addition it provides a set of weighted test strategies to guide the systematic derivation of instances. TAXI can be used for black-box testing of applications accepting in input XML instances and for benchmarking of database management systems.

...read moreread less

Book Chapter•DOI•

FuzzyXPath: Using Fuzzy Logic an IR Features to Approximately Query XML Documents

[...]

Ernesto Damiani¹, Stefania Marrara¹, Gabriella Pasi¹•Institutions (1)

University of Milan¹

18 Jun 2007

TL;DR: Information retrieval and database-related techniques have been jointly applied to effectively tolerate XML data diversity in the evaluation of flexible queries to efficiently addressing structural pattern queries together with predicate support over XML elements content.

...read moreread less

Abstract: XML has become a key technology for interoperability, providing a common data model to applications. However, diverse data modeling choices may lead to heterogeneous XML structure and content. In this paper, information retrieval and database-related techniques have been jointly applied to effectively tolerate XML data diversity in the evaluation of flexible queries. Approximate structure and content matching is supported via a straightforward extension to standard XPath syntax. Also, we outline a query execution technique representing a first step toward efficiently addressing structural pattern queries together with predicate support over XML elements content.

...read moreread less

Proceedings Article•DOI•

Complex Group-By Queries for XML

[...]

Chaitanya Gokhale¹, Nitin Gupta¹, P. Kumar¹, Laks V. S. Lakshmanan², Raymond T. Ng², B. A. Prakash¹ - Show less +2 more•Institutions (2)

Indian Institute of Technology Bombay¹, University of British Columbia²

15 Apr 2007

TL;DR: The popularity of XML as a data exchange standard has led to the emergence of powerful XML query languages like XQuery and studies on XML query optimization, but even for data integration, there is a compelling need for performing group-by style aggregate operations.

...read moreread less

Abstract: The popularity of XML as a data exchange standard has led to the emergence of powerful XML query languages like XQuery and studies on XML query optimization. Of late, there is considerable interest in analytical processing of XML data. As pointed out by Borkar and Carey, even for data integration, there is a compelling need for performing various group-by style aggregate operations. A core operator needed for analytics is the group-by operator, which is widely used in relational as well as OLAP database applications. XQuery requires group-by operations to be simulated using nesting.

...read moreread less

Patent•

Method and apparatus for processing xml for display on a mobile device

[...]

Jack Chen¹, David A. Weintraub¹, Jian Frank Li¹•Institutions (1)

BlackBerry Limited¹

31 May 2007

TL;DR: In this article, a method and apparatus for creating a Document Object Model of an XML document of predetermined type is presented, comprising a first process for receiving and opening a compressed input file containing the XML document, and a second process for opening and parsing the contents of a relationships file to create a map of name-value pairs and detecting a value for identifying the predetermined type from among a plurality of types of XML documents.

...read moreread less

Abstract: A method and apparatus are set forth for creating a Document Object Model of an XML document of predetermined type, comprising a first process for receiving and opening a compressed input file containing the XML document; a second process for opening and parsing the contents of a relationships file to create a map of name-value pairs and detecting a value for identifying the predetermined type from among a plurality of types of XML documents; and a further process for parsing data in the XML document according to the predetermined type, and building the Document Object Model.

...read moreread less

Proceedings Article•DOI•

Clustering XML Documents Based on the Weight of Frequent Structures

[...]

Jeong Hee Hwang¹, Mi Sug Gu²•Institutions (2)

Namseoul University¹, Kyonggi University²

21 Nov 2007

TL;DR: This paper proposes a novel clustering method for XML documents using the weight of frequent structures in XML documents, considering that an XML document as a transaction and the extracted structures from XML documents as items of a transaction.

...read moreread less

Abstract: The previous clustering methods of XML document group XML documents with similar structures, measuring structural similarity and distance between XML documents. In this paper, however, we propose a novel clustering method for XML documents using the weight of frequent structures in XML documents, considering that an XML document as a transaction and the extracted structures from XML documents as items of a transaction. Our experiment results show the high speed and cluster cohesion of our clustering method.

...read moreread less

Book Chapter•DOI•

Combining efficient XML compression with query processing

[...]

Przemysław Skibiński¹, Jakub Swacha•Institutions (1)

University of Wrocław¹

29 Sep 2007

TL;DR: The results of conducted tests show that the proposed scheme attains compression ratios rivaling the best available algorithms, and fast compression, decompression, and query processing.

...read moreread less

Abstract: This paper describes a new XML compression scheme that offers both high compression ratios and short query response time. Its core is a fully reversible transform featuring substitution of every word in an XML document using a semi-dynamic dictionary, effective encoding of dictionary indices, as well as numbers, dates and times found in the document, and grouping data within the same structural context in individual containers. The results of conducted tests show that the proposed scheme attains compression ratios rivaling the best available algorithms, and fast compression, decompression, and query processing.

...read moreread less

Collapse