scispace - formally typeset
Search or ask a question
Journal ArticleDOI

DGReLab+: improving XML path query processing by avoiding buffering irrelevant results

01 Jan 2017-Vol. 115, pp 804-811
TL;DR: A path query processing technique, DGReLab + , which evaluates queries without buffering irrelevant results is proposed, which revealed that DG reLab + outperformed two other techniques which are TwigStack and QTwig.
Abstract: The impulsive headway in the use of XML attracted researchers to conduct researches on the optimization techniques of XML data management. One of the aspects that have been a challenge since then is the effective processing of user queries. Although many techniques have been proposed in the past, these techniques still suffer from large overhead due to buffering irrelevant results before producing the final output. Thus, in this paper, we propose a path query processing technique, DGReLab + , which evaluates queries without buffering irrelevant results. The evaluations revealed that DGReLab + outperformed two other techniques which are TwigStack and QTwig.
Citations
More filters
Journal ArticleDOI
TL;DR: In this article, a centralized pruning technique is adopted into the proposed distributed XML query processing technique to process XML queries robustly to improve the overall performance of a distributed query processor.
Abstract: eXtensible Markup Language (XML) is used widely to transfer data among a wide variety of systems. Due to an increase in query workloads and management of larger datasets, centralized processing is no longer feasible for XML query processing. To address this issue, we propose a technique that improves XML query processing through query workload distribution. An effective distributed XML query processing can be affected by several criteria such as indexing, fragmentation, distribution strategy, and well as the query handling in the distributed servers. However, we believe that an efficient labeling mechanism and an inexpensive centralized query processors or a pruning method at dedicated servers contribute greatly to the overall performance of a distributed query processor. In this paper, we present an effective centralized pruning technique that is adopted into our proposed distributed XML query processing technique to process XML queries robustly. Experimental evaluations showed that the proposed distributed query processor superseded the performance of centralized query processor.

5 citations

Journal ArticleDOI
TL;DR: A novel update-friendly labeling scheme called branch map is introduced, which records the correspondence between parent and child nodes instead of assigning a label to each node, and the space required for the index is reduced considerably.
Abstract: One of the difficulties faced when using XML as the data storage structure is query inefficiency. Therefore, various indexing methods have been proposed. When designing indexing methods, the first step is to choose the labeling method. Some labeling methods can work well; however, if they cannot effectively support update operations, their use is subject to considerable limitations. Most of the update-friendly labeling methods proposed in the literature assign a unique label to each node in XML and provide an expandable mechanism for future insertion. However, they encounter some difficulties, such as increasing the index space, more difficulty in evaluating the relationships between nodes, and increasing the complexity of labels. In this paper, we introduce a novel update-friendly labeling scheme called branch map, which records the correspondence between parent and child nodes instead of assigning a label to each node. The space required for the index is reduced considerably. More importantly, the branch map can maintain the profile as if it was encoded initially, even after being frequently updated. This paper also proposes a compact indexing scheme called UCIS-X. Experimental results indicate that UCIS-X performs well in terms of index size, query, and update efficiency.

4 citations


Cites background from "DGReLab+: improving XML path query ..."

  • ...DGReLab [38], a path query processing technique, was proposed to avoid buffering irrelevant results before producing the final results....

    [...]

Dissertation
05 Mar 2018
TL;DR: This thesis presents a design and implementation of a new indexing technique, called the Child Prime Label (CPL), which exploits the property of prime numbers to identify Parent-Child edges in twig pattern queries (TPQs) during query evaluation.
Abstract: The adoption of the eXtensible Markup Language (XML) as the standard format to store and exchange semi-structure data has been gaining momentum. The growing number of XML documents leads to the need for appropriate XML querying algorithms which are able to retrieve XML data efficiently. Due to the importance of twig pattern matching in XML retrieval systems, finding all matching occurrences of a tree pattern query in an XML document is often considered as a specific task for XML databases as well as a core operation in XML query processing. This thesis presents a design and implementation of a new indexing technique, called the Child Prime Label (CPL) which exploits the property of prime numbers to identify Parent-Child (P-C) edges in twig pattern queries (TPQs) during query evaluation. The CPL approach can be incorporated efficiently within the existing labelling schemes. The major contributions of this thesis can be seen as a set of novel twig matching algorithms which apply the CPL approach and focus on reducing the overhead of storing useless elements and performing unnecessary computations during the output enumeration. The research presented here is the first to provide an efficient and general solution for TPQs containing ordering constraints and positional predicates specified by the XML query languages. To evaluate the CPL approaches, the holistic model was implemented as an experimental prototype in which the approaches proposed are compared against state-of-the-art holistic twig algorithms. Extensive performance studies on various real-world and artificial datasets were conducted to demonstrate the significant improvement of the CPL approaches over the previous indexing and querying methods. The experimental results demonstrate the validity and improvements of the new algorithms over other related methods on common various subclasses of TPQs. Moreover, the scalability tests reveal that the new algorithms are more suitable for processing large XML datasets.

3 citations


Cites methods from "DGReLab+: improving XML path query ..."

  • ...Generally, all XML query processing algorithms which perform structural join operations to match a given query against an XML document rely on either sub-tree labelling schemes or prefix-based labelling schemes [40, 147, 146, 5, 236, 19, 140, 138, 184, 205]....

    [...]

Proceedings ArticleDOI
23 Nov 2019
TL;DR: The results show that the proposed technique, D-DGReLab+ outperformed other centralized techniques, TwigStack and QTwig, when compared to same set of queries on two different datasets.
Abstract: eXtensible Markup Language (XML) has been used to transfer data among a wide variety of systems. The increasing usage of XML data and increase in query workloads makes XML unrealisable for centralized storage. Therefore, a distributed query evaluation strategy is well-suited to access these types of collections without having to ship large volumes of irrelevant data across the network. A centralized planning and distributed execution strategy can be used in processing XML queries in distributed manner. In this paper, a technique that processes query in distributed environment is presented, with a pruning technique in local distributed servers and the central server federates the results sent by the distributed servers. A series of evaluations were that compares the performance of centralized and distributed techniques using same set of queries on two different datasets. The results show that the proposed technique, D-DGReLab+ outperformed other centralized techniques, TwigStack and QTwig.

Cites methods from "DGReLab+: improving XML path query ..."

  • ...Our previous centralized query processing technique, DGReLab [19] will be adopted as the pruning techniques into D-DGReLab....

    [...]

References
More filters
Proceedings ArticleDOI
03 Jun 2002
TL;DR: This paper shows that XML's ordered data model can indeed be efficiently supported by a relational database system, and proposes three order encoding methods that can be used to represent XML order in the relational data model, and also proposes algorithms for translating ordered XPath expressions into SQL using these encoding methods.
Abstract: XML is quickly becoming the de facto standard for data exchange over the Internet. This is creating a new set of data management requirements involving XML, such as the need to store and query XML documents. Researchers have proposed using relational database systems to satisfy these requirements by devising ways to "shred" XML documents into relations, and translate XML queries into SQL queries over these relations. However, a key issue with such an approach, which has largely been ignored in the research literature, is how (and whether) the ordered XML data model can be efficiently supported by the unordered relational data model. This paper shows that XML's ordered data model can indeed be efficiently supported by a relational database system. This is accomplished by encoding order as a data value. We propose three order encoding methods that can be used to represent XML order in the relational data model, and also propose algorithms for translating ordered XPath expressions into SQL using these encoding methods. Finally, we report the results of an experimental study that investigates the performance of the proposed order encoding methods on a workload of ordered XML queries and updates.

2,402 citations

Proceedings Article
25 Aug 1997
TL;DR: The theoretical foundations of DataGuides are presented along with an algorithm for their creation and an overview of incremental maintenance, and performance results based on the implementation of dataGuides in the Lore DBMS for semistructured data are provided.
Abstract: In semistructured databases there is no schema fixed in advance. To provide the benefits of a schema in such environments, we introduce DataGuides: concise and accurate structural summaries of semistructured databases. DataGuides serve as dynamic schemas, generated from the database; they are useful for browsing database structure, formulating queries, storing information such as statistics and sample values, and enabling query optimization. This paper presents the theoretical foundations of DataGuides along with an algorithm for their creation and an overview of incremental maintenance. We provide performance results based on our implementation of DataGuides in the Lore DBMS for semistructured data. We also describe the use of DataGuides in Lore, both in the user interface to enable structure browsing and query formulation, and as a means of guiding the query processor and optimizing query execution.

1,341 citations

Proceedings ArticleDOI
03 Jun 2002
TL;DR: This paper proposes a novel holistic twig join algorithm, TwigStack, that uses a chain of linked stacks to compactly represent partial results to root-to-leaf query paths, which are then composed to obtain matches for the twig pattern.
Abstract: XML employs a tree-structured data model, and, naturally, XML queries specify patterns of selection predicates on multiple elements related by a tree structure. Finding all occurrences of such a twig pattern in an XML database is a core operation for XML query processing. Prior work has typically decomposed the twig pattern into binary structural (parent-child and ancestor-descendant) relationships, and twig matching is achieved by: (i) using structural join algorithms to match the binary relationships against the XML database, and (ii) stitching together these basic matches. A limitation of this approach for matching twig patterns is that intermediate result sizes can get large, even when the input and output sizes are more manageable.In this paper, we propose a novel holistic twig join algorithm, TwigStack, for matching an XML query twig pattern. Our technique uses a chain of linked stacks to compactly represent partial results to root-to-leaf query paths, which are then composed to obtain matches for the twig pattern. When the twig pattern uses only ancestor-descendant relationships between elements, TwigStack is I/O and CPU optimal among all sequential algorithms that read the entire input: it is linear in the sum of sizes of the input lists and the final result list, but independent of the sizes of intermediate results. We then show how to use (a modification of) B-trees, along with TwigStack, to match query twig patterns in sub-linear time. Finally, we complement our analysis with experimental results on a range of real and synthetic data, and query twig patterns.

1,014 citations

01 Jan 1999
TL;DR: This paper describes the experiences migrating the Lore database management system for semistructured data to work with XML, and presents a modified data model, whose definition was a subtly challenging task given that XML itself is just a textual language.
Abstract: Research on semistructured data over the last several years has focused on data models, query languages, and systems where the database is modeled as some form of labeled, directed graph. The recent emergence of eXtensible Markup Language (XML) as a new standard for data representation and exchange on the World-Wide Web has drawn significant attention. Researchers have casually observed a striking similarity between semistructured data models and XML. While similarities do abound, some key differences dictate changes to any existing data model, query language, or DBMS for semistructured data in order to fully support XML. This paper describes our experiences migrating the Lore database management system for semistructured data to work with XML. We present our modified data model, whose definition was a subtly challenging task given that XML itself is just a textual language. Based on this model, we describe changes to Lorel, Lore's query language. We also briefly discuss changes to Lore's dynamic structural summaries (DataGuides) and the relationship of DataGuides to XML's Document Type Definitions (DTDs).

309 citations

Proceedings ArticleDOI
01 Sep 2006
TL;DR: To the authors' knowledge, this is the first GTP matching solution that avoids any post path-join, sort, duplicate elimination and grouping operations, and the proposed Twig2Stack algorithm not only has better twig query processing performance than state-of theart algorithms, but is also capable of efficiently processing the more complex GTP queries.
Abstract: Tree pattern matching is one of the most fundamental tasks for XML query processing. Holistic twig query processing techniques [4, 16] have been developed to minimize the intermediate results, namely, those root-to-leaf path matches that are not in the final twig results. However, useless path matches cannot be completely avoided, especially when there is a parent-child relationship in the twig query. Furthermore, existing approaches do not consider the fact that in practice, in order to process XPath or XQuery statements, a more powerful form of twig queries, namely, Generalized-Tree-Pattern (GTP) [8] queries, is required. Most existing works on processing GTP queries generally calls for costly post-processing for eliminating redundant data and/or grouping of the matching results.In this paper, we first propose a novel hierarchical stack encoding scheme to compactly represent the twig results. We introduce Twig2Stack, a bottom-up algorithm for processing twig queries based on this encoding scheme. Then we show how to efficiently enumerate the query results from the encodings for a given GTP query. To our knowledge, this is the first GTP matching solution that avoids any post path-join, sort, duplicate elimination and grouping operations. Extensive performance studies on various data sets and queries show that the proposed Twig2Stack algorithm not only has better twig query processing performance than state-of-the-art algorithms, but is also capable of efficiently processing the more complex GTP queries.

227 citations