scispace - formally typeset
Search or ask a question

Showing papers on "Simple API for XML published in 2013"


Journal ArticleDOI
01 Mar 2013
TL;DR: A new hierarchical approach is proposed, that allows to consider (if necessary) multiple forms of structural components to isolate structurally-homogeneous clusters of XML documents and outperforms established competitors in effectiveness and scalability.
Abstract: Clustering XML documents by structure is the task of grouping them by common structural components. Hitherto, this has been accomplished by looking at the occurrence of one preestablished type of structural components in the structures of the XML documents. However, the a-priori chosen structural components may not be the most appropriate for effective clustering. Moreover, it is likely that the resulting clusters exhibit a certain extent of inner structural inhomogeneity, because of uncaught differences in the structures of the XML documents, due to further neglected forms of structural components. To overcome these limitations, a new hierarchical approach is proposed, that allows to consider (if necessary) multiple forms of structural components to isolate structurally-homogeneous clusters of XML documents. At each level of the resulting hierarchy, clusters are divided by considering some type of structural components (unaddressed at the preceding levels), that still differentiate the structures of the XML documents. Each cluster in the hierarchy is summarized through a novel technique, that provides a clear and differentiated understanding of its structural properties. A comparative evaluation over both real and synthetic XML data proves that the devised approach outperforms established competitors in effectiveness and scalability. Cluster summarization is also shown to be very representative.

31 citations


Journal ArticleDOI
TL;DR: A novel unit structure called G-node is defined for streaming XML data in the wireless environment that exploits the benefits of the structure indexing and attribute summarization that can integrate relevant XML elements into a group.
Abstract: In this paper, we propose an energy and latency efficient XML dissemination scheme for the mobile computing. We define a novel unit structure called G-node for streaming XML data in the wireless environment. It exploits the benefits of the structure indexing and attribute summarization that can integrate relevant XML elements into a group. It provides a way for selective access of their attribute values and text content. We also propose a lightweight and effective encoding scheme, called Lineage Encoding, to support evaluation of predicates and twig pattern queries over the stream. The Lineage Encoding scheme represents the parent-child relationships among XML elements as a sequence of bit-strings, called Lineage Code(V, H), and provides basic operators and functions for effective twig pattern query processing at mobile clients. Extensive experiments using real and synthetic data sets demonstrate our scheme outperforms conventional wireless XML broadcasting methods for simple path queries as well as complex twig pattern queries with predicate conditions.

23 citations


Journal ArticleDOI
TL;DR: This paper proposes a two-method approach to build Document Warehouse conceptual schemas for the unification of XML document structures; it aims to elaborate a global and generic view for a set of XML documents belonging to the same domain.
Abstract: Data Warehouses and OLAP (On Line Analytical Processing) technologies are dedicated to analyzing structured data issued from organizations' OLTP (On Line Transaction Processing) systems. Furthermore, in order to enhance their decision support systems, these organizations need to explore XML (eXtensible Markup Language) documents as an additional and important source of unstructured data. In this context, this paper addresses the warehousing of document-centric XML documents. More specifically, we propose a two-method approach to build Document Warehouse conceptual schemas. The first method is for the unification of XML document structures; it aims to elaborate a global and generic view for a set of XML documents belonging to the same domain. The second method is for designing multidimensional galaxy schemas for Document Warehouses.

20 citations


Proceedings ArticleDOI
08 Apr 2013
TL;DR: Efficient algorithms for evaluating tree-pattern queries with joins over probabilistic XML or, more specifically, for listing the answers to a query along with their computed or approximated probability are explored.
Abstract: Probabilistic XML is a probabilistic model for uncertain tree-structured data, with applications to data integration, information extraction, or uncertain version control. We explore in this work efficient algorithms for evaluating tree-pattern queries with joins over probabilistic XML or, more specifically, for listing the answers to a query along with their computed or approximated probability. The approach relies on, first, producing the lineage query by evaluating it over the probabilistic XML document, and, second, looking for an optimal strategy to compute the probability of the lineage formula. This latter part relies on a query-optimizer - like approach: exploring different evaluation plans for different parts of the formula and estimating the cost of each plan, using a cost model for the various evaluation algorithms. We demonstrate the efficiency of this approach on datasets used in previous research on probabilistic XML querying, as well as on synthetic data. We also compare the performance of our query engine with EvalDP [1], Trio [2], and MayBMS/SPROUT [3].

17 citations


Proceedings ArticleDOI
10 Sep 2013
TL;DR: It is shown that standard version control operations can be implemented directly as operations on the probabilistic XML model; efficiency with respect to deterministic version control systems is demonstrated on real-world datasets.
Abstract: In order to ease content enrichment, exchange, and sharing, web-scale collaborative platforms such as Wikipedia or Google Docs enable unbounded interactions between a large number of contributors, without prior knowledge of their level of expertise and reliability. Version control is then essential for keeping track of the evolution of the shared content and its provenance. In such environments, uncertainty is ubiquitous due to the unreliability of the sources, the incompleteness and imprecision of the contributions, the possibility of malicious editing and vandalism acts, etc. To handle this uncertainty, we use a probabilistic XML model as a basic component of our version control framework. Each version of a shared document is represented by an XML tree and the whole document, together with its different versions, is modeled as a probabilistic XML document. Uncertainty is evaluated using the probabilistic model and the reliability measure associated to each source, each contributor, or each editing event, resulting in an uncertainty measure on each version and each part of the document. We show that standard version control operations can be implemented directly as operations on the probabilistic XML model; efficiency with respect to deterministic version control systems is demonstrated on real-world datasets.

16 citations


Book ChapterDOI
11 Nov 2013
TL;DR: This paper proposes to use Object Relationship OR graph, which fully captures semantics of object, relationship and attribute, to represent XML document and develops algorithms based on the OR graph to return more comprehensive answers.
Abstract: Existing XML keyword search approaches can be categorized into tree-based search and graph-based search. Both of them are structure-based search because they mainly rely on the exploration of the structural features of document. Those structure-based approaches cannot fully exploit hidden semantics in XML document. This causes serious problems in processing some class of keyword queries. In this paper, we thoroughly point out mismatches between answers returned by structure-based search and the expectations of common users. Through detailed analysis of these mismatches, we show the importance of semantics in XML keyword search and propose a semantics-based approach to process XML keyword queries. Particularly, we propose to use Object Relationship OR graph, which fully captures semantics of object, relationship and attribute, to represent XML document and we develop algorithms based on the OR graph to return more comprehensive answers. Experimental results show that our proposed semantics-based approach can resolve the problems of the structure-based search, and significantly improve both the effectiveness and efficiency.

14 citations


Proceedings ArticleDOI
22 Jun 2013
TL;DR: A fundamental study of a theory on reasoning about XML keys in the presence of XML schemas, which incorporates the above mentioned properties to assess and refine the quality of derived keys, and an experimental study on an extensive body of real world XML data evaluating the effectiveness of the proposed algorithm is provided.
Abstract: A great deal of research into the learning of schemas from XML data has been conducted in recent years to enable the automatic discovery of XML schemas from XML documents when no schema or only a low-quality one is available. Unfortunately, and in strong contrast to, for instance, the relational model, the automatic discovery of even the simplest of XML constraints, namely XML keys, has been left largely unexplored in this context. A major obstacle here is the unavailability of a theory on reasoning about XML keys in the presence of XML schemas, which is needed to validate the quality of candidate keys. The present article embarks on a fundamental study of such a theory and classifies the complexity of several crucial properties concerning XML keys in the presence of an XSD, like, for instance, testing for consistency, boundedness, satisfiability, universality, and equivalence. Of independent interest, novel results are obtained related to cardinality estimation of XPath result sets. A mining algorithm is then developed within the framework of levelwise search. The algorithm leverages known discovery algorithms for functional dependencies in the relational model, but incorporates the properties mentioned before to assess and refine the quality of derived keys. An experimental study on an extensive body of real-world XML data evaluating the effectiveness of the proposed algorithm is provided.

13 citations


Patent
28 May 2013
TL;DR: In this article, the authors present a method for dynamic DOM aware code editing, comprising storing, in a DOM model, a plurality of Document Object Model (DOM) elements in one or more HyperText Markup Language (HTML) files for a project.
Abstract: A computer implemented method and apparatus for dynamic Document Object Model (DOM) aware code editing. The method comprising storing, in a DOM model, a plurality of Document Object Model (DOM) elements in one or more HyperText Markup Language (HTML) files for a project; and storing, in the DOM model at least one modification to the DOM that results from execution of one or more JavaScript code files for the project, wherein during JavaScript code editing, the at least one modification to the DOM identifies an interaction between the JavaScript code and the DOM elements.

11 citations


Proceedings ArticleDOI
04 Nov 2013
TL;DR: A new technique for partitioning XML documents is presented, in which conventional clustering techniques operating on flattened representations of individual aspects of the XML documents are used to partition the available XML corpus.
Abstract: The combination of multiple clusterings for partitioning XML documents is proposed as a promising method, aimed to decompose the inherently difficult problem of catching structural and content relationships within an XML corpus into a number of simpler subproblems. To verify the validity of such an intuition, a new technique for partitioning XML documents is presented, in which conventional clustering techniques operating on flattened representations of individual aspects of the XML documents (that also include some rare patterns) are used to partition the available XML corpus. The effectiveness of the devised technique is revealed by a comparative empirical evaluation on benchmark XML corpora.

10 citations


Journal Article
TL;DR: This paper presents a performance study of the main existing Java APIs that deal with XML documents, in order to identify the most suitable one for processing large XML files.
Abstract: Over time, XML markup language has acquired a considerable importance in applications development, standards definition and in the representation of large volumes of data, such as databases. Today, processing XML documents in a short period of time is a critical activity in a large range of applications, which imposes choosing the most appropriate mechanism to parse XML documents quickly and efficiently. When using a programming language for XML processing, such as Java, it becomes necessary to use effective mechanisms, e.g. APIs, which allow reading and processing of large documents in appropriated manners. This paper presents a performance study of the main existing Java APIs that deal with XML documents, in order to identify the most suitable one for processing large XML files.

10 citations


Journal ArticleDOI
TL;DR: This work presents an algorithm for generating an inference-proof view by weakening the actual XML document, i.e., eliminating confidential information and other information that could be used to infer confidential information.
Abstract: This work aims at treating the inference problem in XML documents that are assumed to represent potentially incomplete information. The inference problem consists in providing a control mechanism for enforcing inference-usability confinement of XML documents. More formally, an inference-proof view of an XML document is required to be both indistinguishable from the actual XML document to the clients under their inference capabilities, and to neither contain nor imply any confidential information. We present an algorithm for generating an inference-proof view by weakening the actual XML document, i.e., eliminating confidential information and other information that could be used to infer confidential information. In order to avoid inferences based on the schema of the XML documents, the DTD of the actual XML document is modified according to the weakening operations as well, such that the modified DTD conforms with the generated inference-proof view.

Proceedings ArticleDOI
26 Sep 2013
TL;DR: Experimental results are given for the performance of the filtering system for thousands of queries on streaming XML documents, demonstrating that the proposed system performs better compared to earlier state-of-the-art YFilter system.
Abstract: The XML stream filtering applications are gaining popularity in recent years. These Applications require a filtering system that queries on a continuous stream of XML documents and delivers matched content accordingly. A PFilter algorithm has been proposed recently by authors. It has been found to be more effective on a large number of XML streaming documents. The algorithm is used for extracting the information of user's interest in information systems. The present paper proposes an XML stream filtering system architecture based on PFilter algorithm. [10] The algorithm converts the XPath query expressions for the queries into sequences of nodes. The system provides efficient and fast search in the streaming XML document. Experimental results are given for the performance of the filtering system for thousands of queries on streaming XML documents. Performance versus depth of queries and probability of occurrence of XPath operators demonstrates that the proposed system performs better compared to earlier state-of-the-art YFilter system. The proposed system can be used in applications requiring data dissemination based on user interest.

Proceedings ArticleDOI
22 Jun 2013
TL;DR: Experimental results show that XDAS, when compared to Dewey and Range labeling schemes, provides an efficient label size, disk space required to store labels and matching time required to identify relationships between nodes.
Abstract: The eXtensible Mark-up Language rapidly has become a very powerful standard for the data exchange. Labeling schemes have been introduced to optimize data retrieval and query processing on XML database documents. This is done by providing labels that hold information about XML tree nodes. In this paper we introduce a novel labeling scheme XDAS whose labeling technique is inspired by IP addressing and subnetting technique used in computer networks. This technique is used when dividing a network into several sub-networks. Each sub-network is assigned a subnet mask that helps in identifying the parent network. So, this labeling scheme treats XML documents as a network with sub-networks and assigns labels for XML tree nodes using the masking technique. Experimental results show that XDAS, when compared to Dewey and Range labeling schemes, provides an efficient label size, disk space required to store labels and matching time required to identify relationships between nodes.

Journal ArticleDOI
TL;DR: It is shown that when allowing access to external memory, there is a deterministic streaming algorithm that solves the problem of validating XML documents of size N against general DTDs in the context of streaming algorithms with memory space &Order;(log2 N), a constant number of auxiliary read/write streams, and &Order:(log N) total number of passes on the XML document and auxiliary streams.
Abstract: We study the problem of validating XML documents of size N against general DTDs in the context of streaming algorithms. The starting point of this work is a well-known space lower bound. There are XML documents and DTDs for which p-pass streaming algorithms require Ω(N/p) space. We show that when allowing access to external memory, there is a deterministic streaming algorithm that solves this problem with memory space O(log2N), a constant number of auxiliary read/write streams, and O(log N) total number of passes on the XML document and auxiliary streams. An important intermediate step of this algorithm is the computation of the First-Child-Next-Sibling (FCNS) encoding of the initial XML document in a streaming fashion. We study this problem independently, and we also provide memory-efficient streaming algorithms for decoding an XML document given in its FCNS encoding. Furthermore, validating XML documents encoding binary trees against any DTD in the usual streaming model without external memory can be done with sublinear memory. There is a one-pass algorithm using O(√N log N) space, and a bidirectional two-pass algorithm using O(log2N) space which perform this task.

Proceedings ArticleDOI
TL;DR: This work-in-progress paper describes a grammatical inference approach to learn an automaton from example XML documents for detecting documents with anomalous syntax and an algorithm to learn visibly pushdown automata (VPA) directly from a set of examples.
Abstract: False-positives are a problem in anomaly-based intrusion detection systems. To counter this issue, we discuss anomaly detection for the eXtensible Markup Language (XML) in a language-theoretic view. We argue that many XML-based attacks target the syntactic level, i.e. the tree structure or element content, and syntax validation of XML documents reduces the attack surface. XML offers so-called schemas for validation, but in real world, schemas are often unavailable, ignored or too general. In this work-in-progress paper we describe a grammatical inference approach to learn an automaton from example XML documents for detecting documents with anomalous syntax. We discuss properties and expressiveness of XML to understand limits of learnability. Our contributions are an XML Schema compatible lexical datatype system to abstract content in XML and an algorithm to learn visibly pushdown automata (VPA) directly from a set of examples. The proposed algorithm does not require the tree representation of XML, so it can process large documents or streams. The resulting deterministic VPA then allows stream validation of documents to recognize deviations in the underlying tree structure or datatypes.

Proceedings ArticleDOI
01 Sep 2013
TL;DR: A theoretical tokenization model for XML parsing on resource constrained mobile devices that considerably relieves the processing bottlenecks encountered in conventional XML parsers.
Abstract: This paper presents a theoretical tokenization model for XML parsing on resource constrained mobile devices. The model is based on the identification of sequentially repeating patterns within the structure of an XML document. As soon as it identifies a repeating structure, it relieves the parser from the computationally intensive conventional tokenization process, and focuses on extracting text node based values for further processing by the calling application. Our experiments demonstrate that the proposed tokenization model considerably relieves the processing bottlenecks encountered in conventional XML parsers.

Proceedings ArticleDOI
Liu Haowen1, Li Wei1, Long Yin1
01 Jan 2013
TL;DR: An XML-based pseudo-code online editing and conversion system is put forward, described and saved with XML document which is analyzed and parsed by DOM4J, so as to realize the reuse of pseudo- code and source code.
Abstract: The present researchers urgently need a network tool that is able to edit pseudo-code and convert it into source code on-line This paper puts forward an XML-based pseudo-code online editing and conversion system The pseudo-code is described and saved with XML document which is analyzed and parsed by DOM4J, so as to realize the reuse of pseudo-code and source code

01 Jan 2013
TL;DR: A performance study of XML data parsing by evaluating these parsers using time as a parameter shows that the data structure based parser is efficient than SAX & DOM parsers.
Abstract: XML has become a defacto standard for data representation and exchange, XML data processing becomes more and more important for server workloads like web servers and database servers. One of the most time consuming part is XML document parsing. Parsing is a core operation performed before an XML document can be navigated, queried, or manipulated. Recently, high performance XML parsing has become a topic of considerable interest In this paper, we are presenting a performance study of XML data parsing by evaluating these parsers using time as a parameter. The proposed design uses four different data structures linked list, stack, array queue. All these data structures are linear in nature. We evaluate the data parsing behaviour and study architectural characteristics. The proposed design analyses the performance of XML parsing techniques using various data structures. Based on observed analysis and graphical results it shows that the data structure based parser is efficient than SAX & DOM parsers..

Journal ArticleDOI
TL;DR: This paper proposes a new encoding scheme called PEDewey for probabilistic XML and designs two algorithms for finding answers of top-k probabilities for twig queries, called ProTJFast and PTopKTwig, based on the element streams ordered by the path probability values.
Abstract: The flexibility of XML data model allows a more natural representation of uncertain data compared with the relational model. Matching twig pattern against XML data is a fundamental problem in querying information from XML documents. For a probabilistic XML document, each twig answer has a probabilistic value because of the uncertainty of data. The twig answers that have small probabilistic value are useless to the users, and usually users only want to get the answers with the k largest probabilistic values. To this end, existing algorithms for ordinary XML documents cannot be directly applicable due to the need for handling probability distributional nodes and efficient calculation of top-k probabilities of answers in probabilistic XML. In this paper, we address the problem of finding twig answers with top-k probabilistic values against probabilistic XML documents directly. We propose a new encoding scheme called PEDewey for probabilistic XML in this paper. Based on this encoding scheme, we then design two algorithms for finding answers of top-k probabilities for twig queries. One is called ProTJFast, to process probabilistic XML data based on element streams in document order, and the other is called PTopKTwig, based on the element streams ordered by the path probability values. Experiments have been conducted to study the performance of these algorithms.

Journal Article
TL;DR: This paper is representing how to define RDF elements for a well formatted and valid XML document.
Abstract: In recent years semantic web is one of the technologies to make the WWW as machineunderstandable. The Resource Description Framework (RDF) proposed by W3C is used for describing metadata about (Web) resources. The RDF data model can be represented as entity-relationship or class diagrams, as it is based upon the idea of making statements about resources (in particular Web resources) in the form of subject-predicate-object expressions. These expressions are known as triples in RDF terminology. The Resource Description Framework (RDF) is a language for representing web information in a minimally constraining, extensible, but meaningful way. XML is the basis to represent RDF models on semantic Web. In business applications data can be represented in Xml which describes the structure of data. But to increase the efficiency with respect to data searching and retrieval the xml data can be represented in RDF format. All the elements and attributes of an xml can be defined in RDF format. In this paper we are representing how to define RDF elements for a well formatted and valid XML document. Keywords-Semanticweb, XML,DTD,,RDF

Journal ArticleDOI
TL;DR: A system to manage XML documents that could be queried by the query language XQuery is proposed, and a better translating and executing technique which is more efficiently is proposed.
Abstract: XML has already been the standard of data interchange on the Internet. Nowadays, a large amount of data is represented in XML format. However, most of the critical data in businesses are still stored in relational database management systems. It is difficult to query XML databases because of its textual format. This research intends to tackle this problem, and we proposed a system to manage XML documents that could be queried by the query language XQuery. XML documents are stored in relational format and the XQuery expressions are translated into appropriate SQL queries. The results of the SQL queries are transformed into XML documents. Comparing with LegoDB System, our system reduces processing time to search a relation configuration and proposes a better translating and executing technique which is more efficiently.

Journal ArticleDOI
01 Sep 2013
TL;DR: A query-oriented XML summarization system QXMLSum is proposed, which extracts sentences and combines them as a summary based on three kinds of features: user’s queries, the content of XML documents/elements, and the structure of XML Documents/Elements.
Abstract: Extensible Markup Language (XML) is a simple, flexible text format derived from SGML, which is originally designed to support large-scale electronic publishing. Nowadays XML plays a fundamental role in the exchange of a wide variety of data on the Web. As XML allows designers to create their own customized tags, enables the definition, transmission, validation, and interpretation of data between applications, devices and organizations, lots of works in soft computing employ XML to take control and responsibility for the information, such as fuzzy markup language, and accordingly there are lots of XML-based data or documents. However, most of mobile and interactive ubiquitous multimedia devices have restricted hardware such as CPU, memory, and display screen. So, it is essential to compress an XML document/element collection to a brief summary before it is delivered to the user according to his/her information need. Query-oriented XML text summarization aims to provide users a brief and readable substitution of the original retrieved documents/elements according to the user's query, which can relieve users' reading burden effectively. We propose a query-oriented XML summarization system QXMLSum, which extracts sentences and combines them as a summary based on three kinds of features: user's queries, the content of XML documents/elements, and the structure of XML documents/elements. Experiments on the IEEE-CS datasets used in Initiative for the Evaluation of XML Retrieval show that the query-oriented XML summary generated by QXMLSum is competitive.

Journal ArticleDOI
TL;DR: This work presents mechanisms of Storage and query optimization for XML data in relational database and an optimizing mechanism based on XPath model named Compressed XML Query Tree will be presented in order to improve efficiency of XML data query by reducing superabundant join operations from ancestor- descendent axis.
Abstract: XML is de facto new standard for data representation and exchanging on the web. Along with the growth of XML data, traditional relational databases support XML data processing across-the-board. Consistent storage and efficient query for XML data is the chief problem in XML supported relational databases. This work presents mechanisms of Storage and query optimization for XML data in relational database. XML data are treated as a kind of data type in relational database, and XML tables are used to store native XML data in fixed schema. Structural summary index is built and maintained in relational database and an optimizing mechanism based on XPath model named Compressed XML Query Tree will also be presented in order to improve efficiency of XML data query by reducing superabundant join operations from ancestor- descendent axis. All strategies are appropriate for classical XML query algorithms. Algorithms for XML query will be performed in experiments on real XML datasets in relational database and query workloads to report the performance of our mechanism and show the efficiency compared with other mechanisms.

Patent
01 Apr 2013
TL;DR: A method for data chunk partitioning in XML parsing and a method for XML parsing are disclosed in this paper, which includes partitioning an XML file into multiple XML data segments, and allocating the multiple XML segments to multiple threads for parallel processing.
Abstract: A method for data chunk partitioning in XML parsing and a method for XML parsing are disclosed in the invention, the method for data chunk partitioning in XML parsing includes: partitioning an XML file into multiple XML data segments, and allocating the multiple XML data segments to multiple threads for parallel processing; determining a candidate boundary start symbol in the XML data segment; determining a boundary symbol type of the candidate boundary start symbol, and recording the boundary symbol type and a position of the candidate boundary start symbol; determining a valid boundary start symbol; partitioning the XML file into multiple data chunks by taking the valid boundary start symbol as a boundary. With the method, the integrity of the XML elements in each data chunk can be ensured, and thus effectively improving the efficiency of XML data parsing.

Proceedings ArticleDOI
26 Aug 2013
TL;DR: This paper proposes element-skipping parsing where the parsing of such XML elements that do not contribute to the final query result is skipped with the coordination of XML parser and query processor.
Abstract: This paper proposes a scheme of energy-efficient XML stream processing through element-skipping parsing. Although parsing is one of the most computationally heavy part in the process of XML stream processing, existing techniques do not focus on the efficiency of XML parsing. To this problem, in our scheme, we propose element-skipping parsing where the parsing of such XML elements that do not contribute to the final query result is skipped with the coordination of XML parser and query processor. We show that the scheme is effective in reducing the execution time as well as energy consumption of stream-based XML query processor.

Journal ArticleDOI
TL;DR: It is concluded that entity segments are the suitable ranges where XML constraints can authentically match original information relevancies and the proposal presented in this paper is not only effective on avoiding XML data redundancies but also on keeping XML data consistencies.

Book ChapterDOI
Weidong Yang1, Hao Zhu1
01 Jan 2013
TL;DR: This chapter presents a system called SKeyword which provides a common keyword search interface for heterogeneous XML data sources, and employs OWL ontology to represent the global model of various data sources.
Abstract: Massive heterogeneous XML data sources emerge on the Internet nowadays. These data sources are generally autonomous and provide search interfaces of XML query language such as XPath or XQuery. Accordingly, users need to learn complex syntaxes and know the schemas. Keyword Search is a user-friendly information discovery technique, which can assist users in obtaining useful information conveniently without knowing the schemas, and is very helpful to search heterogeneous XML data. In this chapter, the authors present a system called SKeyword which provides a common keyword search interface for heterogeneous XML data sources, and employs OWL ontology to represent the global model of various data sources. Section 1 introduces the context of keyword search for heterogeneous XML data source. In Section 2, the preliminary knowledge is given, and the semantics of keyword search result in ontology is defined. In section 3, the system architecture is described. Section 4 presents the approaches of ontology integration and index building used by SKeyword. Section 5 presents the generation algorithm of searching results and discusses how to rewrite the keyword search of global conceptual model to into the XQuery sentences for local XML sources. Section 6 discussed how to organize and rank the results. Section 7 shows the experiments. Section 8 is the related work. Section 9 is the conclusion of this chapter.

Proceedings ArticleDOI
23 Jun 2013
TL;DR: A new unit structure for XML stream called SecNode is defined which supports data confidentiality of XML data over the wireless broadcast channel and two indexes for the SecNode structure called Min (NCS) and Min (NIS) are defined to efficiently process XML queries over the encrypted XML stream.
Abstract: Recently, the use of XML for data broadcasting in mobile wireless networks has gained many attentions. One of the most essential requirements for such networks is data confidentiality. In order to secure XML data broadcast in mobile wireless network, mobile clients should obey a set of access authorizations specified on the original XML document. In such environments, mobile clients can only access authorized parts of encrypted XML stream based on their access authorizations. Several indexing methods have been proposed in order to have selective access to XML data over the XML stream. However, these indexing methods cannot be used for encrypted XML data. In this paper, we define a new unit structure for XML stream called SecNode which supports data confidentiality of XML data over the wireless broadcast channel. We also define two indexes for the SecNode structure called Min (NCS) and Min (NIS) to efficiently process XML queries over the encrypted XML stream.The experimental results demonstrate using two indexes Min (NCS) and Min (NIS) in the SecNode structure reduces the power consumption of mobile clients in processing XML queries over the encrypted XML stream.

Book ChapterDOI
26 Aug 2013
TL;DR: The experimental results show that the proposed novel method to transform an XML graph to a tree model such that it can exploit existing XML tree search methods can outperform the traditional XMLgraph search methods by orders of magnitude in efficiency while generating a similar set of results as existing XML graph search methods.
Abstract: Keyword search, as opposed to traditional structured query, has been becoming more and more popular on querying XML data in recent years. XML documents usually contain some ID nodes and IDREF nodes to represent reference relationships among the data. An XML document with ID/IDREF is modeled as a graph by existing works, where the keyword query results are computed by graph traversal. As a comparison, if ID/IDREF is not considered, an XML document can be modeled as a tree. Keyword search on XML tree can be much more efficient using tree-based labeling techniques. A nature question is whether we need to abandon the efficient XML tree search methods and invent new, but less efficient search methods for XML graph. To address this problem, we propose a novel method to transform an XML graph to a tree model such that we can exploit existing XML tree search methods. The experimental results show that our solution can outperform the traditional XML graph search methods by orders of magnitude in efficiency while generating a similar set of results as existing XML graph search methods.

Proceedings ArticleDOI
S. Sankari1, S. Bose1
01 Dec 2013
TL;DR: The approach proposes Enhanced Dewey Code (EDC), an encoding scheme for XML document and is based on the notion of enhanced level order numbers that supports the integrity and confidentiality requirements of anxml document and also facilitates efficient identification and distribution of selected content from an XML document.
Abstract: The paper proposes an approach for efficient content dissemination to assure content integrity and confidentiality by exploiting the structural properties of an Extensible Markup Language (XML) document object model (DOM) Our approach proposes Enhanced Dewey Code (EDC), an encoding scheme for XML document and is based on the notion of enhanced level order numbers that supports the integrity and confidentiality requirements of an XML document and also facilitates efficient identification and distribution of selected content from an XML document By using such notion, we develop a policy-based routing scheme for XML content dissemination which assures that the content is delivered to users according to the access control policies by preventing information leaks Our XML content dissemination approach represents an efficient and secure mechanism for XML Documents to use in applications such as publish-subscribe systems Our content dissemination approach provides different levels of confidentiality and integrity requirements in trusted and untrusted networks, which is common across enterprise networks and the web