Showing papers on "Simple API for XML published in 2010"

PDF

Open Access

Journal Article•DOI•

jmzML, an open-source Java API for mzML, the PSI standard for MS data

[...]

Richard G. Côté¹, Florian Reisinger¹, Lennart Martens²•Institutions (2)

European Bioinformatics Institute¹, Ghent University²

01 Apr 2010-Proteomics

TL;DR: jmzML, a Java API for the Proteomics Standards Initiative mzML data standard, can handle arbitrarily large files in minimal memory, allowing easy and efficient processing of mz ML files using the Java programming language.

...read moreread less

Abstract: We here present jmzML, a Java API for the Proteomics Standards Initiative mzML data standard. Based on the Java Architecture for XML Binding and XPath-based XML indexer random-access XML parser, jmzML can handle arbitrarily large files in minimal memory, allowing easy and efficient processing of mzML files using the Java programming language. jmzML also automatically resolves internal XML references on-the-fly. The library (which includes a viewer) can be downloaded from http://jmzml.googlecode.com.

...read moreread less

51 citations

Journal Article•DOI•

Full length Article: Integration of product design, process planning, scheduling, and FMS control using XML data representation

[...]

Dusan Sormaz¹, Jaikumar Arumugam¹, Ramachandra S Harihara², Chintankumar Patel³, Narender Neerukonda⁴ - Show less +1 more•Institutions (4)

Ohio University¹, Lexmark², American Express³, Compuware⁴

01 Dec 2010-Robotics and Computer-integrated Manufacturing

TL;DR: The proposed model is verified in a couple of scenarios for distributed manufacturing planning that involves feature mapping from CAD file, process selection for several part designs integrated with scheduling and simulation of the FMS model using alternative routings.

...read moreread less

Abstract: An efficient model for communications between CAD, CAPP, and CAM applications in distributed manufacturing planning environment has been seen as key ingredient for CIM. Integration of design model with process and scheduling information in real-time is necessary in order to increase product quality, reduce the cost, and shorten the product manufacturing cycle. This paper describes an approach to integrate key product realization activities using neutral data representation. The representation is based on established standards for product data exchange and serves as a prototype implementation of these standards. The product and process models are based on object-oriented representation of geometry, features, and resulting manufacturing processes. Relationships between objects are explicitly represented in the model (for example, feature precedence relations, process sequences, etc.). The product model is developed using XML-based representation for product data required for process planning and the process model also uses XML representation of data required for scheduling and FMS control. The procedures for writing and parsing XML representations have been developed in object-oriented approach, in such a way that each object from object-oriented model is responsible for storing its own data into XML format. Similar approach is adopted for reading and parsing of the XML model. Parsing is performed by a stack of XML handlers, each corresponding to a particular object in XML hierarchical model. This approach allows for very flexible representation, in such a way that only a portion of the model (for example, only feature data, or only the part of process plan for a single machine) may be stored and successfully parsed into another application. This is very useful approach for direct distributed applications, in which data are passed in the form of XML streams to allow real-time on-line communication. The feasibility of the proposed model is verified in a couple of scenarios for distributed manufacturing planning that involves feature mapping from CAD file, process selection for several part designs integrated with scheduling and simulation of the FMS model using alternative routings.

...read moreread less

34 citations

Proceedings Article•DOI•

A 1 cycle-per-byte XML parsing accelerator

[...]

Zefu Dai¹, Nick Ni¹, Jianwen Zhu¹•Institutions (1)

University of Toronto¹

21 Feb 2010

TL;DR: The design of the first complete field programmable gate array (FPGA) accelerator capable of XML well-formed checking, schema validation, and tree construction at a throughput of 1 cycle per byte (CPB) is detailed.

...read moreread less

Abstract: Extensible Markup Language (XML) is playing an increasing important role in web services and database systems. However, the task of XML parsing is often the bottleneck, and as a result, the target of acceleration using custom hardware or multicore CPUs. In this paper, we detail the design of the first complete field programmable gate array (FPGA) accelerator capable of XML well-formed checking, schema validation, and tree construction at a throughput of 1 cycle per byte (CPB). This is a significant advancement from 40 CPB, the best previous reported commercial result. We demonstrate our design on a Xilinx Virtex-5 board, which successfully saturates a 1 Gbps Ethernet link.

...read moreread less

30 citations

Proceedings Article•DOI•

Statistics-based parallelization of XPath queries in shared memory systems

[...]

Rajesh Bordawekar¹, Lipyeow Lim², Anastasios Kementsietsidis¹, Bryant Wei Lun Kok¹•Institutions (2)

IBM¹, University of Hawaii at Manoa²

22 Mar 2010

TL;DR: This paper proposes a novel, end-to-end parallelization framework that determines the optimal way of parallelizing an XML query, based on a statistics-based approach that relies both on the query specifics and the data statistics.

...read moreread less

Abstract: The wide availability of commodity multi-core systems presents an opportunity to address the latency issues that have plaqued XML query processing. However, simply executing multiple XML queries over multiple cores merely addresses the throughput issue: intra-query parallelization is needed to exploit multiple processing cores for better latency. Toward this effort, this paper investigates the parallelization of individual XPath queries over shared-address space multi-core processors. Much previous work on parallelizing XPath in a distributed setting failed to exploit the shared memory parallelism of multi-core systems. We propose a novel, end-to-end parallelization framework that determines the optimal way of parallelizing an XML query. This decision is based on a statistics-based approach that relies both on the query specifics and the data statistics. At each stage of the parallelization process, we evaluate three alternative approaches, namely, data-, query-, and hybrid-partitioning. For a given XPath query, our parallelization algorithm uses XML statistics to estimate the relative efficiencies of these different alternatives and find an optimal parallel XPath processing plan. Our experiments using well-known XML documents validate our parallel cost model and optimization framework, and demonstrate that it is possible to accelerate XPath processing using commodity multi-core systems.

...read moreread less

27 citations

Journal Article•DOI•

XML-based genetic programming framework: design philosophy, implementation, and applications

[...]

Ivan Tanev¹, Katsunori Shimohara¹•Institutions (1)

Doshisha University¹

01 Dec 2010-Artificial Life and Robotics

TL;DR: This work presents the design philosophy, implementation, and various applications of an XML-based genetic programming (GP) framework, which contributes to the achievements of fast prototyping of GP by using the standard built-in API of DOM parsers for manipulating the genetic programs.

...read moreread less

Abstract: We present the design philosophy, implementation, and various applications of an XML-based genetic programming (GP) framework (XGP). The key feature of XGP is the distinct representation of genetic programs as DOM parsing trees featuring corresponding flat XML text. XGP contributes to the achievements of: (i) fast prototyping of GP by using the standard built-in API of DOM parsers for manipulating the genetic programs, (ii) human readability and modifiability of the genetic representations, (iii) generic support for the representation of the grammar of a strongly typed GP using W3C-standardized XML schema; (iv) inherent inter-machine migratability of the text-based genetic representation (i.e., the XML text) in the distributed implementations of GP.

...read moreread less

23 citations

Patent•

Embedding expressions in XML literals

[...]

Henricus Johannes Maria Meijer¹, David N. Schach¹, Avner Y. Aharoni¹, Peter F. Drayton¹, Brian C. Beckman¹, Amanda Silver¹, Paul A. Vick¹ - Show less +3 more•Institutions (1)

Microsoft¹

29 Nov 2010

TL;DR: In this article, the authors propose an architecture that extends conventional computer programming languages that compile into an instance of an extensible markup language (XML) document object model (DOM) to provide support for XML literals in the underlying programming language.

...read moreread less

Abstract: An architecture that that extends conventional computer programming languages that compile into an instance of an extensible markup language (XML) document object model (DOM) to provide support for XML literals in the underlying programming language. This architecture facilitates a convenient short cut by replacing the complex explicit construction required by conventional systems to create an instance of a DOM with a concise XML literal for which conventional compilers can translate into the appropriate code. The architecture allows these XML literals to be embedded with expressions, statement blocks or namespaces to further enrich the power and versatility. In accordance therewith, context information describing the position and data types that an XML DOM can accept can be provided to the programmer via, for example, an integrated development environment. Additionally, the architecture supports escaping XML identifiers, a reification mechanism, and a conversion mechanism to convert between collections and singletons.

...read moreread less

21 citations

Proceedings Article•DOI•

Using versioned tree data structure, change detection and node identity for three-way XML merging

[...]

Cheng Thao¹, Ethan V. Munson¹•Institutions (1)

University of Wisconsin–Milwaukee¹

21 Sep 2010

TL;DR: An implementation of a three-way XML merge algorithm that is faster, uses less memory and is more precise than existing tools is presented and a graphical interface for visualizing and resolving conflicts is provided.

...read moreread less

Abstract: XML has become the standard document representation for many popular tools in various domains. When multiple authors collaborate to produce a document, they must be able to work in parallel and periodically merge their efforts into a single work. While there exist a small number of three-way XML merging tools, their performance could be improved in several areas and they lack any form of user interface for resolving conflicts.In this paper, we present an implementation of a three-way XML merge algorithm that is faster, uses less memory and is more precise than existing tools. It uses a specialized versioning tree data structure that supports node identity and change detection. The algorithm applies the traditional three-way merge found in GNU diff3 to the children of changed nodes. The editing operations it supports are addition, deletion, update, and move. A graphical interface for visualizing and resolving conflicts is also provided. An evaluation experiment was conducted comparing the proposed algorithm with three other tools on randomly generated XML data.

...read moreread less

20 citations

Patent•

Using node identifiers in materialized XML views and indexes to directly navigate to and within XML fragments

[...]

Beda Hammerschmidt¹, Thomas Baby¹, Zhen Hua Liu¹, Siddharth Patel¹•Institutions (1)

Business International Corporation¹

20 Jan 2010

TL;DR: In this paper, a method and apparatus for efficiently searching and navigating XML data stored in a relational database is provided for efficient searching and navigation of XML data, which includes identifying a reference address to within an XML tree index entry and storing the address in an xmltable index.

...read moreread less

Abstract: A method and apparatus is provided for efficiently searching and navigating XML data stored in a relational database. When storing a collection of XML documents, certain scalar elements may be shredded and stored in a relational table, whereas unstructured data may be stored as a CLOB or BLOB column. The approach includes identifying a reference address to within an XML tree index entry and storing the address in an xmltable index. The tree index entry allows for navigation in all axes. A path-based expression may be evaluated in the context of the reference address of the LOB. The result of the evaluation identifies another XML tree index entry containing a LOB locator used to retrieve the content from the document. The tree index, node index, and secondary function indexes are used together to enhance the performance of querying the XML data.

...read moreread less

14 citations

Patent•

Technique for compressing XML indexes

[...]

Sivasankaran Chandrasekar¹, Nipun Agarwal¹•Institutions (1)

Business International Corporation¹

01 Jun 2010

TL;DR: In this paper, column values that are to be stored for shredded XML documents are separately analyzed for a XML document to determine whether to store a particular column in column-major format or row major format, and what compression technique to use.

...read moreread less

Abstract: A database server exploits the power of compression and a form of storing relational data referred to as column-major format, to store XML documents in shredded form. The column values that are to be stored for shredded XML documents are separately analyzed for a XML document to determine whether to store a particular column in column-major format or row-major format, and what compression technique to use, if any.

...read moreread less

13 citations

Posted Content•

Mapping XML Data to Relational Data: A DOM-Based Approach

[...]

Mustafa Atay¹, Yezhou Sun¹, Dapeng Liu¹, Shiyong Lu¹, Farshad Fotouhi - Show less +1 more•Institutions (1)

Wayne State University¹

08 Oct 2010-arXiv: Databases

TL;DR: In this article, an efficient linear algorithm for mapping XML data to relational data is proposed, which can be easily adapted to other inlining algorithms and is based on our previous proposed inlining algorithm.

...read moreread less

Abstract: XML has emerged as the standard for representing and exchanging data on the World Wide Web. It is critical to have efficient mechanisms to store and query XML data to exploit the full power of this new technology. Several researchers have proposed to use relational databases to store and query XML data. While several algorithms of schema mapping and query mapping have been proposed, the problem of mapping XML data to relational data, i.e., mapping an XML INSERT statement to a sequence of SQL INSERT statements, has not been addressed thoroughly in the literature. In this paper, we propose an efficient linear algorithm for mapping XML data to relational data. This algorithm is based on our previous proposed inlining algorithm for mapping DTDs to relational schemas and can be easily adapted to other inlining algorithms.

...read moreread less

13 citations

Generation of web applications from uml models using an xml publishing framework

[...]

Andreas Kraus¹, Nora Koch¹•Institutions (1)

Ludwig Maximilian University of Munich¹

01 Jan 2010

TL;DR: A method for the semiautomatic transition from the design models of a Web application to a running implementation using the XML publishing framework Cocoon which provides a very flexible way to generate documents comprising XSLT and XSP processors.

...read moreread less

Abstract: In this paper we present a method for the semiautomatic transition from the design models of a Web application to a running implementation. The design phase consists of constructing a set of UML models such as the conceptual model, the navigation model and the presentation model. We use the UML extension mechanisms, i.e. stereotypes, tagged values and OCL constraints, thereby defining a UML Profile for the Web application domain. We show how these design models can automatically be mapped to XML documents with a structure conforming to their respective XML Schema definitions. Further on we demonstrate techniques how XML documents for the conceptual model are automatically mapped to conceptual DOM objects (Document Object Model). DOM objects corresponding to interactional objects are automatically derived from conceptual DOM objects and/or other interactional DOM objects. The XSLT mechanism serves to transform the logical presentation objects representing the user interface to physical presentation objects, e.g. HTML or WAP pages. Finally we present a production system architecture for Web applications using the XML publishing framework Cocoon which provides a very flexible way to generate documents comprising XSLT and XSP (eXtensible server pages) processors.

...read moreread less

Proceedings Article•DOI•

Sedna: native XML database management system (internals overview)

[...]

Ilya Taranov¹, Ivan Shcheklein¹, Alexander Kalinin¹, Leonid Novak¹, S. E. Kuznetsov¹, Roman Pastukhov¹, Alexander Boldakov¹, Denis Turdakov¹, Konstantin Antipin¹, Andrey Fomichev¹, Peter Pleshachkov¹, Pavel Velikhov¹, Nikolai Zavaritski¹, Maxim Grinev², Maria Grineva², Dmitry Lizorkin³ - Show less +12 more•Institutions (3)

Russian Academy of Sciences¹, ETH Zurich², Google³

06 Jun 2010

TL;DR: This work believes that the key contribution of this system is an improved schema-based clustering storage strategy efficient for both XML querying and updating, and powered by a novel memory management technique.

...read moreread less

Abstract: We present a native XML database management system, Sedna, which is implemented from scratch as a full-featured database management system for storing large amounts of XML data. We believe that the key contribution of this system is an improved schema-based clustering storage strategy efficient for both XML querying and updating, and powered by a novel memory management technique. We position our approach with respect to state-of-the-art methods.

...read moreread less

Book Chapter•DOI•

An effective object-level XML keyword search

[...]

Zhifeng Bao¹, Jiaheng Lu², Tok Wang Ling¹, Liang Xu¹, Huayu Wu¹ - Show less +1 more•Institutions (2)

National University of Singapore¹, Renmin University of China²

01 Apr 2010

TL;DR: This work proposes object-level matching semantics called Interested Single Object (ISO) and Interested Related Object (IRO) to capture single object and multiple objects as user’s search targets respectively, and design a novel relevance oriented ranking framework for the matching results.

...read moreread less

Abstract: Keyword search is widely recognized as a convenient way to retrieve information from XML data. In order to precisely meet users’ search concerns, we study how to effectively return the targets that users intend to search for. We model XML document as a set of interconnected object-trees, where each object contains a subtree to represent a concept in the real world. Based on this model, we propose object-level matching semantics called Interested Single Object (ISO) and Interested Related Object (IRO) to capture single object and multiple objects as user’s search targets respectively, and design a novel relevance oriented ranking framework for the matching results. We propose efficient algorithms to compute and rank the query results in one phase. Finally, comprehensive experiments show the efficiency and effectiveness of our approach, and an online demo of our system on DBLP data is available at http://xmldb.ddns.comp.nus.edu.sg.

...read moreread less

Journal Article•DOI•

A weighted common structure based clustering technique for XML documents

[...]

Jeong Hee Hwang¹, Keun Ho Ryu²•Institutions (2)

Namseoul University¹, Chungbuk National University²

01 Jul 2010-Journal of Systems and Software

TL;DR: A new method of XML document clustering by a global criterion function, considering the weight of common structures, is proposed, which extracts representative structures of frequent patterns from schemaless XML documents using a sequential pattern mining algorithm.

...read moreread less

Proceedings Article•

CLUX - Clustering XML Sub-trees

[...]

Stefan Böttcher¹, Rita Hartel¹, Christoph Krislin•Institutions (1)

University of Paderborn¹

01 Jan 2010

TL;DR: CluX uses a grammar for sharing similar substructures within the XML tree structure and a cluster-based heuristics for greedily selecting the best compression options in the grammar, which makes CluX a promising technique for XML data exchange whenever the exchanged data volume is a bottleneck in enterprise information systems.

...read moreread less

Abstract: XML has become the de facto standard for data exchange in enterprise information systems. But whenever XML data is stored or processed, e.g. in form of a DOM tree representation, the XML markup causes a huge blow-up of the memory consumption compared to the data, i.e., text and attribute values, contained in the XML document. In this paper, we present CluX, an XML compression approach based on clustering XML sub-trees. CluX uses a grammar for sharing similar substructures within the XML tree structure and a cluster-based heuristics for greedily selecting the best compression options in the grammar. Thereby, CluX allows for storing and exchanging XML data in a space efficient and still queryable way. We evaluate different strategies for XML structure sharing, and we show that CluX often compresses better than XMill, Gzip, and Bzip2, which makes CluX a promising technique for XML data exchange whenever the exchanged data volume is a bottleneck in enterprise information systems.

...read moreread less

A compressed self-indexed representation of XML documents.

[...]

Nieves R. Brisaboa, Ana Cerdeira-Pena, Gonzalo Navarro¹•Institutions (1)

University of Chile¹

01 Jan 2010

TL;DR: XWT permits to answer XPath queries more efficiently than using the uncompressed version of the documents and is also competitive when comparing it with inverted indexes over the XML document (if both structures use the same space).

...read moreread less

Abstract: This paper presents a structure we call XML Wavelet Tree (XWT) to represent any XML document in a compressed and self-indexed form. Therefore, any query or procedure that could be performed over the original document can be performed more efficiently over the XWT representation because it is shorter and has some indexing properties. In fact, XWT permits to answer XPath queries more efficiently than using the uncompressed version of the documents. XWT is also competitive when comparing it with inverted indexes over the XML document (if both structures use the same space).

...read moreread less

Proceedings Article•DOI•

Clustering homogeneous XML documents using weighted similarities on XML attributes

[...]

Naresh Kumar Nagwani¹, Ashok Bhansali•Institutions (1)

National Institute of Technology, Raipur¹

01 Mar 2010

TL;DR: A weighted similarity measurement approach for detecting the similarity between the homogeneous xml documents is suggested and a new clustering model is proposed that is implemened using open source technology java and validated experimentally.

...read moreread less

Abstract: XML (eXtensible Markup Language) have been adopted by number of software vendors today, it became the standard for data interchange over the web and is platform and application independent also. A XML document is consists of number of attributes like document data, structure and style sheet etc. Clustering is method of creating groups of similar objects. In this paper a weighted similarity measurement approach for detecting the similarity between the homogeneous xml documents is suggested. Using this similarity measurement a new clustering technique is also proposed. The method of calculating similarity of document's structure and styling is given by number of researchers, mostly which are based on tree edit distances. And for calculating the distance between document's contents there are number of text and other similarity techniques like cosine, jaccord, tf-idf etc. In this paper both of the similarity techniques are combined to propose a new distance measurement technique for calculating the distance between a pair of homogeneous XML documents. The proposed clustering model is implemened using open source technology java and is validated experimentally. Given a collection of XML documents distances between documents is calculated and stored in the java collections, and then these distances are used to cluster the XML documents.

...read moreread less

Proceedings Article•DOI•

XML processing by Tree-Branch symbiosis algorithm

[...]

Gong Li¹, Liu Gao-feng¹, Liu Zhong¹, An Ru-Kui¹•Institutions (1)

Naval University of Engineering¹

21 May 2010

TL;DR: A processing model for storing and building XML document in data transfer between XML and relational database, where a XML document is parsed and its elements are stored in a single table of database instead, thus leveraging the workload of DOM building to memory by the algorithm called Tree-Branch inter growth.

...read moreread less

Abstract: The processing of XML documents has been regarded as the performance bottleneck in most systems and applications. A number of techniques have been developed to improve the performance of XML processing, ranging from the schema-specific model to the streaming-based model to the hardware acceleration. These methods only address parsing and scheduling the XML document in memory [1]. Although there are a few of works have discussed the efficiency of the data read-write between XML and Relational Database, they constructed the DOM and reading relational database synchronously and neglected the differences of pace between DOM (a general format of XML document in memory) building and relational database reading[2], which will reduce the performance of the entire system. In this paper, we present a processing model for storing and building XML document in data transfer between XML and relational database. In this model, a XML document is parsed and its elements are stored in a single table of database instead, it is not necessary to read the nodes according to their hierarchical structure, thus leveraging the workload of DOM building to memory by the algorithm called Tree-Branch inter growth. To show the feasibility and effectiveness of our approaches, we present our C# implementation of XML processing in this paper. Our empirical study shows our algorithm can improve the XML document processing performance significantly.

...read moreread less

Proceedings Article•DOI•

Native XML Document Fragmentation Model

[...]

Leykun Birhanu, Solomon Atnafu, Fekade Getahun

15 Dec 2010

TL;DR: This work has focused on the vertical fragmentation design of the XML documents, and two fragmentation models have been proposed: query based fragmentation and structure-and size-based fragmentation.

...read moreread less

Abstract: As XML document is distributed across the web, it can be considered like a distributed repository of XML documents and is subjected to distribution design. However, there is no adequate works on XML document distribution design. To address the shortcomings in XML document fragmentation design, in this work, we have focused on the vertical fragmentation design of the XML documents. Two fragmentation models have been proposed: query based fragmentation and structure-and size-based fragmentation. For the query based fragmentation model, vertical fragmentation techniques are proposed using the bond energy algorithm and graphical based algorithm. We have implemented both algorithms and evaluated their performance. The performance of our fragmentation algorithms are compared with centralized and fully replicated XML documents, where better results are obtained. The Structure- and size-based Fragmentation model and its implementation algorithms are also evaluated and encouraging results are achieved.

...read moreread less

Proceedings Article•DOI•

A modulo-based labeling scheme for dynamically ordered XML trees

[...]

Raed Abdullah Al-Shaikh¹, Ghalib Hashim¹, AbdulRahman BinHuraib¹, Salahadin Mohammed²•Institutions (2)

Saudi Aramco¹, King Fahd University of Petroleum and Minerals²

05 Jul 2010

TL;DR: This paper proposes a novel modulo-based labeling scheme that uses modular arithmetic operations and numbering theory to label the XML tree, and shows that it supersedes other XML labeling schemes by having a smaller space size for the node label regardless of the fan-out or the depth of the tree.

...read moreread less

Abstract: XML is becoming the de facto standard for exchanging and querying documents over the Web. Many XML query languages such as XQuery and XPath use label paths to traverse the irregularly structured XML data. Several labeling schemes have been proposed to identify the structural relationships in the tree, as well as to support the incremental updates at a low cost. In this paper, we conduct a comprehensive survey for labeling XML trees, and classify these schemes according to their labeling mechanism. We also propose a novel modulo-based labeling scheme that uses modular arithmetic operations and numbering theory to label the XML tree. Our algorithm labels nodes in the tree in a way, similar to the encryption-decryption function using modular multiplication and a prime modulo. We show that our algorithm supersedes other XML labeling schemes by having a smaller space size for the node label regardless of the fan-out or the depth of the tree, and completely eliminates the need to re-label the whole XML tree in case of future insertions.

...read moreread less

Proceedings Article•DOI•

A Novel XML-Based Document Format with Printing Quality for Web Publishing

[...]

Ruiheng Qiu¹, Zhi Tang¹, Liangcai Gao¹, Yinyan Yu¹•Institutions (1)

Peking University¹

04 Feb 2010-Proceedings of SPIE

TL;DR: A novel XML-based document format for web publishing, called CEBX, is proposed, which has optimized document content organization, physical structure and protection scheme to support web publishing.

...read moreread less

Abstract: Although many XML-based document formats are available for printing or publishing on the Internet, none of them is well designed to support both high quality printing and web publishing. Therefore, we propose a novel XML-based document format for web publishing, called CEBX, in this paper. The proposed format is a fixed-layout document supporting high quality printing, which has optimized document content organization, physical structure and protection scheme to support web publishing. There are four noteworthy features of CEBX documents: (1) CEBX provides original fixed layout by graphic units for printing quality. (2) The content in CEBX document can be reflowed to fit the display device basing on the content blocks and additional fluid information. (3) XML Document Archiving model (XDA), the packaging model used in CEBX, supports document linearization and incremental edit well. (4) By introducing a segment-based content protection scheme into CEBX, some part of a document can be previewed directly while the remaining part is protected effectively such that readers only need to purchase partial content of a book that they are interested in. This will be very helpful to document distribution and support flexible business models such as try-beforebuy, on-demand reading, superdistribution, etc.

...read moreread less

Journal Issue•DOI•

XML data mining

[...]

Andrea Romei¹, Franco Turini¹•Institutions (1)

University of Pisa¹

01 Feb 2010-Software - Practice and Experience

TL;DR: This paper presents a project focussed on designing a general-purpose query language in support of mining XML data, and reports the results of a first bunch of experiments showing that a good trade-off between expressiveness and efficiency in XML DM is not a chimera.

...read moreread less

Abstract: With the spreading of XML sources, mining XML data can be an important objective in the near future. This paper presents a project focussed on designing a general-purpose query language in support of mining XML data. In our framework, raw data, mining models and domain knowledge are represented by way of XML documents and stored inside native XML databases. Data mining (DM) tasks are expressed in an extension of XQuery. Special attention is given to the frequent pattern discovery problem, and a way of exploiting domain-dependent optimizations and efficient data structures as deeper as possible in the extraction process is presented. We report the results of a first bunch of experiments, showing that a good trade-off between expressiveness and efficiency in XML DM is not a chimera. Copyright © 2009 John Wiley & Sons, Ltd.

...read moreread less

Book Chapter•DOI•

EXRT: towards a simple benchmark for XML readiness testing

[...]

Michael J. Carey¹, Ling Ling¹, Matthias Nicola², Lin Shao¹•Institutions (2)

University of California, Irvine¹, IBM²

13 Sep 2010

TL;DR: An effort to evaluate basic XML data management trade-offs for current commercial systems is reported on, including a simple micro-benchmark that methodically evaluates the impact of query characteristics on the comparison of shredded and native XML.

...read moreread less

Abstract: As we approach the ten-year anniversary of the first working draft of the XQuery language, one finds XML storage and query support in a number of commercial database systems. For many XML use cases, database vendors now recommend storing and indexing XML natively and using XQuery or SQL/XML to query and update XML directly. If the complexity of the XML data allows, shredding and reconstructing XML to/from relational tables is still an alternative as well, and might in fact outperform native XML processing. In this paper we report on an effort to evaluate these basic XML data management trade-offs for current commercial systems. We describe EXRT (Experimental XML Readiness Test), a simple micro-benchmark that methodically evaluates the impact of query characteristics on the comparison of shredded and native XML. We describe our experiences and preliminary results from EXRT'ing pressure on the XML data management facilities offered by two relational databases and one XML database system.

...read moreread less

Proceedings Article•DOI•

Keyword search for data-centric XML collections with long text fields

[...]

Arash Termehchy¹, Marianne Winslett¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

22 Mar 2010

TL;DR: This paper introduces an XML keyword search method that provides high precision, recall, and ranking quality for data-centric XML, even when long text fields are present, and presents algorithms to compute NTPCs efficiently.

...read moreread less

Abstract: Users who are unfamiliar with database query languages can search XML data sets using keyword queries. Current approaches for supporting such queries are either for text-centric XML, where the structure is very simple and long text fields predominate; or data-centric, where the structure is very rich. However, long text fields are becoming more common in data-centric XML, and existing approaches deliver relatively poor precision, recall, and ranking for such data sets. In this paper, we introduce an XML keyword search method that provides high precision, recall, and ranking quality for data-centric XML, even when long text fields are present. Our approach is based on a new group of structural relationships called normalized term presence correlation (NTPC). In a one-time setup phase, we compute the NTPCs for a representative DB instance, then use this information to rank candidate answers for all subsequent queries, based on each answer's structure. Our experiments with 65 user-supplied queries over two real-world XML data sets show that NTPC-based ranking is always as effective as the best previously available XML keyword search method for data-centric data sets, and provides better precision, recall, and ranking than previous approaches when long text fields are present. As the straightforward approach for computing NTPCs is too slow, we also present algorithms to compute NTPCs efficiently.

...read moreread less

Journal Article•DOI•

Transforming XML documents as schemas evolve

[...]

Marcin Kwietniewski¹, Jarek Gryz¹, Stephanie J. Hazlewood¹, Paul van Run¹•Institutions (1)

IBM¹

01 Sep 2010

TL;DR: This work has implemented an XML schema transformation toolkit within IBM Master Data Management Server (MDM) that includes an extendible schema matching algorithm that was designed with evolving XML schemas in mind and takes advantage of hierarchical structure of XML.

...read moreread less

Abstract: Database systems often use XML schema to describe the format of valid XML documents. Usually, this format is determined when the system is designed. Sometimes, in an already functioning system, a need arises to change the XML schemas. In such a situation, the system has to transform the old XML documents so that they conform to the new format and that as little information as possible is lost in the process. This process is called schema evolution.We have implemented an XML schema transformation toolkit within IBM Master Data Management Server (MDM). MDM uses XML documents to describe products that an enterprise may be offering to its clients. In this work we focus on evolving schemas rather than on integrating separate or heterogeneous data sources. Our solution includes an extendible schema matching algorithm that was designed with evolving XML schemas in mind and takes advantage of hierarchical structure of XML. It also includes a data transformation and migration method appropriate for environments where migration is performed in an abstraction layer above the DBMS. Finally, we describe a novel way of extending an XSLT editor with an XSLT visualization feature to allow the user's input and evaluation of the transformation.

...read moreread less

Patent•

Method for selecting user desirable content from web pages

[...]

Jian-Ming Jin, Li-Wei Zheng, Sukhwan Lim, Hui-Man Hou

30 Jul 2010

TL;DR: In this paper, a method for selecting user desirable content from web pages is presented, which includes receiving a web page, representing the web page as a Document Object Module (DOM) tree, computing visual and coordinate information of each DOM node within the DOM tree, determining the desirable DOM path, determining a desirable DOM node from the path, and selecting a single DOM node with the highest final score.

...read moreread less

Abstract: A method for selecting user desirable content from web pages includes receiving a web page, representing the web page as a Document Object Module (DOM) tree, computing visual and coordinate information of each Document Object Module (DOM) node within the Document Object Module (DOM) tree, determining the desirable Document Object Module (DOM) path, determining the desirable Document Object Module (DOM) node from the desirable Document Object Module (DOM) path, and selecting a single Document Object Module (DOM) node with the highest final score. The single Document Object Module (DOM) node with the highest final score is selected as the user desirable content of the webpage.

...read moreread less

Proceedings Article•DOI•

A simple approach to optimize XML Retrieval

[...]

Tanakorn Wichaiwong¹, Chuleerat Jaruskulchai¹•Institutions (1)

Kasetsart University¹

22 Nov 2010

TL;DR: Experimental results of the approach using BM25E model for retrieval large-scale XML collection with Score Sharing that allow to assign parent score by sharing score from leaf node to their parents by a Top-Down Scheme approach to improve efficiency on response time are reported.

...read moreread less

Abstract: In this paper, we report experimental results of our approach using BM25E model for retrieval large-scale XML collection, to improve the effectiveness of XML Retrieval. This model is commonly used in the information retrieval community. We propose new algorithm using Score Sharing that allow to assign parent score by sharing score from leaf node to their parents by a Top-Down Scheme approach. In order to improve efficiency on response time, The Score Sharing algorithm processing time on 10,000 leaf nodes is around 0.135 ms. per topic after getting the result list from Zettair. The Zettair is able to process on average time per topic using less than 1 second then the processing time is up to 1 second per topic and our experiment show that the BM25E with Score Sharing improve iP[0.10] by 24.40% and MAiP by 31.89% over the original BM25E. In addition, our algorithm able to handle both elements level and document level by only setting parameter.

...read moreread less

Proceedings Article•DOI•

BFilter - A XML Message Filtering and Matching Approach in Publish/Subscribe Systems

[...]

Liang Dai¹, Chung-Horng Lung¹, Shikharesh Majumdar¹•Institutions (1)

Carleton University¹

01 Dec 2010

TL;DR: BFilter is proposed, which evaluates user queries that use backward matching branch points to delay further matching processes until branch points match in the XML document and user query and has better performance than the well-known YFilter for complex queries.

...read moreread less

Abstract: In publish/subscribe systems, XML message filtering performed at application layer is an important operation for XML message multicast. As a specific case of content-based multicast in application layer, XML message multicast depends on the data filtering and matching processes and the forwarding and routing schemes. As the XML data emerges in transition, XML message filtering and matching becomes more and more desirable. BFilter, proposed in this paper, conducts the XML message filtering and matching by leveraging branch points in both the XML document and user query. It evaluates user queries that use backward matching branch points to delay further matching processes until branch points match in the XML document and user query. In this way, XML message filtering can be performed more efficiently as the probability of mismatching is reduced. A number of experiments have been conducted and the results demonstrate that BFilter has better performance than the well-known YFilter for complex queries.

...read moreread less

Patent•

Xpath evaluation in an xml repository

[...]

Bian Li¹, Yuan Li¹, Chang H. Liu¹, Xiaoyi Wang¹, Yunting Wang¹, Shuo Wu¹, Kang Xu¹ - Show less +3 more•Institutions (1)

IBM¹

27 Sep 2010

TL;DR: XPath evaluation in an XML data repository includes parsing an input Xpath query using a simple path file to generate an execution tree about the XPath query, where the simple file includes an XML file that is generated based on the hierarchical architecture of a plurality of XML files in the data repository, and the names of the nodes in the generated XML file are generated by recording the tag information of respective nodes in a plurality as mentioned in this paper.

...read moreread less

Abstract: XPath evaluation in an XML data repository includes parsing an input XPath query using a simple path file to generate an execution tree about the XPath query, where the simple path file includes an XML file that is generated based on the hierarchical architecture of a plurality of XML files in the data repository, and the names of the nodes in the generated XML file are generated by recording the tag information of respective nodes in the plurality of XML files in the data repository. Execution of an execution tree for the data repository generates a final evaluation result.

...read moreread less

Proceedings Article•DOI•

A parallel solution to XML query application

[...]

Rongxin Chen¹, Weibin Chen²•Institutions (2)

Jimei University¹, Huaqiao University²

09 Jul 2010

TL;DR: A parallel solution to XML query application through the combination of parallel XML parsing and parallel XML query is presented, which makes use of multi-core environment through parallelization of key execution stages in query process.

...read moreread less

Abstract: Since various XML query applications have come to the fore recently, performance optimization becomes the research hotspot. With the popularity of multi-core computing condition, parallelization appears as an important optimization measure. The paper presents a parallel solution to XML query application through the combination of parallel XML parsing and parallel XML query. The XML parsing is based on arbitrary XML data partition and parallel sub-tree construction with the final merging procedure. After XML parsing, the region encodings of XML data are obtained for relation matrix construction in that the XPath evaluation in query procedure is based on relation matrix. The matrix construction procedure and query primitives are parallelized to boost performance. As a whole, our solution makes use of multi-core environment through parallelization of key execution stages in query process. The key execution stages are verified by experiment and the whole effect of the solution is presented.

...read moreread less