Showing papers on "XML published in 2011"

PDF

Open Access

Extensible Messaging and Presence Protocol (XMPP): Core

[...]

01 Mar 2011

TL;DR: This document defines XMPP's core protocol methods: setup and teardown of XML streams, channel encryption, authentication, error handling, and communication primitives for messaging, network availability ("presence"), and request-response interactions.

...read moreread less

Abstract: The Extensible Messaging and Presence Protocol (XMPP) is an application profile of the Extensible Markup Language (XML) that enables the near-real-time exchange of structured yet extensible data between any two or more network entities. This document defines XMPP's core protocol methods: setup and teardown of XML streams, channel encryption, authentication, error handling, and communication primitives for messaging, network availability ("presence"), and request-response interactions. This document obsoletes RFC 3920. [STANDARDS-TRACK]

...read moreread less

795 citations

Network Configuration Protocol (NETCONF)

[...]

Rob Enns, Martin Bjorklund, Juergen Schoenwaelder

01 Jun 2011

TL;DR: The Network Configuration Protocol (NETCONF) defined in this document provides mechanisms to install, manipulate, and delete the configuration of network devices through an Extensible Markup Language (XML)-based data encoding.

...read moreread less

Abstract: The Network Configuration Protocol (NETCONF) defined in this document provides mechanisms to install, manipulate, and delete the configuration of network devices. It uses an Extensible Markup Language (XML)-based data encoding for the configuration data as well as the protocol messages. The NETCONF protocol operations are realized as remote procedure calls (RPCs). This document obsoletes RFC 4741. [STANDARDS-TRACK]

...read moreread less

486 citations

Patent•

System and Method to Publish Information from Servers to Remote Monitor Devices

[...]

Kenneth Wayne Clubb, Lyle Sutton

27 Jun 2011

TL;DR: In this article, a system and method for publishing logging and status information from the servers is provided, where a list of available servers accessible for monitoring by persons, devices, and applications via a remote monitor device can be provided.

...read moreread less

Abstract: To assist in monitoring the intelligent messaging network, a system and method for publishing logging and status information from the servers is provided A list of available servers accessible for monitoring by persons, devices, and applications via a remote monitor device can be provided The remote monitor device may forward selected servers from the list of available servers in which they are interested Also, particular information about the selected servers can be requested Access to certain servers and information may be restricted to those with authorization Authorization can be verified by the use of digital certificates The requested information can then be gathered and provided to authorized persons or devices Typically, the information includes logging and status information from the servers The information can be provided as an XML page and viewed using, for example, a standard web browser Further, if the information is provided to the remote monitor device as an XML page, a standard XML parser may be used to extract particular information from the XML page

...read moreread less

252 citations

Book•

A Declarative Approach to Business Rules in Contracts: Courteous Logic Programs in XML

[...]

Benjamin N. Grosof¹, Yannis Labrou², Hoi Y. Chan¹•Institutions (2)

IBM¹, University of Maryland, Baltimore County²

22 Aug 2011

TL;DR: In this paper, the authors present a generalization of Courteous Logic Programs (CLP) to include prioritized conflict handling, thus enabling modularity in specifying and revising rule-sets.

...read moreread less

Abstract: We address why, and especially how, to represent business rules in e-commerce contracts. By contracts, we mean descriptions of goods and services offered or sought, including ancillary agreements detailing terms of a deal. We observe that rules are useful in contracts to represent conditional relationships, e.g., in terms & conditions, service provisions, and surrounding business processes, and we illustrate this point with several examples. We analyze requirements (desiderata) for representing such rules in contracts. The requirements include: declarative semantics so as to enable shared understanding and interoperability; prioritized conflict handling so as to enable modular updating/revision; ease of parsing; integration into WWW-world software engineering; direct executability; and computational tractability. We give a representational approach that consists of two novel aspects. First, we give a new fundamental knowledge representation formalism: a generalized version of Courteous Logic Programs (CLP), which expressively extends declarative ordinary logic programs (OLP) to include prioritized conflict handling, thus enabling modularity in specifying and revising rule-sets. Our approach to implementing CLP is a courteous compiler that transCopyright ACM. To appear: Proceedings of the 1st ACM Conference on Electronic Commerce (EC99), to be held Denver, Colorado, USA, Nov. 3–5, 1999. Edited by Michael P. Wellman. New York, NY: ACM Press, 1999. http://www.ibm.com/iac/ec99/ or http://www.acm.org. forms any CLP into a semantically equivalent OLP with moderate, tractable computational overhead. Second, we give a new XML encoding of CLP, called Business Rules Markup Language (BRML), suitable for interchange between heterogeneous commercial rule languages. BRML can also express a broad subset of ANSI-draft Knowledge Interchange Format (KIF) which overlaps with CLP. Our new approach, unlike previous approaches, provides not only declarative semantics but also prioritized conflict handling, ease of parsing, and integration into WWW-world software engineering. We argue that this new approach meets the overall requirements to a greater extent than any of the previous approaches, including than KIF, the leading previous declarative approach. We have implemented both aspects of our approach; a free alpha prototype called CommonRules was released on the Web in July of 1999, at http://alphaworks.ibm.com. An extended version of this paper will be available as a forthcoming IBM Research Report (at http://www.research.ibm.com).

...read moreread less

160 citations

Proceedings Article•DOI•

Web Services for the Internet of Things through CoAP and EXI

[...]

Aldo Castellani¹, Mattia Gheda¹, Nicola Bui¹, Michele Rossi¹, Michele Zorzi¹ - Show less +1 more•Institutions (1)

University of Padua¹

05 Jun 2011

TL;DR: This paper presents the TinyOS design and implementation of two components that will play a fundamental role in the communication stack of REST-based devices, and focuses on the Constrained Application Protocol (CoAP), which allows REST--based communications among applications residing in distributed and networked embedded systems.

...read moreread less

Abstract: According to the Internet of Things (IoT) vision, everyday objects such as domestic appliances, actuators and embedded systems of any kind in the near future will be connected with each other and with the Internet. These will form a distributed network with sensing capabilities that will allow unprecedented market opportunities, spurring new services, including energy monitoring and control of homes, buildings, industrial processes and so forth. In this paper, we concentrate on the actual implementation of the communication technology, adopting the Representational State Transfer (REST) approach. REST only relies on the HTTP methods such as GET and POST. Embedded communication devices are addressed using Universal Resource Indicators (URI) and data is exchanged through standard XML. We present our TinyOS design and implementation of two components that will play a fundamental role in the communication stack of REST-based devices. First, we focus on the Constrained Application Protocol (CoAP), which allows REST--based communications among applications residing in distributed and networked embedded systems. Second, we present our lightweight implementation of a EXI library: an efficient binary compressor for XML data files. Experimental results are provided in terms of compression, decoding (EXI) and access time (CoAP) performance.

...read moreread less

150 citations

Patent•

System and method for interfacing with heterogeneous network data gathering tools

[...]

John Leslie Williams, Brian Patrick Costello, John Patrick Ravenel, Thomas Paul Walpole

06 Dec 2011

TL;DR: In this article, a prevention-based network auditing system includes a plurality of heterogeneous information sources gathering information about the network, and converts the information gathered by the information sources into a normalized data format such as, for example, into XML (Extensible Markup Language).

...read moreread less

Abstract: A prevention-based network auditing system includes a plurality of heterogeneous information sources gathering information about the network. An audit server invokes the heterogeneous information sources via a uniform communications interface to gather information about the network, and converts the information gathered by the information sources into a normalized data format such as, for example, into XML (Extensible Markup Language). The converted information is then stored in an audit repository for security and regulatory policy assessment, network vulnerability analysis, report generation, and security improvement recommendations.

...read moreread less

134 citations

Ontobee: A Linked Data Server and Browser for Ontology Terms.

[...]

Zuoshuang Xiang¹, Christopher J. Mungall², Alan Ruttenberg³, Yongqun He•Institutions (3)

University of Michigan¹, Lawrence Berkeley National Laboratory², Science Commons³

01 Jan 2011

TL;DR: Ontobee as mentioned in this paper is a web system aimed to serve as a linked data server and browser specifically targeted for ontology terms, which can be used to promote ontology sharing, interoperability, data integration, and Semantic Web applications.

...read moreread less

Abstract: The Linking Open Data (LOD) community has been working to extend the Web with a data commons by publishing various open datasets as RDF on the Web. To support this effort, we developed Ontobee (http://www.ontobee.org/), a web system aimed to serve as a linked data server and browser specifically targeted for ontology terms. Ontobee combines two basic features for one specific ontology term: (1) a user-friendly web HTML interface for displaying the details and its hierarchy of a specific ontology term; and (2) a RDF/XML source code for this ontology term corresponding to the HTML web page, which can be accessed by Semantic Web applications. Ontobee provides an efficient and publicly available method to promote ontology sharing, interoperability, data integration, and Semantic Web applications.

...read moreread less

134 citations

Proceedings Article•DOI•

An Empirical Study on Web Service Evolution

[...]

Marios Fokaefs¹, Rimon Mikhaiel¹, Nikolaos Tsantalis¹, Eleni Stroulia¹, Alex Lau² - Show less +1 more•Institutions (2)

University of Alberta¹, IBM²

04 Jul 2011

TL;DR: This work empirically study whether VTracker, the algorithm for XML differencing, can precisely recognize changes in WSDL documents by applying it to the task of comparing 18 versions of the Amazon EC2 web service and analyze the changes that occurred between the subsequent versions of various web-services.

...read moreread less

Abstract: The service-oriented architecture paradigm prescribes the development of systems through the composition of services, i.e., network-accessible components, specified by (and invoked through) their WSDL interface descriptions. Systems thus developed need to be aware of changes in, and evolve with, their constituent services. Therefore, accurate recognition of changes in the WSDL specification of a service is an essential functionality in the context of the software lifecycle of service-oriented systems. In this work, we present the results of an empirical study on WSDL evolution analysis. In the first part, we empirically study whether VTracker, our algorithm for XML differencing, can precisely recognize changes in WSDL documents by applying it to the task of comparing 18 versions of the Amazon EC2 web service. Second, we analyze the changes that occurred between the subsequent versions of various web-services and discuss their potential effects on the maintainability of service systems relying on them.

...read moreread less

106 citations

Patent•

System, method, and apparatus for storing, transmitting, receiving, and using structured data using un-structured text message bodies

[...]

Jack L. Marovets

15 Nov 2011

TL;DR: In this paper, the use of unstructured and untagged text message protocols to form a text message body that can be used to transmit and receive semi-structured, or structured text message bodies, which optionally may also use various, widely used Markup Languages.

...read moreread less

Abstract: The present invention relates to the use of unstructured and untagged text message protocols to form a text message body that can be used to transmit and receive semi-structured, or structured text message bodies, which optionally may also use various, widely used Markup Languages. The semi-structure, or structure used within the text message body can be a format, such as, but not limited to, partitioning and/or comma delimited values, etc. The tagging for use with the text message body can be a protocol, such as, but not limited to, Extensible Markup Language (XML).

...read moreread less

104 citations

Patent•

Peer-to-peer identity management interfaces and methods

[...]

Todd R. Manion¹, Robert D. Donner¹, Grigori M. Somin¹•Institutions (1)

Microsoft¹

11 Jul 2011

TL;DR: In this paper, the authors present a P2P application programming interface (API) that allows an application to create, import, export, manage, enumerate, and delete group identity information.

...read moreread less

Abstract: Peer-to-peer (P2P) application programming interfaces (APIs) that allow an application to create, import, export, manage, enumerate, and delete P2P identities are presented. Further, the management of group identity information is provided. APIs abstract away from low level credential and cryptographic functions required to create and manage P2P identities. This management includes retrieval and setting of a friendly name, generation of a cryptographic public/private key pair, retrieval of security information in the form of an XML fragment, and creation of a new name based on an existing identity.

...read moreread less

95 citations

Proceedings Article•DOI•

Improving Data Transmission in Web Applications via the Translation between XML and JSON

[...]

Guanhua Wang¹•Institutions (1)

Southeast University¹

18 Apr 2011

TL;DR: This paper gives how to add the translation method in system structures of the applications and gives how this method can improve system structure and the performance of the web applications.

...read moreread less

Abstract: This paper analyzes the form of two data serializing approaches used in web applications, XML and JSON. Considering that though both widely used, highly-efficient data transmission between these two methods is still a problem in application development. The features of these two data objects were analyzed and it was point out that how to translate correctly between these two objects. A recursive algorithm used to translate between these two types of data serializing forms was given based on the multi-tree data structure of XML and JSON objects. The efficiency of this algorithm was proved by translation experiments. When applied in web applications, this paper gives how to add the translation method in system structures of the applications. It gives how this method can improve system structure and the performance of the web applications.

...read moreread less

Journal Article•DOI•

Static analysis of Android programs

[...]

Étienne Payet¹, Fausto Spoto²•Institutions (2)

University of La Réunion¹, University of Verona²

31 Jul 2011

TL;DR: This article is a description of how the Julia system was extended, based on abstract interpretation, to run formally correct analyses of Android programs, finding bugs and flaws both in the Google samples and in the open-source programs.

...read moreread less

Abstract: Context: Android is a programming language based on Java and an operating system for embedded and mobile devices, whose upper layers are written in the Android language itself. As a language, it features an extended event-based library and dynamic inflation of graphical views from declarative XML layout files. A static analyzer for Android programs must consider such features, for correctness and precision. Objective: Our goal is to extend the Julia static analyzer, based on abstract interpretation, to perform formally correct analyses of Android programs. This article is an in-depth description of such an extension, of the difficulties that we faced and of the results that we obtained. Method: We have extended the class analysis of the Julia analyzer, which lies at the heart of many other analyses, by considering some Android key specific features such as the potential existence of many entry points to a program and the inflation of graphical views from XML through reflection. We also have significantly improved the precision of the nullness analysis on Android programs. Results: We have analyzed with Julia most of the Android sample applications by Google and a few larger open-source programs. We have applied tens of static analyses, including classcast, dead code, nullness and termination analysis. Julia has found, automatically, bugs, flaws and inefficiencies both in the Google samples and in the open-source applications. Conclusion: Julia is the first sound static analyzer for Android programs, based on a formal basis such as abstract interpretation. Our results show that it can analyze real third-party Android applications, without any user annotation of the code, yielding formally correct results in at most 7min and on standard hardware. Hence it is ready for a first industrial use.

...read moreread less

Proceedings Article•DOI•

Lightweight Transformation and Fact Extraction with the srcML Toolkit

[...]

Michael L. Collard¹, Michael John Decker¹, Jonathan I. Maletic²•Institutions (2)

University of Akron¹, Kent State University²

25 Sep 2011

TL;DR: The srcML toolkit for lightweight transformation and fact-extraction of source code is described and application use-cases are shown and demonstrated to be practical and scalable.

...read moreread less

Abstract: The srcML toolkit for lightweight transformation and fact-extraction of source code is described. srcML is an XML format for C/C++/Java source code. The open source toolkit that includes the source-to-srcML and srcML-to-source translators for round-trip reverse engineering is freely available. The direct use of XPath and XSLT is supported, an archive format for large projects is included, and a rich set of input and output formats through a command-line interface is available. Applying transformations and formulating queries using srcML is very convenient. Application use-cases of transformations and fact-extraction are shown and demonstrated to be practical and scalable.

...read moreread less

Patent•

Configuring network settings using portable storage media

[...]

Scott A. Manchester¹, Benjamin Nick¹, Trevor W. Freeman¹, Dalen M. Abraham¹•Institutions (1)

Microsoft¹

18 Apr 2011

TL;DR: A portable computer-readable media device and method of use enable automatic configuration of a computing device, such as a conventional network device or a thin client device, for operation in a network as discussed by the authors.

...read moreread less

Abstract: A portable computer-readable media device and method of use enable automatic configuration of a computing device, such as a conventional network device or a thin client device, for operation in a network. Configuration information, including network settings and security information, is incorporated in an XML file written to the portable media device while it is installed in a first device. This configuration is then automatically transferred to a second device by installing the portable media device in the second device. The second device then writes device information, incorporated in an XML file, to the portable media device, to be uploaded to the first device.

...read moreread less

Book Chapter•DOI•

The Matita interactive theorem prover

[...]

Andrea Asperti¹, Wilmer Ricciotti¹, Claudio Sacerdoti Coen¹, Enrico Tassi²•Institutions (2)

University of Bologna¹, Microsoft²

31 Jul 2011

TL;DR: An account of the whole system, its peculiarities and its main applications of Matita, a fully fledged ITP specifically designed as a light-weight, but competitive system, particularly suited for the assessment of innovative ideas.

...read moreread less

Abstract: Matita is an interactive theorem prover being developed by the Helm team at the University of Bologna. Its stable version 0.5.x may be downloaded at http://matita.cs.unibo.it. The tool originated in the European project MoWGLI as a set of XML-based tools aimed to provide a mathematician-friendly web-interface to repositories of formal mathematical knoweldge, supporting advanced content-based functionalities for querying, searching and browsing the library. It has since then evolved into a fully fledged ITP, specifically designed as a light-weight, but competitive system, particularly suited for the assessment of innovative ideas, both at foundational and logical level. In this paper, we give an account of the whole system, its peculiarities and its main applications.

...read moreread less

Book Chapter•DOI•

LegalRuleML: XML-based rules and norms

[...]

Monica Palmirani¹, Guido Governatori², Antonino Rotolo¹, Said Tabet³, Harold Boley⁴, Adrian Paschke⁵ - Show less +2 more•Institutions (5)

University of Bologna¹, NICTA², EMC Corporation³, National Research Council⁴, Free University of Berlin⁵

03 Nov 2011

TL;DR: This paper provides an extension of RuleML called LegalRuleML for fostering the characteristics of legal knowledge and to permit its full usage in legal reasoning and in the business rule domain.

...read moreread less

Abstract: Legal texts are the foundational resource where to discover rules and norms that feed into different concrete (often XML-based) Web applications. Legislative documents provide general norms and specific procedural rules for eGovernment and eCommerce environments, while contracts specify the conditions of services and business rules (e.g. service level agreements for cloud computing), and judgments provide information about the legal argumentation and interpretation of norms to concrete case-law. Such legal knowledge is an important source that should be detected, properly modeled and expressively represented in order to capture all the domain particularities. This paper provides an extension of RuleML called LegalRuleML for fostering the characteristics of legal knowledge and to permit its full usage in legal reasoning and in the business rule domain. LegalRuleML encourages the effective exchange and sharing of such semantic information between legal documents, business rules, and software applications.

...read moreread less

Book Chapter•DOI•

Integrating linked data and services with linked data services

[...]

Sebastian Speiser¹, Andreas Harth¹•Institutions (1)

Karlsruhe Institute of Technology¹

29 May 2011

TL;DR: This work proposes Linked Data Services (LIDS), a general, formalised approach for integrating dataproviding services with Linked data, a popular mechanism for data publishing which facilitates data integration and allows for decentralised publishing.

...read moreread less

Abstract: A sizable amount of data on the Web is currently available via Web APIs that expose data in formats such as JSON or XML. Combining data from different APIs and data sources requires glue code which is typically not shared and hence not reused. We propose Linked Data Services (LIDS), a general, formalised approach for integrating dataproviding services with Linked Data, a popular mechanism for data publishing which facilitates data integration and allows for decentralised publishing. We present conventions for service access interfaces that conform to Linked Data principles, and an abstract lightweight service description formalism. We develop algorithms that use LIDS descriptions to automatically create links between services and existing data sets. To evaluate our approach, we realise LIDS wrappers and LIDS descriptions for existing services and measure performance and effectiveness of an automatic interlinking algorithm over multiple billions of triples.

...read moreread less

Journal Article•DOI•

Software Code Generation for the RVC-CAL Language

[...]

Matthieu Wipliez¹, Ghislain Roquier², Jean-Francois Nezan¹•Institutions (2)

Intelligence and National Security Alliance¹, École Polytechnique Fédérale de Lausanne²

01 May 2011

TL;DR: A synthesis tool that from a CAL dataflow program generates C code and an associated SystemC model is presented, validated against the original CAL description simulated using the Open Dataflow environment.

...read moreread less

Abstract: The MPEG Reconfigurable Video Coding (RVC) framework is a new standard under development by MPEG that aims at providing a unified high-level specification of current and future MPEG video coding technologies using dataflow models. In this framework, a decoder is built as a configuration of video coding modules taken from the standard MPEG toolbox library or proprietary libraries. The elements of the library are specified by a textual description that expresses the I/O behavior of each module and by a reference software written using a subset of the CAL Actor Language named RVC-CAL. A decoder configuration is written in an XML dialect by connecting a set of CAL modules. Code generators are fundamental supports that enable the direct transformation of a high level specification to efficient hardware and software implementations. This paper presents a synthesis tool that from a CAL dataflow program generates C code and an associated SystemC model. The generated code is validated against the original CAL description simulated using the Open Dataflow environment. Experimental results of the translation of two descriptions of an MPEG-4 Simple Profile decoder with different granularities are shown and discussed.

...read moreread less

Book•

Camel in Action

[...]

Claus Ibsen, Jonathan Anstey

11 Jan 2011

TL;DR: This book distills the developers of Camel's experience and practical insights so that they can tackle integration tasks like a pro and show you how to work with the integration patterns.

...read moreread less

Abstract: Apache Camel is a Java framework that lets you implement the standard enterprise integration patterns in a few lines of code. With a concise but sophisticated DSL you snap integration logic into your app, Lego-style, using Java, XML, or Scala. Camel supports over 80 common transports such as HTTP, REST, JMS, and Web Services. Camel in Action is a Camel tutorial full of small examples showing how to work with the integration patterns. It starts with core concepts like sending, receiving, routing, and transforming data. It then shows you the entire lifecycle and goes in depth on how to test, deal with errors, scale, deploy, and even monitor your app details you can find only in the Camel code itself. Written by the developers of Camel, this book distills their experience and practical insights so that you can tackle integration tasks like a pro. Whats Inside Valuable examples in Java and XML Explanations of complex patterns Error handling, testing, deploying, managing, and running Camel Accessible to beginners, useful to experts

...read moreread less

Journal Article•DOI•

XML document classification based on ELM

[...]

Xiangguo Zhao¹, Xiangguo Zhao², Guoren Wang¹, Guoren Wang², Xin Bi¹, Xin Bi², Peizhen Gong², Peizhen Gong¹, Yuhai Zhao¹, Yuhai Zhao² - Show less +6 more•Institutions (2)

Northeastern University (China)¹, Chinese Ministry of Education²

01 Sep 2011-Neurocomputing

TL;DR: The experiments conducted on real world classification problems demonstrate that the voting-ELM classifiers presented in this paper can achieve better performance than ELM algorithms with respect to precision, recall and F-measure.

...read moreread less

Journal Article•DOI•

Adaptive informatics for multifactorial and high-content biological data

[...]

Bjorn Millard¹, Mario Niepel¹, Michael P. Menden¹, Jeremy L. Muhlich¹, Peter K. Sorger¹, Peter K. Sorger² - Show less +2 more•Institutions (2)

Harvard University¹, Massachusetts Institute of Technology²

01 Jun 2011-Nature Methods

TL;DR: An adaptive approach to managing experimental data based on semantically typed data hypercubes (SDCubes) that combine hierarchical data format 5 (HDF5) and extensible markup language (XML) file types is described.

...read moreread less

Abstract: Whereas genomic data are universally machine-readable, data from imaging, multiplex biochemistry, flow cytometry and other cell- and tissue-based assays usually reside in loosely organized files of poorly documented provenance. This arises because the relational databases used in genomic research are difficult to adapt to rapidly evolving experimental designs, data formats and analytic algorithms. Here we describe an adaptive approach to managing experimental data based on semantically typed data hypercubes (SDCubes) that combine hierarchical data format 5 (HDF5) and extensible markup language (XML) file types. We demonstrate the application of SDCube-based storage using ImageRail, a software package for high-throughput microscopy. Experimental design and its day-to-day evolution, not rigid standards, determine how ImageRail data are organized in SDCubes. We applied ImageRail to collect and analyze drug dose-response landscapes in human cell lines at single-cell resolution.

...read moreread less

Patent•

Method and system for obtaining script related information for website crawling

[...]

Craig Conboy, Darcy Steven Chomeyko¹, Derek Lawrence Ross McDougall, Constantine Grancharov, Andrew Rolleston, Duncan Smith - Show less +2 more•Institutions (1)

IBM¹

23 Mar 2011

TL;DR: In this article, a web crawler system has an automatic website crawler and a virtual browser that provides script related information to the crawler, and the virtual browser transforms an HTML document included in a web page of the website into an XML document and builds a document object model containing document objects in a tree structure based on the XML document.

...read moreread less

Abstract: A web crawler system has an automatic website crawler and a virtual browser that provides script related information to the website crawler. The virtual browser transforms an HTML document included in a web page of the website into an XML document, and builds a document object model containing document objects in a tree structure based on the XML document. The virtual browser extracts from the DOM scripts that are potentially executable, and executes the extracted scripts using a browser object model provided for the virtual browser containing objects and methods and properties that are used for script execution so as to capture script related information generated by execution of the scripts.

...read moreread less

Journal Article•DOI•

Extended XML Tree Pattern Matching: Theories and Algorithms

[...]

Jiaheng Lu¹, Tok Wang Ling², Zhifeng Bao², Chen Wang³•Institutions (3)

Renmin University of China¹, National University of Singapore², IBM³

01 Mar 2011-IEEE Transactions on Knowledge and Data Engineering

TL;DR: A theoretical framework about “matching cross” is established which demonstrates the intrinsic reason in the proof of optimality on holistic algorithms and a set of novel algorithms to efficiently process three categories of extended XML tree patterns are proposed.

...read moreread less

Abstract: As business and enterprises generate and exchange XML data more often, there is an increasing need for efficient processing of queries on XML data. Searching for the occurrences of a tree pattern query in an XML database is a core operation in XML query processing. Prior works demonstrate that holistic twig pattern matching algorithm is an efficient technique to answer an XML tree pattern with parent-child (P-C) and ancestor-descendant (A-D) relationships, as it can effectively control the size of intermediate results during query processing. However, XML query languages (e.g., XPath and XQuery) define more axes and functions such as negation function, order-based axis, and wildcards. In this paper, we research a large set of XML tree pattern, called extended XML tree pattern, which may include P-C, A-D relationships, negation functions, wildcards, and order restriction. We establish a theoretical framework about “matching cross” which demonstrates the intrinsic reason in the proof of optimality on holistic algorithms. Based on our theorems, we propose a set of novel algorithms to efficiently process three categories of extended XML tree patterns. A set of experimental results on both real-life and synthetic data sets demonstrate the effectiveness and efficiency of our proposed theories and algorithms.

...read moreread less

Book Chapter•DOI•

The BINCOA framework for binary code analysis

[...]

Sébastien Bardin, Philippe Herrmann, Jérôme Leroux¹, Olivier Ly¹, Renaud Tabary¹, Aymeric Vincent¹ - Show less +2 more•Institutions (1)

L'Abri¹

14 Jul 2011

TL;DR: The BINCOA framework is presented, whose goal is to ease the development of binary code analysers by providing an open formal model for low-level programs, an XML format for easy exchange of models and some basic tool support.

...read moreread less

Abstract: This paper presents the BINCOA framework, whose goal is to ease the development of binary code analysers by providing an open formal model for low-level programs (typically: executable files), an XML format for easy exchange of models and some basic tool support. The BINCOA framework already comes with three different analysers, including simulation, test generation and Control-Flow Graph reconstruction.

...read moreread less

Book Chapter•DOI•

imzML: Imaging Mass Spectrometry Markup Language: A common data format for mass spectrometry imaging.

[...]

Andreas Römpp¹, Thorsten Schramm¹, Alfons Hester¹, Ivo Klinkert², Jean-Pierre Both³, Ron M. A. Heeren², Markus Stöckli⁴, Bernhard Spengler¹ - Show less +4 more•Institutions (4)

University of Giessen¹, Fundamental Research on Matter Institute for Atomic and Molecular Physics², French Alternative Energies and Atomic Energy Commission³, Novartis⁴

01 Jan 2011-Methods of Molecular Biology

TL;DR: The data format imzML was developed to allow the flexible and efficient exchange of MS imaging data between different instruments and data analysis software and was extended to include specific parameters of imaging mass spectrometry.

...read moreread less

Abstract: Imaging mass spectrometry is the method of scanning a sample of interest and generating an "image" of the intensity distribution of a specific analyte. The data sets consist of a large number of mass spectra which are usually acquired with identical settings. Existing data formats are not sufficient to describe an MS imaging experiment completely. The data format imzML was developed to allow the flexible and efficient exchange of MS imaging data between different instruments and data analysis software.For this purpose, the MS imaging data is divided in two separate files. The mass spectral data is stored in a binary file to ensure efficient storage. All metadata (e.g., instrumental parameters, sample details) are stored in an XML file which is based on the standard data format mzML developed by HUPO-PSI. The original mzML controlled vocabulary was extended to include specific parameters of imaging mass spectrometry (such as x/y position and spatial resolution). The two files (XML and binary) are connected by offset values in the XML file and are unambiguously linked by a universally unique identifier. The resulting datasets are comparable in size to the raw data and the separate metadata file allows flexible handling of large datasets.Several imaging MS software tools already support imzML. This allows choosing from a (growing) number of processing tools. One is no longer limited to proprietary software, but is able to use the processing software which is best suited for a specific question or application. On the other hand, measurements from different instruments can be compared within one software application using identical settings for data processing. All necessary information for evaluating and implementing imzML can be found at http://www.imzML.org .

...read moreread less

Journal Article•DOI•

TEXT: Automatic Template Extraction from Heterogeneous Web Pages

[...]

Chulyun Kim¹, Kyuseok Shim¹•Institutions (1)

Seoul National University¹

01 Apr 2011-IEEE Transactions on Knowledge and Data Engineering

TL;DR: Novel algorithms for extracting templates from a large number of web documents which are generated from heterogeneous templates are presented and a novel goodness measure with its fast approximation for clustering and comprehensive analysis of the algorithm are provided.

...read moreread less

Abstract: World Wide Web is the most useful source of information. In order to achieve high productivity of publishing, the webpages in many websites are automatically populated by using the common templates with contents. The templates provide readers easy access to the contents guided by consistent structures. However, for machines, the templates are considered harmful since they degrade the accuracy and performance of web applications due to irrelevant terms in templates. Thus, template detection techniques have received a lot of attention recently to improve the performance of search engines, clustering, and classification of web documents. In this paper, we present novel algorithms for extracting templates from a large number of web documents which are generated from heterogeneous templates. We cluster the web documents based on the similarity of underlying template structures in the documents so that the template for each cluster is extracted simultaneously. We develop a novel goodness measure with its fast approximation for clustering and provide comprehensive analysis of our algorithm. Our experimental results with real-life data sets confirm the effectiveness and robustness of our algorithm compared to the state of the art for template detection algorithms.

...read moreread less

Journal Article•DOI•

Letter to the Editor: SeqXML and OrthoXML: standards for sequence and orthology information

[...]

Thomas Schmitt, David N. Messina, Fabian Schreiber, Erik L. L. Sonnhammer

01 Sep 2011-Briefings in Bioinformatics

TL;DR: Two XML-based formats, SeqXML and OrthoXML, are designed to represent ortholog assignments from any source in a consistent and structured way, yet cater to specific needs such as scoring schemes or meta-information.

...read moreread less

Abstract: There is a great need for standards in the orthology field. Users must contend with different ortholog data representations from each provider, and the providers themselves must independently gather and parse the input sequence data. These burdensome and redundant procedures make data comparison and integration difficult. We have designed two XML-based formats, SeqXML and OrthoXML, to solve these problems. SeqXML is a lightweight format for sequence recordscthe input for orthology prediction. It stores the same sequence and metadata as typical FASTA format records, but overcomes common problems such as unstructured metadata in the header and erroneous sequence content. XML provides validation to prevent data integrity problems that are frequent in FASTA files. The range of applications for SeqXML is broad and not limited to ortholog prediction. We provide read/write functions for BioJava, BioPerl, and Biopython. OrthoXML was designed to represent ortholog assignments from any source in a consistent and structured way, yet cater to specific needs such as scoring schemes or meta-information. A unified format is particularly valuable for ortholog consumers that want to integrate data from numerous resources, e.g. for gene annotation projects. Reference proteomes for 61 organisms are already available in SeqXML, and 10 orthology databases have signed on to OrthoXML. Adoption by the entire field would substantially facilitate exchange and quality control of sequence and orthology information.

...read moreread less

Proceedings Article•DOI•

Top-k keyword search over probabilistic XML data

[...]

Jianxin Li¹, Chengfei Liu¹, Rui Zhou¹, Wei Wang²•Institutions (2)

Swinburne University of Technology¹, University of New South Wales²

11 Apr 2011

TL;DR: This paper defines the new problem of studying top-k keyword search over probabilistic XML data, which is to retrieve k SLCA results with the k highest probabilities of existence, and proposes two efficient algorithms which can improve the efficiency.

...read moreread less

Abstract: Despite the proliferation of work on XML keyword query, it remains open to support keyword query over probabilistic XML data. Compared with traditional keyword search, it is far more expensive to answer a keyword query over probabilistic XML data due to the consideration of possible world semantics. In this paper, we firstly define the new problem of studying top-k keyword search over probabilistic XML data, which is to retrieve k SLCA results with the k highest probabilities of existence. And then we propose two efficient algorithms. The first algorithm PrStack can find k SLCA results with the k highest probabilities by scanning the relevant keyword nodes only once. To further improve the efficiency, we propose a second algorithm EagerTopK based on a set of pruning properties which can quickly prune unsatisfied SLCA candidates. Finally, we implement the two algorithms and compare their performance with analysis of extensive experimental results.

...read moreread less

Journal Article•DOI•

Interoperability of Medical Device Information and the Clinical Applications: An HL7 RMIM based on the ISO/IEEE 11073 DIM

[...]

Mustafa Yuksel, Asuman Dogac

01 Jul 2011

TL;DR: The ISO/IEEE 11073 DIM is used to derive an HL7 v3 Refined Message Information Model (RMIM) of the medical device domain from the HL8 v3 Reference Information Mode (RIM), which makes it possible to trace themedical device data back to a standard common denominator, that is, HL7v3 RIM from which all the other medical domains under HL7V3 are derived.

...read moreread less

Abstract: Medical devices are essential to the practice of modern healthcare services. Their benefits will increase if clinical software applications can seamlessly acquire the medical device data. The need to represent medical device observations in a format that can be consumable by clinical applications has already been recognized by the industry. Yet, the solutions proposed involve bilateral mappings from the ISO/IEEE 11073 Domain Information Model (DIM) to specific message or document standards. Considering that there are many different types of clinical applications such as the electronic health record and the personal health record systems, the clinical workflows, and the clinical decision support systems each conforming to different standard interfaces, detailing a mapping mechanism for every one of them introduces significant work and, thus, limits the potential health benefits of medical devices. In this paper, to facilitate the interoperability of clinical applications and the medical device data, we use the ISO/IEEE 11073 DIM to derive an HL7 v3 Refined Message Information Model (RMIM) of the medical device domain from the HL7 v3 Reference Information Mode (RIM). This makes it possible to trace the medical device data back to a standard common denominator, that is, HL7 v3 RIM from which all the other medical domains under HL7 v3 are derived. Hence, once the medical device data are obtained in the RMIM format, it can easily be transformed into HL7-based standard interfaces through XML transformations because these interfaces all have their building blocks from the same RIM. To demonstrate this, we provide the mappings from the developed RMIM to some of the widely used HL7 v3-based standard interfaces.

...read moreread less

Journal Article•DOI•

Query-preserving watermarking of relational databases and Xml documents

[...]

David Gross-Amblard¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

18 Mar 2011-ACM Transactions on Database Systems

TL;DR: It is shown that unrestricted databases can not be watermarked while preserving trivial parametric queries, and query languages and classes of structures that allow guaranteed watermarking capacity are exhibited, namely local query languages on structures with bounded degree Gaifman graph, and monadic second-order queries on trees or treelike structures.

...read moreread less

Abstract: Watermarking allows robust and unobtrusive insertion of information in a digital document. During the last few years, techniques have been proposed for watermarking relational databases or Xml documents, where information insertion must preserve a specific measure on data (for example the mean and variance of numerical attributes).In this article we investigate the problem of watermarking databases or Xml while preserving a set of parametric queries in a specified language, up to an acceptable distortion. We first show that unrestricted databases can not be watermarked while preserving trivial parametric queries. We then exhibit query languages and classes of structures that allow guaranteed watermarking capacity, namely 1) local query languages on structures with bounded degree Gaifman graph, and 2) monadic second-order queries on trees or treelike structures. We relate these results to an important topic in computational learning theory, the VC-dimension. We finally consider incremental aspects of query-preserving watermarking.

...read moreread less

Collapse