scispace - formally typeset
Search or ask a question

Showing papers on "XML published in 2011"


01 Mar 2011
TL;DR: This document defines XMPP's core protocol methods: setup and teardown of XML streams, channel encryption, authentication, error handling, and communication primitives for messaging, network availability ("presence"), and request-response interactions.
Abstract: The Extensible Messaging and Presence Protocol (XMPP) is an application profile of the Extensible Markup Language (XML) that enables the near-real-time exchange of structured yet extensible data between any two or more network entities. This document defines XMPP's core protocol methods: setup and teardown of XML streams, channel encryption, authentication, error handling, and communication primitives for messaging, network availability ("presence"), and request-response interactions. This document obsoletes RFC 3920. [STANDARDS-TRACK]

795 citations


01 Jun 2011
TL;DR: The Network Configuration Protocol (NETCONF) defined in this document provides mechanisms to install, manipulate, and delete the configuration of network devices through an Extensible Markup Language (XML)-based data encoding.
Abstract: The Network Configuration Protocol (NETCONF) defined in this document provides mechanisms to install, manipulate, and delete the configuration of network devices. It uses an Extensible Markup Language (XML)-based data encoding for the configuration data as well as the protocol messages. The NETCONF protocol operations are realized as remote procedure calls (RPCs). This document obsoletes RFC 4741. [STANDARDS-TRACK]

486 citations


Patent
27 Jun 2011
TL;DR: In this article, a system and method for publishing logging and status information from the servers is provided, where a list of available servers accessible for monitoring by persons, devices, and applications via a remote monitor device can be provided.
Abstract: To assist in monitoring the intelligent messaging network, a system and method for publishing logging and status information from the servers is provided A list of available servers accessible for monitoring by persons, devices, and applications via a remote monitor device can be provided The remote monitor device may forward selected servers from the list of available servers in which they are interested Also, particular information about the selected servers can be requested Access to certain servers and information may be restricted to those with authorization Authorization can be verified by the use of digital certificates The requested information can then be gathered and provided to authorized persons or devices Typically, the information includes logging and status information from the servers The information can be provided as an XML page and viewed using, for example, a standard web browser Further, if the information is provided to the remote monitor device as an XML page, a standard XML parser may be used to extract particular information from the XML page

252 citations


Book
22 Aug 2011
TL;DR: In this paper, the authors present a generalization of Courteous Logic Programs (CLP) to include prioritized conflict handling, thus enabling modularity in specifying and revising rule-sets.
Abstract: We address why, and especially how, to represent business rules in e-commerce contracts. By contracts, we mean descriptions of goods and services offered or sought, including ancillary agreements detailing terms of a deal. We observe that rules are useful in contracts to represent conditional relationships, e.g., in terms & conditions, service provisions, and surrounding business processes, and we illustrate this point with several examples. We analyze requirements (desiderata) for representing such rules in contracts. The requirements include: declarative semantics so as to enable shared understanding and interoperability; prioritized conflict handling so as to enable modular updating/revision; ease of parsing; integration into WWW-world software engineering; direct executability; and computational tractability. We give a representational approach that consists of two novel aspects. First, we give a new fundamental knowledge representation formalism: a generalized version of Courteous Logic Programs (CLP), which expressively extends declarative ordinary logic programs (OLP) to include prioritized conflict handling, thus enabling modularity in specifying and revising rule-sets. Our approach to implementing CLP is a courteous compiler that transCopyright ACM. To appear: Proceedings of the 1st ACM Conference on Electronic Commerce (EC99), to be held Denver, Colorado, USA, Nov. 3–5, 1999. Edited by Michael P. Wellman. New York, NY: ACM Press, 1999. http://www.ibm.com/iac/ec99/ or http://www.acm.org. forms any CLP into a semantically equivalent OLP with moderate, tractable computational overhead. Second, we give a new XML encoding of CLP, called Business Rules Markup Language (BRML), suitable for interchange between heterogeneous commercial rule languages. BRML can also express a broad subset of ANSI-draft Knowledge Interchange Format (KIF) which overlaps with CLP. Our new approach, unlike previous approaches, provides not only declarative semantics but also prioritized conflict handling, ease of parsing, and integration into WWW-world software engineering. We argue that this new approach meets the overall requirements to a greater extent than any of the previous approaches, including than KIF, the leading previous declarative approach. We have implemented both aspects of our approach; a free alpha prototype called CommonRules was released on the Web in July of 1999, at http://alphaworks.ibm.com. An extended version of this paper will be available as a forthcoming IBM Research Report (at http://www.research.ibm.com).

160 citations


Proceedings ArticleDOI
Aldo Castellani1, Mattia Gheda1, Nicola Bui1, Michele Rossi1, Michele Zorzi1 
05 Jun 2011
TL;DR: This paper presents the TinyOS design and implementation of two components that will play a fundamental role in the communication stack of REST-based devices, and focuses on the Constrained Application Protocol (CoAP), which allows REST--based communications among applications residing in distributed and networked embedded systems.
Abstract: According to the Internet of Things (IoT) vision, everyday objects such as domestic appliances, actuators and embedded systems of any kind in the near future will be connected with each other and with the Internet. These will form a distributed network with sensing capabilities that will allow unprecedented market opportunities, spurring new services, including energy monitoring and control of homes, buildings, industrial processes and so forth. In this paper, we concentrate on the actual implementation of the communication technology, adopting the Representational State Transfer (REST) approach. REST only relies on the HTTP methods such as GET and POST. Embedded communication devices are addressed using Universal Resource Indicators (URI) and data is exchanged through standard XML. We present our TinyOS design and implementation of two components that will play a fundamental role in the communication stack of REST-based devices. First, we focus on the Constrained Application Protocol (CoAP), which allows REST--based communications among applications residing in distributed and networked embedded systems. Second, we present our lightweight implementation of a EXI library: an efficient binary compressor for XML data files. Experimental results are provided in terms of compression, decoding (EXI) and access time (CoAP) performance.

150 citations


Patent
06 Dec 2011
TL;DR: In this article, a prevention-based network auditing system includes a plurality of heterogeneous information sources gathering information about the network, and converts the information gathered by the information sources into a normalized data format such as, for example, into XML (Extensible Markup Language).
Abstract: A prevention-based network auditing system includes a plurality of heterogeneous information sources gathering information about the network. An audit server invokes the heterogeneous information sources via a uniform communications interface to gather information about the network, and converts the information gathered by the information sources into a normalized data format such as, for example, into XML (Extensible Markup Language). The converted information is then stored in an audit repository for security and regulatory policy assessment, network vulnerability analysis, report generation, and security improvement recommendations.

134 citations


01 Jan 2011
TL;DR: Ontobee as mentioned in this paper is a web system aimed to serve as a linked data server and browser specifically targeted for ontology terms, which can be used to promote ontology sharing, interoperability, data integration, and Semantic Web applications.
Abstract: The Linking Open Data (LOD) community has been working to extend the Web with a data commons by publishing various open datasets as RDF on the Web. To support this effort, we developed Ontobee (http://www.ontobee.org/), a web system aimed to serve as a linked data server and browser specifically targeted for ontology terms. Ontobee combines two basic features for one specific ontology term: (1) a user-friendly web HTML interface for displaying the details and its hierarchy of a specific ontology term; and (2) a RDF/XML source code for this ontology term corresponding to the HTML web page, which can be accessed by Semantic Web applications. Ontobee provides an efficient and publicly available method to promote ontology sharing, interoperability, data integration, and Semantic Web applications.

134 citations


Proceedings ArticleDOI
04 Jul 2011
TL;DR: This work empirically study whether VTracker, the algorithm for XML differencing, can precisely recognize changes in WSDL documents by applying it to the task of comparing 18 versions of the Amazon EC2 web service and analyze the changes that occurred between the subsequent versions of various web-services.
Abstract: The service-oriented architecture paradigm prescribes the development of systems through the composition of services, i.e., network-accessible components, specified by (and invoked through) their WSDL interface descriptions. Systems thus developed need to be aware of changes in, and evolve with, their constituent services. Therefore, accurate recognition of changes in the WSDL specification of a service is an essential functionality in the context of the software lifecycle of service-oriented systems. In this work, we present the results of an empirical study on WSDL evolution analysis. In the first part, we empirically study whether VTracker, our algorithm for XML differencing, can precisely recognize changes in WSDL documents by applying it to the task of comparing 18 versions of the Amazon EC2 web service. Second, we analyze the changes that occurred between the subsequent versions of various web-services and discuss their potential effects on the maintainability of service systems relying on them.

106 citations


Patent
15 Nov 2011
TL;DR: In this paper, the use of unstructured and untagged text message protocols to form a text message body that can be used to transmit and receive semi-structured, or structured text message bodies, which optionally may also use various, widely used Markup Languages.
Abstract: The present invention relates to the use of unstructured and untagged text message protocols to form a text message body that can be used to transmit and receive semi-structured, or structured text message bodies, which optionally may also use various, widely used Markup Languages. The semi-structure, or structure used within the text message body can be a format, such as, but not limited to, partitioning and/or comma delimited values, etc. The tagging for use with the text message body can be a protocol, such as, but not limited to, Extensible Markup Language (XML).

104 citations


Patent
11 Jul 2011
TL;DR: In this paper, the authors present a P2P application programming interface (API) that allows an application to create, import, export, manage, enumerate, and delete group identity information.
Abstract: Peer-to-peer (P2P) application programming interfaces (APIs) that allow an application to create, import, export, manage, enumerate, and delete P2P identities are presented. Further, the management of group identity information is provided. APIs abstract away from low level credential and cryptographic functions required to create and manage P2P identities. This management includes retrieval and setting of a friendly name, generation of a cryptographic public/private key pair, retrieval of security information in the form of an XML fragment, and creation of a new name based on an existing identity.

95 citations


Proceedings ArticleDOI
18 Apr 2011
TL;DR: This paper gives how to add the translation method in system structures of the applications and gives how this method can improve system structure and the performance of the web applications.
Abstract: This paper analyzes the form of two data serializing approaches used in web applications, XML and JSON. Considering that though both widely used, highly-efficient data transmission between these two methods is still a problem in application development. The features of these two data objects were analyzed and it was point out that how to translate correctly between these two objects. A recursive algorithm used to translate between these two types of data serializing forms was given based on the multi-tree data structure of XML and JSON objects. The efficiency of this algorithm was proved by translation experiments. When applied in web applications, this paper gives how to add the translation method in system structures of the applications. It gives how this method can improve system structure and the performance of the web applications.

Journal ArticleDOI
31 Jul 2011
TL;DR: This article is a description of how the Julia system was extended, based on abstract interpretation, to run formally correct analyses of Android programs, finding bugs and flaws both in the Google samples and in the open-source programs.
Abstract: Context: Android is a programming language based on Java and an operating system for embedded and mobile devices, whose upper layers are written in the Android language itself. As a language, it features an extended event-based library and dynamic inflation of graphical views from declarative XML layout files. A static analyzer for Android programs must consider such features, for correctness and precision. Objective: Our goal is to extend the Julia static analyzer, based on abstract interpretation, to perform formally correct analyses of Android programs. This article is an in-depth description of such an extension, of the difficulties that we faced and of the results that we obtained. Method: We have extended the class analysis of the Julia analyzer, which lies at the heart of many other analyses, by considering some Android key specific features such as the potential existence of many entry points to a program and the inflation of graphical views from XML through reflection. We also have significantly improved the precision of the nullness analysis on Android programs. Results: We have analyzed with Julia most of the Android sample applications by Google and a few larger open-source programs. We have applied tens of static analyses, including classcast, dead code, nullness and termination analysis. Julia has found, automatically, bugs, flaws and inefficiencies both in the Google samples and in the open-source applications. Conclusion: Julia is the first sound static analyzer for Android programs, based on a formal basis such as abstract interpretation. Our results show that it can analyze real third-party Android applications, without any user annotation of the code, yielding formally correct results in at most 7min and on standard hardware. Hence it is ready for a first industrial use.

Proceedings ArticleDOI
25 Sep 2011
TL;DR: The srcML toolkit for lightweight transformation and fact-extraction of source code is described and application use-cases are shown and demonstrated to be practical and scalable.
Abstract: The srcML toolkit for lightweight transformation and fact-extraction of source code is described. srcML is an XML format for C/C++/Java source code. The open source toolkit that includes the source-to-srcML and srcML-to-source translators for round-trip reverse engineering is freely available. The direct use of XPath and XSLT is supported, an archive format for large projects is included, and a rich set of input and output formats through a command-line interface is available. Applying transformations and formulating queries using srcML is very convenient. Application use-cases of transformations and fact-extraction are shown and demonstrated to be practical and scalable.

Patent
18 Apr 2011
TL;DR: A portable computer-readable media device and method of use enable automatic configuration of a computing device, such as a conventional network device or a thin client device, for operation in a network as discussed by the authors.
Abstract: A portable computer-readable media device and method of use enable automatic configuration of a computing device, such as a conventional network device or a thin client device, for operation in a network. Configuration information, including network settings and security information, is incorporated in an XML file written to the portable media device while it is installed in a first device. This configuration is then automatically transferred to a second device by installing the portable media device in the second device. The second device then writes device information, incorporated in an XML file, to the portable media device, to be uploaded to the first device.

Book ChapterDOI
31 Jul 2011
TL;DR: An account of the whole system, its peculiarities and its main applications of Matita, a fully fledged ITP specifically designed as a light-weight, but competitive system, particularly suited for the assessment of innovative ideas.
Abstract: Matita is an interactive theorem prover being developed by the Helm team at the University of Bologna. Its stable version 0.5.x may be downloaded at http://matita.cs.unibo.it. The tool originated in the European project MoWGLI as a set of XML-based tools aimed to provide a mathematician-friendly web-interface to repositories of formal mathematical knoweldge, supporting advanced content-based functionalities for querying, searching and browsing the library. It has since then evolved into a fully fledged ITP, specifically designed as a light-weight, but competitive system, particularly suited for the assessment of innovative ideas, both at foundational and logical level. In this paper, we give an account of the whole system, its peculiarities and its main applications.

Book ChapterDOI
03 Nov 2011
TL;DR: This paper provides an extension of RuleML called LegalRuleML for fostering the characteristics of legal knowledge and to permit its full usage in legal reasoning and in the business rule domain.
Abstract: Legal texts are the foundational resource where to discover rules and norms that feed into different concrete (often XML-based) Web applications. Legislative documents provide general norms and specific procedural rules for eGovernment and eCommerce environments, while contracts specify the conditions of services and business rules (e.g. service level agreements for cloud computing), and judgments provide information about the legal argumentation and interpretation of norms to concrete case-law. Such legal knowledge is an important source that should be detected, properly modeled and expressively represented in order to capture all the domain particularities. This paper provides an extension of RuleML called LegalRuleML for fostering the characteristics of legal knowledge and to permit its full usage in legal reasoning and in the business rule domain. LegalRuleML encourages the effective exchange and sharing of such semantic information between legal documents, business rules, and software applications.

Book ChapterDOI
29 May 2011
TL;DR: This work proposes Linked Data Services (LIDS), a general, formalised approach for integrating dataproviding services with Linked data, a popular mechanism for data publishing which facilitates data integration and allows for decentralised publishing.
Abstract: A sizable amount of data on the Web is currently available via Web APIs that expose data in formats such as JSON or XML. Combining data from different APIs and data sources requires glue code which is typically not shared and hence not reused. We propose Linked Data Services (LIDS), a general, formalised approach for integrating dataproviding services with Linked Data, a popular mechanism for data publishing which facilitates data integration and allows for decentralised publishing. We present conventions for service access interfaces that conform to Linked Data principles, and an abstract lightweight service description formalism. We develop algorithms that use LIDS descriptions to automatically create links between services and existing data sets. To evaluate our approach, we realise LIDS wrappers and LIDS descriptions for existing services and measure performance and effectiveness of an automatic interlinking algorithm over multiple billions of triples.

Journal ArticleDOI
01 May 2011
TL;DR: A synthesis tool that from a CAL dataflow program generates C code and an associated SystemC model is presented, validated against the original CAL description simulated using the Open Dataflow environment.
Abstract: The MPEG Reconfigurable Video Coding (RVC) framework is a new standard under development by MPEG that aims at providing a unified high-level specification of current and future MPEG video coding technologies using dataflow models. In this framework, a decoder is built as a configuration of video coding modules taken from the standard MPEG toolbox library or proprietary libraries. The elements of the library are specified by a textual description that expresses the I/O behavior of each module and by a reference software written using a subset of the CAL Actor Language named RVC-CAL. A decoder configuration is written in an XML dialect by connecting a set of CAL modules. Code generators are fundamental supports that enable the direct transformation of a high level specification to efficient hardware and software implementations. This paper presents a synthesis tool that from a CAL dataflow program generates C code and an associated SystemC model. The generated code is validated against the original CAL description simulated using the Open Dataflow environment. Experimental results of the translation of two descriptions of an MPEG-4 Simple Profile decoder with different granularities are shown and discussed.

Book
11 Jan 2011
TL;DR: This book distills the developers of Camel's experience and practical insights so that they can tackle integration tasks like a pro and show you how to work with the integration patterns.
Abstract: Apache Camel is a Java framework that lets you implement the standard enterprise integration patterns in a few lines of code. With a concise but sophisticated DSL you snap integration logic into your app, Lego-style, using Java, XML, or Scala. Camel supports over 80 common transports such as HTTP, REST, JMS, and Web Services. Camel in Action is a Camel tutorial full of small examples showing how to work with the integration patterns. It starts with core concepts like sending, receiving, routing, and transforming data. It then shows you the entire lifecycle and goes in depth on how to test, deal with errors, scale, deploy, and even monitor your app details you can find only in the Camel code itself. Written by the developers of Camel, this book distills their experience and practical insights so that you can tackle integration tasks like a pro. Whats Inside Valuable examples in Java and XML Explanations of complex patterns Error handling, testing, deploying, managing, and running Camel Accessible to beginners, useful to experts

Journal ArticleDOI
TL;DR: The experiments conducted on real world classification problems demonstrate that the voting-ELM classifiers presented in this paper can achieve better performance than ELM algorithms with respect to precision, recall and F-measure.

Journal ArticleDOI
TL;DR: An adaptive approach to managing experimental data based on semantically typed data hypercubes (SDCubes) that combine hierarchical data format 5 (HDF5) and extensible markup language (XML) file types is described.
Abstract: Whereas genomic data are universally machine-readable, data from imaging, multiplex biochemistry, flow cytometry and other cell- and tissue-based assays usually reside in loosely organized files of poorly documented provenance. This arises because the relational databases used in genomic research are difficult to adapt to rapidly evolving experimental designs, data formats and analytic algorithms. Here we describe an adaptive approach to managing experimental data based on semantically typed data hypercubes (SDCubes) that combine hierarchical data format 5 (HDF5) and extensible markup language (XML) file types. We demonstrate the application of SDCube-based storage using ImageRail, a software package for high-throughput microscopy. Experimental design and its day-to-day evolution, not rigid standards, determine how ImageRail data are organized in SDCubes. We applied ImageRail to collect and analyze drug dose-response landscapes in human cell lines at single-cell resolution.

Patent
23 Mar 2011
TL;DR: In this article, a web crawler system has an automatic website crawler and a virtual browser that provides script related information to the crawler, and the virtual browser transforms an HTML document included in a web page of the website into an XML document and builds a document object model containing document objects in a tree structure based on the XML document.
Abstract: A web crawler system has an automatic website crawler and a virtual browser that provides script related information to the website crawler. The virtual browser transforms an HTML document included in a web page of the website into an XML document, and builds a document object model containing document objects in a tree structure based on the XML document. The virtual browser extracts from the DOM scripts that are potentially executable, and executes the extracted scripts using a browser object model provided for the virtual browser containing objects and methods and properties that are used for script execution so as to capture script related information generated by execution of the scripts.

Journal ArticleDOI
TL;DR: A theoretical framework about “matching cross” is established which demonstrates the intrinsic reason in the proof of optimality on holistic algorithms and a set of novel algorithms to efficiently process three categories of extended XML tree patterns are proposed.
Abstract: As business and enterprises generate and exchange XML data more often, there is an increasing need for efficient processing of queries on XML data. Searching for the occurrences of a tree pattern query in an XML database is a core operation in XML query processing. Prior works demonstrate that holistic twig pattern matching algorithm is an efficient technique to answer an XML tree pattern with parent-child (P-C) and ancestor-descendant (A-D) relationships, as it can effectively control the size of intermediate results during query processing. However, XML query languages (e.g., XPath and XQuery) define more axes and functions such as negation function, order-based axis, and wildcards. In this paper, we research a large set of XML tree pattern, called extended XML tree pattern, which may include P-C, A-D relationships, negation functions, wildcards, and order restriction. We establish a theoretical framework about “matching cross” which demonstrates the intrinsic reason in the proof of optimality on holistic algorithms. Based on our theorems, we propose a set of novel algorithms to efficiently process three categories of extended XML tree patterns. A set of experimental results on both real-life and synthetic data sets demonstrate the effectiveness and efficiency of our proposed theories and algorithms.

Book ChapterDOI
14 Jul 2011
TL;DR: The BINCOA framework is presented, whose goal is to ease the development of binary code analysers by providing an open formal model for low-level programs, an XML format for easy exchange of models and some basic tool support.
Abstract: This paper presents the BINCOA framework, whose goal is to ease the development of binary code analysers by providing an open formal model for low-level programs (typically: executable files), an XML format for easy exchange of models and some basic tool support. The BINCOA framework already comes with three different analysers, including simulation, test generation and Control-Flow Graph reconstruction.

Book ChapterDOI
TL;DR: The data format imzML was developed to allow the flexible and efficient exchange of MS imaging data between different instruments and data analysis software and was extended to include specific parameters of imaging mass spectrometry.
Abstract: Imaging mass spectrometry is the method of scanning a sample of interest and generating an "image" of the intensity distribution of a specific analyte. The data sets consist of a large number of mass spectra which are usually acquired with identical settings. Existing data formats are not sufficient to describe an MS imaging experiment completely. The data format imzML was developed to allow the flexible and efficient exchange of MS imaging data between different instruments and data analysis software.For this purpose, the MS imaging data is divided in two separate files. The mass spectral data is stored in a binary file to ensure efficient storage. All metadata (e.g., instrumental parameters, sample details) are stored in an XML file which is based on the standard data format mzML developed by HUPO-PSI. The original mzML controlled vocabulary was extended to include specific parameters of imaging mass spectrometry (such as x/y position and spatial resolution). The two files (XML and binary) are connected by offset values in the XML file and are unambiguously linked by a universally unique identifier. The resulting datasets are comparable in size to the raw data and the separate metadata file allows flexible handling of large datasets.Several imaging MS software tools already support imzML. This allows choosing from a (growing) number of processing tools. One is no longer limited to proprietary software, but is able to use the processing software which is best suited for a specific question or application. On the other hand, measurements from different instruments can be compared within one software application using identical settings for data processing. All necessary information for evaluating and implementing imzML can be found at http://www.imzML.org .

Journal ArticleDOI
TL;DR: Novel algorithms for extracting templates from a large number of web documents which are generated from heterogeneous templates are presented and a novel goodness measure with its fast approximation for clustering and comprehensive analysis of the algorithm are provided.
Abstract: World Wide Web is the most useful source of information. In order to achieve high productivity of publishing, the webpages in many websites are automatically populated by using the common templates with contents. The templates provide readers easy access to the contents guided by consistent structures. However, for machines, the templates are considered harmful since they degrade the accuracy and performance of web applications due to irrelevant terms in templates. Thus, template detection techniques have received a lot of attention recently to improve the performance of search engines, clustering, and classification of web documents. In this paper, we present novel algorithms for extracting templates from a large number of web documents which are generated from heterogeneous templates. We cluster the web documents based on the similarity of underlying template structures in the documents so that the template for each cluster is extracted simultaneously. We develop a novel goodness measure with its fast approximation for clustering and provide comprehensive analysis of our algorithm. Our experimental results with real-life data sets confirm the effectiveness and robustness of our algorithm compared to the state of the art for template detection algorithms.

Journal ArticleDOI
TL;DR: Two XML-based formats, SeqXML and OrthoXML, are designed to represent ortholog assignments from any source in a consistent and structured way, yet cater to specific needs such as scoring schemes or meta-information.
Abstract: There is a great need for standards in the orthology field. Users must contend with different ortholog data representations from each provider, and the providers themselves must independently gather and parse the input sequence data. These burdensome and redundant procedures make data comparison and integration difficult. We have designed two XML-based formats, SeqXML and OrthoXML, to solve these problems. SeqXML is a lightweight format for sequence recordscthe input for orthology prediction. It stores the same sequence and metadata as typical FASTA format records, but overcomes common problems such as unstructured metadata in the header and erroneous sequence content. XML provides validation to prevent data integrity problems that are frequent in FASTA files. The range of applications for SeqXML is broad and not limited to ortholog prediction. We provide read/write functions for BioJava, BioPerl, and Biopython. OrthoXML was designed to represent ortholog assignments from any source in a consistent and structured way, yet cater to specific needs such as scoring schemes or meta-information. A unified format is particularly valuable for ortholog consumers that want to integrate data from numerous resources, e.g. for gene annotation projects. Reference proteomes for 61 organisms are already available in SeqXML, and 10 orthology databases have signed on to OrthoXML. Adoption by the entire field would substantially facilitate exchange and quality control of sequence and orthology information.

Proceedings ArticleDOI
11 Apr 2011
TL;DR: This paper defines the new problem of studying top-k keyword search over probabilistic XML data, which is to retrieve k SLCA results with the k highest probabilities of existence, and proposes two efficient algorithms which can improve the efficiency.
Abstract: Despite the proliferation of work on XML keyword query, it remains open to support keyword query over probabilistic XML data. Compared with traditional keyword search, it is far more expensive to answer a keyword query over probabilistic XML data due to the consideration of possible world semantics. In this paper, we firstly define the new problem of studying top-k keyword search over probabilistic XML data, which is to retrieve k SLCA results with the k highest probabilities of existence. And then we propose two efficient algorithms. The first algorithm PrStack can find k SLCA results with the k highest probabilities by scanning the relevant keyword nodes only once. To further improve the efficiency, we propose a second algorithm EagerTopK based on a set of pruning properties which can quickly prune unsatisfied SLCA candidates. Finally, we implement the two algorithms and compare their performance with analysis of extensive experimental results.

Journal ArticleDOI
01 Jul 2011
TL;DR: The ISO/IEEE 11073 DIM is used to derive an HL7 v3 Refined Message Information Model (RMIM) of the medical device domain from the HL8 v3 Reference Information Mode (RIM), which makes it possible to trace themedical device data back to a standard common denominator, that is, HL7v3 RIM from which all the other medical domains under HL7V3 are derived.
Abstract: Medical devices are essential to the practice of modern healthcare services. Their benefits will increase if clinical software applications can seamlessly acquire the medical device data. The need to represent medical device observations in a format that can be consumable by clinical applications has already been recognized by the industry. Yet, the solutions proposed involve bilateral mappings from the ISO/IEEE 11073 Domain Information Model (DIM) to specific message or document standards. Considering that there are many different types of clinical applications such as the electronic health record and the personal health record systems, the clinical workflows, and the clinical decision support systems each conforming to different standard interfaces, detailing a mapping mechanism for every one of them introduces significant work and, thus, limits the potential health benefits of medical devices. In this paper, to facilitate the interoperability of clinical applications and the medical device data, we use the ISO/IEEE 11073 DIM to derive an HL7 v3 Refined Message Information Model (RMIM) of the medical device domain from the HL7 v3 Reference Information Mode (RIM). This makes it possible to trace the medical device data back to a standard common denominator, that is, HL7 v3 RIM from which all the other medical domains under HL7 v3 are derived. Hence, once the medical device data are obtained in the RMIM format, it can easily be transformed into HL7-based standard interfaces through XML transformations because these interfaces all have their building blocks from the same RIM. To demonstrate this, we provide the mappings from the developed RMIM to some of the widely used HL7 v3-based standard interfaces.

Journal ArticleDOI
TL;DR: It is shown that unrestricted databases can not be watermarked while preserving trivial parametric queries, and query languages and classes of structures that allow guaranteed watermarking capacity are exhibited, namely local query languages on structures with bounded degree Gaifman graph, and monadic second-order queries on trees or treelike structures.
Abstract: Watermarking allows robust and unobtrusive insertion of information in a digital document. During the last few years, techniques have been proposed for watermarking relational databases or Xml documents, where information insertion must preserve a specific measure on data (for example the mean and variance of numerical attributes).In this article we investigate the problem of watermarking databases or Xml while preserving a set of parametric queries in a specified language, up to an acceptable distortion. We first show that unrestricted databases can not be watermarked while preserving trivial parametric queries. We then exhibit query languages and classes of structures that allow guaranteed watermarking capacity, namely 1) local query languages on structures with bounded degree Gaifman graph, and 2) monadic second-order queries on trees or treelike structures. We relate these results to an important topic in computational learning theory, the VC-dimension. We finally consider incremental aspects of query-preserving watermarking.