Showing papers on "XML published in 2004"

PDF

Open Access

Proceedings Article•DOI•

Jena: implementing the semantic web recommendations

[...]

Jeremy J. Carroll¹, Ian Dickinson¹, Chris Dollin¹, Dave Reynolds¹, Andy Seaborne¹, Kevin Wilkinson¹ - Show less +2 more•Institutions (1)

Hewlett-Packard¹

19 May 2004

TL;DR: The new Semantic Web recommendations for RDF, RDFS and OWL have, at their heart, the RDF graph, and Jena2, a second-generation RDF toolkit, is similarly centered on the R DF graph.

...read moreread less

Abstract: The new Semantic Web recommendations for RDF, RDFS and OWL have, at their heart, the RDF graph. Jena2, a second-generation RDF toolkit, is similarly centered on the RDF graph. RDFS and OWL reasoning are seen as graph-to-graph transforms, producing graphs of virtual triples. Rich APIs are provided. The Model API includes support for other aspects of the RDF recommendations, such as containers and reification. The Ontology API includes support for RDFS and OWL, including advanced OWL Full support. Jena includes the de facto reference RDF/XML parser, and provides RDF/XML output using the full range of the rich RDF/XML grammar. N3 I/O is supported. RDF graphs can be stored in-memory or in databases. Jena's query language, RDQL, and the Web API are both offered for the next round of standardization.

...read moreread less

1,125 citations

Journal Article•DOI•

IntAct: an open source molecular interaction database

[...]

Henning Hermjakob¹, Luisa Montecchi-Palazzi², Chris Lewington, Sugath Mudali, Samuel Kerrien, Sandra Orchard, Martin Vingron³, Bernd Roechert⁴, Peter Roepstorff⁵, Alfonso Valencia⁶, Hanah Margalit⁷, John Armstrong⁸, Amos Marc Bairoch⁴, Gianni Cesareni, David James Sherman⁹, Rolf Apweiler - Show less +12 more•Institutions (9)

European Bioinformatics Institute¹, University of Rome Tor Vergata², Max Planck Society³, Swiss Institute of Bioinformatics⁴, University of Southern Denmark⁵, Spanish National Research Council⁶, Hebrew University of Jerusalem⁷, GlaxoSmithKline⁸, University of Bordeaux⁹

01 Jan 2004-Nucleic Acids Research

TL;DR: IntAct provides an open source database and toolkit for the storage, presentation and analysis of protein interactions, and allows exploring interaction networks in the context of the GO annotations of the interacting proteins.

...read moreread less

Abstract: IntAct provides an open source database and toolkit for the storage, presentation and analysis of protein interactions. The web interface provides both textual and graphical representations of protein interactions, and allows exploring interaction networks in the context of the GO annotations of the interacting proteins. A web service allows direct computational access to retrieve interaction networks in XML format. IntAct currently contains approximately 2200 binary and complex interactions imported from the literature and curated in collaboration with the Swiss-Prot team, making intensive use of controlled vocabularies to ensure data consistency. All IntAct software, data and controlled vocabularies are available at http://www.ebi.ac.uk/intact.

...read moreread less

965 citations

Journal Article•DOI•

A common open representation of mass spectrometry data and its application to proteomics research

[...]

Patrick G. A. Pedrioli¹, Jimmy K. Eng¹, Robert Hubley¹, Mathijs Vogelzang¹, Eric W. Deutsch¹, Brian Raught¹, Brian S. Pratt, Erik Nilsson, Ruth Hogue Angeletti², Rolf Apweiler, Kei Cheung³, Catherine E. Costello⁴, Henning Hermjakob, Sequin Huang⁴, Randall K. Julian, Eugene A. Kapp⁵, Mark E. McComb⁴, Stephen G. Oliver⁶, Gilbert S. Omenn⁷, Norman W. Paton⁶, Richard J. Simpson⁵, Richard D. Smith⁸, Chris F. Taylor, Weimin Zhu, Ruedi Aebersold¹ - Show less +21 more•Institutions (8)

Institute for Systems Biology¹, Yeshiva University², Yale University³, Boston University⁴, Royal Melbourne Hospital⁵, University of Manchester⁶, University of Michigan⁷, Pacific Northwest National Laboratory⁸

01 Nov 2004-Nature Biotechnology

TL;DR: The 'mzXML' format is introduced, an open, generic XML (extensible markup language) representation of MS data that will facilitate data management, interpretation and dissemination in proteomics research.

...read moreread less

Abstract: A broad range of mass spectrometers are used in mass spectrometry (MS)-based proteomics research. Each type of instrument possesses a unique design, data system and performance specifications, resulting in strengths and weaknesses for different types of experiments. Unfortunately, the native binary data formats produced by each type of mass spectrometer also differ and are usually proprietary. The diverse, nontransparent nature of the data structure complicates the integration of new instruments into preexisting infrastructure, impedes the analysis, exchange, comparison and publication of results from different experiments and laboratories, and prevents the bioinformatics community from accessing data sets required for software development. Here, we introduce the 'mzXML' format, an open, generic XML (extensible markup language) representation of MS data. We have also developed an accompanying suite of supporting programs. We expect that this format will facilitate data management, interpretation and dissemination in proteomics research.

...read moreread less

788 citations

Proceedings Article•DOI•

Analysis of interacting BPEL web services

[...]

Xiang Fu¹, Tevfik Bultan¹, Jianwen Su¹•Institutions (1)

University of California, Santa Barbara¹

17 May 2004

TL;DR: It is shown that a large class of composite web services with unbounded input queues can be completely verified using a finite state model checker such as SPIN, and a set of sufficient conditions that guarantee synchronizability and that can be checked statically are given.

...read moreread less

Abstract: This paper presents a set of tools and techniques for analyzing interactions of composite web services which are specified in BPEL and communicate through asynchronous XML messages. We model the interactions of composite web services as conversations, the global sequence of messages exchanged by the web services. As opposed to earlier work, our tool-set handles rich data manipulation via XPath expressions. This allows us to verify designs at a more detailed level and check properties about message content. We present a framework where BPEL specifications of web services are translated to an intermediate representation, followed by the translation of the intermediate representation to a verification language. As an intermediate representation we use guarded automata augmented with unbounded queues for incoming messages, where the guards are expressed as XPath expressions. As the target verification language we use Promela, input language of the model checker SPIN. Since SPIN model checker is a finite-state verification tool we can only achieve partial verification by fixing the sizes of the input queues in the translation. We propose the concept of synchronizability to address this problem. We show that if a composite web service is synchronizable, then its conversation set remains same when asynchronous communication is replaced with synchronous communication. We give a set of sufficient conditions that guarantee synchronizability and that can be checked statically. Based on our synchronizability results, we show that a large class of composite web services with unbounded input queues can be completely verified using a finite state model checker such as SPIN.

...read moreread less

713 citations

Book Chapter•DOI•

Universal Description, Discovery, and Integration

[...]

Anura Gurugé

01 Jan 2004

TL;DR: The goal of a UDDI directory is to ensure that enterprises and individuals can quickly, easily, and dynamically locate and make use of services—particularly Web services—that are of interest to them.

...read moreread less

Abstract: Universal Description, Discovery, and Integration (UDDI) is a standardized process to publish and discover information of Web services (as well as other services) programmatically or via a graphical user interface that would typically be Web based. The aim of UDDI is to provide a standard, uniform service, which is readily accessible by applications via a programmatic interface or by people via a graphical user interface (GUI). A UDDI directory—referred to as a registry—is meant to be platform independent and can be readily accessible via a Web browser-based GUI or by applications via published application programming interfaces (APIs). The goal of a UDDI directory is to ensure that enterprises and individuals can quickly, easily, and dynamically locate and make use of services—particularly Web services—that are of interest to them. As with all things related to or inspired by Web services, the UDDI is highly Extensible Markup Language (XML)-centric. The core information model used by the UDDI—irrespective of the kind of service being described—is based on an XML schema.

...read moreread less

474 citations

Patent•

XML-based template language for devices and services

[...]

William Michael Zintel, Amar S. Gandhi¹, Ye Gu¹, Shyamalan Pather¹, Jeffrey C. Schlimmer¹, Christopher M. Rude¹, Daniel R. Weisman¹, Donald R. Ryan¹, Paul J. Leach¹, Ting Cai¹, Holly Knight¹, Peter S. Ford¹ - Show less +8 more•Institutions (1)

Microsoft¹

04 Nov 2004

TL;DR: In this article, a universal plug and play (UPnP) device makes itself known through a set of processes, such as discovery, description, control, eventing, and presentation.

...read moreread less

Abstract: A universal plug and play (UPnP) device makes itself known through a set of processes—discovery, description, control, eventing, and presentation. Following discovery of a UPnP device, an entity can learn more about the device and its capabilities by retrieving the device's description. The description includes vendor-specific manufacturer information like the model name and number, serial number, manufacturer name, URLs to vendor-specific Web sites, etc. The description also includes a list of any embedded devices or services, as well as URLs for control, eventing, and presentation. The description is written by a vendor, and is usually based on a device template produced by a UPnP forum working committee. The template is derived from a template language that is used to define elements to describe the device and any services supported by the device. The template language is written using an XML-based syntax that organizes and structures the elements.

...read moreread less

463 citations

Proceedings Article•DOI•

ORDPATHs: insert-friendly XML node labels

[...]

Patrick O'Neil¹, Elizabeth O'Neil¹, Shankar Pal², Istvan Cseri², Gideon Schaller², Nigel Westbury² - Show less +2 more•Institutions (2)

University of Massachusetts Boston¹, Microsoft²

13 Jun 2004

TL;DR: A hierarchical labeling scheme called ORDPATH that is implemented in the upcoming version of Microsoft® SQL Server™ and supports insertion of new nodes at arbitrary positions in the XML tree, their ORDPath values "careted in" between OrDPATHs of sibling nodes, without relabeling any old nodes.

...read moreread less

Abstract: We introduce a hierarchical labeling scheme called ORDPATH that is implemented in the upcoming version of Microsoft® SQL Server™. ORDPATH labels nodes of an XML tree without requiring a schema (the most general case---a schema simplifies the problem). An example of an ORDPATH value display format is "1.5.3.9.1". A compressed binary representation of ORDPATH provides document order by simple byte-by-byte comparison and ancestry relationship equally simply. In addition, the ORDPATH scheme supports insertion of new nodes at arbitrary positions in the XML tree, their ORDPATH values "careted in" between ORDPATHs of sibling nodes, without relabeling any old nodes.

...read moreread less

436 citations

Proceedings Article•

Annotating Multi-media/Multi-modal Resources with ELAN

[...]

Hennie Brugman¹, Albert Russel¹•Institutions (1)

Max Planck Society¹

01 May 2004

TL;DR: The actual state of development of the manual annotation tool ELAN is shown and usage requirements from three different groups of users are presented and one annotation model and a number of generic design principles guided the choices made during the development process of ELAN.

...read moreread less

Abstract: This paper shows the actual state of development of the manual annotation tool ELAN. It presents usage requirements from three different groups of users and how one annotation model and a number of generic design principles guided the choices made during the development process of ELAN. Introduction At the Max-Planck-Institute for Psycholinguistics (MPI) software development on annotation tools for the manual annotation of multimedia data has been going on since the early 90’s. Over this decade there have been large changes in enabling technology and insights in the nature of linguistic annotation. Media frameworks for the handling of digital audio and especially digital video files have matured, as has media streaming technology. XML has come to existence and has become highly relevant in a short time. Rendering and input of Unicode characters is now commonplace. Simultaneously, users made experiences with the first generation of video annotation tools and became aware of and got used to these new technologies. From this a new set of requirements arose. Finally, annotation tool builders are better aware of each other’s approaches, annotation models and annotation document formats. Clearly convergence is going on, leading to easier exchange of data between annotation tools. An important role in this process was played by the paper by (Bird & Liberman, 2001) that introduced Annotation Graphs. We are closely watching and trying to participate in standards initiatives, as for example ISO TC37/SC4. The first video annotation tool developed at the MPI was MediaTagger, a QuickTime based application that runs only on pre-OS X Macintoshes. It started as a first attempt to exploit the QuickTime Movie data structure, and especially it’s text tracks, as an informal model for linguistic annotation. Since then several new formal models where made, each one building on the experiences of the previous ones and considering new user requirements. The formal modeling languages that were used are Entity-Relationship diagrams and UML. A detailed presentation and evaluation of these models can be found in (Brugman & Wittenburg, 2001). The next chapters will discuss the requirements of several different groups of users and describe the latest state of ELAN functionality. We will then present our model for annotation in some detail and show how we can cover the needs of very different user groups with one relatively simple model. In the discussions plans for future development will presented. 1 http://www.mpi.nl User requirements ELAN is developed with a number of different user groups in mind. These users are situated both within the MPI and, in an increasing number of cases, outside the MPI. Often they are participating in externally funded projects (DoBeS, ECHO). We will discuss the main requirements per group, although there is of course a substantial overlap between each group’s needs.

...read moreread less

428 citations

Book Chapter•DOI•

USIXML: a language supporting multi-path development of user interfaces

[...]

Quentin Limbourg¹, Jean Vanderdonckt¹, Benjamin Michotte¹, Laurent Bouillon¹, Víctor López-Jaquero¹ - Show less +1 more•Institutions (1)

Université catholique de Louvain¹

11 Jul 2004

TL;DR: Model-to-model transformation as discussed by the authors can be supported in multiple configurations, based on composition of three basic transformation types: abstraction, reification, and translation, which is the cornerstone of Model-Driven Architecture.

...read moreread less

Abstract: USer Interface eXtensible Markup Language (USIXML) consists in a User Interface Description Language (UIDL) allowing designers to apply a multi-path development of user interfaces. In this development paradigm, a user interface can be specified and produced at and from different, and possibly multiple, levels of abstraction while maintaining the mappings between these levels if required. Thus, the development process can be initiated from any level of abstraction and proceed towards obtaining one or many final user interfaces for various contexts of use at other levels of abstraction. In this way, the model-to-model transformation, which is the cornerstone of Model-Driven Architecture (MDA), can be supported in multiple configurations, based on composition of three basic transformation types: abstraction, reification, and translation.

...read moreread less

424 citations

Book Chapter•DOI•

Schema-free XQuery

[...]

Yunyao Li¹, Cong Yu¹, H. V. Jagadish¹•Institutions (1)

University of Michigan¹

31 Aug 2004

TL;DR: This work introduces the notion of Meaningful Lowest Common Ancestor Structure (MLCAS) for finding related nodes within an XML document and adds new functionality to XQuery to enable users to take full advantage of XQuery in querying XML data precisely and efficiently without requiring (perfect) knowledge of the document structure.

...read moreread less

Abstract: The widespread adoption of XML holds out the promise that document structure can be exploited to specify precise database queries. However, the user may have only a limited knowledge of the XML structure, and hence may be unable to produce a correct XQuery, especially in the context of a heterogeneous information collection. The default is to use keyword-based search and we are all too familiar with how difficult it is to obtain precise answers by these means. We seek to address these problems by introducing the notion of Meaningful Lowest Common Ancestor Structure (MLCAS) for finding related nodes within an XML document. By automatically computing MLCAS and expanding ambiguous tag names, we add new functionality to XQuery and enable users to take full advantage of XQuery in querying XML data precisely and efficiently without requiring (perfect) knowledge of the document structure. Such a Schema-Free XQuery is potentially of value not just to casual users with partial knowledge of schema, but also to experts working in a data integration or data evolution context. In such a context, a schema-free query, once written, can be applied universally to multiple data sources that supply similar content under different schemas, and applied "forever" as these schemas evolve. Our experimental evaluation found that it was possible to express a wide variety of queries in a schema-free manner and have them return correct results over a broad diversity of schemas. Furthermore, the evaluation of a schema-free query is not expensive using a novel stack-based algorithm we develop for computing MLCAS: from 1 to 4 times the execution time of an equivalent schema-aware query.

...read moreread less

381 citations

Proceedings Article•DOI•

A proposal for an owl rules language

[...]

Ian Horrocks¹, Peter F. Patel-Schneider²•Institutions (2)

University of Manchester¹, Bell Labs²

17 May 2004

TL;DR: The expressive power of ORL is discussed, showing that the ontology consistency problem is undecidable, and how reasoning support for ORL might be provided are discussed.

...read moreread less

Abstract: Although the OWLWeb Ontology Language adds considerable expressive power to the Semantic Web it does have expressive limitations, particularly with respect to what can be said about properties. Wepresent ORL (OWL Rules Language), a Horn clause rules extension to OWL that overcomes many of these limitations. ORL extends OWL in a syntactically and semantically coherent manner: the basic syntax for ORL rules is an extension of the abstract syntax for OWL DL and OWLLite; ORL rules are given formal meaning via an extension of the OWLDL model-theoretic semantics; ORL rules are given an XML syntax basedon the OWL XML presentation syntax; and a mapping from ORL rules to RDF graphs is given based on the OWL RDF/XML exchange syntax. Wediscuss the expressive power of ORL, showing that the ontology consistency problem is undecidable, provide several examples of ORLusage, and discuss how reasoning support for ORL might be provided.

...read moreread less

Journal Article•DOI•

Containment and equivalence for a fragment of XPath

[...]

Gerome Miklau¹, Dan Suciu¹•Institutions (1)

University of Washington¹

01 Jan 2004-Journal of the ACM

TL;DR: This article identifies one parameterized class of queries for which containment can be decided efficiently, and shows that even with some bounded parameters, containment remains coNP-complete.

...read moreread less

Abstract: XPath is a language for navigating an XML document and selecting a set of element nodes. XPath expressions are used to query XML data, describe key constraints, express transformations, and reference elements in remote documents. This article studies the containment and equivalence problems for a fragment of the XPath query language, with applications in all these contexts.In particular, we study a class of XPath queries that contain branching, label wildcards and can express descendant relationships between nodes. Prior work has shown that languages that combine any two of these three features have efficient containment algorithms. However, we show that for the combination of features, containment is coNP-complete. We provide a sound and complete algorithm for containment that runs in exponential time, and study parameterized PTIME special cases. While we identify one parameterized class of queries for which containment can be decided efficiently, we also show that even with some bounded parameters, containment remains coNP-complete. In response to these negative results, we describe a sound algorithm that is efficient for all queries, but may return false negatives in some cases.

...read moreread less

Journal Article•DOI•

The Piazza peer data management system

[...]

Alon Halevy¹, Zachary G. Ives², Jayant Madhavan¹, Peter Mork¹, Dan Suciu¹, Igor Tatarinov¹ - Show less +2 more•Institutions (2)

University of Washington¹, University of Pennsylvania²

01 Jul 2004-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This paper describes-several aspects of the Piazza PDMS, including the schema mediation formalism, query answering and optimization algorithms, and the relevance of PDMSs to the semantic Web.

...read moreread less

Abstract: Intuitively, data management and data integration tools are well-suited for exchanging information in a semantically meaningful way. Unfortunately, they suffer from two significant problems: They typically require a comprehensive schema design before they can be used to store or share information and they are difficult to extend because schema evolution is heavyweight and may break backward compatibility. As a result, many small-scale data sharing tasks are more easily facilitated by nondatabase-oriented tools that have little support for semantics. The goal of the peer data management system (PDMS) is to address this need: We propose the use of a decentralized, easily extensible data management architecture in which any user can contribute new data, schema information, or even mappings between other peers' schemes. PDMSs represent a natural step beyond data integration systems, replacing their single logical schema with an interlinked collection of semantic mappings between peers' individual schemas. This paper describes-several aspects of the Piazza PDMS, including the schema mediation formalism, query answering and optimization algorithms, and the relevance of PDMSs to the semantic Web.

...read moreread less

Proceedings Article•DOI•

Secure XML querying with security views

[...]

Wenfei Fan¹, Chee-Yong Chan², Minos Garofalakis³•Institutions (3)

University of Edinburgh¹, National University of Singapore², Bell Labs³

13 Jun 2004

TL;DR: This work is the first to study a flexible, DTD-based access-control model for XML and its implications on the XML query-execution engine and is among the first efforts for query rewriting and optimization in the presence of general DTDs for a rich a class of XPath queries.

...read moreread less

Abstract: The prevalent use of XML highlights the need for a generic, flexible access-control mechanism for XML documents that supports efficient and secure query access, without revealing sensitive information unauthorized users. This paper introduces a novel paradigm for specifying XML security constraints and investigates the enforcement of such constraints during XML query evaluation. Our approach is based on the novel concept of security views, which provide for each user group (a) an XML view consisting of all and only the information that the users are authorized to access, and (b) a view DTD that the XML view conforms to. Security views effectively protect sensitive data from access and potential inferences by unauthorized user, and provide authorized users with necessary schema information to facilitate effective query formulation and optimization. We propose an efficient algorithm for deriving security view definitions from security policies (defined on the original document DTD) for different user groups. We also develop novel algorithms for XPath query rewriting and optimization such that queries over security views can be efficiently answered without materializing the views. Our algorithms transform a query over a security view to an equivalent query over the original document, and effectively prune query nodes by exploiting the structural properties of the document DTD in conjunction with approximate XPath containment tests. Our work is the first to study a flexible, DTD-based access-control model for XML and its implications on the XML query-execution engine. Furthermore, it is among the first efforts for query rewriting and optimization in the presence of general DTDs for a rich a class of XPath queries. An empirical study based on real-life DTDs verifies the effectiveness of our approach.

...read moreread less

Proceedings Article•DOI•

A prime number labeling scheme for dynamic ordered XML trees

[...]

X. Wu¹, Mong Li Lee¹, Wynne Hsu¹•Institutions (1)

National University of Singapore¹

30 Mar 2004

TL;DR: This work proposes a new labeling scheme that take advantage of the unique property of prime numbers to meet the need for efficient support to order-sensitive queries and updates of XML queries.

...read moreread less

Abstract: Efficient evaluation of XML queries requires the determination of whether a relationship exists between two elements. A number of labeling schemes have been designed to label the element nodes such that the relationships between nodes can be easily determined by comparing their labels. With the increased popularity of XML on the Web, finding a labeling scheme that is able to support order-sensitive queries in the presence of dynamic updates becomes urgent. We propose a new labeling scheme that take advantage of the unique property of prime numbers to meet this need. The global order of the nodes can be captured by generating simultaneous congruence values from the prime number node labels. Theoretical analysis of the label size requirements for the various labeling schemes is given. Experiment results indicate that the prime number labeling scheme is compact compared to existing dynamic labeling schemes, and provides efficient support to order-sensitive queries and updates.

...read moreread less

Proceedings Article•DOI•

Concrete syntax for objects: domain-specific language embedding and assimilation without restrictions

[...]

Martin Bravenboer¹, Eelco Visser¹•Institutions (1)

Utrecht University¹

01 Oct 2004

TL;DR: MetaBorg is described, a method for providing concrete syntax for domain abstractions to application programmers by embedding domain-specific languages in a general purpose host language and assimilating the embedded domain code into the surrounding host code.

...read moreread less

Abstract: Application programmer's interfaces give access to domain knowledge encapsulated in class libraries without providing the appropriate notation for expressing domain composition. Since object-oriented languages are designed for extensibility and reuse, the language constructs are often sufficient for expressing domain abstractions at the semantic level. However, they do not provide the right abstractions at the syntactic level. In this paper we describe MetaBorg, a method for providing concrete syntax for domain abstractions to application programmers. The method consists of embedding domain-specific languages in a general purpose host language and assimilating the embedded domain code into the surrounding host code. Instead of extending the implementation of the host language, the assimilation phase implements domain abstractions in terms of existing APIs leaving the host language undisturbed. Indeed, MetaBorg can be considered a method for promoting APIs to the language level. The method is supported by proven and available technology, i.e. the syntax definition formalism SDF and the program transformation language and toolset Stratego/XT. We illustrate the method with applications in three domains: code generation, XML generation, and user-interface construction.

...read moreread less

Journal Article•DOI•

Rule-based generation of requirements traceability relations

[...]

George Spanoudakis¹, Andrea Zisman¹, Elena Pérez-Miñana², Paul Krause³•Institutions (3)

Northampton Community College¹, Philips², University of Surrey³

01 Jul 2004-Journal of Systems and Software

TL;DR: This paper presents a rule-based approach to support the automatic generation of traceability relations between documents which specify requirement statements and use cases, and analysis object models for software systems.

...read moreread less

Proceedings Article•DOI•

Efficient query reformulation in peer data management systems

[...]

Igor Tatarinov¹, Alon Halevy¹•Institutions (1)

University of Washington¹

13 Jun 2004

TL;DR: This work develops techniques for pruning paths in the reformulation process and for minimizing the reformulated queries as they are created, and shows that pre-computing semantic paths in a PDMS can greatly improve the efficiency of the reformulations process.

...read moreread less

Abstract: Peer data management systems (PDMS) offer a flexible architecture for decentralized data sharing. In a PDMS, every peer is associated with a schema that represents the peer's domain of interest, and semantic relationships between peers are provided locally between pairs (or small sets) of peers. By traversing semantic paths of mappings, a query over one peer can obtain relevant data from any reachable peer in the network. Semantic paths are traversed by reformulating queries at a peer into queries on its neighbors.Naively following semantic paths is highly inefficient in practice. We describe several techniques for optimizing the reformulation process in a PDMS and validate their effectiveness using real-life data sets. In particular, we develop techniques for pruning paths in the reformulation process and for minimizing the reformulated queries as they are created. In addition, we consider the effect of the strategy we use to search through the space of reformulations. Finally, we show that pre-computing semantic paths in a PDMS can greatly improve the efficiency of the reformulation process. Together, all of these techniques form a basis for scalable query reformulation in PDMS.To enable our optimizations, we developed practical algorithms, of independent interest, for checking containment and minimization of XML queries, and for composing XML mappings.

...read moreread less

Proceedings Article•DOI•

FleXPath: flexible structure and full-text querying for XML

[...]

Sihem Amer-Yahia¹, Laks V. S. Lakshmanan², Shashank Pandit³•Institutions (3)

AT&T Labs¹, University of British Columbia², Indian Institute of Technology Bombay³

13 Jun 2004

TL;DR: This paper provides an elegant definition of relaxation on structure and defines primitive operators to span the space of relaxations for ranking schemes and proposes natural ranking schemes that adhere to these principles.

...read moreread less

Abstract: Querying XML data is a well-explored topic with powerful database-style query languages such as XPath and XQuery set to become W3C standards. An equally compelling paradigm for querying XML documents is full-text search on textual content. In this paper, we study fundamental challenges that arise when we try to integrate these two querying paradigms.While keyword search is based on approximate matching, XPath has exact match semantics. We address this mismatch by considering queries on structure as a "template", and looking for answers that best match this template and the full-text search. To achieve this, we provide an elegant definition of relaxation on structure and define primitive operators to span the space of relaxations. Query answering is now based on ranking potential answers on structural and full-text search conditions. We set out certain desirable principles for ranking schemes and propose natural ranking schemes that adhere to these principles. We develop efficient algorithms for answering top-K queries and discuss results from a comprehensive set of experiments that demonstrate the utility and scalability of the proposed framework and algorithms.

...read moreread less

Patent•

Intent based processing

[...]

Rafael S. Lisitsa¹, Dale A. Sather¹, Costin Hagiu¹•Institutions (1)

Microsoft¹

22 Dec 2004

TL;DR: In this article, a system and method for determining a user's intent is presented, where constituents and a topology are derived from the user's expression of intent, which can be stated broadly or stated in specific detail.

...read moreread less

Abstract: Presented is a system and method for determining a user's intent. Specifically, constituents and a topology are derived from the user's expression of intent, which can be stated broadly or stated in specific detail. The intent is expressed verbally, written, or in an XML format. The constituents and topology are resolved into a configuration based upon contexts. The contexts, which include a resource context, a user context, and an application context, includes information about the user's preferences, location, restrictions, device and network availability, and content availability. The configuration is then implemented.

...read moreread less

Proceedings Article•DOI•

PRIX: indexing and querying XML using prufer sequences

[...]

Praveen Rao¹, Bongki Moon¹•Institutions (1)

University of Arizona¹

30 Mar 2004

TL;DR: This work proposes a new way of indexing XML documents and processing twig patterns in an XML database that allows holistic processing of a twig pattern without breaking the twig into root-to-leaf paths and processing these paths individually.

...read moreread less

Abstract: We propose a new way of indexing XML documents and processing twig patterns in an XML database. Every XML document in the database can be transformed into a sequence of labels by Prufer's method that constructs a one-to-one correspondence between trees and sequences. During query processing, a twig pattern is also transformed into its Prufer sequence. By performing subsequence matching on the set of sequences in the database, and performing a series of refinement phases that we have developed, we can find all the occurrences of a twig pattern in the database. Our approach allows holistic processing of a twig pattern without breaking the twig into root-to-leaf paths and processing these paths individually. Furthermore, we show that all correct answers are found without any false dismissals or false alarms. Experimental results demonstrate the performance benefits of our proposed techniques.

...read moreread less

Book•

Eclipse modeling framework : a developer's guide

[...]

Frank Budinsky

01 Jan 2004

TL;DR: The EMF Framework is introduced, a model for unifying Java, XML, and UML, and a guide to implementing and editing EMF Models and Projects.

...read moreread less

Abstract: Foreword by Sridhar Iyengar. Foreword by Dr. Lee R. Nackman. Preface. References. I. EMF OVERVIEW. 1. Eclipse. The Projects. The Eclipse Platform. More Information. 2. Introducing EMF. Unifying Java, XML, and UML. Modeling vs Programming. Defining the Model. Generating Code. The EMF Framework. EMF and Modeling Standards. 3. Model Editing with EMF.Edit. Displaying and Editing EMF Models. Item Providers. Command Framework. Generating EMF.Edit Code. 4. Using EMF-A Simple Overview. Example Model: The Primer Purchase Order. Creating EMF Models and Projects. Generating Code. Running the Application. Continuing Development. II. DEFINING EMF MODELS. 5. Ecore Modeling Concepts. Core Model Uses. The Ecore Kernel. Structural Features. Behavioral Features. Classifiers. Packages and Factories. Annotations. Modeled Data Types. 6. Java Source Code. Java Specification for Packages. Java Specification for Classes. Java Specification for Enumerations. Java Specification for Data Types. Java Specification for Maps. 7. XML Schema. Schema Definition of Packages. Schema Definition of Classes. Schema Definition of Attributes. Schema Definition of References. Schema Simple Types. 8. UML. UML Packages. UML Specification for Classifiers. UML Specification for Attributes. UML Specification for References. UML Specification for Operations. III. USING THE EMF GENERATOR. 9. EMF Generator Patterns. Modeled Classes. Attributes. References. Operations. Class Inheritance. Reflective Methods. Factories and Packages. Switch Classes and Adapter Factories. Customizing Generated Classes. 10. EMF.Edit Generator Patterns. Item Providers. Item Provider Adapter Factories. Editor. Action Bar Contributor. Wizard. Plug-Ins. 11. Running the Generators. EMF Code Generation. The Generator GUI. The Command-Line Generator Tools. The Template Format. 12. Example-Implementing a Model and Editor. Getting Started. Generating the Model. Implementing Volatile Features. Implementing Data Types. Running the ExtendedPO2 Editor. Restricting Reference Targets. Splitting the Model into Multiple Packages. Editing Multiple Resources Concurrently. IV. PROGRAMMING WITH EMF. 13. EMF Client Programming. Packages and Factories. The EMF Persistence API. EMF Resource Implementations. Adapters. Working with EMF Objects. Dynamic EMF. 14. EMF.Edit Programming. Overriding Commands. Customizing Views. V. EMF API. 15. The org.eclipse.emf.common Plug-In. The org.eclipse.emf.common Package. The org.eclipse.emf.common.command Package. The org.eclipse.emf.common.notify Package. The org.eclipse.emf.common.util Package. 16. The org.eclipse.emf.common.ui Plug-In. The org.eclipse.emf.common.ui Package. The org.eclipse.emf.common.ui.celleditor Package. The org.eclipse.emf.common.ui.viewer Package. 17. The org.eclipse.emf.ecore Plug-In. The org.eclipse.emf.ecore Package. The org.eclipse.emf.ecore.plugin Package. The org.eclipse.emf.ecore.resource Package. The org.eclipse.emf.ecore.util Package. 18. The org.eclipse.emf.ecore.xmi Plug-In. The org.eclipse.emf.ecore.xmi Package. VI. EMF.EDIT API. 19. The org.eclipse.emf.edit Plug-In. The org.eclipse.emf.edit Package. The org.eclipse.emf.edit.command Package. The org.eclipse.emf.edit.domain Package. The org.eclipse.emf.edit.provider Package. The org.eclipse.emf.edit.provider.resource Package. The org.eclipse.emf.edit.tree Package. The org.eclipse.emf.edit.tree.provider Package. The org.eclipse.emf.edit.tree.util Package. 20. The org.eclipse.emf.edit.ui Plug-In. The org.eclipse.emf.edit.ui Package. The org.eclipse.emf.edit.ui.action Package. The org.eclipse.emf.edit.ui.celleditor Package. The org.eclipse.emf.edit.ui.dnd Package. The org.eclipse.emf.edit.ui.provider Package. Appendix A: UML Notation. Classes and Interfaces. Enumerations and Data Types. Class Relationships. Appendix B: Summary of Example Models. SimplePO. PrimerPO. ExtendedPO1. ExtendedPO2. ExtendedPO3. Index.

...read moreread less

Proceedings Article•DOI•

Constraint-based XML query rewriting for data integration

[...]

Cong Yu¹, Lucian Popa²•Institutions (2)

University of Michigan¹, IBM²

13 Jun 2004

TL;DR: The semantics of query answering in such an integration scenario is defined, and two novel algorithms are designed, basic query rewrite and query resolution, to implement the semantics.

...read moreread less

Abstract: We study the problem of answering queries through a target schema, given a set of mappings between one or more source schemas and this target schema, and given that the data is at the sources. The schemas can be any combination of relational or XML schemas, and can be independently designed. In addition to the source-to-target mappings, we consider as part of the mapping scenario a set of target constraints specifying additional properties on the target schema. This becomes particularly important when integrating data from multiple data sources with overlapping data and when such constraints can express data merging rules at the target. We define the semantics of query answering in such an integration scenario, and design two novel algorithms, basic query rewrite and query resolution, to implement the semantics. The basic query rewrite algorithm reformulates target queries in terms of the source schemas, based on the mappings. The query resolution algorithm generates additional rewritings that merge related information from multiple sources and assemble a coherent view of the data, by incorporating target constraints. The algorithms are implemented and then evaluated using a comprehensive set of experiments based on both synthetic and real-life data integration scenarios.

...read moreread less

Patent•

XML remote procedure call (XML-RPC)

[...]

Phillip Merrick, Stewart O. Allen, Joseph T. Lapp

03 Dec 2004

TL;DR: In this paper, the Remote Procedure Call (RPC) is implemented using XML-based message encoding wherein elements in the message corresponding to arguments of the RPC are associated with element type indicators selected from a defined set.

...read moreread less

Abstract: Remote Procedure Call (RPC) is implemented using XML-based message encoding wherein elements in the message corresponding to arguments of the RPC are associated with element type indicators selected from a defined set. The type indicators may allow the message itself to identify structural aspects of the message, particularly useful in the context of array elements, but useful for other types of elements as well.

...read moreread less

Journal Article•DOI•

Processing XML streams with deterministic automata and stream indexes

[...]

Todd J. Green¹, Ashish Kumar Gupta², Gerome Miklau², Makoto Onizuka³, Dan Suciu² - Show less +1 more•Institutions (3)

University of Pennsylvania¹, University of Washington², Nippon Telegraph and Telephone³

12 Dec 2004-ACM Transactions on Database Systems

TL;DR: The DFA can be used effectively for evaluating a large number of XPath expressions on a stream of XML packets and a series of theoretical results and experimental evaluations show that the lazy DFA has a small number of states, for all practical purposes.

...read moreread less

Abstract: We consider the problem of evaluating a large number of XPath expressions on a stream of XML packets. We contribute two novel techniques. The first is to use a single Deterministic Finite Automaton (DFA). The contribution here is to show that the DFA can be used effectively for this problem: in our experiments we achieve a constant throughput, independently of the number of XPath expressions. The major issue is the size of the DFA, which, in theory, can be exponential in the number of XPath expressions. We provide a series of theoretical results and experimental evaluations that show that the lazy DFA has a small number of states, for all practical purposes. These results are of general interest in XPath processing, beyond stream processing. The second technique is the Streaming IndeX (SIX), which consists of adding a small amount of binary data to each XML packet that allows the query processor to achieve significant speedups. As an application of these techniques we describe the XML Toolkit (XMLTK), a collection of command-line tools providing highly scalable XML data processing.

...read moreread less

Journal Article•DOI•

An efficient and scalable algorithm for clustering XML documents by structure

[...]

Wang Lian¹, David W. Cheung¹, Nikos Mamoulis¹, Siu-Ming Yiu¹•Institutions (1)

University of Hong Kong¹

01 Jan 2004-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This work proposes a hierarchical algorithm (S-GRACE) for clustering XML documents based on structural information in the data, and proposes a computationally efficient distance metric defined between documents and sets of documents using the notion of structure graph (s-graph).

...read moreread less

Abstract: With the standardization of XML as an information exchange language over the Internet, a huge amount of information is formatted in XML documents. In order to analyze this information efficiently, decomposing the XML documents and storing them in relational tables is a popular practice. However, query processing becomes expensive since, in many cases, an excessive number of joins is required to recover information from the fragmented data. If a collection consists of documents with different structures (for example, they come from different DTDs), mining clusters in the documents could alleviate the fragmentation problem. We propose a hierarchical algorithm (S-GRACE) for clustering XML documents based on structural information in the data. The notion of structure graph (s-graph) is proposed, supporting a computationally efficient distance metric defined between documents and sets of documents. This simple metric yields our new clustering algorithm which is efficient and effective, compared to other approaches based on tree-edit distance. Experiments on real data show that our algorithm can discover clusters not easily identified by manual inspection.

...read moreread less

Proceedings Article•DOI•

Precise service level agreements

[...]

James Skene¹, D. Davide Lamanna¹, Wolfgang Emmerich¹•Institutions (1)

University College London¹

23 May 2004

TL;DR: This work uses the semantics of SLAng to define a notion of SLA compatibility, and an extension to UML that enables the modelling of service situations as a precursor to analysis, implementation and provisioning activities.

...read moreread less

Abstract: SLAng is an XML language for defining service level agreements, the part of a contract between the client and provider of an Internet service that describes the quality attributes that the service is required to possess. We define the semantics of SLAng precisely by modelling the syntax of the language in UML, then relating the language model to a model that describes the structure and behaviour of services. The presence of SLAng elements imposes behavioural constraints on service elements, and the precise definition of these constraints using OCL constitutes the semantic description of the language. We use the semantics to define a notion of SLA compatibility, and an extension to UML that enables the modelling of service situations as a precursor to analysis, implementation and provisioning activities.

...read moreread less

Web Services Eventing (WS-Eventing)

[...]

Luis Felipe Cabrera

01 Jan 2004

TL;DR: This specification describes a protocol that allows Web services to subscribe to or accept subscriptions for event notification messages to provide secure, reliable, and/or transacted message delivery and to express Web service and client policy.

...read moreread less

Abstract: This specification describes a protocol that allows Web services to subscribe to or accept subscriptions for event notification messages. Composable Architecture By using the XML, SOAP [SOAP 1.1, SOAP 1.2], and WSDL [WSDL 1.1] extensibility models, the Web service specifications (WS-*) are designed to be composed with each other to provide a rich set of tools to provide security in the Web services environment. This specification specifically relies on other Web service specifications to provide secure, reliable, and/or transacted message delivery and to express Web service and client policy.

...read moreread less

Book Chapter•DOI•

Towards an internet-scale XML dissemination service

[...]

Yanlei Diao¹, Shariq Rizvi¹, Michael J. Franklin¹•Institutions (1)

University of California, Berkeley¹

31 Aug 2004

TL;DR: This paper identifies the salient technical challenges in supporting XML filtering and transformation in this environment and proposes techniques for solving them and presents the architectural design of ONYX, a system based on an overlay network.

...read moreread less

Abstract: Publish/subscribe systems have demonstrated the ability to scale to large numbers of users and high data rates when providing content-based data dissemination services on the Internet. However, their services are limited by the data semantics and query expressiveness that they support. On the other hand, the recent work on selective dissemination of XML data has made significant progress in moving from XML filtering to the richer functionality of transformation for result customization, but in general has ignored the challenges of deploying such XML-based services on an Internet-scale. In this paper, we address these challenges in the context of incorporating the rich functionality of XML data dissemination in a highly scalable system. We present the architectural design of ONYX, a system based on an overlay network. We identify the salient technical challenges in supporting XML filtering and transformation in this environment and propose techniques for solving them.

...read moreread less

Patent•

Extraction of facts from text

[...]

Mark Wasson, James Wiltshire¹, Donald Loritz¹, Steve Xu¹, Shian-Jung Chen¹, Valentina Templar¹, Eleni Koutsomitopoulou¹ - Show less +3 more•Institutions (1)

LexisNexis¹

26 Oct 2004

TL;DR: The fact extraction tool set (FEX) as mentioned in this paperEX is a pattern matching language which is used to write scripts that find and match patterns of attributes that correspond to targeted pieces of information in the text, and extract that information.

...read moreread less

Abstract: A fact extraction tool set ('FEX') finds and extracts targeted pieces of information from text using linguistic and pattern matching technologies, and in particular, text annotation and fact extraction. Text annotation tools break a text, such as a document, into its base tokens and annotate those tokens or patterns of tokens with orthographic, syntactic, semantic, pragmatic and other attributes. A user-defined 'Annotation Configuration' controls which annotation tools are used in a given application. XML is used as the basis for representing the annotated text. A tag uncrossing tool resolves conflicting (crossed) annotation boundaries in an annotated text to produce well-formed XML from the results of the individual annotators. The fact extraction tool is a pattern matching language which is used to write scripts that find and match patterns of attributes that correspond to targeted pieces of information in the text, and extract that information.

...read moreread less

Collapse