Showing papers on "XML published in 2008"

PDF

Open Access

Journal Article•DOI•

A survey of top-k query processing techniques in relational database systems

[...]

Ihab F. Ilyas¹, George Beskales¹, Mohamed A. Soliman¹•Institutions (1)

15 Oct 2008-ACM Computing Surveys

TL;DR: This survey describes and classify top-k processing techniques in relational databases including query models, data access methods, implementation levels, data and query certainty, and supported scoring functions, and shows the implications of each dimension on the design of the underlying techniques.

...read moreread less

Abstract: Efficient processing of top-k queries is a crucial requirement in many interactive environments that involve massive amounts of data. In particular, efficient top-k processing in domains such as the Web, multimedia search, and distributed systems has shown a great impact on performance. In this survey, we describe and classify top-k processing techniques in relational databases. We discuss different design dimensions in the current techniques including query models, data access methods, implementation levels, data and query certainty, and supported scoring functions. We show the implications of each dimension on the design of the underlying techniques. We also discuss top-k queries in XML domain, and show their connections to relational approaches.

...read moreread less

893 citations

Book•

Domain-Specific Modeling: Enabling Full Code Generation

[...]

Steven Kelly, Juha-Pekka Tolvanen

07 Mar 2008

TL;DR: This chapter discusses modeling with a general-purpose language and with a domain-specific language, and defines the DSM solution as a continuous process in the real world.

...read moreread less

Abstract: Foreword. Preface . PART 1: BACKGROUND AND MOTIVATION. 1. Introduction. 1.1 Seeking the better level of abstraction. 1.2 Code-driven and model-driven development. 1.3 An example: modeling with a general-purpose language and with a domain-specific language. 1.4 What is DSM? 1.5 When to use DSM? 1.6 Summary. 2. Business value. 2.1 Productivity. 2.2 Quality. 2.3 Leverage expertise. 2.4 The economics of DSM. 2.5 Summary. PART 2: FUNDAMENTALS. 3. DSM defined. 3.1 DSM characteristics. 3.2 Implications of DSM for users. 3.3 Difference to other modeling approaches. 3.4 Tooling for DSM. 3.5 Summary. 4. Architecture of DSM. 4.1 Introduction. 4.2 Language. 4.3 Models. 4.4 Code generator. 4.5 Domain framework and target environment. 4.6 DSM organization and process. 4.7 Summary. PART 3: DSM EXAMPLES. 5. IP telephony and call processing. 5.1 Introduction and objectives. 5.2 Development process. 5.3 Language for modeling call processing services. 5.4 Modeling IP telephony service. 5.5 Generator for XML. 5.6 Framework support. 5.7 Main results. 5.8 Summary. 6. Insurance products. 6.1 Introduction and objectives. 6.2 Development process. 6.3 Language for modeling insurances. 6.4 Modeling insurance products. 6.5 Generator for Java. 6.6 Framework support. 6.7 Main results. 6.8 Summary. 7. Home Automation. 7.1 Introduction and objectives. 7.2 Development process. 7.3 Home automation modeling language. 7.4 Home automation modeling language in use. 7.5 Generator. 7.6 Main results. 7.7 Summary. 8. Mobile phone applications using Python framework. 8.1 Introduction and objectives. 8.2 Development process. 8.3 Language for application modeling. 8.4 Modeling phone applications. 8.5 Generator for Python. 8.6 Framework support. 8.7 Main results. 8.8 Extending the solution to native S60 C++. 8.9 Summary. 9. Digital Wristwatch. 9.1 Introduction and Objectives. 9.2 Development Process. 9.3 Modeling Language. 9.4 Models. 9.5 Code Generation for Watch Models. 9.6 The Domain Framework. 9.7 Main Results. 9.8 Summary. PART 4: CREATING DSM SOLUTIONS. 10 DSM language definition. 10.1 Introduction and objectives. 10.2 Identifying and defining modeling concepts. 10.3 Formalizing languages with metamodeling. 10.4 Defining language rules. 10.5 Integrating multiple languages. 10.6 Notation for the language. 10.7 Testing the languages. 10.8 Maintaining the languages. 10.9 Summary. 11. Generator definition. 11.1 "Here's one I made earlier". 11.2 Types of generator facilities. 11.3 Generator output patterns. 11.4 Generator structure. 11.5 Process. 11.6 Summary. 12. Domain Framework. 12.1 Removing duplication from generated code. 12.2 Hiding platform details. 12.3 Providing an interface for the generator. 12.4 Summary. 13. DSM definition process. 13.1 Choosing among possible candidate domains. 13.2 Organizing for DSM. 13.3 Proof of concept. 13.4 Defining the DSM solution. 13.5 Pilot project. 13.6 DSM deployment. 13.7 DSM as a continuous process in the real world. 13.8 Summary. 14. Tools for DSM. 14.1 Different approaches to building tool support. 14.2 A Brief History of Tools. 14.3 What is needed in a DSM environment. 14.4 Current tools. 14.5 Summary. 15. DSM in use. 15.1 Model reuse. 15.2 Model sharing and splitting. 15.3 Model versioning. 15.4 Summary. 16. Conclusion. 16.1 No sweat shops--But no Fritz Lang's Metropolis either. 16.2 The onward march of DSM. Appendix A: Metamodeling Language. References. Index.

...read moreread less

825 citations

Patent•

Network operating system

[...]

Daniel Arthursson, Marcus Bristav

29 Sep 2008

340 citations

Journal Article•DOI•

Sound and complete relevance assessment for XML retrieval

[...]

Benjamin Piwowarski, Andrew Trotman¹, Mounia Lalmas²•Institutions (2)

University of Otago¹, Queen Mary University of London²

23 Dec 2008-ACM Transactions on Information Systems

TL;DR: Investigations into the creation of sound and complete relevance assessments for the evaluation of content-oriented XML retrieval as carried out at INEX, the evaluation campaign for XML retrieval are described.

...read moreread less

Abstract: In information retrieval research, comparing retrieval approaches requires test collections consisting of documents, user requests and relevance assessments. Obtaining relevance assessments that are as sound and complete as possible is crucial for the comparison of retrieval approaches. In XML retrieval, the problem of obtaining sound and complete relevance assessments is further complicated by the structural relationships between retrieval results.A major difference between XML retrieval and flat document retrieval is that the relevance of elements (the retrievable units) is not independent of that of related elements. This has major consequences for the gathering of relevance assessments. This article describes investigations into the creation of sound and complete relevance assessments for the evaluation of content-oriented XML retrieval as carried out at INEX, the evaluation campaign for XML retrieval. The campaign, now in its seventh year, has had three substantially different approaches to gather assessments and has finally settled on a highlighting method for marking relevant passages within documents—even though the objective is to collect assessments at element level. The different methods of gathering assessments at INEX are discussed and contrasted. The highlighting method is shown to be the most reliable of the methods.

...read moreread less

256 citations

Patent•

Systems and methods for embedding a cloud-based resource request in a specification language wrapper

[...]

James Michael Ferris¹•Institutions (1)

Red Hat¹

26 Nov 2008

TL;DR: In this paper, the authors describe a system and methods for embedding a cloud-based resource request in a specification language wrapper, such as an XML object, which can be transmitted to a marketplace to seek the response of available clouds which can support the application or appliance according to the specifications contained in the specification language wrappers.

...read moreread less

Abstract: Embodiments relate to systems and methods for embedding a cloud-based resource request in a specification language wrapper In embodiments, a set of applications and/or a set of appliances can be registered to be instantiated in a cloud-based network Each application or appliance can have an associated set of specified resources with which the user wishes to instantiate those objects For example, a user may specify a maximum latency for input/output of the application or appliance, a geographic location of the supporting cloud resources, a processor throughput, or other resource specification to instantiate the desired object According to embodiments, the set of requested resources can be embedded in a specification language wrapper, such as an XML object The specification language wrapper can be transmitted to a marketplace to seek the response of available clouds which can support the application or appliance according to the specifications contained in the specification language wrapper

...read moreread less

203 citations

Journal Article•DOI•

Integrating Data Warehouses with Web Data: A Survey

[...]

Juan Manuel Pérez, Rafael Berlanga, María José Aramburu, Torben Bach Pedersen¹•Institutions (1)

Aalborg University¹

01 Jul 2008-IEEE Transactions on Knowledge and Data Engineering

TL;DR: The paper addresses the application of information retrieval technology in a DW to exploit text-rich documents collections and introduces the problem of dealing with semi-structured data in aDW.

...read moreread less

Abstract: This paper surveys the most relevant research on combining Data Warehouse (DW) and Web data. It studies the XML technologies that are currently being used to integrate, store, query and retrieve web data, and their application to DWs. The paper reviews different DW distributed architectures and the use of XML languages as an integration tool in these systems. It also introduces the problem of dealing with semi-structured data in a DW. It studies Web data repositories, the design of multidimensional databases for XML data sources and the XML extensions of On-Line Analytical Processing techniques. The paper addresses the application of information retrieval technology in a DW to exploit text-rich documents collections. The authors hope that the paper will help to discover the main limitations and opportunities that offer the combination of the DW and the Web fields, as well as, to identify open research lines.

...read moreread less

160 citations

Journal Article•DOI•

The Active XML project: an overview

[...]

Serge Abiteboul¹, Omar Benjelloun², Tova Milo³•Institutions (3)

French Institute for Research in Computer Science and Automation¹, Google², Tel Aviv University³

01 Aug 2008

TL;DR: The AXML model and language are described and motivate, the research results obtained in the course of the project are overviewed, and how all the pieces come together in the implementation are shown.

...read moreread less

Abstract: This paper provides an overview of the Active XML project developed at INRIA over the past five years. Active XML (AXML, for short), is a declarative framework that harnesses Web services for distributed data management, and is put to work in a peer-to-peer architecture. The model is based on AXML documents, which are XML documents that may contain embedded calls to Web services, and on AXML services, which are Web services capable of exchanging AXML documents. An AXML peer is a repository of AXML documents that acts both as a client by invoking the embedded service calls, and as a server by providing AXML services, which are generally defined as queries or updates over the persistent AXML documents. The approach gracefully combines stored information with data defined in an intensional manner as well as dynamic information. This simple, rather classical idea leads to a number of technically challenging problems, both theoretical and practical. In this paper, we describe and motivate the AXML model and language, overview the research results obtained in the course of the project, and show how all the pieces come together in our implementation.

...read moreread less

156 citations

Journal Article•DOI•

Reasoning and identifying relevant matches for XML keyword search

[...]

Ziyang Liu¹, Yi Cher¹•Institutions (1)

Arizona State University¹

01 Aug 2008

TL;DR: This work investigates an axiomatic framework that includes two intuitive and non-trivial properties that an XML keyword search technique should ideally satisfy: monotonicity and consistency, with respect to data and query and proposes a novel semantics for identifying relevant matches that satisfies both properties.

...read moreread less

Abstract: Keyword search is a user-friendly mechanism for retrieving XML data in web and scientific applications. An intuitively compelling but vaguely defined goal is to identify matches to query keywords that are relevant to the user. However, it is hard to directly evaluate the relevance of query results due to the inherent ambiguity of search semantics. In this work, we investigate an axiomatic framework that includes two intuitive and non-trivial properties that an XML keyword search technique should ideally satisfy: monotonicity and consistency, with respect to data and query. This is the first work that reasons about keyword search strategies from a formal perspective.Then we propose a novel semantics for identifying relevant matches, which, to the best of our knowledge, is the only existing algorithm that satisfies both properties. An efficient algorithm is designed for realizing this semantics. Extensive experimental studies have verified the intuition of the properties and shown the effectiveness of the proposed algorithm.

...read moreread less

152 citations

Patent•

Xml-based event processing networks for event server

[...]

Alexandre de Castro Alves, Dana Bergen, Andrew Piper

04 Jun 2008

TL;DR: In this article, an event server running an event driven application implementing an event processing network can be specified by XML that is an extension of SPRING framework XML, and the event server can include at least one processor to implement a rule on at least a single input stream.

...read moreread less

Abstract: An event server running an event driven application implementing an event processing network. The event processing network can include at least one processor to implement a rule on at least one input stream. The event driven application can be specified by XML that is an extension of SPRING framework XML.

...read moreread less

147 citations

Patent•

Gesture-based collaboration

[...]

Daniel Arthursson

22 Oct 2008

136 citations

Journal Article•DOI•

A National Human Neuroimaging Collaboratory Enabled by the Biomedical Informatics Research Network (BIRN)

[...]

David Keator¹, Jeffrey S. Grethe², Daniel S. Marcus³, Burak Ozyurt², Syam Gadde⁴, Sean Murphy⁵, Steve Pieper⁶, Douglas N. Greve⁵, Randy Notestine², H.J. Bockholt⁷, Philip M. Papadopoulos² - Show less +7 more•Institutions (7)

University of California, Irvine¹, University of California, San Diego², Washington University in St. Louis³, Duke University⁴, Harvard University⁵, Brigham and Women's Hospital⁶, MIND Institute⁷

01 Mar 2008

TL;DR: The biomedical informatics research network (BIRN) has developed a federated and distributed infrastructure for the storage, retrieval, analysis, and documentation of biomedical imaging data.

...read moreread less

Abstract: The aggregation of imaging, clinical, and behavioral data from multiple independent institutions and researchers presents both a great opportunity for biomedical research as well as a formidable challenge. Many research groups have well-established data collection and analysis procedures, as well as data and metadata format requirements that are particular to that group. Moreover, the types of data and metadata collected are quite diverse, including image, physiological, and behavioral data, as well as descriptions of experimental design, and preprocessing and analysis methods. Each of these types of data utilizes a variety of software tools for collection, storage, and processing. Furthermore sites are reluctant to release control over the distribution and access to the data and the tools. To address these needs, the biomedical informatics research network (BIRN) has developed a federated and distributed infrastructure for the storage, retrieval, analysis, and documentation of biomedical imaging data. The infrastructure consists of distributed data collections hosted on dedicated storage and computational resources located at each participating site, a federated data management system and data integration environment, an extensible markup language (XML) schema for data exchange, and analysis pipelines, designed to leverage both the distributed data management environment and the available grid computing resources.

...read moreread less

Journal Article•DOI•

Translating unstructured workflow processes to readable BPEL: Theory and implementation

[...]

Wil M. P. van der Aalst¹, Kristian Bisgaard Lassen²•Institutions (2)

Eindhoven University of Technology¹, Aarhus University²

01 Feb 2008-Information & Software Technology

TL;DR: A mapping from Workflow Nets (WF-nets) to BPEL is provided, which builds on the rich theory of Petri nets and can also be used to map other languages onto BPEL.

...read moreread less

Abstract: The Business Process Execution Language for Web Services (BPEL) has emerged as the de facto standard for implementing processes. Although intended as a language for connecting web services, its application is not limited to cross-organizational processes. It is expected that in the near future a wide variety of process-aware information systems will be realized using BPEL. While being a powerful language, BPEL is difficult to use. Its XML representation is very verbose and only readable for the trained eye. It offers many constructs and typically things can be implemented in many ways, e.g., using links and the flow construct or using sequences and switches. As a result only experienced users are able to select the right construct. Several vendors offer a graphical interface that generates BPEL code. However, the graphical representations are a direct reflection of the BPEL code and not easy to use by end-users. Therefore, we provide a mapping from Workflow Nets (WF-nets) to BPEL. This mapping builds on the rich theory of Petri nets and can also be used to map other languages (e.g., UML, EPC, BPMN, etc.) onto BPEL. In addition to this we have implemented the algorithm in a tool called WorkflowNet2BPEL4WS.

...read moreread less

Journal Article•DOI•

Adoption of e-business functions and migration from EDI-based to XML-based e-business frameworks in supply chain integration

[...]

Juha-Miikka Nurmilaakso¹•Institutions (1)

Helsinki University of Technology¹

01 Jun 2008-International Journal of Production Economics

TL;DR: This paper explores how organizational and technological factors explain the adoption of e-business functions in 4570 European companies and the migration from EDI-based to XML-based e- business frameworks in 329 European companies.

...read moreread less

Book•DOI•

Focused Access to XML Documents: 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007 Dagstuhl Castle, Germany, December 17-19, 2007. Selected Papers

[...]

Norbert Fuhr, Jaap Kamps, Mounia Lalmas, Andrew Trotman

01 May 2008-Lecture Notes in Computer Science

TL;DR: In the context of the INEX 2007 Ad Hoc Track (BookSearch'07) as discussed by the authors, the authors of this paper presented an XML document classification using Extended VSM (VSM) approach.

...read moreread less

Abstract: Ad Hoc Track.- Overview of the INEX 2007 Ad Hoc Track.- INEX 2007 Evaluation Measures.- XML Retrieval by Improving Structural Relevance Measures Obtained from Summary Models.- TopX @ INEX 2007.- The Garnata Information Retrieval System at INEX'07.- Dynamic Element Retrieval in the Wikipedia Collection.- The Simplest XML Retrieval Baseline That Could Possibly Work.- Using Language Models and Topic Models for XML Retrieval.- UJM at INEX 2007: Document Model Integrating XML Tags.- Phrase Detection in the Wikipedia.- Indian Statistical Institute at INEX 2007 Adhoc Track: VSM Approach.- A Fast Retrieval Algorithm for Large-Scale XML Data.- LIG at INEX 2007 Ad Hoc Track: Using Collectionlinks as Context.- Book Search Track.- Overview of the INEX 2007 Book Search Track (BookSearch'07).- Logistic Regression and EVIs for XML Books and the Heterogeneous Track.- CMIC at INEX 2007: Book Search Track.- XML-Mining Track.- Clustering XML Documents Using Closed Frequent Subtrees: A Structural Similarity Approach.- Probabilistic Methods for Structured Document Classification at INEX'07.- Efficient Clustering of Structured Documents Using Graph Self-Organizing Maps.- Document Clustering Using Incremental and Pairwise Approaches.- XML Document Classification Using Extended VSM.- Entity Ranking Track.- Overview of the INEX 2007 Entity Ranking Track.- L3S at INEX 2007: Query Expansion for Entity Ranking Using a Highly Accurate Ontology.- Entity Ranking Based on Category Expansion.- Entity Ranking from Annotated Text Collections Using Multitype Topic Models.- An n-Gram and Initial Description Based Approach for Entity Ranking Track.- Structured Document Retrieval, Multimedia Retrieval, and Entity Ranking Using PF/Tijah.- Using Wikipedia Categories and Links in Entity Ranking.- Integrating Document Features for Entity Ranking.- Interactive Track.- A Comparison of Interactive and Ad-Hoc Relevance Assessments.- Task Effects on Interactive Search: The Query Factor.- Link-the-Wiki Track.- Overview of INEX 2007 Link the Wiki Track.- Using and Detecting Links in Wikipedia.- GPX: Ad-Hoc Queries and Automated Link Discovery in the Wikipedia.- University of Waterloo at INEX2007: Adhoc and Link-the-Wiki Tracks.- Wikipedia Ad Hoc Passage Retrieval and Wikipedia Document Linking.- Multimedia Track.- The INEX 2007 Multimedia Track.

...read moreread less

Book•

Database and XML Technologies

[...]

Zohra Bellahsene, Ela Hunt¹, Michael Rys², Rainer Unland³•Institutions (3)

ETH Zurich¹, Microsoft², University of Duisburg-Essen³

01 Jan 2008

TL;DR: This book constitutes the refereed proceedings of the 7th International XML Database Symposium, XSym 2010, held in Singapore, in September 2010, and is organized in topical sections on XML query processing, XML update and applications, and XML modeling.

...read moreread less

Abstract: This book constitutes the refereed proceedings of the 7th International XML Database Symposium, XSym 2010, held in Singapore, in September 2010.The 11 papers were carefully reviewed and selected from 20 submissions. The papers are organized in topical sections on XML query processing, XML update and applications, and XML modeling.

...read moreread less

Proceedings Article•DOI•

Query biased snippet generation in XML search

[...]

Yu Huang¹, Ziyang Liu¹, Yi Chen¹•Institutions (1)

Arizona State University¹

09 Jun 2008

TL;DR: This paper identifies that a good XML result snippet should be a self-contained meaningful information unit of a small size that effectively summarizes this query result and differentiates it from others, according to which users can quickly assess the relevance of the query result.

...read moreread less

Abstract: Snippets are used by almost every text search engine to complement ranking scheme in order to effectively handle user searches, which are inherently ambiguous and whose relevance semantics are difficult to assess. Despite the fact that XML is a standard representation format of web data, research on generating result snippets for XML search remains untouched.In this paper we present a system, eXtract, which addresses this important yet open problem. We identify that a good XML result snippet should be a self-contained meaningful information unit of a small size that effectively summarizes this query result and differentiates it from others, according to which users can quickly assess the relevance of the query result. We have designed and implemented a novel algorithm to satisfy these requirements and verified its efficiency and effectiveness through experiments.

...read moreread less

Journal Article•DOI•

Efficient memory representation of XML document trees

[...]

Giorgio Busatto¹, Markus Lohrey², Sebastian Maneth³•Institutions (3)

University of Oldenburg¹, Leipzig University², University of New South Wales³

01 Jun 2008-Information Systems

TL;DR: A technique is presented that allows to represent the tree structure of an XML document in an efficient way by compressing their tree structure, and the functionality of basic tree operations, like traversal along edges, is preserved under this compressed representation.

...read moreread less

Book Chapter•DOI•

Overview of the INEX 2007 Entity Ranking Track

[...]

Arjen P. de Vries, Anne-Marie Vercoustre¹, James A. Thom², Nick Craswell³, Mounia Lalmas⁴ - Show less +1 more•Institutions (4)

French Institute for Research in Computer Science and Automation¹, RMIT University², Microsoft³, Queen Mary University of London⁴

01 May 2008

TL;DR: This track overview introduces the track setup, and discusses the implications of the new relevance notion for entity ranking in comparison to ad hoc retrieval.

...read moreread less

Abstract: Many realistic user tasks involve the retrieval of specific entities instead of just any type of documents. Examples of information needs include `Countries where one can pay with the euro' or `Impressionist art museums in The Netherlands'. The Initiative for Evaluation of XML Retrieval (INEX) started the XML Entity Ranking track (INEX-XER) to create a test collection for entity retrieval in Wikipedia. Entities are assumed to correspond to Wikipedia entries. The goal of the track is to evaluate how well systems can rank entities in response to a query; the set of entities to be ranked is assumed to be loosely defined either by a generic category (entity ranking) or by some example entities (list completion). This track overview introduces the track setup, and discusses the implications of the new relevance notion for entity ranking in comparison to ad hoc retrieval.

...read moreread less

Journal Article•DOI•

TopX: efficient and versatile top-k query processing for semistructured data

[...]

Martin Theobald¹, Holger Bast¹, Debapriyo Majumdar¹, Ralf Schenkel¹, Gerhard Weikum¹ - Show less +1 more•Institutions (1)

Max Planck Society¹

01 Jan 2008

TL;DR: The main contributions of this paper unfold into four main points: fully implemented models and algorithms for ranked XML retrieval with XPath Full-Text functionality, efficient and effective top-k query processing for semistructured data, support for integrating thesauri and ontologies with statistically quantified relationships among concepts, and a comprehensive description of the TopX system.

...read moreread less

Abstract: Recent IR extensions to XML query languages such as Xpath 1.0 Full-Text or the NEXI query language of the INEX benchmark series reflect the emerging interest in IR-style ranked retrieval over semistructured data. TopX is a top-k retrieval engine for text and semistructured data. It terminates query execution as soon as it can safely determine the k top-ranked result elements according to a monotonic score aggregation function with respect to a multidimensional query. It efficiently supports vague search on both content- and structure-oriented query conditions for dynamic query relaxation with controllable influence on the result ranking. The main contributions of this paper unfold into four main points: (1) fully implemented models and algorithms for ranked XML retrieval with XPath Full-Text functionality, (2) efficient and effective top-k query processing for semistructured data, (3) support for integrating thesauri and ontologies with statistically quantified relationships among concepts, leveraged for word-sense disambiguation and query expansion, and (4) a comprehensive description of the TopX system, with performance experiments on large-scale corpora like TREC Terabyte and INEX Wikipedia.

...read moreread less

Journal Article•DOI•

Querying business processes with BP-QL

[...]

Catriel Beeri¹, Anat Eyal², Simon Kamenkovich², Tova Milo²•Institutions (2)

Hebrew University of Jerusalem¹, Tel Aviv University²

01 Sep 2008-Information Systems

TL;DR: BP-QL as discussed by the authors is a query language for querying business processes based on an intuitive model of business processes, an abstraction of the emerging BPEL (business process execution language) standard.

...read moreread less

Proceedings Article•DOI•

Muse: Mapping Understanding and deSign by Example

[...]

Bogdan Alexe¹, Laura Chiticariu¹, Renée J. Miller², Wang-Chiew Tan¹•Institutions (2)

University of California, Santa Cruz¹, University of Toronto²

07 Apr 2008

TL;DR: Muse is described, a mapping design wizard that uses data examples to assist designers in understanding and refining a schema mapping towards the desired specification.

...read moreread less

Abstract: A fundamental problem in information integration is that of designing the relationships, called schema mappings, between two schemas. The specification of a semantically correct schema mapping is typically a complex task. Automated tools can suggest potential mappings, but few tools are available for helping a designer understand mappings and design alternative mappings. We describe Muse, a mapping design wizard that uses data examples to assist designers in understanding and refining a schema mapping towards the desired specification. We present novel algorithms behind Muse and show how Muse systematically guides the designer on two important components of a mapping design: the specification of the desired grouping semantics for sets of data and the choice among alternative interpretations for semantically ambiguous mappings. In every component, Muse infers the desired semantics based on the designer's actions on a short sequence of small examples. Whenever possible, Muse draws examples from a familiar database, thus facilitating the design process even further. We report our experience with Muse on some publicly available schemas.

...read moreread less

Book•

Web Service Contract Design and Versioning for SOA

[...]

Thomas Erl, Anish Karmarkar, Priscilla Walmsley, Hugo Haas, L. Umit Yalcinalp, Kevin Liu, David Orchard, Andre Tost, J. Pasley - Show less +5 more

24 Sep 2008

TL;DR: The Ultimate Guide for Designing and Governing Web Service Contracts For Web services to succeed as part of SOA, they require balanced, effective technical contracts that enable services to be evolved and repeatedly reused for years to come.

...read moreread less

Abstract: The Ultimate Guide for Designing and Governing Web Service Contracts For Web services to succeed as part of SOA, they require balanced, effective technical contracts that enable services to be evolved and repeatedly reused for years to come. Now, a team of industry experts presents the first end-to-end guide to designing and governing Web service contracts. Writing for developers, architects, governance specialists, and other IT professionals, the authors cover the following areas: Understanding Web Service Contract Technologies Initial chapters and ongoing supplementary content help even the most inexperienced professional get up to speed on how all of the different technologies and design considerations relate to the creation of Web service contracts. For example, a visual anatomy of a Web service contract documented from logical and physical perspectives is provided, along with a chapter dedicated to describing namespaces in plain English. The book is further equipped with numerous case study examples and many illustrations. Fundamental and Advanced WSDL Tutorial coverage of WSDL 1.1 and 2.0 and detailed descriptions of their differences is followed by numerous advanced WSDL topics and design techniques, including extreme loose coupling, modularization options, use of extensibility elements, asynchrony, message dispatch, service instance identification, non-SOAP HTTP binding, and WS-BPEL extensions. Also explained is how WSDL definitions are shaped by key SOA design patterns. Fundamental and Advanced XML Schema XML Schema basics are covered within the context of Web services and SOA, after which advanced XML Schema chapters delve into a variety of specialized message design considerations and techniques, including the use of wildcards, reusability of schemas and schema fragments, type inheritance and composition, CRUD-style message design, and combining industry and custom schemas. Fundamental and Advanced WS-Policy Topics, such as Policy Expression Structure, Composite Policies, Operator Composition Rules, and Policy Attachment establish a foundation upon which more advanced topics, such as policy reusability and centralization, nested, parameterized, and ignorable assertions are covered, along with an exploration of creating concurrent policy-enabled contracts and designing custom policy assertions and vocabularies. Fundamental Message Design with SOAPA broad range of message design-related topics are covered, including SOAP message structures, SOAP nodes and roles, SOAP faults, designing custom SOAP headers and working with industry-standard SOAP headers. Advanced Message Design with WS-Addressing The art of message design is taken to a new level with in-depth descriptions of WS-Addressing endpoint references (EPRs) and MAP headers and an exploration of how they are applied via SOA design patterns. Also covered are WSDL binding considerations, related MEP rules, WS-Addressing policy assertions, and detailed coverage of how WS-Addressing relates to SOAP Action values. Advanced Message Design with MTOM, and SwA Developing SOAP messages capable of transporting large documents or binary content is explored with a documentation of the MTOM packaging and serialization framework (including MTOM-related policy assertions), together with the SOAP with Attachments (SwA) standard and the related WS-I Attachments Profile. Versioning Techniques and Strategies Fundamental versioning theory starts off a series of chapters that dive into a variety of versioning techniques based on proven SOA design patterns including backward and forward compatibility, version identification strategies, service termination, policy versioning, validation by projection, concurrency control, partial understanding, and versioning with and without wildcards. Web Service Contracts and SOA The constant focus of this book is on the design and versioning of Web service contracts in support of SOA and service-orientation. Relevant SOA design principles and design patterns are periodically discussed to demonstrate how specific Web service technologies can be applied and further optimized. Furthermore, several of the advanced chapters provide expert techniques for designing Web service contracts while taking SOA governance considerations into account. About the Web Sites www.soabooks.com supplements this book with a variety of resources, including a diagram symbol legend, glossary, supplementary articles, and source code available for download. www.soaspecs.com provides further support by establishing a descriptive portal to XML and Web services specifications referenced in all of Erls Service-Oriented Architecture books. Foreword Preface Chapter 1: Introduction Chapter 2: Case Study Background Part I: Fundamental Service Contract Design Chapter 3: SOA Fundamentals and Web Service Contracts Chapter 4: Anatomy of a Web Service Contract Chapter 5: A Plain English Guide to Namespaces Chapter 6: Fundamental XML Schema: Types and Message Structure Basics Chapter 7: Fundamental WSDL Part I: Abstract Description Design Chapter 8: Fundamental WSDL Part II: Concrete Description Design Chapter 9: Fundamental WSDL 2.0: New Features, and Design Options Chapter 10: Fundamental WS-Policy: Expression, Assertion, and Attachment Chapter 11: Fundamental Message Design: SOAP Envelope Structure, and Header Block Processing Part II: Advanced Service Contract Design Chapter 12: Advanced XML Schema Part I: Message Flexibility, and Type Inheritance and Composition Chapter 13: Advanced XML Schema Part II: Reusability, Derived Types, and Relational Design Chapter 14: Advanced WSDL Part I: Modularization, Extensibility, MEPs, and Asynchrony Chapter 15: Advanced WSDL Part II: Message Dispatch, Service Instance Identification, and Non-SOAP HTTP Binding Chapter 16: Advanced WS-Policy Part I: Policy Centralization and Nested, Parameterized, and Ignorable Assertions Chapter 17: Advanced WS-Policy Part II: Custom Policy Assertion Design, Runtime Representation, and Compatibility Chapter 18: Advanced Message Design Part I: WS-Addressing Vocabularies Chapter 19: Advanced Message Design Part II: WS-Addressing Rules and Design Techniques Part III: Service Contract Versioning Chapter 20: Versioning Fundamentals Chapter 21: Versioning WSDL Definitions Chapter 22: Versioning Message Schemas Chapter 23: Advanced Versioning Part IV: Appendices Appendix A: Case Study Conclusion Appendix B: A Comparison of Web Services and REST Services Appendix C: How Technology Standards are Developed Appendix D: Alphabetical Pseudo Schema Reference Appendix E: SOA Design Patterns Related to This Book

...read moreread less

Journal Article•DOI•

Temporal XML: modeling, indexing, and query processing

[...]

Flavio Rizzolo¹, Alejandro A. Vaisman²•Institutions (2)

Center for Information Technology¹, University of Buenos Aires²

01 Aug 2008

TL;DR: This paper proposes a data model for tracking historical information in an XML document and for recovering the state of the document as of any given time, and introduces a new class of summaries, denoted TSummary, that adds the time dimension to the well-known path summarization schemes.

...read moreread less

Abstract: In this paper we address the problem of modeling and implementing temporal data in XML. We propose a data model for tracking historical information in an XML document and for recovering the state of the document as of any given time. We study the temporal constraints imposed by the data model, and present algorithms for validating a temporal XML document against these constraints, along with methods for fixing inconsistent documents. In addition, we discuss different ways of mapping the abstract representation into a temporal XML document, and introduce TXPath, a temporal XML query language that extends XPath 2.0. In the second part of the paper, we present our approach for summarizing and indexing temporal XML documents. In particular we show that by indexing continuous paths, i.e., paths that are valid continuously during a certain interval in a temporal XML graph, we can dramatically increase query performance. To achieve this, we introduce a new class of summaries, denoted TSummary, that adds the time dimension to the well-known path summarization schemes. Within this framework, we present two new summaries: LCP and Interval summaries. The indexing scheme, denoted TempIndex, integrates these summaries with additional data structures. We give a query processing strategy based on TempIndex and a type of ancestor-descendant encoding, denoted temporal interval encoding. We present a persistent implementation of TempIndex, and a comparison against a system based on a non-temporal path index, and one based on DOM. Finally, we sketch a language for updates, and show that the cost of updating the index is compatible with real-world requirements.

...read moreread less

Journal Article•DOI•

Model-independent schema translation

[...]

Paolo Atzeni¹, Paolo Cappellari¹, Riccardo Torlone¹, Philip A. Bernstein², Giorgio Gianforme¹ - Show less +1 more•Institutions (2)

Roma Tre University¹, Microsoft²

01 Nov 2008

TL;DR: A proposal for the implementation of the model management operator ModelGen, which translates schemas from one model to another, for example from object-oriented to SQL or from SQL to XML schema descriptions, is discussed.

...read moreread less

Abstract: We discuss a proposal for the implementation of the model management operator ModelGen, which translates schemas from one model to another, for example from object-oriented to SQL or from SQL to XML schema descriptions. The operator can be used to generate database wrappers (e.g., object-oriented or XML to relational), default user interfaces (e.g., relational to forms), or default database schemas from other representations. The approach translates schemas from a model to another, within a predefined, but large and extensible, set of models: given a source schema S expressed in a source model, and a target model TM, it generates a schema S? expressed in TM that is "equivalent" to S. A wide family of models is handled by using a metamodel in which models can be succinctly and precisely described. The approach expresses the translation as Datalog rules and exposes the source and target of the translation in a generic relational dictionary. This makes the translation transparent, easy to customize and model-independent. The proposal includes automatic generation of translations as composition of basic steps.

...read moreread less

Proceedings Article•DOI•

Recent Advances in a Feature-Rich Framework for Treebank Annotation

[...]

Petr Pajas¹, Jan Štėpánek¹•Institutions (1)

Charles University in Prague¹

18 Aug 2008

TL;DR: This paper presents recent advances in an established treebank annotation framework comprising of an abstract XML-based data format, fully customizable editor of tree-based annotations, a toolkit for all kinds of automated data processing with support for cluster computing, and a work-in-progress database-driven search engine with a graphical user interface built into the tree editor.

...read moreread less

Abstract: This paper presents recent advances in an established treebank annotation framework comprising of an abstract XML-based data format, fully customizable editor of tree-based annotations, a toolkit for all kinds of automated data processing with support for cluster computing, and a work-in-progress database-driven search engine with a graphical user interface built into the tree editor.

...read moreread less

Introducing meta-services for biomedical information extraction

[...]

Florian Leitner, Martin Krallinger, Carlos Rodriguez-Penagos, Jörg Hakenberg¹, Jörg Hakenberg², Conrad Plake², Cheng-Ju Kuo³, Cheng-Ju Kuo⁴, Chun-Nan Hsu³, Richard Tzong-Han Tsai⁵, Hsi-Chuan Hung³, William W. Lau⁶, Calvin A. Johnson⁶, Rune Sætre⁷, Kazuhiro Yoshida⁷, Yan Hua Chen⁸, Sun Kim⁹, Soo-Yong Shin⁹, Byoung-Tak Zhang⁹, William A. Baumgartner¹⁰, Lawrence Hunter¹⁰, Barry Haddow¹¹, Michael Matthews¹¹, Xinglong Wang¹¹, Patrick Ruch, Frédéric Ehrler¹², Arzucan Özgür¹³, Gunes Erkan¹³, Dragomir R. Radev¹³, Michael Krauthammer¹⁴, ThaiBinh Luong¹⁴, Robert Hoffmann¹⁵, Chris Sander¹⁶, Alfonso Valencia - Show less +30 more•Institutions (16)

Humboldt University of Berlin¹, Dresden University of Technology², Academia Sinica³, National Yang-Ming University⁴, Yuan Ze University⁵, Center for Information Technology⁶, University of Tokyo⁷, Norwegian University of Science and Technology⁸, Seoul National University⁹, University of Colorado Denver¹⁰, University of Edinburgh¹¹, University of Geneva¹², University of Michigan¹³, Yale University¹⁴, Massachusetts Institute of Technology¹⁵, Memorial Sloan Kettering Cancer Center¹⁶

01 Sep 2008

TL;DR: The BioCreative MetaServer (BCMS) as discussed by the authors is a meta-service for information extraction in molecular biology, which provides automatically generated annotations for PubMed/Medline abstracts.

...read moreread less

Abstract: We introduce the first meta-service for information extraction in molecular biology, the BioCreative MetaServer (BCMS; http://bcms.bioinfo.cnio.es/). This prototype platform is a joint effort of 13 research groups and provides automatically generated annotations for PubMed/Medline abstracts. Annotation types cover gene names, gene IDs, species, and protein-protein interactions. The annotations are distributed by the meta-server in both human and machine readable formats (HTML/XML). This service is intended to be used by biomedical researchers and database annotators, and in biomedical language processing. The platform allows direct comparison, unified access, and result aggregation of the annotations.

...read moreread less

Journal Article•DOI•

Introducing meta-services for biomedical information extraction

[...]

01 Sep 2008-Genome Biology

TL;DR: The first meta-service for information extraction in molecular biology, the BioCreative MetaServer (BCMS), provides automatically generated annotations for PubMed/Medline abstracts and is intended to be used by biomedical researchers and database annotators, and in biomedical language processing.

...read moreread less

Patent•

System and method for computerized insurance rating

[...]

Perry M. Roschelle, Leonard G. Fiorilli, Oai T. Tran, Michael Robert McKernan

12 Nov 2008

TL;DR: In this paper, a computer system and method for integrating legacy insurance policy underwriting is described, which integrates an older legacy policy generating system to on-line rating systems where users access the system through browsers.

...read moreread less

Abstract: The invention relates generally to a computer system and method for integrating insurance policy underwriting. In one aspect, it integrates an older legacy insurance policy generating system to on-line rating systems where users access the system through browsers. The computer system to perform the process of dynamically rating includes generating an input XML file of risk information that is sent to a web service and calculated in a calculation engine. The processed data is retrieved by the web service and transmitted as an XML file to a user interface that parses the rating information and displays the data.

...read moreread less

Proceedings Article•DOI•

Data flow testing of service-oriented workflow applications

[...]

Lijun Mei¹, W. K. Chan², T. H. Tse¹•Institutions (2)

University of Hong Kong¹, City University of Hong Kong²

10 May 2008

TL;DR: This paper develops an algorithm to construct XRGs and a novel family of data flow testing criteria to test WS-BPEL applications and proposes a data structure called XPath rewriting graph (XRG), which not only models how an XPath is conceptually rewritten but also tracks individual rewritings progressively.

...read moreread less

Abstract: WS-BPEL applications are a kind of service-oriented application. They use XPath extensively to integrate loosely-coupled workflow steps. However, XPath may extract wrong data from the XML messages received, resulting in erroneous results in the integrated process. Surprisingly, although XPath plays a key role in workflow integration, inadequate researches have been conducted to address the important issues in software testing. This paper tackles the problem. It also demonstrates a novel transformation strategy to construct artifacts. We use the mathematical definitions of XPath constructs as rewriting rules, and propose a data structure called XPath rewriting graph (XRG), which not only models how an XPath is conceptually rewritten but also tracks individual rewritings progressively. We treat the mathematical variables in the applied rewriting rules as if they were program variables, and use them to analyze how information may be rewritten in an XPath conceptually. We thus develop an algorithm to construct XRGs and a novel family of data flow testing criteria to test WS-BPEL applications. Experiment results show that our testing approach is promising.

...read moreread less

Journal Article•

Tree model guided candidate generation for mining frequent subtrees from XML

[...]

Henry Tan, Fedja Hadzic, Tharam S. Dillon, Elizabeth Chang, Ling Feng, L. Feng - Show less +2 more

01 Jan 2008-ACM Transactions on Knowledge Discovery From Data

Collapse