Showing papers on "XML published in 2005"

PDF

Open Access

Proceedings Article•DOI•

How to Declare Access Control Policies for XML Structured Information Objects using OASIS' eXtensible Access Control Markup Language (XACML)

[...]

A. Matheus¹•Institutions (1)

Technische Universität München¹

03 Jan 2005

TL;DR: This paper introduces a novel approach for declaring information object related access restrictions, based on a valid XML encoding, and shows, how the access restrictions can be declared using XACML and Xpath.

...read moreread less

Abstract: Web Services, as the new building blocks of today's Internet provide the power to access distributed and heterogeneous information objects, which is the base for more advanced use like in electronic commerce. But, the access to these information objects is not always unrestricted. The owner of the information objects may control access due to different reasons. This paper introduces a novel approach for declaring information object related access restrictions, based on a valid XML encoding. The paper shows, how the access restrictions can be declared using XACML and Xpath. Based on the specified 'fine grained' policies, multiple policies can be applicable. If these policies declare positive and negative permissions for the same subject, policy inconsistencies exist. The paper also focuses on specifying the ground of policy inconsistencies and how to solve them.

...read moreread less

731 citations

The Atom Syndication Format

[...]

Mark Nottingham, Robert Sayre

01 Dec 2005

TL;DR: This document specifies Atom, an XML-based Web content and metadata syndication format that is compatible with HTML, CSS, and other standards-based formats.

...read moreread less

Abstract: This document specifies Atom, an XML-based Web content and metadata syndication format. [STANDARDS-TRACK]

...read moreread less

326 citations

Proceedings Article•

From region encoding to extended dewey: on efficient processing of XML twig pattern matching

[...]

Jiaheng Lu¹, Tok Wang Ling¹, Chee-Yong Chan¹, Ting Chen¹•Institutions (1)

National University of Singapore¹

30 Aug 2005

TL;DR: This paper designs a novel holistic twig join algorithm, called TJFast, which to answer a twig query only needs to access the labels of the leaf query nodes, and reports experimental results to show that these algorithms are superior to previous approaches in terms of the number of elements scanned, the size of intermediate results and query performance.

...read moreread less

Abstract: Finding all the occurrences of a twig pattern in an XML database is a core operation for efficient evaluation of XML queries. A number of algorithms have been proposed to process a twig query based on region encoding labeling scheme. While region encoding supports efficient determination of structural relationship between two elements, we observe that the information within a single label is very limited. In this paper, we propose a new labeling scheme, called extended Dewey. This is a powerful labeling scheme, since from the label of an element alone, we can derive all the elements names along the path from the root to the element. Based on extended Dewey, we design a novel holistic twig join algorithm, called TJFast. Unlike all previous algorithms based on region encoding, to answer a twig query, TJFast only needs to access the labels of the leaf query nodes. Through this, not only do we reduce disk access, but we also support the efficient evaluation of queries with wildcards in branching nodes, which is very difficult to be answered by algorithms based on region encoding. Finally, we report our experimental results to show that our algorithms are superior to previous approaches in terms of the number of elements scanned, the size of intermediate results and query performance.

...read moreread less

309 citations

Journal Article•DOI•

Building rich web applications with Ajax

[...]

L.D. Paulson

01 Oct 2005-IEEE Computer

TL;DR: A look at how developers are going back to the future by building Web applications using Ajax (Asynchronous JavaScript and XML), a set of technologies mostly developed in the 1990s.

...read moreread less

Abstract: Looks at how developers are going back to the future by building Web applications using Ajax (Asynchronous JavaScript and XML), a set of technologies mostly developed in the 1990s. A key advantage of Ajax applications is that they look and act more like desktop applications. Proponents argue that Ajax applications perform better than traditional Web programs. As an example, Ajax applications can add or retrieve new data for a page it is working with and the page will update immediately without reloading.

...read moreread less

300 citations

Proceedings Article•DOI•

Clio grows up: from research prototype to industrial tool

[...]

Laura M. Haas¹, Mauricio A. Hernández¹, Howard Ho¹, Lucian Popa¹, Mary Roth¹ - Show less +1 more•Institutions (1)

IBM¹

14 Jun 2005

TL;DR: The architecture and algorithms behind Clio are revisited, some implementation issues, optimizations needed for scalability, and general lessons learned in the road towards creating an industrial-strength tool are discussed.

...read moreread less

Abstract: Clio, the IBM Research system for expressing declarative schema mappings, has progressed in the past few years from a research prototype into a technology that is behind some of IBM's mapping technology. Clio provides a declarative way of specifying schema mappings between either XML or relational schemas. Mappings are compiled into an abstract query graph representation that captures the transformation semantics of the mappings. The query graph can then be serialized into different query languages, depending on the kind of schemas and systems involved in the mapping. Clio currently produces XQuery, XSLT, SQL, and SQL/XML queries. In this paper, we revisit the architecture and algorithms behind Clio. We then discuss some implementation issues, optimizations needed for scalability, and general lessons learned in the road towards creating an industrial-strength tool.

...read moreread less

298 citations

Journal Article•

Open Archives Initiative Protocol for Metadata Harvesting

[...]

Wang Jun¹•Institutions (1)

Peking University¹

01 Jan 2005-Information Sciences

TL;DR: This paper is an introduction of the OAI protocol for metadata harvesting and the main technical idea of OAI-PMH is how to implementing the protocol.

...read moreread less

296 citations

Journal Article•DOI•

The Open Microscopy Environment (OME) Data Model and XML file: open tools for informatics and quantitative analysis in biological imaging

[...]

Ilya G. Goldberg¹, Chris Allan², Jean-Marie Burel², Doug Creager³, Andrea Falconi², Harry Hochheiser¹, Josiah Johnston¹, Jeff Mellen³, Peter K. Sorger³, Jason R. Swedlow² - Show less +6 more•Institutions (3)

National Institutes of Health¹, University of Dundee², Massachusetts Institute of Technology³

03 May 2005-Genome Biology

TL;DR: The OME Data Model, expressed in Extensible Markup Language (XML) and realized in a traditional database, is both extensible and self-describing, allowing it to meet emerging imaging and analysis needs.

...read moreread less

Abstract: The Open Microscopy Environment (OME) defines a data model and a software implementation to serve as an informatics framework for imaging in biological microscopy experiments, including representation of acquisition parameters, annotations and image analysis results. OME is designed to support high-content cell-based screening as well as traditional image analysis applications. The OME Data Model, expressed in Extensible Markup Language (XML) and realized in a traditional database, is both extensible and self-describing, allowing it to meet emerging imaging and analysis needs.

...read moreread less

289 citations

Patent•

Document management system with enhanced intelligent document recognition capabilities

[...]

Suresh S. Pandian, Thyagarajan Swaminathan, Subramaniyan Neelagandan, Krishna K. Srinivasan, Randal J. Martin - Show less +1 more

10 Jun 2005

TL;DR: An intelligent document recognition-based document management system as discussed by the authors includes modules for image capture, image enhancement, image identification, optical character recognition (OCR), data extraction, and quality assurance.

...read moreread less

Abstract: An intelligent document recognition-based document management system (Fig. 2) includes modules for image capture (32), image enhancement (32), image identification (34), optical character recognition (36), data extraction (37) and quality assurance (42). The system captures data from electronic documents as diverse as facsimile images, scanned images and images from document management systems. It processes these images and presents the data in, for example, a standard XML format. The document management system processes both structured document images (40) (ones which have a standard format) and unstructured document images (38) (ones which do not have a standard format). The system can extract images directly from a facsimile machine, a scanner or a document management system for processing.

...read moreread less

233 citations

Book Chapter•DOI•

Efficient memory representation of XML documents

[...]

Giorgio Busatto¹, Markus Lohrey², Sebastian Maneth³•Institutions (3)

University of Oldenburg¹, University of Stuttgart², École Polytechnique Fédérale de Lausanne³

28 Aug 2005

TL;DR: A technique is presented that allows to represent the tree structure of an XML document in an efficient way by “compressing” their tree structure, which allows to directly execute queries without prior decompression.

...read moreread less

Abstract: Implementations that load XML documents and give access to them via, e.g., the DOM, suffer from huge memory demands: the space needed to load an XML document is usually many times larger than the size of the document. A considerable amount of memory is needed to store the tree structure of the XML document. Here a technique is presented that allows to represent the tree structure of an XML document in an efficient way. The representation exploits the high regularity in XML documents by “compressing” their tree structure; the latter means to detect and remove repetitions of tree patterns. The functionality of basic tree operations, like traversal along edges, is preserved in the compressed representation. This allows to directly execute queries (and in particular, bulk operations) without prior decompression. For certain tasks like validation against an XML type or checking equality of documents, the representation allows for provably more efficient algorithms than those running on conventional representations.

...read moreread less

225 citations

Mapping XML to OWL ontologies

[...]

Hannes Bohring, Sören Auer¹•Institutions (1)

Leipzig University¹

01 Jan 2005

TL;DR: This paper presents a mapping between the data model elements of XML and OWL, and gives account about its implementation within a ready-to-use XSLT framework, as well as its evaluation for common use cases.

...read moreread less

Abstract: By now, XML has reached a wide acceptance as data exchange format in E-Business. An efficient collaboration between different participants in E-Business thus, is only possible, when business partners agree on a common syntax and have a common understanding of the basic concepts in the domain. XML covers the syntactic level, but lacks support for efficient sharing of conceptualizations. The Web Ontology Language (OWL [Bec04]) in turn supports the representation of domain knowledge using classes, properties and instances for the use in a distributed environment as the WorldWideWeb. We present in this paper a mapping between the data model elements of XML and OWL. We give account about its implementation within a ready-to-use XSLT framework, as well as its evaluation for common use cases.

...read moreread less

208 citations

Patent•

Chart view for reusable data markup language

[...]

Russell T. Davis

03 May 2005

TL;DR: A chart view as discussed by the authors is a component of a data viewer used to retrieve, manipulate, and view documents in the Reusable Data Markup Language (RDML) format, which facilitates the browsing and manipulation of numbers, as opposed to text as in HTML.

...read moreread less

Abstract: Methods and systems provide a “chart view” for a markup language referred to as Reusable Data Markup Language (“RDML”). Generally, a chart view comprises the components necessary for automatically manipulating and displaying a graphical display of numerical data contained in RDML markup documents. RDML is a markup language, such as the Hypertext Markup Language (“HTML”) or the Extensible Markup Language (“XML”). Generally, RDML facilitates the browsing and manipulation of numbers, as opposed to text as in HTML, and does so by requiring attributes describing the meaning of the numbers to be attached to the numbers. Upon receiving RDML markup documents, the chart view transforms, formats, manipulates and displays data stored in the markup documents using the attributes describing the meaning of the data. The chart view uses the attributes of the numbers to, for example, facilitate the simultaneous display of different series of numbers of different types on a single chart and automatically display appropriate axis labels, axis titles, chart titles, number precision, etc. A chart view may be a component of a data viewer used to retrieve, manipulate, and view documents in the RDML format.

...read moreread less

Proceedings Article•DOI•

On boosting holism in XML twig pattern matching using structural indexing techniques

[...]

Ting Chen¹, Jiaheng Lu¹, Tok Wang Ling¹•Institutions (1)

National University of Singapore¹

14 Jun 2005

TL;DR: This paper develops a method to perform holistic twig pattern matching on XML documents partitioned using various streaming schemes and can process a large class of twig patterns consisting of both ancestor-descendant and parent-child relationships and avoid generating redundant intermediate results.

...read moreread less

Abstract: Searching for all occurrences of a twig pattern in an XML document is an important operation in XML query processing. Recently a holistic method TwigStack. [2] has been proposed. The method avoids generating large intermediate results which do not contribute to the final answer and is CPU and I/O optimal when twig patterns only have ancestor-descendant relationships. Another important direction of XML query processing is to build structural indexes [3][8][13][15] over XML documents to avoid unnecessary scanning of source documents. We regard XML structural indexing as a technique to partition XML documents and call it streaming scheme in our paper. In this paper we develop a method to perform holistic twig pattern matching on XML documents partitioned using various streaming schemes. Our method avoids unnecessary scanning of irrelevant portion of XML documents. More importantly, depending on different streaming schemes used, it can process a large class of twig patterns consisting of both ancestor-descendant and parent-child relationships and avoid generating redundant intermediate results. Our experiments demonstrate the applicability and the performance advantages of our approach.

...read moreread less

Journal Article•DOI•

A comprehensive approach for the development of modular software architecture description languages

[...]

Eric M. Dashofy¹, André van der Hoek¹, Richard N. Taylor¹•Institutions (1)

University of California, Irvine¹

01 Apr 2005-ACM Transactions on Software Engineering and Methodology

TL;DR: The technical contribution of the infrastructure is augmented by several research contributions: the first decomposition of an architecture description language into modules, insights about how to develop new language modules and a process for integrating them, and insights about the roles of different kinds of tools in a modular ADL-based infrastructure.

...read moreread less

Abstract: Research over the past decade has revealed that modeling software architecture at the level of components and connectors is useful in a growing variety of contexts. This has led to the development of a plethora of notations for representing software architectures, each focusing on different aspects of the systems being modeled. In general, these notations have been developed without regard to reuse or extension. This makes the effort in adapting an existing notation to a new purpose commensurate with developing a new notation from scratch. To address this problem, we have developed an approach that allows for the rapid construction of new architecture description languages (ADLs). Our approach is unique because it encapsulates ADL features in modules that are composed to form ADLs. We achieve this by leveraging the extension mechanisms provided by XML and XML schemas. We have defined a set of generic, reusable ADL modules called xADL 2.0, useful as an ADL by itself, but also extensible to support new applications and domains. To support this extensibility, we have developed a set of reflective syntax-based tools that adapt to language changes automatically, as well as several semantically-aware tools that provide support for advanced features of xADL 2.0. We demonstrate the effectiveness, scalability, and flexibility of our approach through a diverse set of experiences. First, our approach has been applied in industrial contexts, modeling software architectures for aircraft software and spacecraft systems. Second, we show how xADL 2.0 can be extended to support the modeling features found in two different representations for modeling product-line architectures. Finally, we show how our infrastructure has been used to support its own development. The technical contribution of our infrastructure is augmented by several research contributions: the first decomposition of an architecture description language into modules, insights about how to develop new language modules and a process for integrating them, and insights about the roles of different kinds of tools in a modular ADL-based infrastructure.

...read moreread less

Journal Article•DOI•

Named graphs

[...]

Jeremy J. Carroll¹, Christian Bizer², Patrick J. Hayes³, Patrick Stickler⁴•Institutions (4)

Hewlett-Packard¹, Free University of Berlin², Florida Institute for Human and Machine Cognition³, Nokia⁴

01 Dec 2005-Journal of Web Semantics

TL;DR: In this paper, the authors extend the syntax and semantics of RDF to cover named graphs, which enables RDF statements that describe graphs, useful in many Semantic Web application areas.

...read moreread less

Patent•

Virtual tags and the process of virtual tagging

[...]

Tomasz Imielinski¹, Vince Sgro¹, Don Smith¹•Institutions (1)

Rutgers University¹

18 Nov 2005

TL;DR: In this paper, a method and system for transformation of an electronic document through learning transformation rules during training from the original electronic document using visual user feedback and applying the learned transformation rules to a second electronic document having a similar structure as the original document or all future instances of the original e-document.

...read moreread less

Abstract: The present invention relates to a method and system for transformation of an electronic document through learning transformation rules during training from the original electronic document using visual user feedback and applying the learned transformation rules to either the original electronic document or a second electronic document having a similar structure as the original document or all future instances of the original electronic document. Accordingly, the transformed document is customized to the user's preference learned during training. Preferably, the transformed document is created in a queriable form. For example, the original electronic document can be defined any type of mark-up language or electronic document generation language, such as Hypertext mark-up language (HTML), extended mark-up language (XML), portable data file (PDF) or Microsoft® Word, and the like and the transformed document is defined in a queriable language such as (XML) views and the like. For example, a virtual page can be a customization of an instance of a Web page which can be used to transform all future instances of the original Web page. Alternatively, the virtual page is formed form a customization of an original electronic document, such as a chapter in a book, which is applied to a second electronic document having a similar structure, such as all chapters in the book.

...read moreread less

A Meta Model for Process Mining Data.

[...]

Boudewijn F. van Dongen¹, Wil M. P. van der Aalst•Institutions (1)

Eindhoven University of Technology¹

01 Jan 2005

TL;DR: A meta model for event logs is proposed that gives the requirements for the data that should be available, both informally and formally, and backs up with an XML format called MXML and a tooling framework that is capable of reading MXML files.

...read moreread less

Abstract: Modern process-aware information systems store detailed information about processes as they are being executed. This kind of information can be used for very different purposes. The term process mining refers to the techniques and tools to extract knowledge (e.g., in the form of models) from this. Several key players in this area have developed sophisticated process mining tools, such as Aris PPM and the HP Business Cockpit, that are capable of using the information available to generate meaningful insights. What most of these commercial process mining tools have in common is that installation and maintenance of the systems requires enormous effort, and deep knowledge of the underlying information system. Moreover, information systems log events in different ways. Therefore, the interface between process-aware information systems and process mining tools is far from trivial. It is vital to correctly map and interpret event logs recorded by the underlying information systems. Therefore, we propose a meta model for event logs. We give the requirements for the data that should be available, both informally and formally. Furthermore, we back our meta model up with an XML format called MXML and a tooling framework that is capable of reading MXML files. Although, the approach presented in this paper is very pragmatic, it can be seen as a first step towards and ontological analysis of process mining data.

...read moreread less

Patent•

News feed browser

[...]

Jessica Kahn¹, Jens Alfke¹, Sarah Wilkin¹, Albert R. Howard¹, Steven P. Jobs¹, Scott Forstall¹, Gregory N. Christie¹, Stephen O. Lemay¹, Donald Melton¹, Wayne Loofbourrow - Show less +6 more•Institutions (1)

Apple Inc.¹

13 Apr 2005

TL;DR: In this paper, techniques for detecting, managing, and presenting syndication XML (feeds) are disclosed, where a web browser automatically determines that a web site is publishing feeds and notifies the user, who can then access the feed easily.

...read moreread less

Abstract: Techniques for detecting, managing, and presenting syndication XML (feeds) are disclosed. In one embodiment, a web browser automatically determines that a web site is publishing feeds and notifies the user, who can then access the feed easily. In another embodiment, a browser determines that a web page or feed is advertising relationship XML, and displays information about the people identified in the relationship XML. In yet another embodiment, a browser determines that a file contains a feed and enables the user to view it in a user-friendly way. In yet another embodiment, feed state information is stored in a repository that is accessible by applications that are used to view the feed. In yet another embodiment, if a feed's state changes, an application notifies the repository, and the state is updated. In yet another embodiment, a feed is parsed and stored in a structured way.

...read moreread less

Journal Article•DOI•

Regular expression types for XML

[...]

Haruo Hosoya¹, Jérôme Vouillon², Benjamin C. Pierce³•Institutions (3)

University of Tokyo¹, Centre national de la recherche scientifique², University of Pennsylvania³

01 Jan 2005-ACM Transactions on Programming Languages and Systems

TL;DR: A practical algorithm that, unlike classical algorithms based on determinization of tree automata, checks the inclusion relation by a top-down traversal of the original type expressions, which can exploit the property that type expressions being compared often share portions of their representations.

...read moreread less

Abstract: We propose regular expression types as a foundation for statically typed XML processing languages. Regular expression types, like most schema languages for XML, introduce regular expression notations such as repetition (a), alternation (v), etc., to describe XML documents. The novelty of our type system is a semantic presentation of subtyping, as inclusion between the sets of documents denoted by two types. We give several examples illustrating the usefulness of this form of subtyping in XML processing.The decision problem for the subtype relation reduces to the inclusion problem between tree automata, which is known to be EXPTIME-complete. To avoid this high complexity in typical cases, we develop a practical algorithm that, unlike classical algorithms based on determinization of tree automata, checks the inclusion relation by a top-down traversal of the original type expressions. The main advantage of this algorithm is that it can exploit the property that type expressions being compared often share portions of their representations. Our algorithm is a variant of Aiken and Murphy's set-inclusion constraint solver, to which are added several new implementation techniques, correctness proofs, and preliminary performance measurements on some small programs in the domain of typed XML processing.

...read moreread less

Journal Article•DOI•

Fuzzy control interoperability and scalability for adaptive domotic framework

[...]

Giovanni Acampora, Vincenzo Loia

16 May 2005-IEEE Transactions on Industrial Informatics

TL;DR: An agent-based framework designed for providing proactive services in domotic environments, and an agent architecture that adopts interoperability techniques that represent an efficient experience for adaptive domotic framework are presented.

...read moreread less

Abstract: The evolution of the microprocessor industry, combined with the reduction on cost and increase of efficiency, gives rise to new scenario for ubiquitous computing where humans trigger seamlessly activities and tasks using unusual (often imperceptible) interfaces according to physical space and context. Many problems must be faced: adaptivity, hybrid control strategies, system (hardware) integration, and ubiquitous networking access. In this paper, a solution that attempts to provide a flexible and dependable solution to these complicated problems is illustrated. First, an extensible markup language (XML)-derived technologies is proposed to define fuzzy markup language (FML), a markup language skilled for defining detailed structure of fuzzy control independent from its legacy representation. FML is essentially composed of three layers: 1) XML in order to create a new markup language for fuzzy logic control; 2) document type definition in order to define the legal building blocks; and 3) extensible stylesheet language transformations in order to convert a fuzzy controller description into a specific programming language. Then an agent-based framework designed for providing proactive services in domotic environments, is presented. The agent architecture, exploiting mobile computation, is able to maximize the fuzzy control deployment for the natively FML representation by performing an efficient distribution of pieces of the global control flow over the different computers. Agents are also used to capture user habits, to identify requests, and to apply the artefact-mediated activity through an adaptive fuzzy control strategy. The architecture adopts interoperability techniques that, combined with sophisticated control facilities, represent an efficient experience for adaptive domotic framework.

...read moreread less

Journal Article•DOI•

Querying XML streams

[...]

Vanja Josifovski¹, Marcus Fontoura¹, Attila Barta²•Institutions (2)

IBM¹, University of Toronto²

01 Apr 2005

TL;DR: The TurboXPath path processor is proposed, which accepts a language equivalent to a subset of the for-let-where constructs of XQuery over a single document, and can be extended to provide full XQuery support or used to augment federated database engines for efficient handling of queries over XML data streams produced by external sources.

...read moreread less

Abstract: Efficient querying of XML streams will be one of the fundamental features of next-generation information systems. In this paper we propose the TurboXPath path processor, which accepts a language equivalent to a subset of the for-let-where constructs of XQuery over a single document. TurboXPath can be extended to provide full XQuery support or used to augment federated database engines for efficient handling of queries over XML data streams produced by external sources. Internally, TurboXPath uses a tree-shaped path expression with multiple outputs to drive the execution. The result of a query execution is a sequence of tuples of XML fragments matching the output nodes. Based on a streamed execution model, TurboXPath scales up to large documents and has limited memory consumption for increased concurrency. Experimental evaluation of a prototype demonstrates performance gains compared to other state-of-the-art path processors.

...read moreread less

Proceedings Article•DOI•

NaLIX: an interactive natural language interface for querying XML

[...]

Yunyao Li¹, Huahai Yang², H. V. Jagadish¹•Institutions (2)

University of Michigan¹, University at Albany, SUNY²

14 Jun 2005

TL;DR: It is shown that NaLIX, while far from being able to pass the Turing test, is perfectly usable in practice, and able to handle even quite complex queries in a variety of application domains.

...read moreread less

Abstract: Database query languages can be intimidating to the non-expert, leading to the immense recent popularity for keyword based search in spite of its significant limitations. The holy grail has been the development of a natural language query interface. We present NaLIX, a generic interactive natural language query interface to an XML database. Our system can accept an arbitrary English language sentence as query input, which can include aggregation, nesting, and value joins, among other things. This query is translated, potentially after reformulation, into an XQuery expression that can be evaluated against an XML database. The translation is done through mapping grammatical proximity of natural language parsed tokens to proximity of corresponding elements in the result XML. In this demonstration, we show that NaLIX, while far from being able to pass the Turing test, is perfectly usable in practice, and able to handle even quite complex queries in a variety of application domains. In addition, we also demonstrate how carefully designed features in NaLIX facilitate the interactive query process and improve the usability of the interface.

...read moreread less

Proceedings Article•DOI•

Relational queries over program traces

[...]

Simon F. Goldsmith¹, Robert O'Callahan, Alex Aiken²•Institutions (2)

University of California, Berkeley¹, Stanford University²

12 Oct 2005

TL;DR: This work proposes Program Trace Query Language (PTQL), a language based on relational queries over program traces, in which programmers can write expressive, declarative queries about program behavior, and describes the compiler, Partiqle, which instruments the program to execute the query on-line.

...read moreread less

Abstract: Instrumenting programs with code to monitor runtime behavior is a common technique for profiling and debugging. In practice, instrumentation is either inserted manually by programmers, or automatically by specialized tools that monitor particular properties. We propose Program Trace Query Language (PTQL), a language based on relational queries over program traces, in which programmers can write expressive, declarative queries about program behavior. We also describe our compiler, Partiqle. Given a PTQL query and a Java program, Partiqle instruments the program to execute the query on-line. We apply several PTQL queries to a set of benchmark programs, including the Apache Tomcat Web server. Our queries reveal significant performance bugs in the jack SpecJVM98 benchmark, in Tomcat, and in the IBM Java class library, as well as some correct though uncomfortably subtle code in the Xerces XML parser. We present performance measurements demonstrating that our prototype system has usable performance.

...read moreread less

Journal Article•DOI•

PADS: a domain-specific language for processing ad hoc data

[...]

Kathleen Fisher¹, Robert E. Gruber²•Institutions (2)

AT&T Labs¹, Google²

12 Jun 2005

TL;DR: From such descriptions, the PADS compiler generates libraries and tools for manipulating the data, including parsing routines, statistical profiling tools, translation programs to produce well-behaved formats such as Xml or those required for loading relational databases, and Tools for running XQueries over raw PADS data sources.

...read moreread less

Abstract: PADS is a declarative data description language that allows data analysts to describe both the physical layout of ad hoc data sources and semantic properties of that data. From such descriptions, the PADS compiler generates libraries and tools for manipulating the data, including parsing routines, statistical profiling tools, translation programs to produce well-behaved formats such as Xml or those required for loading relational databases, and tools for running XQueries over raw PADS data sources. The descriptions are concise enough to serve as "living" documentation while flexible enough to describe most of the ASCII, binary, and Cobol formats that we have seen in practice. The generated parsing library provides for robust, application-specific error handling.

...read moreread less

Journal Article•DOI•

PDBML: the representation of archival macromolecular structure data in XML

[...]

John D. Westbrook¹, Nobutoshi Ito², Haruki Nakamura³, Kim Henrick⁴, Helen M. Berman¹ - Show less +1 more•Institutions (4)

Rutgers University¹, Tokyo Medical and Dental University², Osaka University³, European Bioinformatics Institute⁴

01 Apr 2005-Bioinformatics

TL;DR: The correspondences between the PDB dictionary and the XML schema metadata are described as well as the XML representations of PDB dictionaries and data files.

...read moreread less

Abstract: Summary: The Protein Data Bank (PDB) has recently released versions of the PDB Exchange dictionary and the PDB archival data files in XML format collectively named PDBML. The automated generation of these XML files is driven by the data dictionary infrastructure in use at the PDB. The correspondences between the PDB dictionary and the XML schema metadata are described as well as the XML representations of PDB dictionaries and data files. Availability: The current software translated XML schema file is located at http://deposit.pdb.org/pdbML/pdbx-v1.000.xsd, and on the PDB mmCIF resource page at http://deposit.pdb.org/mmcif/. PDBML files are stored on the PDB beta ftp site at ftp://beta.rcsb.org/pub/pdb/uniformity/data/XML Contact: jwest@rcsb.rutgers.edu

...read moreread less

Patent•

System and method for routing power management data via XML firewall

[...]

Douglas S. Ransom, Martin A. Hancock, Ronald G. Hart, J. Forth, Michael E. Teachman, Andrew W. Blackett - Show less +2 more

02 Feb 2005

TL;DR: In this paper, a power management architecture for an electrical power distribution system, or portion thereof, is disclosed, which includes multiple electronic devices distributed throughout the power distribution systems to manage the flow and consumption of power from the system using real-time communications.

...read moreread less

Abstract: A power management architecture for an electrical power distribution system, or portion thereof, is disclosed The architecture includes multiple electronic devices distributed throughout the power distribution system to manage the flow and consumption of power from the system using real time communications Power management application software and/or hardware components operate on the electronic devices and the back-end servers and inter-operate via the network to implement a power management application The architecture provides a scalable and cost effective framework of hardware and software upon which such power management applications can operate to manage the distribution and consumption of electrical power by one or more utilities/suppliers and/or customers which provide and utilize the power distribution system Autonomous communication on the network between IED's, back-end servers and other entities coupled with secure networks, themselves interconnected, via firewalls, by one or more unsecure networks, is facilitated by the use of an XML firewall using SOAP SOAP allows a device to communicate without knowledge of how the sender's system operates or data formats are organized

...read moreread less

Journal Article•DOI•

X-GTRBAC: an XML-based policy specification framework and architecture for enterprise-wide access control

[...]

Rafae Bhatti¹, Arif Ghafoor¹, Elisa Bertino¹, James Joshi²•Institutions (2)

Purdue University¹, University of Pittsburgh²

01 May 2005-ACM Transactions on Information and System Security

TL;DR: This paper investigates the challenges of providing effective mechanism for enforcement of enterprise policy across distributed domains, ensuring secure content-based access to enterprise resources at all user levels, and allowing the specification of temporal and nontemporal context conditions to support fine-grained dynamic access control.

...read moreread less

Abstract: Modern day enterprises exhibit a growing trend toward adoption of enterprise computing services for efficient resource utilization, scalability, and flexibility. These environments are characterized by heterogeneous, distributed computing systems exchanging enormous volumes of time-critical data with varying levels of access control in a dynamic business environment. The enterprises are thus faced with significant challenges as they endeavor to achieve their primary goals, and simultaneously ensure enterprise-wide secure interoperation among the various collaborating entities. Key among these challenges are providing effective mechanism for enforcement of enterprise policy across distributed domains, ensuring secure content-based access to enterprise resources at all user levels, and allowing the specification of temporal and nontemporal context conditions to support fine-grained dynamic access control. In this paper, we investigate these challenges, and present X-GTRBAC, an XML-based GTRBAC policy specification language and its implementation for enforcing enterprise-wide access control. Our specification language is based on the GTRBAC model that incorporates the content- and context-aware dynamic access control requirements of an enterprise. An X-GTRBAC system has been implemented as a Java application. We discuss the salient features of the specification language, and present the software architecture of our system. A comprehensive example is included to discuss and motivate the applicability of the X-GTRBAC framework to a generic enterprise environment. An application level interface for implementing the policy in the X-GTRBAC system is also provided to consolidate the ideas presented in the paper.

...read moreread less

Proceedings Article•

Structure and content scoring for XML

[...]

Sihem Amer-Yahia¹, Nick Koudas², Amélie Marian³, Divesh Srivastava¹, David Toman⁴ - Show less +1 more•Institutions (4)

AT&T Labs¹, University of Toronto², Columbia University³, University of Waterloo⁴

30 Aug 2005

TL;DR: This work proposes novel XML scoring methods that are inspired by tf*idf and that account for both structure and content while considering query relaxations and proposes efficient data structures in order to speed up ranked query processing.

...read moreread less

Abstract: XML repositories are usually queried both on structure and content. Due to structural heterogeneity of XML, queries are often interpreted approximately and their answers are returned ranked by scores. Computing answer scores in XML is an active area of research that oscillates between pure content scoring such as the well-known tf*idf and taking structure into account. However, none of the existing proposals fully accounts for structure and combines it with content to score query answers. We propose novel XML scoring methods that are inspired by tf*idf and that account for both structure and content while considering query relaxations. Twig scoring, accounts for the most structure and content and is thus used as our reference method. Path scoring is an approximation that loosens correlations between query nodes hence reducing the amount of time required to manipulate scores during top-k query processing. We propose efficient data structures in order to speed up ranked query processing. We run extensive experiments that validate our scoring methods and that show that path scoring provides very high precision while improving score computation time.

...read moreread less

Proceedings Article•DOI•

System RX: one part relational, one part XML

[...]

Kevin Scott Beyer¹, Roberta Jo Cochrane¹, Vanja Josifovski¹, Jim Kleewein¹, George Lapis¹, Guy M. Lohman¹, Bob Lyle¹, Fatma Ozcan¹, Hamid Pirahesh¹, Normen Seemann¹, Tuong Chanh Truong¹, Bert Van der Linden¹, Brian S. Vickery¹, Chun Zhang¹ - Show less +10 more•Institutions (1)

IBM¹

14 Jun 2005

TL;DR: The overall architecture and design aspects of a hybrid relational and XML database system called System RX are described, which is the first truly hybrid system that comingles XML and relational data, giving them equal footing.

...read moreread less

Abstract: This paper describes the overall architecture and design aspects of a hybrid relational and XML database system called System RX. We believe that such a system is fundamental in the evolution of enterprise data management solutions: XML and relational data will co-exist and complement each other in enterprise solutions. Furthermore, a successful XML repository requires much of the same infrastructure that already exists in a relational database management system. Finally, XML query languages have considerable conceptual and functional overlap with relational dataflow engines. System RX is the first truly hybrid system that comingles XML and relational data, giving them equal footing. The new support for XML includes native support for storage and indexing as well as query compilation and evaluation support for the latest industry-standard query languages, SQL/XML and XQuery. By building a hybrid system, we leverage more than 20 years of data management research to advance XML technology to the same standards expected from mature relational systems.

...read moreread less

Journal Article•DOI•

Why Your Data Won’t Mix: New tools and techniques can help ease the pain of reconciling schemas.

[...]

Alon Halevy¹•Institutions (1)

University of Washington¹

01 Oct 2005-ACM Queue

TL;DR: For multiple data systems to cooperate with each other, they must understand each other’s schemas; without such understanding, the multitude of data sources amounts to a digital version of the Tower of Babel.

...read moreread less

Abstract: When independent parties develop database schemas for the same domain, they will almost always be quite different from each other. These differences are referred to as semantic heterogeneity, which also appears in the presence of multiple XML documents, Web services, and ontologies—or more broadly, whenever there is more than one way to structure a body of data. The presence of semi-structured data exacerbates semantic heterogeneity, because semi-structured schemas are much more flexible to start with. For multiple data systems to cooperate with each other, they must understand each other’s schemas. Without such understanding, the multitude of data sources amounts to a digital version of the Tower of Babel.

...read moreread less

Proceedings Article•

An efficient and versatile query engine for TopX search

[...]

Martin Theobald¹, Ralf Schenkel¹, Gerhard Weikum¹•Institutions (1)

Max Planck Society¹

30 Aug 2005

TL;DR: TopX as discussed by the authors is a top-k query engine for XML documents with a focus on inexpensive sequential access to index lists and only a few judiciously scheduled random accesses.

...read moreread less

Abstract: This paper presents a novel engine, coined TopX, for efficient ranked retrieval of XML documents over semistructured but nonschematic data collections. The algorithm follows the paradigm of threshold algorithms for top-k query processing with a focus on inexpensive sequential accesses to index lists and only a few judiciously scheduled random accesses. The difficulties in applying the existing top-k algorithms to XML data lie in 1) the need to consider scores for XML elements while aggregating them at the document level, 2) the combination of vague content conditions with XML path conditions, 3) the need to relax query conditions if too few results satisfy all conditions, and 4) the selectivity estimation for both content and structure conditions and their impact on evaluation strategies. TopX addresses these issues by precomputing score and path information in an appropriately designed index structure, by largely avoiding or postponing the evaluation of expensive path conditions so as to preserve the sequential access pattern on index lists, and by selectively scheduling random accesses when they are cost-beneficial. In addition, TopX can compute approximate top-k results using probabilistic score estimators, thus speeding up queries with a small and controllable loss in retrieval precision.

...read moreread less

Collapse