Showing papers on "XML published in 2007"

PDF

Open Access

The Intrusion Detection Message Exchange Format (IDMEF)

[...]

Hervé Debar, David A. Curry, Benjamin S. Feinstein

01 Mar 2007

TL;DR: A data model to represent information exported by intrusion detection systems and the rationale for using this model is explained and an implementation of the data model in the Extensible Markup Language (XML) is presented.

...read moreread less

Abstract: The purpose of the Intrusion Detection Message Exchange Format (IDMEF) is to define data formats and exchange procedures for sharing information of interest to intrusion detection and response systems and to the management systems that may need to interact with them. This document describes a data model to represent information exported by intrusion detection systems and explains the rationale for using this model. An implementation of the data model in the Extensible Markup Language (XML) is presented, an XML Document Type Definition is developed, and examples are provided. This memo defines an Experimental Protocol for the Internet community.

...read moreread less

378 citations

Proceedings Article•DOI•

Identifying meaningful return information for XML keyword search

[...]

Ziyang Liu¹, Yi Chen¹•Institutions (1)

Arizona State University¹

11 Jun 2007

TL;DR: This work presents an XML keyword search engine, XSeek, to infer the semantics of the search and identify return nodes effectively and demonstrates the effectiveness of this engine.

...read moreread less

Abstract: Keyword search enables web users to easily access XML data without the need to learn a structured query language and to study possibly complex data schemas. Existing work has addressed the problem of selecting qualified data nodes that match keywords and connecting them in a meaningful way, in the spirit of inferring a where clause in XQuery. However, how to infer the return clause for keyword search is an open problem. To address this challenge, we present an XML keyword search engine, XSeek, to infer the semantics of the search and identify return nodes effectively. XSeek recognizes possible entities and attributes inherently represented in the data. It also distinguishes between search predicates and return specifications in the keywords. Then based on the analysis of both XML data structures and keyword match patterns, XSeek generates return nodes. Extensive experimental studies show the effectiveness of XSeek.

...read moreread less

278 citations

Journal Article•DOI•

ORegAnno: an open-access community-driven resource for regulatory annotation.

[...]

Obi L. Griffith, Stephen B. Montgomery¹, Bridget Bernier², Bryan Chu², Katayoon Kasaian², Stein Aerts³, Shaun Mahony⁴, Monica C. Sleumer², Mikhail Bilenky², Maximilian Haeussler⁵, Malachi Griffith², Steven M. Gallo⁶, Belinda Giardine⁷, Bart Hooghe⁸, Peter Van Loo³, Enrique Blanco, Amy Ticoll², Stuart Lithwick², Elodie Portales-Casamar², Ian J. Donaldson⁹, Gordon Robertson², Claes Wadelius¹⁰, Pieter De Bleser⁸, D. Vlieghe⁸, Marc S. Halfon⁶, Wyeth W. Wasserman², Ross C. Hardison, Casey M. Bergman⁹, Steven J.M. Jones² - Show less +25 more•Institutions (10)

Wellcome Trust Sanger Institute¹, University of British Columbia², Katholieke Universiteit Leuven³, Penn State Cancer Institute⁴, Centre national de la recherche scientifique⁵, University at Buffalo⁶, Pennsylvania State University⁷, Ghent University⁸, University of Manchester⁹, Uppsala University¹⁰

15 Nov 2007-Nucleic Acids Research

TL;DR: The current release of ORegAnno comprises 30 145 records curated from 922 publications and describing regulatory sequences for over 3853 genes and 465 transcription factors from 19 species.

...read moreread less

Abstract: ORegAnno is an open-source, open-access database and literature curation system for communitybased annotation of experimentally identified DNA regulatory regions, transcription factor binding sites and regulatory variants. The current release comprises 30145 records curated from 922 publications and describing regulatory sequences for over 3853 genes and 465 transcription factors from 19 species. A new feature called the ‘publication queue’ allows users to input relevant papers from scientific literature as targets for annotation. The queue contains 4438 gene regulation papers entered by experts and another 54351 identified by text-mining methods. Users can enter or ‘check out’ papers from the queue for manual curation using a series of user-friendly annotation pages. A typical record entry consists of species, sequence type, sequence, target gene, binding factor, experimental outcome and one or more lines of experimental evidence. An evidence ontology was developed to describe and categorize these experiments. Records are cross-referenced to Ensembl or Entrez gene identifiers, PubMed and dbSNP and can be visualized in the Ensembl or UCSC genome browsers. All data are freely available through search pages, XML data dumps or web services at: http://www.oreganno.org.

...read moreread less

262 citations

Patent•

Querying markup language data sources using a relational query processor

[...]

Martin Breining¹, Vanja Josifovski¹, Peter Schwarz¹•Institutions (1)

IBM¹

27 Jun 2007

TL;DR: An XML wrapper as discussed by the authors queries an XML document in an on-the-fly manner so that only parent nodes in the document that satisfy the query are extracted and then unnested The parent nodes and associated descendent nodes are located using XPath expressions contained as options in data definition language (DDL) statements.

...read moreread less

Abstract: An XML wrapper queries an XML document in an on-the-fly manner so that only parent nodes in the document that satisfy the query are extracted and then unnested The parent nodes and associated descendent nodes are located using XPath expressions contained as options in data definition language (DDL) statements The parent nodes satisfying the query and associated descendent nodes are extracted and stored outside of a database according to a relational schema The wrapper facilitates applications that use convention SQL queries and views to operate on that information stored according to the relational schema The wrapper also responds to query optimizer requests for costs associated with queries against external data sources associated with the wrapper

...read moreread less

244 citations

Journal Article•

Provenance in Scientific Workflow Systems

[...]

Susan B. Davidson¹, Sarah Cohen Boulakia¹, Anat Eyal, Bertram Ludäscher, Timothy M. McPhillips, Shawn Bowers, Manish Kumar Anand, Juliana Freire - Show less +4 more•Institutions (1)

Microsoft¹

01 Mar 2007-IEEE Data(base) Engineering Bulletin

TL;DR: This paper explores the implementation of SemEQUAL using OrdPath, a positional representation for nodes of a hierarchy that is used successfully for supporting XML documents in relational systems, and proposes the use of OrdPath to represent position within the Wordnet hierarchy, leveraging its ability to compute transitive closures efficiently.

...read moreread less

Abstract: The volume of information in natural languages in electronic format is increasing exponentially. The demographics of users of information management systems are becoming increasingly multilingual. Together these trends create a requirement for information management systems to support processing of information in multiple natural languages seamlessly. Database systems, the backbones of information management, should support this requirement effectively and efficiently. Earlier research in this area had proposed multilingual operators [7, 8] for relational database systems, and discussed their implementation using existing database features. In this paper, we specifically focus on the SemEQUAL operator [8], implementing a multilingual semantic matching predicate using WordNet [12]. We explore the implementation of SemEQUAL using OrdPath [10], a positional representation for nodes of a hierarchy that is used successfully for supporting XML documents in relational systems. We propose the use of OrdPath to represent position within the Wordnet hierarchy, leveraging its ability to compute transitive closures efficiently. We show theoretically that an implementation using OrdPath will outperform those implementations proposed previously. Our initial experimental results confirm this analysis, and show that the OrdPath implementation performs significantly better. Further, since our technique is not specifically rooted to linguistic hierarchies, the same approach may benefit other applications that utilize alternative hierarchical ontologies.

...read moreread less

204 citations

Proceedings Article•DOI•

Effective keyword search for valuable lcas over xml documents

[...]

Guoliang Li¹, Jianhua Feng¹, Jianyong Wang¹, Lizhu Zhou¹•Institutions (1)

Tsinghua University¹

06 Nov 2007

TL;DR: This paper introduces the notion of Valuable Lowest Common Ancestor (VLCA) to accurately and effectively answer keyword queries over XML documents and proposes the concept of Compact VLCA (CVLCA), to efficiently compute CVLCAs.

...read moreread less

Abstract: In this paper, we study the problem of effective keyword search over XML documents. We begin by introducing the notion of Valuable Lowest Common Ancestor (VLCA) to accurately and effectively answer keyword queries over XML documents. We then propose the concept of Compact VLCA (CVLCA) and compute the meaningful compact connected trees rooted as CVLCAs as the answers of keyword queries. To efficiently compute CVLCAs, we devise an effective optimization strategy for speeding up the computation, and exploit the key properties of CVLCA in the design of the stack-based algorithm for answering keyword queries. We have conducted an extensive experimental study and the experimental results show that our proposed approach achieves both high efficiency and effectiveness when compared with existing proposals.

...read moreread less

200 citations

Journal Article•DOI•

Efficiently Querying Large XML Data Repositories: A Survey

[...]

Gang Gou¹, Rada Chirkova¹•Institutions (1)

North Carolina State University¹

01 Oct 2007-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This survey considers two classes of major XML query processing techniques: the relational approach and the native approach, which could result in higher query processing performance and also significantly reduce system reengineering costs.

...read moreread less

Abstract: Extensible markup language (XML) is emerging as a de facto standard for information exchange among various applications on the World Wide Web. There has been a growing need for developing high-performance techniques to query large XML data repositories efficiently. One important problem in XML query processing is twig pattern matching, that is, finding in an XML data tree D all matches that satisfy a specified twig (or path) query pattern Q. In this survey, we review, classify, and compare major techniques for twig pattern matching. Specifically, we consider two classes of major XML query processing techniques: the relational approach and the native approach. The relational approach directly utilizes existing relational database systems to store and query XML data, which enables the use of all important techniques that have been developed for relational databases, whereas in the native approach, specialized storage and query processing systems tailored for XML data are developed from scratch to further improve XML query performance. As implied by existing work, XML data querying and management are developing in the direction of integrating the relational approach with the native approach, which could result in higher query processing performance and also significantly reduce system reengineering costs.

...read moreread less

191 citations

Proceedings Article•DOI•

Multiway SLCA-based keyword search in XML data

[...]

Chong Sun¹, Chee-Yong Chan¹, Amit K. Goenka¹•Institutions (1)

National University of Singapore¹

08 May 2007

TL;DR: This paper first analyzes properties of the LCA computation and proposes improved algorithms to solve the traditional keyword search problem (with only AND semantics), and extends this approach to handle general keyword search involving combinations of AND and OR boolean operators.

...read moreread less

Abstract: Keyword search for smallest lowest common ancestors (SLCAs)in XML data has recently been proposed as a meaningful way to identify interesting data nodes inXML data where their subtrees contain an input set of keywords. In this paper, we generalize this useful search paradigm to support keyword search beyond the traditional AND semantics to include both AND and OR boolean operators as well. We first analyze properties of the LCA computation and propose improved algorithms to solve the traditional keyword search problem (with only AND semantics). We then extend our approach to handle general keyword search involving combinations of AND and OR boolean operators. The effectiveness of our new algorithms is demonstrated with a comprehensive experimental performance study.

...read moreread less

190 citations

Patent•

Certificate-based search

[...]

James Moore

17 May 2007

TL;DR: The systems and methods described in this paper provide authentication of content sources and metadata sources so that downstream users of syndicated content can rely on these attributes when searching, citing, and/or redistributing content.

...read moreread less

Abstract: The systems and methods disclosed herein provide for authentication of content sources and/or metadata sources so that downstream users of syndicated content can rely on these attributes when searching, citing, and/or redistributing content. To further improve the granularity and reusability of content, globally unique identifiers may be assigned to fragments of each document. This may be particularly useful for indexing documents that contain XML grammar with functional aspects, where atomic functional components can be individually indexed and referenced independent from a document in which they are contained.

...read moreread less

172 citations

Proceedings Article•DOI•

A framework for rapid integration of presentation components

[...]

Jin Yu¹, Boualem Benatallah¹, Régis Saint-Paul¹, Fabio Casati², Florian Daniel, Maristella Matera - Show less +2 more•Institutions (2)

University of New South Wales¹, University of Trento²

08 May 2007

TL;DR: This paper proposes a framework for the integration of stand-alone modules or applications, where integration occurs at the presentation layer, and provides an abstract component model to specify characteristics and behaviors of presentation components and an event-based composition model to specifying the composition logic.

...read moreread less

Abstract: The development of user interfaces (UIs) is one of the most time-consuming aspects in software development. In this context, the lack of proper reuse mechanisms for UIs is increasingly becoming manifest, especially as software development is more and more moving toward composite applications. In this paper we propose a framework for the integration of stand-alone modules or applications, where integration occurs at the presentation layer. Hence, the final goal is to reduce the effort required for UI development by maximizing reus. The design of the framework is inspired by lessons learned from application integration, appropriately modified to account for the specificity of the UI integration problem. We provide an abstract component model to specify characteristics and behaviors of presentation components and propose an event-based composition model to specify the composition logic. Components and composition are described by means of a simple XML-based language, which is interpreted by a runtime middleware for the execution of the resulting composite application. A proof-of-concept prototype allows us to show that the proposed component model can also easily be applied to existing presentation components, built with different languages and/or component technologies.

...read moreread less

163 citations

Journal Article•DOI•

Automata for XML---A survey

[...]

Thomas Schwentick

01 May 2007-Journal of Computer and System Sciences

TL;DR: An overview of fundamental properties of the different kinds of automata used in XML processing are given to relate them to the four key aspects of XML processing: schemas, navigation, querying and transformation.

...read moreread less

Book•

Modeling & Simulation-Based Data Engineering: Introducing Pragmatics into Ontologies for Net-Centric Information Exchange

[...]

Bernard P. Zeigler, Phillip E. Hammonds

07 Aug 2007

TL;DR: This book provides the first practical approach to data engineering and modeling, which supports interoperabililty with consumers of the data in a service- oriented architectures (SOAs) and introduces linguistic levels of interoperability for effective information exchange.

...read moreread less

Abstract: Data Engineering has become a necessary and critical activity for business, engineering, and scientific organizations as the move to service oriented architecture and web services moves into full swing. Notably, the US Department of Defense is mandating that all of its agencies and contractors assume a defining presence on the Net-centric Global Information Grid. This book provides the first practical approach to data engineering and modeling, which supports interoperabililty with consumers of the data in a service- oriented architectures (SOAs). Although XML (eXtensible Modeling Language) is the lingua franca for such interoperability, it is not sufficient on its own. The approach in this book addresses critical objectives such as creating a single representation for multiple applications, designing models capable of supporting dynamic processes, and harmonizing legacy data models for web-based co-existence. The approach is based on the System Entity Structure (SES) which is a well-defined structure, methodology, and practical tool with all of the functionality of UML (Unified Modeling Language) and few of the drawbacks. The SES originated in the formal representation of hierarchical simulation models. So it provides an axiomatic formalism that enables automating the development of XML dtds and schemas, composition and decomposition of large data models, and analysis of commonality among structures.Zeigler and Hammond include a range of features to benefit their readers. Natural language, graphical and XML forms of SES specification are employed to allow mapping of legacy meta-data. Real world examples and case studies provide insight into data engineering and test evaluation in various application domains. Comparative information is provided on concepts of ontologies, modeling and simulation, introductory linguistic background, and support options enable programmers to work with advanced tools in the area. The website of the Arizona Center for Integrative Modeling and Simulation, co-founded by Zeigler in 2001, provides links to downloadable software to accompany the book. * The only practical guide to integrating XML and web services in data engineering* Introduces linguistic levels of interoperability for effective information exchange* Covers the interoperability standards mandated by national and international agencies * Complements Zeigler's classic THEORY OF MODELING AND SIMULATION

...read moreread less

Patent•

Enabling dynamic voiceXML in an X+V page of a multimodal application

[...]

Charles W. Cross¹, Hilary A. Pike¹, Lisa A. Seacat¹, Marc White¹•Institutions (1)

Nuance Communications¹

14 Mar 2007

TL;DR: In this paper, the authors propose an approach to enable dynamic VoiceXML in an X+V page of a multimodal application implemented with the multimodAL application operating in a multimoderal browser on a multi-modal device supporting multiple modes of interaction including a voice mode and one or more non-voice modes.

...read moreread less

Abstract: Enabling dynamic VoiceXML in an X+V page of a multimodal application implemented with the multimodal application operating in a multimodal browser on a multimodal device supporting multiple modes of interaction including a voice mode and one or more non-voice modes, the multimodal application operatively coupled to a VoiceXML interpreter, including representing by the multimodal browser an XML element of a VoiceXML dialog of the X+V page as an ECMAScript object, the XML element comprising XML content; storing by the multimodal browser the XML content of the XML element in an attribute of the ECMAScript object; and accessing the XML content of the XML element in the attribute of the ECMAScript object from an ECMAScript script in the X+V page.

...read moreread less

Proceedings Article•DOI•

Xproj: a framework for projected structural clustering of xml documents

[...]

Charu C. Aggarwal¹, Na Ta², Jianyong Wang², Jianhua Feng², Mohammed J. Zaki - Show less +1 more•Institutions (2)

IBM¹, Tsinghua University²

12 Aug 2007

TL;DR: This paper proposes an effective clustering algorithm for XML data which uses substructures of the documents in order to gain insights about the important underlying structures and proposes new ways of using multiple sub-structuralinformation in XML documents to evaluate the quality of intermediate cluster solutions.

...read moreread less

Abstract: XML has become a popular method of data representation both on the web and in databases in recent years. One of the reasons for the popularity of XML has been its ability to encode structural information about data records. However, this structural characteristic of data sets also makes it a challenging problem for a variety of data mining problems. One such problem is that of clustering, in which the structural aspects of the data result in a high implicit dimensionality of the data representation. As a result, it becomes more difficult to cluster the data in a meaningful way. In this paper, we propose an effective clustering algorithm for XML data which uses substructures of the documents in order to gain insights about the important underlying structures. We propose new ways of using multiple sub-structuralinformation in XML documents to evaluate the quality of intermediate cluster solutions, and guide the algorithms to a final solution which reflects the true structural behavior in individual partitions. We test the algorithm on a variety of real and synthetic data sets.

...read moreread less

Patent•

Location aware content using presence information data formation with location object (pidf-lo)

[...]

Donald Le Roy Mitchell

03 Jan 2007

TL;DR: In this paper, the authors propose an extension of the presence information data format-location object (PIDF-LO) as defined by the Internet Engineering Task Force (IETF) to accommodate an association of geospatial location to XML content on the Internet.

...read moreread less

Abstract: The format of the Presence Information Data Format-Location Object (PIDF-LO) as defined by the Internet Engineering Task Force (IETF) is extended or modified to accommodate, within the standard PIDF-LO format, an association of geospacial location to XML content on the Internet. A geospacial location is associated with Extensible Markup Language (XML) content on the Internet. The XML content is identified by a universal resource locator (URL), and associated with geospatial location information (either a specific location, zone, or direction). The URL is inserted into a section of a Presence Information Data Format-Location Object (PIDF-LO) compliant document as defined by the Internet Engineering Task Force (IETF). In this way, geospacial location information is associated with Internet based XML content using a standard PIDF-LO format.

...read moreread less

Proceedings Article•

Inferring XML schema definitions from XML data

[...]

Geert Jan Bex¹, Frank Neven¹, Stijn Vansummeren¹•Institutions (1)

University of Hasselt¹

23 Sep 2007

TL;DR: A theoretically complete algorithm is provided that always infers the correct XSD when a sufficiently large corpus of XML documents is available and a variant of this algorithm is presented that works well on real-world data sets.

...read moreread less

Abstract: Although the presence of a schema enables many optimizations for operations on XML documents, recent studies have shown that many XML documents in practice either do not refer to a schema, or refer to a syntactically incorrect one. It is therefore of utmost importance to provide tools and techniques that can automatically generate schemas from sets of sample documents. While previous work in this area has mostly focused on the inference of Document Type Definitions (DTDs for short), we will consider the inference of XML Schema Definitions (XSDs for short) --- the increasingly popular schema formalism that is turning DTDs obsolete. In contrast to DTDs where the content model of an element depends only on the element's name, the content model in an XSD can also depend on the context in which the element is used. Hence, while the inference of DTDs basically reduces to the inference of regular expressions from sets of sample strings, the inference of XSDs also entails identifying from a corpus of sample documents the contexts in which elements bear different content models. Since a seminal result by Gold implies that no inference algorithm can learn the complete class of XSDs from positive examples only, we focus on a class of XSDs that captures most XSDs occurring in practice. For this class, we provide a theoretically complete algorithm that always infers the correct XSD when a sufficiently large corpus of XML documents is available. In addition, we present a variant of this algorithm that works well on real-world (and therefore incomplete) data sets.

...read moreread less

Journal Article•DOI•

Fuzzy XML data modeling with the UML and relational data models

[...]

Zongmin Ma¹, Li Yan¹•Institutions (1)

Northeastern University (China)¹

01 Dec 2007

TL;DR: This paper focuses on fuzzy XML data modeling, which is mainly involved in the representation model of the fuzzy XML, its conceptual design, and its storage in databases.

...read moreread less

Abstract: Information imprecision and uncertainty exist in many real-world applications and for this reason fuzzy data modeling has been extensively investigated in various data models. Currently, huge amounts of electronic data are available on the Internet, and XML has been the de facto standard of information representation and exchange over the Web. This paper focuses on fuzzy XML data modeling, which is mainly involved in the representation model of the fuzzy XML, its conceptual design, and its storage in databases. Based on ''possibility distribution theory'', we developed this fuzzy XML data model. We developed this fuzzy UML data model to design the fuzzy XML model conceptually. We investigated the formal conversions from the fuzzy UML model to the fuzzy XML model and the formal mapping from the fuzzy XML model to the fuzzy relational databases.

...read moreread less

Patent•

Managing compound XML documents in a repository

[...]

Ravi Murthy¹•Institutions (1)

Business International Corporation¹

29 May 2007

TL;DR: In this article, a declarative mechanism is used to manage large documents within a repository, where large documents are sectioned into subdocuments that are linked together by a parent document.

...read moreread less

Abstract: A declarative mechanism is used to manage large documents within a repository. The large documents are sectioned into subdocuments that are linked together by a parent document. The combination of the parent document and subdocument is referred to as a compound document. There are multiple options for configuring rules to break up a source document into a compound document and naming the subdocuments. The compound documents may be queried using statements that treat the compound document as a single XML document, or the parent document of a subdocument may be queried and treated independently. Access control and versioning can be applied at the finer granularity of the subdocument.

...read moreread less

YAWN: A Semantically Annotated Wikipedia XML Corpus.

[...]

Ralf Schenkel¹, Fabian M. Suchanek¹, Gjergji Kasneci¹•Institutions (1)

Max Planck Society¹

01 Jan 2007

TL;DR: YAWN as discussed by the authors is a system to convert the well-known and widely used Wikipedia collection into an XML corpus with semantically rich, self-explaining tags, which can be used for high-precision queries.

...read moreread less

Abstract: The paper presents YAWN, a system to convert the well-known and widely used Wikipedia collection into an XML corpus with semantically rich, self-explaining tags We introduce algorithms to annotate pages and links with concepts from the WordNet thesaurus This annotation process exploits categorical information in Wikipedia, which is a high-quality, manually assigned source of information, extracts additional information from lists, and utilizes the invocations of templates with named parameters We give examples how such annotations can be exploited for high-precision queries

...read moreread less

Journal Article•DOI•

Node labeling schemes for dynamic XML documents reconsidered

[...]

Theo Härder¹, Michael Peter Haustein¹, Christian Mathis¹, Markus Wagner¹•Institutions (1)

Kaiserslautern University of Technology¹

01 Jan 2007

TL;DR: This paper evaluates existing range-based and prefix-based labeling schemes, before proposing its own scheme based on DeweyIDs, which is experimentally explored as a general and immutable node labeling mechanism, stress its synergetic potential for query processing and locking, and show how it can be implemented efficiently.

...read moreread less

Abstract: We explore suitable node labeling schemes used in collaborative XML DBMSs (XDBMSs, for short) supporting typical XML document processing interfaces. Such schemes have to provide holistic support for essential XDBMS processing steps for declarative as well as navigational query processing and, with the same importance, lock management. In this paper, we evaluate existing range-based and prefix-based labeling schemes, before we propose our own scheme based on DeweyIDs. We experimentally explore its suitability as a general and immutable node labeling mechanism, stress its synergetic potential for query processing and locking, and show how it can be implemented efficiently. Various compression and optimization measures deliver surprising space reductions, frequently reduce the size of storage representation-compared to an already space-efficient encoding scheme-to less than 20-30% in the average and, thus, conclude their practical relevance.

...read moreread less

A comparative study of the IFC and gbXML informational infrastructures for data exchange in computational design support environments

[...]

Bing Dong¹, Khee Poh Lam¹, Y. C. Huang¹, G. M. Dobbs²•Institutions (2)

Carnegie Mellon University¹, United Technologies²

01 Dec 2007

TL;DR: This paper presents a detailed investigation and comparative study of the differences between IFC and gbXML in terms of their data representations, data structures and applications and selected examples of the respective schema are selected for development.

...read moreread less

Abstract: Significant progress has been made in the area of common data exchange in the building industry with the development of information technology. Currently, the Industry Foundation Class (IFC) and Green Building XML (gbXML) are two prevalent informational infrastructures in the architecture, engineering and construction (AEC) industry. IFC and gbXML are both used for common data exchange between AEC applications such as CAD and building simulation tools. This paper presents a detailed investigation and comparative study of the differences between IFC and gbXML in terms of their data representations, data structures and applications. It aims to explicitly illustrate the complex data representation through selected examples of the respective schema. Two specific demonstrative cases will include building element specification (i.e.,enclosure geometry) and building sensors (control and operation). Findings will be reported on the following aspects: (1) the strength and challenges of the diametrically opposing approaches between IFC and gbXML; (2) hierarchical structure of the schema in support of extensibility, data extraction, ease of implementation etc.; (3) formal adoption and application. Based on the results of this study, the gbXML schema is selected for development to demonstrate the features of gbXML. A proposed XML schema for lighting simulation will be presented. It aims to provide a seamless data integration platform between a CAD model (i.e., REVIT) and lighting simulation software (i.e., Radiance) in this study to support concurrent design of high performance buildings.

...read moreread less

Patent•

System and method for correlating and diagnosing system component performance data

[...]

Cody Menard, Raghavendra K. Murthy

04 Apr 2007

TL;DR: In this article, a system and method for correlating, predicting and diagnosing system component performance data includes capturing knowledge about system behavior, deploying the captured knowledge as baseline system behavior files, evaluating system performance data against the baseline system behaviour files, and notifying a user when an analysis result is generated.

...read moreread less

Abstract: The system and method for correlating, predicting and diagnosing system component performance data includes capturing knowledge about system behavior, deploying the captured knowledge as baseline system behavior files, evaluating system performance data against the baseline system behavior files, performing predictive and diagnostic analysis when received system performance data exceeds thresholds in the baseline system behavior files, and notifying a user when an analysis result is generated. The method of capturing knowledge about system behavior includes defining problems to be solved, creating datasets that correspond to defined problems, constructing problem scenarios, associating data patterns modules with the problem scenarios, and generating XML definition files that characterize system behavior in terms of the scenarios, modules, and datasets. The system has the capability to activate corrective scripts in the target system and to reconfigure the target system. Detailed information on various example embodiments of the inventions are provided in the Detailed Description below, and the inventions are defined by the appended claims.

...read moreread less

Journal Article•DOI•

Report on the XML mining track at INEX 2005 and INEX 2006: categorization and clustering of XML documents

[...]

Ludovic Denoyer¹, Patrick Gallinari¹•Institutions (1)

University of Paris¹

01 Jun 2007

TL;DR: This article is a report concerning the two years of the XML Mining track at INEX (2005 and 2006) and focuses here on the classification and clustering of XML documents.

...read moreread less

Abstract: This article is a report concerning the two years of the XML Mining track at INEX (2005 and 2006). We focus here on the classification and clustering of XML documents. We detail these two tasks and the corpus used for this challenge and then present a summary of the different methods proposed by the participants. We last compare the results obtained during the two years of the track.

...read moreread less

Proceedings Article•DOI•

Efficient static analysis of XML paths and types

[...]

Pierre Genevès¹, Nabil Layaïda², Alan Schmitt²•Institutions (2)

École Polytechnique Fédérale de Lausanne¹, French Institute for Research in Computer Science and Automation²

10 Jun 2007

TL;DR: An algorithm to solve XPath decision problems under regular tree type constraints and its use to statically type-check XPath queries is presented and the decidability of a logic with converse for finite ordered trees is proved.

...read moreread less

Abstract: We present an algorithm to solve XPath decision problems under regular tree type constraints and show its use to statically type-check XPath queries. To this end, we prove the decidability of a logic with converse for finite ordered trees whose time complexity is a simple exponential of the size of a formula. The logic corresponds to the alternation free modal μ-calculus without greatest fixpoint, restricted to finite trees, and where formulas are cycle-free.Our proof method is based on two auxiliary results. First, XML regular tree types and XPath expressions have a linear translation to cycle-free formulas. Second, the least and greatest fixpoints are equivalent for finite trees, hence the logic is closed under negation.Building on these results, we describe a practical, effective system for solving the satisfiability of a formula. The system has been experimented with some decision problems such as XPath emptiness, containment, overlap, and coverage, with or without type constraints. The benefit of the approach is that our system can be effectively used in static analyzers for programming languages manipulating both XPath expressions and XML type annotations (as input and output types).

...read moreread less

Journal Article•DOI•

An intelligent process planning system for prismatic parts using STEP features.

[...]

Saleh M. Amaitik¹, S. Engin Kilic²•Institutions (2)

Atılım University¹, Middle East Technical University²

09 Jan 2007-The International Journal of Advanced Manufacturing Technology

TL;DR: A new feature-based intelligent CAPP system for avoiding complex feature recognition and knowledge acquisition problems is suggested.

...read moreread less

Abstract: This paper presents an intelligent process planning system using STEP features (ST-FeatCAPP) for prismatic parts. The system maps a STEP AP224 XML data file, without using a complex feature recognition process, and produces the corresponding machining operations to generate the process plan and corresponding STEP-NC in XML format. It carries out several stages of process planning such as operations selection, tool selection, machining parameters determination, machine tools selection and setup planning. A hybrid approach of most recent techniques (neural networks, fuzzy logic and rule-based) of artificial intelligence is used as the inference engine of the developed system. An object-oriented approach is used in the definition and implementation of the system. An example part is tested and the corresponding process plan is presented to demonstrate and verify the proposed CAPP system. The paper thus suggests a new feature-based intelligent CAPP system for avoiding complex feature recognition and knowledge acquisition problems.

...read moreread less

Journal Article•DOI•

User Interaction with the Matita Proof Assistant

[...]

Andrea Asperti¹, Claudio Sacerdoti Coen¹, Enrico Tassi¹, Stefano Zacchiroli¹•Institutions (1)

University of Bologna¹

01 Aug 2007-Journal of Automated Reasoning

TL;DR: This paper focuses on some of the distinctive features of the user interaction with Matita, characterized mostly by the organization of the library as a searchable knowledge base, the emphasis on a high-quality notational rendering, and the complex interplay between syntax, presentation, and semantics.

...read moreread less

Abstract: Matita is a new, document-centric, tactic-based interactive theorem prover This paper focuses on some of the distinctive features of the user interaction with Matita, characterized mostly by the organization of the library as a searchable knowledge base, the emphasis on a high-quality notational rendering, and the complex interplay between syntax, presentation, and semantics

...read moreread less

Patent•

Health integration platform API

[...]

Sean Nolan¹, Jeffrey Dick Jones¹, Johnson T. Apacible¹, Vijay Varadan¹•Institutions (1)

Microsoft¹

01 Nov 2007

TL;DR: In this article, an application program interface (API) is provided for requesting, storing, and accessing data within a health integration network, which facilitates secure and seamless access to the centrally-stored data by offering authentication/authorization, as well as the ability to receive requests in an extensible language format, such as XML, and returns resulting data in XML format.

...read moreread less

Abstract: An application program interface (API) is provided for requesting, storing, and otherwise accessing data within a health integration network. The API facilitates secure and seamless access to the centrally-stored data by offering authentication/authorization, as well as the ability to receive requests in an extensible language format, such as XML, and returns resulting data in XML format. The data can also have transformation, style and/or schema information associated with it which can be returned in the resulting XML and/or applied to the data beforehand by the API. The API can be utilized in many environment architectures including XML over HTTP and a software development kit (SDK).

...read moreread less

Patent•

System and method to create a collaborative web-based multimedia layered platform

[...]

Henry Hon, Christopher Anthony Peri, Timothy Hon, Frankie Waitim Wong

24 Jan 2007

TL;DR: In this paper, a system and method to allow multiple users to collaborate on tasks and interact in a shared space session within a network in real-time; using a media application to manage media-layers.

...read moreread less

Abstract: The present invention relates to a system and method to allow multiple users to collaborate on tasks and interact in a shared space session within a network in real-time; using a media application to manage media-layers. Each media-layer serves as a container for multimedia programs or plug-ins. The invention allows which media-layer to display via organization metaphors and filtering criteria. When multiple users are logged into the same shared space, each user can invoke and observe modifications to media-layers with the browser based or client based application. All events are synchronized among all users in that shared space, where the system is a communication conduit. The media-layers in the shared space maintains spatial and temporal correlation by a media application stage manager tool and described as a collection file descriptor such as an XML file. The ability to invoke events that affect media-layers can be supported in a synched or non synched mode on demand.

...read moreread less

YAWN: A Semantically Annotated Wikipedia XML Corpus

[...]

Ralf Schenkel¹, Fabian M. Suchanek¹, Gjergji Kasneci¹•Institutions (1)

Max Planck Society¹

01 Jan 2007

TL;DR: YAWN, a system to convert the well-known and widely used Wikipedia collection into an XML corpus with semantically rich, self-explaining tags, is presented.

...read moreread less

Abstract: The paper presents YAWN, a system to convert the well-known and widely used Wikipedia collection into an XML corpus with semantically rich, self-explaining tags. We introduce algorithms to annotate pages and links with concepts from the WordNet thesaurus. This annotation process exploits categorical information in Wikipedia, which is a high-quality, manually assigned source of information, extracts additional information from lists, and utilizes the invocations of templates with named parameters. We give examples how such annotations can be exploited for high-precision queries.

...read moreread less

Journal Article•DOI•

Efficient schema-based XML-to-Relational data mapping

[...]

Mustafa Atay¹, Artem Chebotko¹, Dapeng Liu¹, Shiyong Lu¹, Farshad Fotouhi¹ - Show less +1 more•Institutions (1)

Wayne State University¹

01 May 2007-Information Systems

TL;DR: A lossless schema mapping algorithm to generate a database schema from a DTD, which makes several improvements over existing algorithms, and two linear data mapping algorithms based on DOM and SAX, respectively, to map ordered XML data to relational data are proposed.

...read moreread less

Collapse