Showing papers on "Information integration published in 2004"

PDF

Open Access

Journal Article•DOI•

Merging the senses into a robust percept

[...]

Marc O. Ernst¹, Heinrich H. Bülthoff¹•Institutions (1)

01 Apr 2004-Trends in Cognitive Sciences

TL;DR: It is shown that, depending on the type of information, different combination and integration strategies are used and that prior knowledge is often required for interpreting the sensory signals.

...read moreread less

1,628 citations

Book•

Text Mining: Predictive Methods for Analyzing Unstructured Information

[...]

Sholom M. Weiss, Nitin Indurkhya, Tong Zhang, Fred J. Damerau

25 Oct 2004

TL;DR: This book introduces the new world of text mining and examines proven methods for various critical text-mining tasks, such as automated document indexing and information retrieval and search, as well as new research areas that rely on evolving text- mining techniques.

...read moreread less

Abstract: The growth of the web can be seen as an expanding public digital library collection. Online digital information extends far beyond the web and its publicly available information. Huge amounts of information are private and are of interest to local communities, such as the records of customers of a business. This information is overwhelmingly text and has its record-keeping purpose, but an automated analysis might be desirable to find patterns in the stored records. Analogous to this data mining is text mining, which also finds patterns and trends in information samples but which does so with far less structured--though with greater immediate utility for users--ingredients. This book focuses on the concepts and methods needed to expand horizons beyond structured, numeric data to automated mining of text samples. It introduces the new world of text mining and examines proven methods for various critical text-mining tasks, such as automated document indexing and information retrieval and search. New research areas are explored, such as information extraction and document summarization, that rely on evolving text-mining techniques.

...read moreread less

596 citations

Book Chapter•DOI•

e-Business and Supply Chain Integration

[...]

Hau L. Lee¹, Seungjin Whang¹•Institutions (1)

Stanford University¹

01 Jan 2004

TL;DR: In this paper, the impact of e-business on supply chain integration can be described along the dimensions of information integration, synchronized planning, coordinated workflow, and new business models, and significant value can be created by e-Business enabled supply-chain integration.

...read moreread less

Abstract: e-Business has emerged as a key enabler to drive supply chain integration. Businesses can use the Internet to gain global visibility across their extended network of trading partners and help them respond quickly to changing customer demand captured over the Internet. The impact of e-business on supply chain integration can be described along the dimensions of information integration, synchronized planning, coordinated workflow, and new business models. As a result, many of the core supply chain principles and concepts can now be put into practice much more effectively using e-business. Significant value can be created by e-business enabled supply chain integration.

...read moreread less

374 citations

Journal Article•

On the move to meaningful Internet systems 2004 : CoopIS, DOA, and ODBASE : OTM confederated international conferences, CoopIS, DOA, and ODBASE 2004, Agia Napa, Cyprus, October 25-29, 2004 : proceedings (part II)

[...]

R.A. Meersman, Z. Tari, W.M.P. van der Aalst, Christoph Bussler, Avigdor Gal - Show less +1 more

01 Jan 2004-Lecture Notes in Computer Science

TL;DR: Cooperative Information Systems (CoopIS) 2004 International Conference (International Conference on Cooperative Information Systems) PC Co-chairs' Message- Keynote- Business Process Optimization- Workflow/Process/Web Services, I- Discovering Workflow Transactional behavior from Event-based Log- A Flexible Mediation Process for Large Distributed Information Systems- Exception Handling Through a Workflow- WorkFlow/Process, Web Services, II- Flexible and Composite Schema Matching Algorithm- Analysis, Transformation, and Improvements of ebXML Choreographies based on Work

...read moreread less

Abstract: Cooperative Information Systems (CoopIS) 2004 International Conference- CoopIS 2004 International Conference (International Conference on Cooperative Information Systems) PC Co-chairs' Message- Keynote- Business Process Optimization- Workflow/Process/Web Services, I- Discovering Workflow Transactional Behavior from Event-Based Log- A Flexible Mediation Process for Large Distributed Information Systems- Exception Handling Through a Workflow- Workflow/Process/Web Services, II- A Flexible and Composite Schema Matching Algorithm- Analysis, Transformation, and Improvements of ebXML Choreographies Based on Workflow Patterns- The Notion of Business Process Revisited- Workflow/Process/Web Services, III- Disjoint and Overlapping Process Changes: Challenges, Solutions, Applications- Untangling Unstructured Cyclic Flows - A Solution Based on Continuations- Making Workflow Models Sound Using Petri Net Controller Synthesis- Database Management/Transaction- Concurrent Undo Operations in Collaborative Environments Using Operational Transformation- Refresco: Improving Query Performance Through Freshness Control in a Database Cluster- Automated Supervision of Data Production - Managing the Creation of Statistical Reports on Periodic Data- Schema Integration/Agents- Deriving Sub-schema Similarities from Semantically Heterogeneous XML Sources- Supporting Similarity Operations Based on Approximate String Matching on the Web- Managing Semantic Compensation in a Multi-agent System- Modelling with Ubiquitous Agents a Web-Based Information System Accessed Through Mobile Devices- Events- A Meta-service for Event Notification- Classification and Analysis of Distributed Event Filtering Algorithms- P2P/Collaboration- A Collaborative Model for Agricultural Supply Chains- FairNet - How to Counter Free Riding in Peer-to-Peer Data Structures- Supporting Collaborative Layouting in Word Processing- A Reliable Content-Based Routing Protocol over Structured Peer-to-Peer Networks- Applications, I- Covering Your Back: Intelligent Virtual Agents in Humanitarian Missions Providing Mutual Support- Dynamic Modelling of Demand Driven Value Networks- An E-marketplace for Auctions and Negotiations in the Constructions Sector- Applications, II- Managing Changes to Engineering Products Through the Co-ordination of Human and Technical Activities- Towards Automatic Deployment in eHome Systems: Description Language and Tool Support- A Prototype of a Context-Based Architecture for Intelligent Home Environments- Trust/Security/Contracts- Trust-Aware Collaborative Filtering for Recommender Systems- Service Graphs for Building Trust- Detecting Violators of Multi-party Contracts- Potpourri- Leadership Maintenance in Group-Based Location Management Scheme- TLS: A Tree-Based DHT Lookup Service for Highly Dynamic Networks- Minimizing the Network Distance in Distributed Web Crawling- Ontologies, DataBases, and Applications of Semantics (ODBASE) 2004 International Conference- ODBASE 2004 International Conference (Ontologies, DataBases, and Applications of Semantics) PC Co-chairs' Message- Keynote- Helping People (and Machines) Understanding Each Other: The Role of Formal Ontology- Knowledge Extraction- Automatic Initiation of an Ontology- Knowledge Extraction from Classification Schemas- Semantic Web in Practice- Generation and Management of a Medical Ontology in a Semantic Web Retrieval System- Semantic Web Based Content Enrichment and Knowledge Reuse in E-science- The Role of Foundational Ontologies in Manufacturing Domain Applications- Intellectual Property Rights Management Using a Semantic Web Information System- Ontologies and IR- Intelligent Retrieval of Digital Resources by Exploiting Their Semantic Context- The Chrysostom Knowledge Base: An Ontology of Historical Interactions- Text Simplification for Information-Seeking Applications- Information Integration- Integration of Integrity Constraints in Federated Schemata Based on Tight Constraining- Modal Query Language for Databases with Partial Orders- Composing Mappings Between Schemas Using a Reference Ontology- Assisting Ontology Integration with Existing Thesauri

...read moreread less

284 citations

Journal Article•

Semantically enhanced Collaborative Filtering on the Web

[...]

Bamshad Mobasher, Xin Jin, Yanzan Zhou

01 Jan 2004-Lecture Notes in Computer Science

TL;DR: In this paper, an approach for semantically enhanced collaborative filtering in which structured semantic knowledge about items, extracted automatically from the Web based on domain-specific reference ontologies, is used in conjunction with user-item mappings to create a combined similarity measure and generate predictions.

...read moreread less

Abstract: Item-based Collaborative Filtering (CF) algorithms have been designed to deal with the scalability problems associated with traditional user-based CF approaches without sacrificing recommendation or prediction accuracy. Item-based algorithms avoid the bottleneck in computing user-user correlations by first considering the relationships among items and performing similarity computations in a reduced space. Because the computation of item similarities is independent of the methods used for generating predictions, multiple knowledge sources, including structured semantic information about items, can be brought to bear in determining similarities among items. The integration of semantic similarities for items with rating- or usage-based similarities allows the system to make inferences based on the underlying reasons for which a user may or may not be interested in a particular item. Furthermore, in cases where little or no rating (or usage) information is available (such as in the case of newly added items, or in very sparse data sets), the system can still use the semantic similarities to provide reasonable recommendations for users. In this paper, we introduce an approach for semantically enhanced collaborative filtering in which structured semantic knowledge about items, extracted automatically from the Web based on domain-specific reference ontologies, is used in conjunction with user-item mappings to create a combined similarity measure and generate predictions. Our experimental results demonstrate that the integrated approach yields significant advantages both in terms of improving accuracy, as well as in dealing with very sparse data sets or new items.

...read moreread less

224 citations

Journal Article•DOI•

Integration of biological sources: current systems and challenges ahead

[...]

Thomas Hernandez¹, Subbarao Kambhampati¹•Institutions (1)

Arizona State University¹

01 Sep 2004

TL;DR: The pros and cons of the current approaches and systems are identified and what an integration system for biologists ought to be are discussed.

...read moreread less

Abstract: This paper surveys the area of biological and genomic sources integration, which has recently become a major focus of the data integration research field. The challenges that an integration system for biological sources must face are due to several factors such as the variety and amount of data available, the representational heterogeneity of the data in the different sources, and the autonomy and differing capabilities of the sources.This survey describes the main integration approaches that have been adopted. They include warehouse integration, mediator-based integration, and navigational integration. Then we look at the four major existing integration systems that have been developed for the biological domain: SRS, BioKleisli, TAMBIS, and DiscoveryLink. After analyzing these systems and mentioning a few others, we identify the pros and cons of the current approaches and systems and discuss what an integration system for biologists ought to be.

...read moreread less

178 citations

Book Chapter•DOI•

Multi-phase Process Mining: Building Instance Graphs

[...]

van Bf Boudewijn Dongen¹, van der Wmp Wil Aalst¹•Institutions (1)

Eindhoven University of Technology¹

08 Nov 2004

TL;DR: This paper presents an approach which constructs an instance graph for each individual process instance, based on information in the entire data set, in terms of Event-driven Process Chains (EPCs).

...read moreread less

Abstract: Deploying process-driven information systems is a time-con-suming and error-prone task. Process mining attempts to improve this by automatically generating a process model from event-based data. Existing techniques try to generate a complete process model from the data acquired. However, unless this model is the ultimate goal of mining, such a model is not always required. Instead, a good visualization of each individual process instance can be enough. From these individual instances, an overall model can then be generated if required. In this paper, we present an approach which constructs an instance graph for each individual process instance, based on information in the entire data set. The results are represented in terms of Event-driven Process Chains (EPCs). This representation is used to connect our process mining to a widely used commercial tool for the visualization and analysis of instance EPCs.

...read moreread less

167 citations

Book•

Information Sharing on the Semantic Web

[...]

Heiner Stuckenschmidt, Frank van Harmelen

20 Oct 2004

TL;DR: Addressing problems like missing conceptual models, unclear system boundaries, and heterogeneous representations, the authors design a framework for ontology-based information sharing in weakly structured environments like the Semantic Web.

...read moreread less

Abstract: The large-scale and almost ubiquitous availability of information has become as much of a curse as it is a blessing. The more information is available, the harder it is to locate any particular piece of it. And even when it has been successfully found, it is even harder still to usefully combine it with other information we may already possess. This problem occurs at many different levels, ranging from the overcrowded disks of our own PCs to the mass of unstructured information on the World Wide Web. It is commonly understood that this problem of information sharing can only be solved by giving computers better access to the semantics of the information. While it has been recognized that ontologies play a crucial role in solving the open problems, most approaches rely on the existence of well-established data structures. To overcome these shortcomings, Stuckenschmidt and van Harmelen describe ontology-based approaches for resolving semantic heterogeneity in weakly structured environments, in particular the World Wide Web. Addressing problems like missing conceptual models, unclear system boundaries, and heterogeneous representations, they design a framework for ontology-based information sharing in weakly structured environments like the Semantic Web. For researchers and students in areas related to the Semantic Web, the authors provide not only a comprehensive overview of the State of the art, but also present in detail recent research in areas like ontology design for information integration, metadata generation and management, and representation and management of distributed ontologies. For professionals in areas such as e-commerce (e.g., the exchange of product knowledge) and knowledge management (e.g., in large and distributed organizations), the book provides decision support on the use of novel technologies, information about potential problems, and guidelines for the successful application of existing technologies.

...read moreread less

165 citations

Proceedings Article•DOI•

Discovering complex matchings across web query interfaces: a correlation mining approach

[...]

Bin He¹, Kevin Chen-Chuan Chang¹, Jiawei Han¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

22 Aug 2004

TL;DR: This paper develops the DCM framework, which consists of data preparation, dual mining of positive and negative correlations, and finally matching selection, and introduces a new correlation measure, $H$-measure, distinct from those proposed in previous work.

...read moreread less

Abstract: To enable information integration, schema matching is a critical step for discovering semantic correspondences of attributes across heterogeneous sources. While complex matchings are common, because of their far more complex search space, most existing techniques focus on simple 1:1 matchings. To tackle this challenge, this paper takes a conceptually novel approach by viewing schema matching as correlation mining, for our task of matching Web query interfaces to integrate the myriad databases on the Internet. On this "deep Web," query interfaces generally form complex matchings between attribute groups (e.g., [author] corresponds to [first name, last name] in the Books domain). We observe that the co-occurrences patterns across query interfaces often reveal such complex semantic relationships: grouping attributes (e.g., [first name, last name]) tend to be co-present in query interfaces and thus positively correlated. In contrast, synonym attributes are negatively correlated because they rarely co-occur. This insight enables us to discover complex matchings by a correlation mining approach. In particular, we develop the DCM framework, which consists of data preparation, dual mining of positive and negative correlations, and finally matching selection. Unlike previous correlation mining algorithms, which mainly focus on finding strong positive correlations, our algorithm cares both positive and negative correlations, especially the subtlety of negative correlations, due to its special importance in schema matching. This leads to the introduction of a new correlation measure, $H$-measure, distinct from those proposed in previous work. We evaluate our approach extensively and the results show good accuracy for discovering complex matchings.

...read moreread less

160 citations

Journal Article•DOI•

Completeness of integrated information sources

[...]

Felix Naumann¹, Johann-Christoph Freytag¹, Ulf Leser¹•Institutions (1)

Humboldt University of Berlin¹

01 Sep 2004

TL;DR: This article describes how to support mediators in their source selection and query planning process and proposes three new merge operators, which formalize the integration of multiple source responses.

...read moreread less

Abstract: For many information domains there are numerous World Wide Web data sources. The sources vary both in their extension and their intension: They represent different real-world entities with possible overlap and provide different attriouites of these entities. Mediator-based information systems allow integrated access to such sources by providing a common schema against which the user can pose queries. Given a query, the mediator must determine which participating sources to access and how to integrate the incoming results.This article describes how to support mediators in their source selection and query planning process. We propose three new merge operators, which formalize the integration of multiple source responses. A completeness model describes the usefulness of a source to answer a query. The completeness measure incorporates both extensional value (called coverage) and intensional value (called density) of a source. We show how to determine the completeness of single sources and of combinations of sources under the new merge operators. Finally, we show how to use the measure for source selection and query planning.

...read moreread less

148 citations

Book Chapter•DOI•

An Ontology-Driven Framework for Data Transformation in Scientific Workflows

[...]

Shawn Bowers¹, Bertram Ludäscher¹•Institutions (1)

University of California¹

25 Mar 2004

TL;DR: In this article, a generic framework for transforming heterogeneous data within scientific workflows is defined, which relies on a formalized ontology, which serves as a simple, unstructured global schema.

...read moreread less

Abstract: Ecologists spend considerable effort integrating heterogeneous data for statistical analyses and simulations, for example, to run and test predictive models. Our research is focused on reducing this effort by providing data integration and transformation tools, allowing researchers to focus on “real science,” that is, discovering new knowledge through analysis and modeling. This paper defines a generic framework for transforming heterogeneous data within scientific workflows. Our approach relies on a formalized ontology, which serves as a simple, unstructured global schema. In the framework, inputs and outputs of services within scientific workflows can have structural types and separate semantic types (expressions of the target ontology). In addition, a registration mapping can be defined to relate input and output structural types to their corresponding semantic types. Using registration mappings, appropriate data transformations can then be generated for each desired service composition. Here, we describe our proposed framework and an initial implementation for services that consume and produce XML data.

...read moreread less

Journal Article•DOI•

A framework to review the information integration of the enterprise

[...]

Ronald E. Giachetti¹•Institutions (1)

Florida International University¹

01 Mar 2004-International Journal of Production Research

TL;DR: The enterprise information integration framework defines four levels of the enterprise system to identify the obstacles and to define the information integration types encountered at each level and is used to analyse the currently used technologies and promising technologies toward enterprise information Integration.

...read moreread less

Abstract: Organizations face the challenging task of integrating their distributed organizational units, information systems, and business processes for improved operation and attainment of organizational goals. There is the difficulty of dealing with heterogeneous applications that use different formats (syntax) and apply different meanings (semantics) to the data. There is the difficulty of coordinating the workflow so as the disparate organizational units act as a harmonious whole. The broad scope of the enterprise integration problem precludes approaches that tackle the entire problem but rather requires approaches that address a limited but useful integration type. The various information integration types and how they are related to each other is poorly defined. This article presents an enterprise information integration framework that aims to coalesce parallel approaches towards integration so that the information integration problem can be better understood as a whole. The enterprise information integration...

...read moreread less

Proceedings Article•

Distributed reasoning in a peer-to-peer setting

[...]

P. Adjiman¹, Philippe Chatalic¹, François Goasdoué¹, Marie-Christine Rousset¹, Laurent Simon¹ - Show less +1 more•Institutions (1)

University of Paris-Sud¹

22 Aug 2004

TL;DR: In this article, a peer can answer queries by reasoning from its local (propositional) theory but can also ask queries to some other peers with which it is semantically related by sharing part of its vocabulary.

...read moreread less

Abstract: In a peer-to-peer system, there is no centralized control or hierarchical organization: each peer is equivalent in functionality and cooperates with other peers in order to solve a collective task. Such systems have evolved from simple keyword-based peer-to-peer file sharing systems like Napster and Gnutella to schema-based peer data management systems like Edutella [3] or Piazza [2], which handle semantic data description and support complex queries for data retrieval. In this paper, we are interested in peer-to-peer inference systems in which each peer can answer queries by reasoning from its local (propositional) theory but can also ask queries to some other peers with which it is semantically related by sharing part of its vocabulary. This framework encompasses several applications like peer-to-peer information integration systems or intelligent agents, in which each peer has its own knowledge (about its data or its expertise domain) and some partial knowledge about some other peers. In this setting, when it is solicited to perform a reasoning task and if it cannot solve completely that task locally, a peer must be able to distribute appropriate reasoning subtasks among its acquainted peers. The contribution of this paper is the first consequence finding algorithm in a peer-to-peer setting: it is anytime and computes consequences gradually from the solicited peer to peers that are more and more distant. We have exhibited a sufficient condition on the acquaintance graph of the peer-to-peer inference system for guaranteeing the completeness of this algorithm. Our algorithm splits clauses if they involve vocabularies of several peers. Each piece of a splitted clause is transmitted to the corresponding theory to find its consequences. The consequences that are found for each piece of splitted clause must be recomposed to get the consequences of that clause.

...read moreread less

Journal Article•

Semantic overlay clusters within super-peer networks

[...]

Alexander Löser, Felix Naumann, Wolf Siberski, Wolfgang Nejdl, Uwe Thaden - Show less +1 more

01 Jan 2004-Lecture Notes in Computer Science

TL;DR: In this article, the concept of semantic overlay clusters (SOC) is introduced for super-peer networks enabling a controlled distribution of peers to clusters, based on predefined policies defined by human experts.

...read moreread less

Abstract: When joining information provider peers to a peer-to-peer network, an arbitrary distribution is sub-optimal. In fact, clustering peers by their characteristics, enhances search and integration significantly. Currently super-peer networks, such as the Edutella network, provide no sophisticated means for such a semantic clustering of peers. We introduce the concept of semantic overlay clusters (SOC) for super-peer networks enabling a controlled distribution of peers to clusters. In contrast to the recently announced semantic overlay network approach designed for flat, pure peer-to-peer topologies and for limited meta data sets, such as simple filenames, we allow a clustering of complex heterogeneous schemes known from relational databases and use advantages of super-peer networks, such as efficient search and broadcast of messages. Our approach is based on predefined policies defined by human experts. Based on such policies a fully decentralized broadcast-and matching approach distributes the peers automatically to super-peers. Thus we are able to automate the integration of information sources in super-peer networks and reduce flooding of the network with messages.

...read moreread less

Book Chapter•DOI•

Learning to harvest information for the semantic web

[...]

Fabio Ciravegna¹, Sam Chapman¹, Alexiei Dingli¹, Yorick Wilks¹•Institutions (1)

University of Sheffield¹

10 May 2004-Lecture Notes in Computer Science

TL;DR: A methodology for harvesting information from large distributed repositories (e.g. large Web sites) with minimum user intervention is described and its implementation in the Armadillo system is described.

...read moreread less

Abstract: In this paper we describe a methodology for harvesting information from large distributed repositories (e.g. large Web sites) with minimum user intervention. The methodology is based on a combination of information extraction, information integration and machine learning techniques. Learning is seeded by extracting information from structured sources (e.g. databases and digital libraries) or a user-defined lexicon. Retrieved information is then used to partially annotate documents. Annotated documents are used to bootstrap learning for simple Information Extraction (IE) methodologies, which in turn will produce more annotation to annotate more documents that will be used to train more complex IE engines and so on. In this paper we describe the methodology and its implementation in the Armadillo system, compare it with the current state of the art, and describe the details of an implemented application. Finally we draw some conclusions and highlight some challenges and future work.

...read moreread less

Journal Article•

Information integration or process integration? How to achieve interoperability in administration

[...]

Ralf Klischewski

01 Jan 2004-Lecture Notes in Computer Science

TL;DR: This paper aims at contrasting the different characteristics of both approaches of information integration and process integration, and concludes with recommendations according to the intended organisational scope of integration.

...read moreread less

Abstract: IT managers in administration must decide how to contribute to cross-organisational integration and what strategy and means to choose for achieving interoperability. Comparing the frameworks and guidelines provided by central European and U.S. governmental units, we find information integration and process integration as prominent concepts to guide interoperability efforts, but they seem to point to different directions. This paper aims at contrasting the different characteristics of both approaches and concludes with recommendations according to the intended organisational scope of integration. To be successful in these efforts it is important to understand that (a) interoperability requires a guiding vision of integration, (b) each type of integration points to a different set of interrelated ideas, assumptions and technical means, and (c) integration implies a strategic commitment to explicit forms of cross-organisational cooperation and their implementation.

...read moreread less

Book Chapter•DOI•

Data Mapping Diagrams for Data Warehouse Design with UML

[...]

Sergio Luján-Mora¹, Panos Vassiliadis², Juan Trujillo¹•Institutions (2)

University of Alicante¹, University of Ioannina²

08 Nov 2004

TL;DR: In this paper, the authors present a framework for the design of the Data Warehouse back-stage (and the respective ETL processes) based on the key observation that this task fundamentally involves dealing with the specificities of information at very low levels of granularity including transformation rules at the attribute level.

...read moreread less

Abstract: In Data Warehouse (DW) scenarios, ETL (Extraction, Transformation, Loading) processes are responsible for the extraction of data from heterogeneous operational data sources, their transformation (conversion, cleaning, normalization, etc.) and their loading into the DW. In this paper, we present a framework for the design of the DW back-stage (and the respective ETL processes) based on the key observation that this task fundamentally involves dealing with the specificities of information at very low levels of granularity including transformation rules at the attribute level. Specifically, we present a disciplined framework for the modeling of the relationships between sources and targets in different levels of granularity (including coarse mappings at the database and table levels to detailed inter-attribute mappings at the attribute level). In order to accomplish this goal, we extend UML (Unified Modeling Language) to model attributes as first-class citizens. In our attempt to provide complementary views of the design artifacts in different levels of detail, our framework is based on a principled approach in the usage of UML packages, to allow zooming in and out the design of a scenario.

...read moreread less

Book Chapter•DOI•

On the Application of Formal Principles to Life Science Data: a Case Study in the Gene Ontology

[...]

Barry Smith¹, Jacob Köhler, Anand Kumar•Institutions (1)

University at Buffalo¹

25 Mar 2004

TL;DR: It is argued that formal principles governing best practices in classification and definition have for too long been neglected in the construction of biomedical ontologies, in ways which have important negative consequences for data integration and ontology alignment, and that the use of such principles in ontology construction can serve as a valuable tool in error-detection and in supporting reliable manual curation.

...read moreread less

Abstract: Formal principles governing best practices in classification and definition have for too long been neglected in the construction of biomedical ontologies, in ways which have important negative consequences for data integration and ontology alignment. We argue that the use of such principles in ontology construction can serve as a valuable tool in error-detection and also in supporting reliable manual curation. We argue also that such principles are a prerequisite for the successful application of advanced data integration techniques such as ontology-based multi-database querying, automated ontology alignment and ontology-based text-mining. These theses are illustrated by means of a case study of the Gene Ontology, a project of increasing importance within the field of biomedical data integration.

...read moreread less

Book Chapter•DOI•

Information Integration or Process Integration? How to Achieve Interoperability in Administration

[...]

Ralf Klischewski¹•Institutions (1)

German University in Cairo¹

30 Aug 2004

TL;DR: In this article, the authors compare information integration and process integration as prominent concepts to guide interoperability efforts, but they seem to point to different directions, and conclude with recommendations according to the intended organisational scope of integration.

...read moreread less

Proceedings Article•DOI•

Modeling the social and technical processes of interorganizational information integration

[...]

Sharon S. Dawes¹, Anthony M. Cresswell¹, Theresa A. Pardo¹, Fiona Thompson¹•Institutions (1)

University at Albany, SUNY¹

24 May 2004

TL;DR: This research seeks to enhance both the conceptual and practical models of III by building new understanding of the interaction among the social and technical processes in interorganizational information integration.

...read moreread less

Abstract: Integrating and sharing information in multi-organizational government settings involves complex interactions within social and technological contexts. These integration processes often involve new work processes and significant organizational change. They are also embedded in larger political and institutional environments, which shape their goals and circumscribe their choices. The purpose of this research is to develop and test dynamic models of information integration in these settings.

...read moreread less

Book Chapter•DOI•

Semantically Linking and Browsing Provenance Logs for E-science

[...]

Jun Zhao¹, Carole Goble¹, Robert Stevens¹, Sean Bechhofer¹•Institutions (1)

University of Manchester¹

17 Jun 2004-Lecture Notes in Computer Science

TL;DR: This paper describes how to assemble a Semantic Web of workflow provenance logs that allows a bioinformatician to browse and navigate between experimental components by generating hyperlinks based on semantic annotations associated with them.

...read moreread less

Abstract: e-Science experiments are those performed using computer-based resources such as database searches, simulations or other applications Like their laboratory based counterparts, the data associated with an e-Science experiment are of reduced value if other scientists are not able to identify the origin, or provenance, of those data Provenance is the term given to metadata about experiment processes, the derivation paths of data, and the sources and quality of experimental components, which includes the scientists themselves, related literature, etc Consequently provenance metadata are valuable resources for e-Scientists to repeat experiments, track versions of data and experiment runs, verify experiment results, and as a source of experimental insight One specific kind of in silico experiment is a workflow In this paper we describe how we can assemble a Semantic Web of workflow provenance logs that allows a bioinformatician to browse and navigate between experimental components by generating hyperlinks based on semantic annotations associated with them By associating well-formalized semantics with workflow logs we take a step towards integration of process provenance information and improved knowledge discovery

...read moreread less

Journal Article•DOI•

Tool integration at the meta-model level: the Fujaba approach

[...]

Sven Burmester¹, Holger Giese¹, Jörg Niere², Matthias Tichy¹, Jörg P. Wadsack¹, Robert Wagner¹, Lothar Wendehals¹, Albert Zündorf³ - Show less +4 more•Institutions (3)

University of Paderborn¹, University of Siegen², University of Kassel³

21 Aug 2004-International Journal on Software Tools for Technology Transfer

TL;DR: This work proposes two design patterns that provide a flexible basis for the integration of different tool data at the meta-model level and describes rule-based mechanisms providing generic solutions for managing overlapping and redundant data.

...read moreread less

Abstract: Today’s development processes employ a variety of notations and tools, e.g., the Unified Modeling Language UML, the Standard Description Language SDL, requirements databases, design tools, code generators, model checkers, etc. For better process support, the employed tools may be organized within a tool suite or integration platform, e.g., Rational Rose or Eclipse. While these tool-integration platforms usually provide GUI adaption mechanisms and functional adaption via application programming interfaces, they frequently do not provide appropriate means for data integration at the meta-model level. Thus, overlapping and redundant data from different “integrated” tools may easily become inconsistent and unusable. We propose two design patterns that provide a flexible basis for the integration of different tool data at the meta-model level. To achieve consistency between meta-models, we describe rule-based mechanisms providing generic solutions for managing overlapping and redundant data. The proposed mechanisms are widely used within the Fujaba Tool Suite. We report about our implementation and application experiences .

...read moreread less

Book•

Intelligent Information Integration for the Semantic Web

[...]

Ubbo Visser

06 Sep 2004

TL;DR: The Buster approach for Terminological, Spatial, and Temporal Representation and Reasoning for Semantic Translation and Implementation issues and System Demonstration are described.

...read moreread less

Abstract: and Related Work.- Related Work.- The Buster Approach for Terminological, Spatial, and Temporal Representation and Reasoning.- General Approach of Buster.- Terminological Representation and Reasoning, Semantic Translation.- Spatial Representation and Reasoning.- Temporal Representation and Reasoning.- Implementation, Conclusion, and Future Work.- Implementation Issues and System Demonstration.- Conclusion and Future Work.- References.

...read moreread less

Book Chapter•DOI•

From Ontology to Relational Databases

[...]

Anuradha Gali¹, Cindy X. Chen¹, Kajal T. Claypool¹, Rosario A. Uceda-Sosa²•Institutions (2)

University of Massachusetts Amherst¹, IBM²

08 Nov 2004

TL;DR: This paper presents a set of techniques to provide a lossless mapping of an OWL ontology to a relational schema and the corresponding instances to data and presents preliminary experiments that compare the efficiency of the mapping techniques in terms of query performance.

...read moreread less

Abstract: The semantic web envisions a World Wide Web in which data is described with rich semantics and applications can pose complex queries. Ontologies, a cornerstone of the semantic web, have gained wide popularity as a model of information in a given domain that can be used for many purposes, including enterprise integration, database design, information retrieval and information interchange on the World Wide Web. Much of the current focus on ontologies has been on the development of languages such as DAML+OIL and OWL that enable the creation of ontologies and provide extensive semantics for Web data, and on answering intensional queries, that is, queries about the structure of an ontology. However, it is almost certain that the many of the semantic web queries will be extensional and to flourish, the semantic web will need to accommodate the huge amounts of existing data that is described by the ontologies and the applications that operate on them. Given the established record of relational databases to store and query large amounts of data, in this paper we present a set of techniques to provide a lossless mapping of an OWL ontology to a relational schema and the corresponding instances to data. We present preliminary experiments that compare the efficiency of the mapping techniques in terms of query performance.

...read moreread less

Proceedings Article•DOI•

Fusion metrics for dynamic situation analysis

[...]

Erik Blasch¹, Mike Pribilski, Bryan Daughtery, Brian Roscoe, Josh Gunsett - Show less +1 more•Institutions (1)

Air Force Research Laboratory¹

09 Aug 2004

TL;DR: This paper explores an aggregate set of metrics for fusion evaluation and demonstrates with information need metrics for dynamic situation analysis.

...read moreread less

Abstract: To design information fusion systems, it is important to develop metrics as part of a test and evaluation strategy. In many cases, fusion systems are designed to (1) meet a specific set of user information needs (IN), (2) continuously validate information pedigree and updates, and (3) maintain this performance under changing conditions. A fusion system’s performance is evaluated in many ways. However, developing a consistent set of metrics is important for standardization. For example, many track and identification metrics have been proposed for fusion analysis. To evaluate a complete fusion system performance, level 4 sensor management and level 5 user refinement metrics need to be developed simultaneously to determine whether or not the fusion system is meeting information needs. To describe fusion performance, the fusion community needs to agree on a minimum set of metrics for user assessment and algorithm comparison. We suggest that such a minimum set should include feasible metrics of accuracy, confidence, throughput, timeliness, and cost. These metrics can be computed as confidence (probability), accuracy (error), timeliness (delay), throughput (amount) and cost (dollars). In this paper, we explore an aggregate set of metrics for fusion evaluation and demonstrate with information need metrics for dynamic situation analysis.

...read moreread less

Journal Article•DOI•

Bioinformatics integration and agent technology

[...]

Konstantinos Karasavvas¹, Richard Baldock², Albert Burger¹•Institutions (2)

Heriot-Watt University¹, Medical Research Council²

01 Jun 2004-Journal of Biomedical Informatics

TL;DR: An overview of important integration issues that should be considered when designing a bioinformatics integration system is provided and agent technology is introduced and it is argued why it provides an appropriate solution for designing bioinformatic integration systems.

...read moreread less

Book Chapter•DOI•

AutoMed: A BAV Data Integration System for Heterogeneous Data Sources

[...]

Michael Boyd¹, Sasivimol Kittivoravitkul¹, Charalambos Lazanitis¹, Peter McBrien¹, Nikos Rizopoulos¹ - Show less +1 more•Institutions (1)

Imperial College London¹

07 Jun 2004

TL;DR: The AutoMed repository and some associated tools are described, which provide the first implementation of the both as view (BAV) approach to data integration, and how several practical problems in data integration between heterogeneous data sources have been solved.

...read moreread less

Abstract: This paper describes the AutoMed repository and some associated tools, which provide the first implementation of the both as view (BAV) approach to data integration. Apart from being a highly expressive data integration approach, BAV in additional provides a method to support a wide range of data modelling languages, and describes transformations between those data modelling languages. This paper documents how BAV has been implemented in the AutoMed repository, and how several practical problems in data integration between heterogeneous data sources have been solved. We illustrate the implementation with examples in the relational, ER, and semi-structured data models.

...read moreread less

Book Chapter•DOI•

Ontological Foundations for Geographic Information Science

[...]

David M. Mark, Barry Smith, Max J. Egenhofer, Stephen C. Hirtle

30 Aug 2004

Journal Article•DOI•

The contribution of data mining to information science

[...]

Sherry Y. Chen¹, Xiaohui Liu¹•Institutions (1)

Brunel University London¹

01 Dec 2004-Journal of Information Science

TL;DR: The impact of data mining is examined by reviewing existing applications, including personalized environments, electronic commerce, and search engines, to identify the limitations of current work and raise several directions for future research.

...read moreread less

Abstract: The information explosion is a serious challenge for current information institutions. On the other hand, data mining, which is the search for valuable information in large volumes of data, is one of the solutions to face this challenge. In the past several years, data mining has made a significant contribution to the field of information science. This paper examines the impact of data mining by reviewing existing applications, including personalized environments, electronic commerce, and search engines. For these three types of application, how data mining can enhance their functions is discussed. The reader of this paper is expected to get an overview of the state of the art research associated with these applications. Furthermore, we identify the limitations of current work and raise several directions for future research.

...read moreread less

Exploring Information Superiority: A Methodology for Measuring the Quality of Information and Its Impact on Shared Awareness

[...]

Walter L. Perry, David Signori, John E. Boon

01 Jan 2004

TL;DR: The RAND Corporation has been participating in the Information Superiority Metrics Working Group, whose purpose is to describe key concepts and related metrics that are necessary to explore part of the proposed network-centric warfare value chain.

...read moreread less

Abstract: : The military is formulating new visions, strategies, and concepts that capitalize on emerging information-age technologies to provide its warfighters with significantly improved capabilities to meet the national security challenges of the 21st century. These programs are described in such documents as the Quadrennial Defense Review, Joint Vision 2020, a variety of publications describing network-centric warfare (NCW), and other documents describing military transformation. Joint Vision 2020 provides an important starting point for describing a future warfighting concept that has since evolved into NCW. A key tenet of Joint Vision 2020 is that information superiority will enable decision dominance, new Joint operational concepts, and a decisive advantage over future adversaries. To create and leverage information superiority, it is foreseen that, under some circumstances, a mix of command, control, communications, computers, intelligence, surveillance, and reconnaissance (C4ISR) capabilities would interoperate with weapon systems and forces on an end-to-end basis through a network-centric information environment to achieve significant improvements in awareness, shared awareness, and synchronization. The military is embarked on a series of analyses and experiments to improve its understanding of the potential of these NCW concepts. The Assistant Secretary of Defense for Networks and Information Integration (ASD NII), through the Command and Control Research Program, asked RAND to help develop methods and tools that could improve the assessment of C4ISR capabilities and processes to the achievement of NCW concepts, including awareness, shared awareness, and synchronization. In response to this request, the RAND Corporation has been participating in the Information Superiority Metrics Working Group, under the auspices of ASD NII. The group s purpose is to describe key concepts and related metrics that are necessary to explore part of the proposed NCW value chain.

...read moreread less

Collapse