scispace - formally typeset
Search or ask a question

Showing papers on "Information integration published in 2004"


Journal ArticleDOI
TL;DR: It is shown that, depending on the type of information, different combination and integration strategies are used and that prior knowledge is often required for interpreting the sensory signals.

1,628 citations


Book
25 Oct 2004
TL;DR: This book introduces the new world of text mining and examines proven methods for various critical text-mining tasks, such as automated document indexing and information retrieval and search, as well as new research areas that rely on evolving text- mining techniques.
Abstract: The growth of the web can be seen as an expanding public digital library collection. Online digital information extends far beyond the web and its publicly available information. Huge amounts of information are private and are of interest to local communities, such as the records of customers of a business. This information is overwhelmingly text and has its record-keeping purpose, but an automated analysis might be desirable to find patterns in the stored records. Analogous to this data mining is text mining, which also finds patterns and trends in information samples but which does so with far less structured--though with greater immediate utility for users--ingredients. This book focuses on the concepts and methods needed to expand horizons beyond structured, numeric data to automated mining of text samples. It introduces the new world of text mining and examines proven methods for various critical text-mining tasks, such as automated document indexing and information retrieval and search. New research areas are explored, such as information extraction and document summarization, that rely on evolving text-mining techniques.

596 citations


Book ChapterDOI
01 Jan 2004
TL;DR: In this paper, the impact of e-business on supply chain integration can be described along the dimensions of information integration, synchronized planning, coordinated workflow, and new business models, and significant value can be created by e-Business enabled supply-chain integration.
Abstract: e-Business has emerged as a key enabler to drive supply chain integration. Businesses can use the Internet to gain global visibility across their extended network of trading partners and help them respond quickly to changing customer demand captured over the Internet. The impact of e-business on supply chain integration can be described along the dimensions of information integration, synchronized planning, coordinated workflow, and new business models. As a result, many of the core supply chain principles and concepts can now be put into practice much more effectively using e-business. Significant value can be created by e-business enabled supply chain integration.

374 citations


Journal Article
TL;DR: Cooperative Information Systems (CoopIS) 2004 International Conference (International Conference on Cooperative Information Systems) PC Co-chairs' Message- Keynote- Business Process Optimization- Workflow/Process/Web Services, I- Discovering Workflow Transactional behavior from Event-based Log- A Flexible Mediation Process for Large Distributed Information Systems- Exception Handling Through a Workflow- WorkFlow/Process, Web Services, II- Flexible and Composite Schema Matching Algorithm- Analysis, Transformation, and Improvements of ebXML Choreographies based on Work
Abstract: Cooperative Information Systems (CoopIS) 2004 International Conference- CoopIS 2004 International Conference (International Conference on Cooperative Information Systems) PC Co-chairs' Message- Keynote- Business Process Optimization- Workflow/Process/Web Services, I- Discovering Workflow Transactional Behavior from Event-Based Log- A Flexible Mediation Process for Large Distributed Information Systems- Exception Handling Through a Workflow- Workflow/Process/Web Services, II- A Flexible and Composite Schema Matching Algorithm- Analysis, Transformation, and Improvements of ebXML Choreographies Based on Workflow Patterns- The Notion of Business Process Revisited- Workflow/Process/Web Services, III- Disjoint and Overlapping Process Changes: Challenges, Solutions, Applications- Untangling Unstructured Cyclic Flows - A Solution Based on Continuations- Making Workflow Models Sound Using Petri Net Controller Synthesis- Database Management/Transaction- Concurrent Undo Operations in Collaborative Environments Using Operational Transformation- Refresco: Improving Query Performance Through Freshness Control in a Database Cluster- Automated Supervision of Data Production - Managing the Creation of Statistical Reports on Periodic Data- Schema Integration/Agents- Deriving Sub-schema Similarities from Semantically Heterogeneous XML Sources- Supporting Similarity Operations Based on Approximate String Matching on the Web- Managing Semantic Compensation in a Multi-agent System- Modelling with Ubiquitous Agents a Web-Based Information System Accessed Through Mobile Devices- Events- A Meta-service for Event Notification- Classification and Analysis of Distributed Event Filtering Algorithms- P2P/Collaboration- A Collaborative Model for Agricultural Supply Chains- FairNet - How to Counter Free Riding in Peer-to-Peer Data Structures- Supporting Collaborative Layouting in Word Processing- A Reliable Content-Based Routing Protocol over Structured Peer-to-Peer Networks- Applications, I- Covering Your Back: Intelligent Virtual Agents in Humanitarian Missions Providing Mutual Support- Dynamic Modelling of Demand Driven Value Networks- An E-marketplace for Auctions and Negotiations in the Constructions Sector- Applications, II- Managing Changes to Engineering Products Through the Co-ordination of Human and Technical Activities- Towards Automatic Deployment in eHome Systems: Description Language and Tool Support- A Prototype of a Context-Based Architecture for Intelligent Home Environments- Trust/Security/Contracts- Trust-Aware Collaborative Filtering for Recommender Systems- Service Graphs for Building Trust- Detecting Violators of Multi-party Contracts- Potpourri- Leadership Maintenance in Group-Based Location Management Scheme- TLS: A Tree-Based DHT Lookup Service for Highly Dynamic Networks- Minimizing the Network Distance in Distributed Web Crawling- Ontologies, DataBases, and Applications of Semantics (ODBASE) 2004 International Conference- ODBASE 2004 International Conference (Ontologies, DataBases, and Applications of Semantics) PC Co-chairs' Message- Keynote- Helping People (and Machines) Understanding Each Other: The Role of Formal Ontology- Knowledge Extraction- Automatic Initiation of an Ontology- Knowledge Extraction from Classification Schemas- Semantic Web in Practice- Generation and Management of a Medical Ontology in a Semantic Web Retrieval System- Semantic Web Based Content Enrichment and Knowledge Reuse in E-science- The Role of Foundational Ontologies in Manufacturing Domain Applications- Intellectual Property Rights Management Using a Semantic Web Information System- Ontologies and IR- Intelligent Retrieval of Digital Resources by Exploiting Their Semantic Context- The Chrysostom Knowledge Base: An Ontology of Historical Interactions- Text Simplification for Information-Seeking Applications- Information Integration- Integration of Integrity Constraints in Federated Schemata Based on Tight Constraining- Modal Query Language for Databases with Partial Orders- Composing Mappings Between Schemas Using a Reference Ontology- Assisting Ontology Integration with Existing Thesauri

284 citations


Journal Article
TL;DR: In this paper, an approach for semantically enhanced collaborative filtering in which structured semantic knowledge about items, extracted automatically from the Web based on domain-specific reference ontologies, is used in conjunction with user-item mappings to create a combined similarity measure and generate predictions.
Abstract: Item-based Collaborative Filtering (CF) algorithms have been designed to deal with the scalability problems associated with traditional user-based CF approaches without sacrificing recommendation or prediction accuracy. Item-based algorithms avoid the bottleneck in computing user-user correlations by first considering the relationships among items and performing similarity computations in a reduced space. Because the computation of item similarities is independent of the methods used for generating predictions, multiple knowledge sources, including structured semantic information about items, can be brought to bear in determining similarities among items. The integration of semantic similarities for items with rating- or usage-based similarities allows the system to make inferences based on the underlying reasons for which a user may or may not be interested in a particular item. Furthermore, in cases where little or no rating (or usage) information is available (such as in the case of newly added items, or in very sparse data sets), the system can still use the semantic similarities to provide reasonable recommendations for users. In this paper, we introduce an approach for semantically enhanced collaborative filtering in which structured semantic knowledge about items, extracted automatically from the Web based on domain-specific reference ontologies, is used in conjunction with user-item mappings to create a combined similarity measure and generate predictions. Our experimental results demonstrate that the integrated approach yields significant advantages both in terms of improving accuracy, as well as in dealing with very sparse data sets or new items.

224 citations


Journal ArticleDOI
01 Sep 2004
TL;DR: The pros and cons of the current approaches and systems are identified and what an integration system for biologists ought to be are discussed.
Abstract: This paper surveys the area of biological and genomic sources integration, which has recently become a major focus of the data integration research field. The challenges that an integration system for biological sources must face are due to several factors such as the variety and amount of data available, the representational heterogeneity of the data in the different sources, and the autonomy and differing capabilities of the sources.This survey describes the main integration approaches that have been adopted. They include warehouse integration, mediator-based integration, and navigational integration. Then we look at the four major existing integration systems that have been developed for the biological domain: SRS, BioKleisli, TAMBIS, and DiscoveryLink. After analyzing these systems and mentioning a few others, we identify the pros and cons of the current approaches and systems and discuss what an integration system for biologists ought to be.

178 citations


Book ChapterDOI
08 Nov 2004
TL;DR: This paper presents an approach which constructs an instance graph for each individual process instance, based on information in the entire data set, in terms of Event-driven Process Chains (EPCs).
Abstract: Deploying process-driven information systems is a time-con-suming and error-prone task. Process mining attempts to improve this by automatically generating a process model from event-based data. Existing techniques try to generate a complete process model from the data acquired. However, unless this model is the ultimate goal of mining, such a model is not always required. Instead, a good visualization of each individual process instance can be enough. From these individual instances, an overall model can then be generated if required. In this paper, we present an approach which constructs an instance graph for each individual process instance, based on information in the entire data set. The results are represented in terms of Event-driven Process Chains (EPCs). This representation is used to connect our process mining to a widely used commercial tool for the visualization and analysis of instance EPCs.

167 citations


Book
20 Oct 2004
TL;DR: Addressing problems like missing conceptual models, unclear system boundaries, and heterogeneous representations, the authors design a framework for ontology-based information sharing in weakly structured environments like the Semantic Web.
Abstract: The large-scale and almost ubiquitous availability of information has become as much of a curse as it is a blessing. The more information is available, the harder it is to locate any particular piece of it. And even when it has been successfully found, it is even harder still to usefully combine it with other information we may already possess. This problem occurs at many different levels, ranging from the overcrowded disks of our own PCs to the mass of unstructured information on the World Wide Web. It is commonly understood that this problem of information sharing can only be solved by giving computers better access to the semantics of the information. While it has been recognized that ontologies play a crucial role in solving the open problems, most approaches rely on the existence of well-established data structures. To overcome these shortcomings, Stuckenschmidt and van Harmelen describe ontology-based approaches for resolving semantic heterogeneity in weakly structured environments, in particular the World Wide Web. Addressing problems like missing conceptual models, unclear system boundaries, and heterogeneous representations, they design a framework for ontology-based information sharing in weakly structured environments like the Semantic Web. For researchers and students in areas related to the Semantic Web, the authors provide not only a comprehensive overview of the State of the art, but also present in detail recent research in areas like ontology design for information integration, metadata generation and management, and representation and management of distributed ontologies. For professionals in areas such as e-commerce (e.g., the exchange of product knowledge) and knowledge management (e.g., in large and distributed organizations), the book provides decision support on the use of novel technologies, information about potential problems, and guidelines for the successful application of existing technologies.

165 citations


Proceedings ArticleDOI
22 Aug 2004
TL;DR: This paper develops the DCM framework, which consists of data preparation, dual mining of positive and negative correlations, and finally matching selection, and introduces a new correlation measure, $H$-measure, distinct from those proposed in previous work.
Abstract: To enable information integration, schema matching is a critical step for discovering semantic correspondences of attributes across heterogeneous sources. While complex matchings are common, because of their far more complex search space, most existing techniques focus on simple 1:1 matchings. To tackle this challenge, this paper takes a conceptually novel approach by viewing schema matching as correlation mining, for our task of matching Web query interfaces to integrate the myriad databases on the Internet. On this "deep Web," query interfaces generally form complex matchings between attribute groups (e.g., [author] corresponds to [first name, last name] in the Books domain). We observe that the co-occurrences patterns across query interfaces often reveal such complex semantic relationships: grouping attributes (e.g., [first name, last name]) tend to be co-present in query interfaces and thus positively correlated. In contrast, synonym attributes are negatively correlated because they rarely co-occur. This insight enables us to discover complex matchings by a correlation mining approach. In particular, we develop the DCM framework, which consists of data preparation, dual mining of positive and negative correlations, and finally matching selection. Unlike previous correlation mining algorithms, which mainly focus on finding strong positive correlations, our algorithm cares both positive and negative correlations, especially the subtlety of negative correlations, due to its special importance in schema matching. This leads to the introduction of a new correlation measure, $H$-measure, distinct from those proposed in previous work. We evaluate our approach extensively and the results show good accuracy for discovering complex matchings.

160 citations


Journal ArticleDOI
01 Sep 2004
TL;DR: This article describes how to support mediators in their source selection and query planning process and proposes three new merge operators, which formalize the integration of multiple source responses.
Abstract: For many information domains there are numerous World Wide Web data sources. The sources vary both in their extension and their intension: They represent different real-world entities with possible overlap and provide different attriouites of these entities. Mediator-based information systems allow integrated access to such sources by providing a common schema against which the user can pose queries. Given a query, the mediator must determine which participating sources to access and how to integrate the incoming results.This article describes how to support mediators in their source selection and query planning process. We propose three new merge operators, which formalize the integration of multiple source responses. A completeness model describes the usefulness of a source to answer a query. The completeness measure incorporates both extensional value (called coverage) and intensional value (called density) of a source. We show how to determine the completeness of single sources and of combinations of sources under the new merge operators. Finally, we show how to use the measure for source selection and query planning.

148 citations


Book ChapterDOI
25 Mar 2004
TL;DR: In this article, a generic framework for transforming heterogeneous data within scientific workflows is defined, which relies on a formalized ontology, which serves as a simple, unstructured global schema.
Abstract: Ecologists spend considerable effort integrating heterogeneous data for statistical analyses and simulations, for example, to run and test predictive models. Our research is focused on reducing this effort by providing data integration and transformation tools, allowing researchers to focus on “real science,” that is, discovering new knowledge through analysis and modeling. This paper defines a generic framework for transforming heterogeneous data within scientific workflows. Our approach relies on a formalized ontology, which serves as a simple, unstructured global schema. In the framework, inputs and outputs of services within scientific workflows can have structural types and separate semantic types (expressions of the target ontology). In addition, a registration mapping can be defined to relate input and output structural types to their corresponding semantic types. Using registration mappings, appropriate data transformations can then be generated for each desired service composition. Here, we describe our proposed framework and an initial implementation for services that consume and produce XML data.

Journal ArticleDOI
TL;DR: The enterprise information integration framework defines four levels of the enterprise system to identify the obstacles and to define the information integration types encountered at each level and is used to analyse the currently used technologies and promising technologies toward enterprise information Integration.
Abstract: Organizations face the challenging task of integrating their distributed organizational units, information systems, and business processes for improved operation and attainment of organizational goals. There is the difficulty of dealing with heterogeneous applications that use different formats (syntax) and apply different meanings (semantics) to the data. There is the difficulty of coordinating the workflow so as the disparate organizational units act as a harmonious whole. The broad scope of the enterprise integration problem precludes approaches that tackle the entire problem but rather requires approaches that address a limited but useful integration type. The various information integration types and how they are related to each other is poorly defined. This article presents an enterprise information integration framework that aims to coalesce parallel approaches towards integration so that the information integration problem can be better understood as a whole. The enterprise information integration...

Proceedings Article
22 Aug 2004
TL;DR: In this article, a peer can answer queries by reasoning from its local (propositional) theory but can also ask queries to some other peers with which it is semantically related by sharing part of its vocabulary.
Abstract: In a peer-to-peer system, there is no centralized control or hierarchical organization: each peer is equivalent in functionality and cooperates with other peers in order to solve a collective task. Such systems have evolved from simple keyword-based peer-to-peer file sharing systems like Napster and Gnutella to schema-based peer data management systems like Edutella [3] or Piazza [2], which handle semantic data description and support complex queries for data retrieval. In this paper, we are interested in peer-to-peer inference systems in which each peer can answer queries by reasoning from its local (propositional) theory but can also ask queries to some other peers with which it is semantically related by sharing part of its vocabulary. This framework encompasses several applications like peer-to-peer information integration systems or intelligent agents, in which each peer has its own knowledge (about its data or its expertise domain) and some partial knowledge about some other peers. In this setting, when it is solicited to perform a reasoning task and if it cannot solve completely that task locally, a peer must be able to distribute appropriate reasoning subtasks among its acquainted peers. The contribution of this paper is the first consequence finding algorithm in a peer-to-peer setting: it is anytime and computes consequences gradually from the solicited peer to peers that are more and more distant. We have exhibited a sufficient condition on the acquaintance graph of the peer-to-peer inference system for guaranteeing the completeness of this algorithm. Our algorithm splits clauses if they involve vocabularies of several peers. Each piece of a splitted clause is transmitted to the corresponding theory to find its consequences. The consequences that are found for each piece of splitted clause must be recomposed to get the consequences of that clause.

Journal Article
TL;DR: In this article, the concept of semantic overlay clusters (SOC) is introduced for super-peer networks enabling a controlled distribution of peers to clusters, based on predefined policies defined by human experts.
Abstract: When joining information provider peers to a peer-to-peer network, an arbitrary distribution is sub-optimal. In fact, clustering peers by their characteristics, enhances search and integration significantly. Currently super-peer networks, such as the Edutella network, provide no sophisticated means for such a semantic clustering of peers. We introduce the concept of semantic overlay clusters (SOC) for super-peer networks enabling a controlled distribution of peers to clusters. In contrast to the recently announced semantic overlay network approach designed for flat, pure peer-to-peer topologies and for limited meta data sets, such as simple filenames, we allow a clustering of complex heterogeneous schemes known from relational databases and use advantages of super-peer networks, such as efficient search and broadcast of messages. Our approach is based on predefined policies defined by human experts. Based on such policies a fully decentralized broadcast-and matching approach distributes the peers automatically to super-peers. Thus we are able to automate the integration of information sources in super-peer networks and reduce flooding of the network with messages.

Book ChapterDOI
TL;DR: A methodology for harvesting information from large distributed repositories (e.g. large Web sites) with minimum user intervention is described and its implementation in the Armadillo system is described.
Abstract: In this paper we describe a methodology for harvesting information from large distributed repositories (e.g. large Web sites) with minimum user intervention. The methodology is based on a combination of information extraction, information integration and machine learning techniques. Learning is seeded by extracting information from structured sources (e.g. databases and digital libraries) or a user-defined lexicon. Retrieved information is then used to partially annotate documents. Annotated documents are used to bootstrap learning for simple Information Extraction (IE) methodologies, which in turn will produce more annotation to annotate more documents that will be used to train more complex IE engines and so on. In this paper we describe the methodology and its implementation in the Armadillo system, compare it with the current state of the art, and describe the details of an implemented application. Finally we draw some conclusions and highlight some challenges and future work.

Journal Article
TL;DR: This paper aims at contrasting the different characteristics of both approaches of information integration and process integration, and concludes with recommendations according to the intended organisational scope of integration.
Abstract: IT managers in administration must decide how to contribute to cross-organisational integration and what strategy and means to choose for achieving interoperability. Comparing the frameworks and guidelines provided by central European and U.S. governmental units, we find information integration and process integration as prominent concepts to guide interoperability efforts, but they seem to point to different directions. This paper aims at contrasting the different characteristics of both approaches and concludes with recommendations according to the intended organisational scope of integration. To be successful in these efforts it is important to understand that (a) interoperability requires a guiding vision of integration, (b) each type of integration points to a different set of interrelated ideas, assumptions and technical means, and (c) integration implies a strategic commitment to explicit forms of cross-organisational cooperation and their implementation.

Book ChapterDOI
08 Nov 2004
TL;DR: In this paper, the authors present a framework for the design of the Data Warehouse back-stage (and the respective ETL processes) based on the key observation that this task fundamentally involves dealing with the specificities of information at very low levels of granularity including transformation rules at the attribute level.
Abstract: In Data Warehouse (DW) scenarios, ETL (Extraction, Transformation, Loading) processes are responsible for the extraction of data from heterogeneous operational data sources, their transformation (conversion, cleaning, normalization, etc.) and their loading into the DW. In this paper, we present a framework for the design of the DW back-stage (and the respective ETL processes) based on the key observation that this task fundamentally involves dealing with the specificities of information at very low levels of granularity including transformation rules at the attribute level. Specifically, we present a disciplined framework for the modeling of the relationships between sources and targets in different levels of granularity (including coarse mappings at the database and table levels to detailed inter-attribute mappings at the attribute level). In order to accomplish this goal, we extend UML (Unified Modeling Language) to model attributes as first-class citizens. In our attempt to provide complementary views of the design artifacts in different levels of detail, our framework is based on a principled approach in the usage of UML packages, to allow zooming in and out the design of a scenario.

Book ChapterDOI
25 Mar 2004
TL;DR: It is argued that formal principles governing best practices in classification and definition have for too long been neglected in the construction of biomedical ontologies, in ways which have important negative consequences for data integration and ontology alignment, and that the use of such principles in ontology construction can serve as a valuable tool in error-detection and in supporting reliable manual curation.
Abstract: Formal principles governing best practices in classification and definition have for too long been neglected in the construction of biomedical ontologies, in ways which have important negative consequences for data integration and ontology alignment. We argue that the use of such principles in ontology construction can serve as a valuable tool in error-detection and also in supporting reliable manual curation. We argue also that such principles are a prerequisite for the successful application of advanced data integration techniques such as ontology-based multi-database querying, automated ontology alignment and ontology-based text-mining. These theses are illustrated by means of a case study of the Gene Ontology, a project of increasing importance within the field of biomedical data integration.

Book ChapterDOI
30 Aug 2004
TL;DR: In this article, the authors compare information integration and process integration as prominent concepts to guide interoperability efforts, but they seem to point to different directions, and conclude with recommendations according to the intended organisational scope of integration.
Abstract: IT managers in administration must decide how to contribute to cross-organisational integration and what strategy and means to choose for achieving interoperability. Comparing the frameworks and guidelines provided by central European and U.S. governmental units, we find information integration and process integration as prominent concepts to guide interoperability efforts, but they seem to point to different directions. This paper aims at contrasting the different characteristics of both approaches and concludes with recommendations according to the intended organisational scope of integration. To be successful in these efforts it is important to understand that (a) interoperability requires a guiding vision of integration, (b) each type of integration points to a different set of interrelated ideas, assumptions and technical means, and (c) integration implies a strategic commitment to explicit forms of cross-organisational cooperation and their implementation.

Proceedings ArticleDOI
24 May 2004
TL;DR: This research seeks to enhance both the conceptual and practical models of III by building new understanding of the interaction among the social and technical processes in interorganizational information integration.
Abstract: Integrating and sharing information in multi-organizational government settings involves complex interactions within social and technological contexts. These integration processes often involve new work processes and significant organizational change. They are also embedded in larger political and institutional environments, which shape their goals and circumscribe their choices. The purpose of this research is to develop and test dynamic models of information integration in these settings.

Book ChapterDOI
TL;DR: This paper describes how to assemble a Semantic Web of workflow provenance logs that allows a bioinformatician to browse and navigate between experimental components by generating hyperlinks based on semantic annotations associated with them.
Abstract: e-Science experiments are those performed using computer-based resources such as database searches, simulations or other applications Like their laboratory based counterparts, the data associated with an e-Science experiment are of reduced value if other scientists are not able to identify the origin, or provenance, of those data Provenance is the term given to metadata about experiment processes, the derivation paths of data, and the sources and quality of experimental components, which includes the scientists themselves, related literature, etc Consequently provenance metadata are valuable resources for e-Scientists to repeat experiments, track versions of data and experiment runs, verify experiment results, and as a source of experimental insight One specific kind of in silico experiment is a workflow In this paper we describe how we can assemble a Semantic Web of workflow provenance logs that allows a bioinformatician to browse and navigate between experimental components by generating hyperlinks based on semantic annotations associated with them By associating well-formalized semantics with workflow logs we take a step towards integration of process provenance information and improved knowledge discovery

Journal ArticleDOI
TL;DR: This work proposes two design patterns that provide a flexible basis for the integration of different tool data at the meta-model level and describes rule-based mechanisms providing generic solutions for managing overlapping and redundant data.
Abstract: Today’s development processes employ a variety of notations and tools, e.g., the Unified Modeling Language UML, the Standard Description Language SDL, requirements databases, design tools, code generators, model checkers, etc. For better process support, the employed tools may be organized within a tool suite or integration platform, e.g., Rational Rose or Eclipse. While these tool-integration platforms usually provide GUI adaption mechanisms and functional adaption via application programming interfaces, they frequently do not provide appropriate means for data integration at the meta-model level. Thus, overlapping and redundant data from different “integrated” tools may easily become inconsistent and unusable. We propose two design patterns that provide a flexible basis for the integration of different tool data at the meta-model level. To achieve consistency between meta-models, we describe rule-based mechanisms providing generic solutions for managing overlapping and redundant data. The proposed mechanisms are widely used within the Fujaba Tool Suite. We report about our implementation and application experiences .

Book
06 Sep 2004
TL;DR: The Buster approach for Terminological, Spatial, and Temporal Representation and Reasoning for Semantic Translation and Implementation issues and System Demonstration are described.
Abstract: and Related Work.- Related Work.- The Buster Approach for Terminological, Spatial, and Temporal Representation and Reasoning.- General Approach of Buster.- Terminological Representation and Reasoning, Semantic Translation.- Spatial Representation and Reasoning.- Temporal Representation and Reasoning.- Implementation, Conclusion, and Future Work.- Implementation Issues and System Demonstration.- Conclusion and Future Work.- References.

Book ChapterDOI
08 Nov 2004
TL;DR: This paper presents a set of techniques to provide a lossless mapping of an OWL ontology to a relational schema and the corresponding instances to data and presents preliminary experiments that compare the efficiency of the mapping techniques in terms of query performance.
Abstract: The semantic web envisions a World Wide Web in which data is described with rich semantics and applications can pose complex queries. Ontologies, a cornerstone of the semantic web, have gained wide popularity as a model of information in a given domain that can be used for many purposes, including enterprise integration, database design, information retrieval and information interchange on the World Wide Web. Much of the current focus on ontologies has been on the development of languages such as DAML+OIL and OWL that enable the creation of ontologies and provide extensive semantics for Web data, and on answering intensional queries, that is, queries about the structure of an ontology. However, it is almost certain that the many of the semantic web queries will be extensional and to flourish, the semantic web will need to accommodate the huge amounts of existing data that is described by the ontologies and the applications that operate on them. Given the established record of relational databases to store and query large amounts of data, in this paper we present a set of techniques to provide a lossless mapping of an OWL ontology to a relational schema and the corresponding instances to data. We present preliminary experiments that compare the efficiency of the mapping techniques in terms of query performance.

Proceedings ArticleDOI
09 Aug 2004
TL;DR: This paper explores an aggregate set of metrics for fusion evaluation and demonstrates with information need metrics for dynamic situation analysis.
Abstract: To design information fusion systems, it is important to develop metrics as part of a test and evaluation strategy. In many cases, fusion systems are designed to (1) meet a specific set of user information needs (IN), (2) continuously validate information pedigree and updates, and (3) maintain this performance under changing conditions. A fusion system’s performance is evaluated in many ways. However, developing a consistent set of metrics is important for standardization. For example, many track and identification metrics have been proposed for fusion analysis. To evaluate a complete fusion system performance, level 4 sensor management and level 5 user refinement metrics need to be developed simultaneously to determine whether or not the fusion system is meeting information needs. To describe fusion performance, the fusion community needs to agree on a minimum set of metrics for user assessment and algorithm comparison. We suggest that such a minimum set should include feasible metrics of accuracy, confidence, throughput, timeliness, and cost. These metrics can be computed as confidence (probability), accuracy (error), timeliness (delay), throughput (amount) and cost (dollars). In this paper, we explore an aggregate set of metrics for fusion evaluation and demonstrate with information need metrics for dynamic situation analysis.

Journal ArticleDOI
TL;DR: An overview of important integration issues that should be considered when designing a bioinformatics integration system is provided and agent technology is introduced and it is argued why it provides an appropriate solution for designing bioinformatic integration systems.

Book ChapterDOI
07 Jun 2004
TL;DR: The AutoMed repository and some associated tools are described, which provide the first implementation of the both as view (BAV) approach to data integration, and how several practical problems in data integration between heterogeneous data sources have been solved.
Abstract: This paper describes the AutoMed repository and some associated tools, which provide the first implementation of the both as view (BAV) approach to data integration. Apart from being a highly expressive data integration approach, BAV in additional provides a method to support a wide range of data modelling languages, and describes transformations between those data modelling languages. This paper documents how BAV has been implemented in the AutoMed repository, and how several practical problems in data integration between heterogeneous data sources have been solved. We illustrate the implementation with examples in the relational, ER, and semi-structured data models.


Journal ArticleDOI
TL;DR: The impact of data mining is examined by reviewing existing applications, including personalized environments, electronic commerce, and search engines, to identify the limitations of current work and raise several directions for future research.
Abstract: The information explosion is a serious challenge for current information institutions. On the other hand, data mining, which is the search for valuable information in large volumes of data, is one of the solutions to face this challenge. In the past several years, data mining has made a significant contribution to the field of information science. This paper examines the impact of data mining by reviewing existing applications, including personalized environments, electronic commerce, and search engines. For these three types of application, how data mining can enhance their functions is discussed. The reader of this paper is expected to get an overview of the state of the art research associated with these applications. Furthermore, we identify the limitations of current work and raise several directions for future research.

01 Jan 2004
TL;DR: The RAND Corporation has been participating in the Information Superiority Metrics Working Group, whose purpose is to describe key concepts and related metrics that are necessary to explore part of the proposed network-centric warfare value chain.
Abstract: : The military is formulating new visions, strategies, and concepts that capitalize on emerging information-age technologies to provide its warfighters with significantly improved capabilities to meet the national security challenges of the 21st century. These programs are described in such documents as the Quadrennial Defense Review, Joint Vision 2020, a variety of publications describing network-centric warfare (NCW), and other documents describing military transformation. Joint Vision 2020 provides an important starting point for describing a future warfighting concept that has since evolved into NCW. A key tenet of Joint Vision 2020 is that information superiority will enable decision dominance, new Joint operational concepts, and a decisive advantage over future adversaries. To create and leverage information superiority, it is foreseen that, under some circumstances, a mix of command, control, communications, computers, intelligence, surveillance, and reconnaissance (C4ISR) capabilities would interoperate with weapon systems and forces on an end-to-end basis through a network-centric information environment to achieve significant improvements in awareness, shared awareness, and synchronization. The military is embarked on a series of analyses and experiments to improve its understanding of the potential of these NCW concepts. The Assistant Secretary of Defense for Networks and Information Integration (ASD NII), through the Command and Control Research Program, asked RAND to help develop methods and tools that could improve the assessment of C4ISR capabilities and processes to the achievement of NCW concepts, including awareness, shared awareness, and synchronization. In response to this request, the RAND Corporation has been participating in the Information Superiority Metrics Working Group, under the auspices of ASD NII. The group s purpose is to describe key concepts and related metrics that are necessary to explore part of the proposed NCW value chain.