scispace - formally typeset
Search or ask a question
Topic

Ontology-based data integration

About: Ontology-based data integration is a research topic. Over the lifetime, 11065 publications have been published within this topic receiving 216888 citations.


Papers
More filters
Journal IssueDOI
TL;DR: A collaboratively engineered general-purpose knowledge management (KM) ontology that can be used by practitioners, researchers, and educators is described that evolved from a Delphi-like process involving a diverse panel of over 30 KM practitioners and researchers.
Abstract: This article describes a collaboratively engineered general-purpose knowledge management (KM) ontology that can be used by practitioners, researchers, and educators. The ontology is formally characterized in terms of nearly one hundred definitions and axioms that evolved from a Delphi-like process involving a diverse panel of over 30 KM practitioners and researchers. The ontology identifies and relates knowledge manipulation activities that an entity (e.g., an organization) can perform to operate on knowledge resources. It introduces a taxonomy for these resources, which indicates classes of knowledge that may be stored, embedded, andsor represented in an entity. It recognizes factors that influence the conduct of KM both within and across KM episodes. The Delphi panelists judge the ontology favorably overall: its ability to unify KM concepts, its comprehensiveness, and utility. Moreover, various implications of the ontology for the KM field are examined as indicators of its utility for practitioners, educators, and researchers. © 2005 Wiley Periodicals, Inc.

107 citations

Book ChapterDOI
01 Jun 2008
TL;DR: This work proposes a novel clustered-graph structure that corresponds to only a summary of the original ontology, and adopts several mechanisms for query ranking, which can consider many factors such as the query length, the relevance of ontology elements w.r.t. the query and the importance of ontological elements.
Abstract: The increasing amount of data on the Semantic Web offers opportunities for semantic search However, formal query hinders the casual users in expressing their information need as they might be not familiar with the query's syntax or the underlying ontology Because keyword interfaces are easier to handle for casual users, many approaches aim to translate keywords to formal queries However, these approaches yet feature only very basic query ranking and do not scale to large repositories We tackle the scalability problem by proposing a novel clustered-graph structure that corresponds to only a summary of the original ontology The so reduced data space is then used in the exploration for the computation of top-k queries Additionally, we adopt several mechanisms for query ranking, which can consider many factors such as the query length, the relevance of ontology elements wrt the query and the importance of ontology elements The experimental results performed against our implemented system Q2Semantic show that we achieve good performance on many datasets of different sizes

107 citations

Book
01 Jan 2007
TL;DR: A Practitioner's Guide to Data Management and Data Integration in Bioinformatics Barbara A. Eckman, Zoe Lacroix and Terence Critchlow.
Abstract: 1 Introduction Zoe Lacroix and Terence Critchlow 1.1 Overview 1.2 Problem and Scope 1.3 Biological Data Integration 1.4 Developing a Biological Data Integration System 1.4.1 Specifications 1.4.2 Translating Specifications into a Technical Approach 1.4.3 Development Process 1.4.4 Evaluation of the System References 2 Challenges Faced in the Integration of Biological Information Su Yun Chung and John C. Wooley 2.1 The Life Science Discovery Process 2.2 An Information Integration Environment for Life Science Discovery 2.3 The Nature of Biological Data 2.3.1 Diversity 2.3.2 Variability 2.4 Data Sources in Life Science 2.4.1 Biological Databases Are Autonomous 2.4.2 Biological Databases Are Heterogeneous in Data Formats 2.4.3 Biological Data Sources Are Dynamic 2.4.4 Computational Analysis Tools Require Specific Input/Output Formats and Broad Domain Knowledge 2.5 Challenges in Information Integration 2.5.1 Data Integration 2.5.2 Meta-Data Specification 2.5.3 Data Provenance and Data Accuracy 2.5.4 Ontology 2.5.5 Web Presentations Conclusion References 3 A Practitioner's Guide to Data Management and Data Integration in Bioinformatics Barbara A. Eckman 3.1 Introduction 3.2 Data Management in Bioinformatics 3.2.1 Data Management Basics 3.2.2 Two Popular Data Management Strategies and Their Limitations 3.2.3 Traditional Database Management 3.3 Dimensions Describing the Space of Integration Solutions 3.3.1 A Motivating Use Case for Integration 3.3.2 Browsing vs. Querying 3.3.3 Syntactic vs. Semantic Integration 3.3.4 Warehouse vs. Federation 3.3.5 Declarative vs. Procedural Access 3.3.6 Generic vs. Hard-Coded 3.3.7 Relational vs. Non-Relational Data Model 3.4 Use Cases of Integration Solutions 3.4.1 Browsing-Driven Solutions 3.4.2 Data Warehousing Solutions 3.4.3 Federated Database Systems Approach 3.4.4 Semantic Data Integration 3.5 Strengths and Weaknesses of the Various Approaches to Integration 3.5.1 Browsing and Querying: Strengths and Weaknesses 3.5.2 Warehousing and Federation: Strengths and Weaknesses 3.5.3 Procedural Code and Declarative Query Language: Strengths and Weaknesses 3.5.4 Generic and Hard-Coded Approaches: Strengths and Weaknesses 3.5.5 Relational and Non-Relational Data Models: Strengths and Weaknesses 3.5.6 Conclusion: A Hybrid Approach to Integration Is Ideal 3.6 Tough Problems in Bioinformatics Integration 3.6.1 Semantic Query Planning Over Web Data Sources 3.6.2 Schema Management 3.7 Summary Acknowledgments References 4 Issues to Address While Designing a Biological Information System Zoe Lacroix 4.1 Legacy 4.1.1 Biological Data 4.1.2 Biological Tools and Workflows 4.2 A Domain in Constant Evolution 4.2.1 Traditional Database Management and Changes 4.2.2 Data Fusion 4.2.3 Fully Structured vs. Semi-Structured 4.2.4 Scientific Object Identity 4.2.5 Concepts and Ontologies 4.3 Biological Queries 4.3.1 Searching and Mining 4.3.2 Browsing 4.3.3 Semantics of Queries 4.3.4 Tool-Driven vs. Data-Driven Integration 4.4 Query Processing 4.4.1 Biological Resources 4.4.2 Query Planning 4.4.3 Query Optimization 4.5 Visualization 4.5.1 Multimedia Data 4.5.2 Browsing Scientific Objects 4.6 Conclusion Acknowledgments References 5 SRS: An Integration Platform for Databanks and Analysis Tools in Bioinformatics Thure Etzold, Howard Harris, and Simon Beaulah 5.1 Integrating Flat File Databanks 5.1.1 The SRS Token Server 5.1.2 Subentry Libraries 5.2 Integration of XML Databases 5.2.1 What Makes XML Unique? 5.2.2 How Are XML Databanks Integrated into SRS? 5.2.3 Overview of XML Support Features 5.2.4 How Does SRS Meet the Challenges of XML? 5.3 Integrating Relational Databases 5.3.1 Whole Schema Integration 5.3.2 Capturing the Relational Schema 5.3.3 Selecting a Hub Table 5.3.4 Generation of SQL 5.3.5 Restricting Access to Parts of the Schema 5.3.6 Query Performance to Relational Databases 5.3.7 Viewing Entries from a Relational Databank 5.3.8 Summary 5.4 The SRS Query Language 5.4.1 SRS Fields 5.5 Linking Databanks 5.5.1 Constructing Links 5.5.2 The Link Operators 5.6 The Object Loader 5.6.1 Creating Complex and Nested Objects 5.6.2 Support for Loading from XML Databanks 5.6.3 Using Links to Create Composite Structures 5.6.4 Exporting Objects to XML 5.7 Scientific Analysis Tools 5.7.1 Processing of Input and Output 5.7.2 Batch Queues 5.8 Interfaces to SRS 5.8.1 The Web Interface 5.8.2 SRS Objects 5.8.3 SOAP and Web Services 5.9 Automated Server Maintenance with SRS Prisma 5.10 Conclusion References 6 The Kleisli Query System as a Backbone for Bioinformatics Data Integration and Analysis Jing Chen, Su Yun Chung, and Limsoon Wong 6.1 Motivating Example 6.2 Approach 6.3 Data Model and Representation 6.4 Query Capability 6.5 Warehousing Capability 6.6 Data Sources 6.7 Optimizations 6.7.1 Monadic Optimizations 6.7.2 Context-Sensitive Optimizations 6.7.3 Relational Optimizations 6.8 User Interfaces 6.8.1 Programming Language Interface 6.8.2 Graphical Interface 6.9 Other Data Integration Technologies 6.9.1 SRS 6.9.2 DiscoveryLink 6.9.3 Object-Protocol Model (OPM) 6.10 Conclusions References 7 Complex Query Formulation Over Diverse Information Sources in TAMBIS Robert Stevens, Carole Goble, Norman W. Paton, Sean Bechhofer, Gary Ng, Patricia Baker, and Andy Brass 7.1 The Ontology 7.2 The User Interface 7.2.1 Exploring the Ontology 7.2.2 Constructing Queries 7.2.3 The Role of Reasoning in Query Formulation 7.3 The Query Processor 7.3.1 The Sources and Services Model 7.3.2 The Query Planner 7.3.3 The Wrappers 7.4 Related Work x Contents 7.4.1 Information Integration in Bioinformatics 7.4.2 Knowledge Based Information Integration 7.4.3 Biological Ontologies 7.5 Current and Future Developments in TAMBIS 7.5.1 Summary Acknowledgments References 8 The Information Integration System K2 Val Tannen, Susan B. Davidson, and Scott Harker 8.1 Approach 8.2 Data Model and Languages 8.3 An Example 8.4 Internal Language 8.5 Data Sources 8.6 Query Optimization 8.7 User Interfaces 8.8 Scalability 8.9 Impact 8.10 Summary Acknowledgments References 9 P/FDM Mediator for a Bioinformatics Database Federation Graham J. L. Kemp and Peter M. D. Gray 9.1 Approach 9.1.1 Alternative Architectures for Integrating Databases 9.1.2 The Functional Data Model 9.1.3 Schemas in the Federation 9.1.4 Mediator Architecture 9.1.5 Example 9.1.6 Query Capabilities 9.1.7 Data Sources 9.2 Analysis 9.2.1 Optimization 9.2.2 User Interfaces 9.2.3 Scalability 9.3 Conclusions Acknowledgment References 10 Integration Challenges in Gene Expression Data Management Victor M. Markowitz, John Campbell, I-Min A. Chen, Anthony Kosky, Krishna Palaniappan, and Thodoros Topaloglou 10.1 Gene Expression Data Management: Background 10.1.1 Gene Expression Data Spaces 10.1.2 Standards: Benefits and Limitations 10.2 The GeneExpress System 10.2.1 GeneExpress System Components 10.2.2 GeneExpress Deployment and Update Issues 10.3 Managing Gene Expression Data: Integration Challenges 10.3.1 Gene Expression Data: Array Versions 10.3.2 Gene Expression Data: Algorithms and Normalization 10.3.3 Gene Expression Data: Variability 10.3.4 Sample Data 10.3.5 Gene Annotations 10.4 Integrating Third-Party Gene Expression Data in GeneExpress 10.4.1 Data Exchange Formats 10.4.2 Structural Data Transformation Issues 10.4.3 Semantic Data Mapping Issues 10.4.4 Data Loading Issues 10.4.5 Update Issues 10.5 Summary Acknowledgments Trademarks References 11 DiscoveryLink Laura M. Haas, Barbara A. Eckman, Prasad Kodali, Eileen T. Lin, Julia E. Rice, and Peter M. Schwarz 11.1 Approach 11.1.1 Architecture 11.1.2 Registration 11.2 Query Processing Overview 11.2.1 Query Optimization 11.2.2 An Example 11.2.3 Determining Costs 11.3 Ease of Use, Scalability, and Performance 11.4 Conclusions References 12 A Model-Based Mediator System for Scientific Data Management Bertram Ludascher, Amarnath Gupta, and Maryann E. Martone 12.1 Background 12.2 Scientific Data Integration Across Multiple Worlds: Examples and Challenges from the Neurosciences 12.2.1 From Terminology and Static Knowledge to Process Context 12.3 Model-Based Mediation 12.3.1 Model-Based Mediation: The Protagonists 12.3.2 Conceptual Models and Registration of Sources at the Mediator 12.3.3 Interplay Between Mediator and Sources 12.4 Knowledge Representation for Model-Based Mediation 12.4.1 Domain Maps 12.4.2 Process Maps 12.5 Model-Based Mediator System and Tools 12.5.1 The KIND Mediator Prototype 12.5.2 The Cell-Centered Database and SMART Atlas: Retrieval and Navigation Through Multi-Scale Data 12.6 Related Work and Conclusion 12.6.1 Related Work 12.6.2 Summary: Model-Based Mediation and Reason-Able Meta-Data Acknowledgments References 13 Compared Evaluation of Scientific Data Management Systems Zoe Lacroix and Terence Critchlow 13.1 Performance Model 13.1.1 Evaluation Matrix 13.1.2 Cost Model 13.1.3 Benchmarks 13.1.4 User Survey 13.2 Evaluation Criteria 13.2.1 The Implementation Perspective 13.2.2 The User Perspective 13.3 Tradeoffs 13.3.1 Materialized vs. Non-Materialized 13.3.2 Data Distribution and Heterogeneity 13.3.3 Semi-Structured Data vs. Fully Structured Data 13.3.4 Text Retrieval 13.3.5 Integrating Applications 13.4 Summary References Concluding Remarks Summary Looking Toward the Future Appendix: Biological Resources Glossary System Information SRS Kleisli TAMBIS K2 P/FDM Mediator GeneExpress DiscoveryLink KIND Index

107 citations

Book ChapterDOI
03 Jun 2007
TL;DR: This work has developed guidelines and a set of methodological tools based on the notions of "normalization" and "stable metrics" for creating ontology metrics that allow the metric author to decide which properties metrics need to fulfil and to appropriately design the desired metric.
Abstract: You can only control what you can measure. Measuring ontologies is necessary to evaluate ontologies both during engineering and application. Metrics allow the fast and simple assessment of an ontology and also to track their subsequent evolution. In the last few years, a growing number of ontology metrics and measures have been suggested and defined. But many of them suffer from a recurring set of problems, most importantly they do not take the semantics of the ontology language properly into account. The work presented here is a principal approach to facilitate the creation of ontology metrics with the clear goal to go beyond structural metrics to proper semantic-aware ontology metrics. We have developed guidelines and a set of methodological tools based on the notions of "normalization" and "stable metrics" for creating ontology metrics. These guidelines allow the metric author to decide which properties metrics need to fulfil and to appropriately design the desired metric. A discussion of an exemplary metric (taken from literature) illustrates and motivates the issues and suggested solutions.

106 citations

Proceedings Article
01 May 2001
TL;DR: The vision of ontology learning that is proposed here includes a number of complementary disciplines that feed on different types of unstructured, semi-structured and fully structured data in order to support a semi-automatic, cooperative ontology engineering process.
Abstract: The Semantic Web relies heavily on the formal ontologies that structure underlying data for the purpose of comprehensive and transportable machine understanding. Therefore, the success of the Semantic Web depends strongly on the proliferation of ontologies, which requires fast and easy engineering of ontologies and avoidance of a knowledge acquisition bottleneck. Ontology Learning greatly facilitates the construction of ontologies by the ontology engineer. The vision of ontology learning that we propose here includes a number of complementary disciplines that feed on different types of unstructured, semi-structured and fully structured data in order to support a semi-automatic, cooperative ontology engineering process. Our ontology learning framework proceeds through ontology import, extraction, pruning, refinement, and evaluation giving the ontology engineer a wealth of coordinated tools for ontology modeling. Besides of the general framework and architecture, we show in this paper some exemplary techniques in the ontology learning cycle that we have implemented in our ontology learning environment, Text-To-Onto, such as ontology learning from free text, from dictionaries, or from legacy ontologies, and refer to some others that need to complement the complete architecture, such as reverse engineering of ontologies from database schemata or learning from XML documents.

105 citations


Network Information
Related Topics (5)
Server
79.5K papers, 1.4M citations
84% related
Graph (abstract data type)
69.9K papers, 1.2M citations
84% related
Software development
73.8K papers, 1.4M citations
84% related
User interface
85.4K papers, 1.7M citations
84% related
Support vector machine
73.6K papers, 1.7M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202337
2022149
202111
202011
201919
201843