scispace - formally typeset
Search or ask a question

Showing papers on "Ontology-based data integration published in 2021"


Journal ArticleDOI
TL;DR: Taking into account the modern society in which a flood of information occurs, yamato has a sophisticated theory of informational objects (representations) and quality and quantity are carefully organized for the sake of greater interoperability of real-world data.
Abstract: Upper ontology plays critical roles in ontology development by giving developers a guideline of how to view the target domain. Although some upper ontologies such as DOLCE, BFO, GFO, SUMO, CYC, etc. are already developed and extensively used, a careful examination of them reveals some room for improvement in a couple of respects. This paper discusses YAMATO 1 : Yet Another More Advanced Top-level Ontology which has been developed intended to cover three features in Quality description, Representation and Process/Event, respectively, in a better way than existing ontologies.

57 citations


Journal ArticleDOI
TL;DR: This study selected and reviewed 24 data tools based on common use cases of data across the building life cycle, from design to construction, commissioning, operation, and retrofits, and recommended recommendations for future research are provided for the data and buildings community based on the FAIR principles.
Abstract: Building information modeling (BIM) has been widely adopted for representing and exchanging building data across disciplines during building design and construction. However, BIM's use in the building operation phase is limited. With the increasing deployment of low-cost sensors and meters, as well as affordable digital storage and computing technologies, growing volumes of data have been collected from buildings, their energy services systems, and occupants. Such data are crucial to help decision makers understand what, how, and when energy is consumed in buildings—a critical step to improving building performance for energy efficiency, demand flexibility, and resilience. However, practical analyses and use of the collected data are very limited due to various reasons, including poor data quality, ad-hoc representation of data, and lack of data science skills. To unlock value from building data, there is a strong need for a toolchain to curate and represent building information and performance data in common standardized terminologies and schemas, to enable interoperability between tools and applications. This study selected and reviewed 24 data tools based on common use cases of data across the building life cycle, from design to construction, commissioning, operation, and retrofits. The selected data tools are grouped into three categories: (1) data dictionary or terminology, (2) data ontology and schemas, and (3) data platforms. The data are grouped into ten typologies covering most types of data collected in buildings. This study resulted in five main findings: (1) most data representation tools can represent their intended data typologies well, such as Green Button for smart meter data and Brick schema for metadata of sensors in buildings and HVAC systems, but none of the tools cover all ten types of data; (2) there is a need for data schemas to represent the basis of design data and metadata of occupant data; (3) standard terminologies such as those defined in BEDES are only adopted in a few data tools; (4) integrating data across various stages in the building life cycle remains a challenge; and (5) most data tools were developed and maintained by different parties for different purposes, their flexibility and interoperability can be improved to support broader use cases. Finally, recommendations for future research on building data tools are provided for the data and buildings community based on the FAIR principles to make data Findable, Accessible, Interoperable, and Reusable.

20 citations



Proceedings ArticleDOI
01 Jan 2021
TL;DR: In this article, an ontology-based approach is proposed to integrate vehicle-related data, consisting of semantically annotating application-specific data with a well-defined semantic model that considers its streaming-nature.
Abstract: Vehicle architectures have evolved over the past two decades to provide support for data-driven functionalities. The typical approach in this domain has been application-centric, leading to data models that are disparate, repetitive, and hardly maintainable in the long run. As a result, the software complexity increases, while the knowledge remains hidden in the applications' code. We argue that it is essential to enrich the data with standard semantic models to enable a smooth integration of heterogeneous data. In this paper, we propose an ontology-based approach to integrate vehicle-related data. It consists of semantically annotating application-specific data with a well-defined semantic model that considers its streaming-nature. Three applications that use vehicle data are implemented and annotated with the presented procedure. The resulting semantic data is validated with elaborated analytical competency questions that combine application-specific data. Such questions are satisfied with the implementation of queries that follow the patterns of the semantic model. Our work shows that ontology-based data integration is a suitable component for vehicle architectures. The use of this type of integration implies the one-time implementation of queries that are stable over time, reusability of application-specific data, and increased semantics.

4 citations


Proceedings ArticleDOI
07 Oct 2021
TL;DR: In this article, the authors presented a method to integrate heterogeneous data from various clinical sources in ontological format for adverse drug reaction (ADR) that may occur due to inhalation of a drug and requires quick treatment to reduce the risk of life.
Abstract: There is an increasing interest to transform and integrate data obtained from variety of sources in the health care domain. This transformation and integration is particularly beneficial in adverse drug reaction (ADR) that may occur due to inhalation of a drug and requires quick treatment to reduce the risk of life. It has been proved in the literature that an expert system based on prior knowledge or supervised learning is beneficial for diagnosis of ADR. A ubiquitous semantic rich knowledge data set is essential for machine learning based clinical expert systems to deal with emergency in case of ADR. This paper highlights the method to integrate heterogeneous data from various clinical sources in ontological format for ADR. Thereafter, the data in ontological format is linked to various standard medical terminologies like SNOMED CT(Systematized Nomenclature of Medicine -Clinical Terms) and MedDRA (Medical Dictionary for Regulatory Activities) in order to make it more general, standardized and semantically rich with the help of TOP SPIN (TopQuadrant SPAQL Inferencing Notation) and SHACL (Shapes Constraint Language) rules. Furthermore the ontology is mapped to domain ontology OAE (Ontology for Adverse Events) to make it more powerful knowledgebase for query retrieval and searching. The objective behind this work is to obtain an integrated ontology or RDF (Resource Description Framework) Turtle format integrated data from heterogeneous resources (XML Data, Relational Data and ASCII Text Data). Further, it ensures linking to medical standards by utilizing different capabilities and plugins of Top Braid Composer maestro edition.

3 citations


Proceedings ArticleDOI
22 Mar 2021
TL;DR: In this article, a knowledge graph-based approach is proposed to allow integration of data according to abstract semantic models for Points-of-Interests (POI)s recommendation scenarios, enriching data with information about attributes, relationships and their meaning.
Abstract: Context-aware Recommender Systems (CARS) are becoming an integral part of the everyday life by providing users the ability to retrieve relevant information based on their contextual situation. To increase the predictive power considering many parameters, such as mood, hunger level and user preferences, information from heterogeneous sources should be leveraged. However, these data sources are typically isolated and unexplored and the efforts for integrating them are exacerbated by variety of data structures used for their modelling and costly pre-processing operations. We propose a Knowledge Graph-based approach to allow integration of data according to abstract semantic models for Points-of-Interests (POI)s recommendation scenarios. By enriching data with information about attributes, relationships and their meaning, additional knowledge can be derived from what already exists. We demonstrate the applicability of the proposed approach with a concrete example showing benefits of the retrieving the dispersed data with a unified access mechanism.

3 citations


Journal ArticleDOI
TL;DR: The paper introduces a new concept – trans-forest, which is intended for modeling educational, language IT marketing, geological and other situations from the point of view of graph theory and game theory.
Abstract: The paper introduces a new concept – trans-forest. A forest is a complex of additional connected trees. The skeleton of the graph is the basic multi-hierarchy. Subjects and “forces” are sources of moves on the playing field of the trans-forest based on the rules of the generative grammars of Montague. Each move of all generative grammars is synchronized with the meta-generative grammar – the “daemon of time”. Each turn over a trans-forest can be done by moving the present marker on the time tree to the “Next” position. Particular attention is paid to the format tree and transconnections between trees and its levels, since it is the format tree that determines human behavior in certain situations. An algorithm of evolutionarily justified expansions is proposed, the purpose of which is to ensure well-being for the subject himself and other subjects. The proposed trans-forest model is intended for modeling educational, language IT marketing, geological and other situations from the point of view of graph theory and game theory.

3 citations


Journal ArticleDOI
Zhu Lilu, Su Xiaolu, Hu Yanfeng, Tai Xianqing, Fu Kun 
TL;DR: Wang et al. as mentioned in this paper proposed a spatio-temporal local association query algorithm for remote sensing data (STLAQ), which uses the method of partition-based clustering and spectral clustering to measure the correlation between spatiotemporal correlation networks.
Abstract: It is extremely important to extract valuable information and achieve efficient integration of remote sensing data. The multi-source and heterogeneous nature of remote sensing data leads to the increasing complexity of these relationships, and means that the processing mode based on data ontology cannot meet requirements any more. On the other hand, the multi-dimensional features of remote sensing data bring more difficulties in data query and analysis, especially for datasets with a lot of noise. Therefore, data quality has become the bottleneck of data value discovery, and a single batch query is not enough to support the optimal combination of global data resources. In this paper, we propose a spatio-temporal local association query algorithm for remote sensing data (STLAQ). Firstly, we design a spatio-temporal data model and a bottom-up spatio-temporal correlation network. Then, we use the method of partition-based clustering and the method of spectral clustering to measure the correlation between spatio-temporal correlation networks. Finally, we construct a spatio-temporal index to provide joint query capabilities. We carry out local association query efficiency experiments to verify the feasibility of STLAQ on multi-scale datasets. The results show that the STLAQ weakens the barriers between remote sensing data, and improves their application value effectively.

1 citations


Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a unified semantic model called OEDO (Ocean Environmental Data Ontology) to represent heterogeneous ocean data by metadata and to be published as data services.
Abstract: Massive ocean data acquired by various observing platforms and sensors poses new challenges to data management and utilization. Typically, it is difficult to find the desired data from the large amount of datasets efficiently and effectively. Most of existing methods for data discovery are based on the keyword retrieval or direct semantic reasoning, and they are either limited in data access rate or do not take the time cost into account. In this paper, we creatively design and implement a novel system to alleviate the problem by introducing semantics with ontologies, which is referred to as Data Ontology and List-Based Publishing (DOLP). Specifically, we mainly improve the ocean data services in the following three aspects. First, we propose a unified semantic model called OEDO (Ocean Environmental Data Ontology) to represent heterogeneous ocean data by metadata and to be published as data services. Second, we propose an optimized quick service query list (QSQL) data structure for storing the pre-inferred semantically related services, and reducing the service querying time. Third, we propose two algorithms for optimizing QSQL hierarchically and horizontally, respectively, which aim to extend the semantics relationships of the data service and improve the data access rate. Experimental results prove that DOLP outperforms the benchmark methods. First, our QSQL-based data discovery methods obtain a higher recall rate than the keyword-based method, and are faster than the traditional semantic method based on direct reasoning. Second, DOLP can handle more complex semantic relationships than the existing methods.

1 citations


Patent
02 Apr 2021
TL;DR: In this article, a data model comparison method based on an uncertainty support vector machine (SVM) is proposed, which comprises the steps of building a high-dimensional data model, introducing uncertainty parameters into the training of a classification model of the SVM, improving the recognition and comparison efficiency and accuracy of an ontology data model and finally achieving the high-efficiency data integration.
Abstract: The invention discloses a data model comparison method based on an uncertainty support vector machine, and the method comprises the steps: building a high-dimensional data model, introducing uncertainty parameters into the training of a classification model of the support vector machine, improving the recognition and comparison efficiency and accuracy of an ontology data model, and finally achieving the high-efficiency data integration. According to the method, a support vector machine learning method is adopted, and rapid classification of data ontology models is carried out. By introducing uncertainty parameters, normal operation of the noise interference classification model caused by management differences and uncertain factors among different departments and different responsibility subjects of the data model is avoided. According to the method, linear classification and nonlinear classification based on the kernel function are considered at the same time, and the method has highadaptability to different application scenes.

Book ChapterDOI
26 Jul 2021
TL;DR: In this paper, the authors propose the approach DIKG2 to structure semi-structured data from Web forms and stored in a RDB and to facilitate the dynamic data integration in the KGs.
Abstract: Over the last two decades, semantic-based data integration has become one of the major data challenges. The use of semantic Web standards and Linked Open Data (LOD), especially knowledge graphs (KGs), provides support to address issues related to data access and integration. In this paper, we propose the approach DIKG2 to structure semi-structured data from Web forms and stored in a RDB and to facilitate the dynamic data integration in the KGs. By exploiting a common vocabulary between the Web forms and the entities of our domain ontology, OAFE, we describe the mapping process performed between RDB schema and the ontology’s conceptual schema.