scispace - formally typeset
Search or ask a question

Showing papers on "Semantic Web published in 2017"


Journal ArticleDOI
TL;DR: Results show that semantic web technologies have a key role to play in logic-based applications and applications that require information from multiple application areas and devising beneficial implementation approaches that rely on appropriate combinations of declarative and procedural programming techniques, semantic and legacy data formats, user input, and automated procedures.

282 citations


Journal ArticleDOI
TL;DR: This survey analyzes 62 different SQA systems, which are systematically and manually selected using predefined inclusion and exclusion criteria, leading to 72 selected publications out of 1960 candidates, and identifies common challenges, structure solutions, and provide recommendations for future systems.
Abstract: Semantic Question Answering (SQA) removes two major access requirements to the Semantic Web: the mastery of a formal query language like SPARQL and knowledge of a specific vocabulary. Because of the complexity of natural language, SQA presents difficult challenges and many research opportunities. Instead of a shared effort, however, many essential components are redeveloped, which is an inefficient use of researcher’s time and resources. This survey analyzes 62 different SQA systems, which are systematically and manually selected using predefined inclusion and exclusion criteria, leading to 72 selected publications out of 1960 candidates. We identify common challenges, structure solutions, and provide recommendations for future systems. This work is based on publications from the end of 2010 to July 2015 and is also compared to older but similar surveys.

205 citations


Journal ArticleDOI
TL;DR: Details about FRED’s capabilities, design issues, implementation and evaluation are provided, which make the tool suitable to be used as a semantic middleware for domainor task-specific applications.
Abstract: A machine reader is a tool able to transform natural language text to formal structured knowledge so as the latter can be interpreted by machines, according to a shared semantics. FRED is a machine reader for the semantic web: its output is a RDF/OWL graph, whose design is based on frame semantics. Nevertheless, FRED’s graph are domain and task independent making the tool suitable to be used as a semantic middleware for domainor task-specific applications. To serve this purpose, it is available both as REST service and as Python library. This paper provides details about FRED’s capabilities, design issues, implementation and evaluation.

164 citations


Journal ArticleDOI
TL;DR: An IoT based Semantic Interoperability Model (IoT-SIM) is proposed to provide Semantic interoperability among heterogeneous IoT devices in healthcare domain to provide annotations for data.
Abstract: Interoperability remains a significant burden to the developers of Internet of Things’ Systems. This is due to the fact that the IoT devices are highly heterogeneous in terms of underlying communication protocols, data formats, and technologies. Secondly due to lack of worldwide acceptable standards, interoperability tools remain limited. In this paper, we proposed an IoT based Semantic Interoperability Model (IoT-SIM) to provide Semantic Interoperability among heterogeneous IoT devices in healthcare domain. Physicians communicate their patients with heterogeneous IoT devices to monitor their current health status. Information between physician and patient is semantically annotated and communicated in a meaningful way. A lightweight model for semantic annotation of data using heterogeneous devices in IoT is proposed to provide annotations for data. Resource Description Framework (RDF) is a semantic web framework that is used to relate things using triples to make it semantically meaningful. RDF annotated patients’ data has made it semantically interoperable. SPARQL query is used to extract records from RDF graph. For simulation of system, we used Tableau, Gruff-6.2.0, and Mysql tools.

140 citations


Journal ArticleDOI
TL;DR: Various types of heterogeneity with emphasis to a semantic heterogeneity are described and the implementation of the semantic heterogeneity reduction with the focus on using Semantic Web technologies for a data integration is introduced.
Abstract: The current gradual adoption of the Industry 4.0 is the research trend that includes more intensive utilization of cyber-physical systems (CPSs). The computerization of manufacturing will bring many advantages but it is needed to face the heterogeneity problem during an integration of various CPSs for enabling this progress. In this paper, we describe various types of heterogeneity with emphasis to a semantic heterogeneity. The CPSs integration problem is classified into two different challenges. Next, we introduce the approach and the implementation of the semantic heterogeneity reduction with the focus on using Semantic Web technologies for a data integration. Then, the Big Data approach is described for facilitating the implementation. Finally, the possible solution is demonstrated on our proposed semantic Big Data historian.

128 citations


Journal ArticleDOI
TL;DR: This work develops a novel method for feature learning on biological knowledge graphs that combines symbolic methods, in particular knowledge representation using symbolic logic and automated reasoning, with neural networks to generate embeddings of nodes that encode for related information within knowledge graphs.
Abstract: Motivation Biological data and knowledge bases increasingly rely on Semantic Web technologies and the use of knowledge graphs for data integration, retrieval and federated queries. In the past years, feature learning methods that are applicable to graph-structured data are becoming available, but have not yet widely been applied and evaluated on structured biological knowledge. Results: We develop a novel method for feature learning on biological knowledge graphs. Our method combines symbolic methods, in particular knowledge representation using symbolic logic and automated reasoning, with neural networks to generate embeddings of nodes that encode for related information within knowledge graphs. Through the use of symbolic logic, these embeddings contain both explicit and implicit information. We apply these embeddings to the prediction of edges in the knowledge graph representing problems of function prediction, finding candidate genes of diseases, protein-protein interactions, or drug target relations, and demonstrate performance that matches and sometimes outperforms traditional approaches based on manually crafted features. Our method can be applied to any biological knowledge graph, and will thereby open up the increasing amount of Semantic Web based knowledge bases in biology to use in machine learning and data analytics. Availability and implementation https://github.com/bio-ontology-research-group/walking-rdf-and-owl. Contact robert.hoehndorf@kaust.edu.sa. Supplementary information Supplementary data are available at Bioinformatics online.

124 citations


Journal ArticleDOI
TL;DR: This model is used to recommend medicine with side effects for different symptoms collected from heterogeneous IoT sensors and made it semantically interoperable, to deliver semantic interoperability among heterogeneity IoT devices in health care domain.

117 citations


Book ChapterDOI
28 May 2017
TL;DR: The past years have seen a growing amount of research on question answering (QA) over Semantic Web data, shaping an interaction paradigm that allows end users to profit from the expressive power ofSemantic Web standards while, at the same time, hiding their complexity behind an intuitive and easy-to-use interface.
Abstract: The past years have seen a growing amount of research on question answering (QA) over Semantic Web data, shaping an interaction paradigm that allows end users to profit from the expressive power of Semantic Web standards while, at the same time, hiding their complexity behind an intuitive and easy-to-use interface. On the other hand, the growing amount of data has led to a heterogeneous data landscape where QA systems struggle to keep up with the volume, variety and veracity of the underlying knowledge.

116 citations


Journal ArticleDOI
TL;DR: Harmonizing definitions of concepts, as proposed by TOP, forms the basis for better integration of data across heterogeneous data sets and terminologies, thereby increasing the potential for data reuse and enhanced scientific synthesis.
Abstract: Ecological research produces a tremendous amount of data, but the diversity in scales and topics covered and the ways in which studies are carried out result in large numbers of small, idiosyncratic data sets using heterogeneous terminologies. Such heterogeneity can be attributed, in part, to a lack of standards for acquiring, organizing and describing data. Here, we propose a terminological resource, a Thesaurus Of Plant characteristics (TOP), whose aim is to harmonize and formalize concepts for plant characteristics widely used in ecology. TOP concentrates on two types of plant characteristics: traits and environmental associations. It builds on previous initiatives for several aspects: (i) characteristics are designed following the entity-quality (EQ) model (a characteristic is modelled as the ‘Quality’ of an ‘Entity’ ) used in the context of Open Biological Ontologies; (ii) whenever possible, the Entities and Qualities are taken from existing terminology standards, mainly the Plant Ontology (PO) and Phenotypic Quality Ontology (PATO) ontologies; and (iii) whenever a characteristic already has a definition, if appropriate, it is reused and referenced. The development of TOP, which complies with semantic web principles, was carried out through the involvement of experts from both the ecology and the semantics research communities. Regular updates of TOP are planned, based on community feedback and involvement. TOP provides names, definitions, units, synonyms and related terms for about 850 plant characteristics. TOP is available online (www.top-thesaurus.org), and can be browsed using an alphabetical list of characteristics, a hierarchical tree of characteristics, a faceted and a free-text search, and through an Application Programming Interface. Synthesis. Harmonizing definitions of concepts, as proposed by TOP, forms the basis for better integration of data across heterogeneous data sets and terminologies, thereby increasing the potential for data reuse. It also allows enhanced scientific synthesis. TOP therefore has the potential to improve research and communication not only within the field of ecology, but also in related fields with interest in plant functioning and distribution.

104 citations


Journal ArticleDOI
TL;DR: This paper proposes and implements framework for smart e-learning ecosystem using ontology and SWRL and fosters the creation of a separate four ontologies for the personalized full learning package which is composed of learner model and all the learning process components.

103 citations


Journal ArticleDOI
TL;DR: This work proposes a novel approach to computing semantic distance, based on network science methodology, and demonstrates how this approach addresses key issues in cognitive theory, namely the breadth of the spreading activation process and the effect of semantic distance on memory retrieval.
Abstract: Semantic distance is a determining factor in cognitive processes, such as semantic priming, operating upon semantic memory. The main computational approach to compute semantic distance is through latent semantic analysis (LSA). However, objections have been raised against this approach, mainly in its failure at predicting semantic priming. We propose a novel approach to computing semantic distance, based on network science methodology. Path length in a semantic network represents the amount of steps needed to traverse from 1 word in the network to the other. We examine whether path length can be used as a measure of semantic distance, by investigating how path length affect performance in a semantic relatedness judgment task and recall from memory. Our results show a differential effect on performance: Up to 4 steps separating between word-pairs, participants exhibit an increase in reaction time (RT) and decrease in the percentage of word-pairs judged as related. From 4 steps onward, participants exhibit a significant decrease in RT and the word-pairs are dominantly judged as unrelated. Furthermore, we show that as path length between word-pairs increases, success in free- and cued-recall decreases. Finally, we demonstrate how our measure outperforms computational methods measuring semantic distance (LSA and positive pointwise mutual information) in predicting participants RT and subjective judgments of semantic strength. Thus, we provide a computational alternative to computing semantic distance. Furthermore, this approach addresses key issues in cognitive theory, namely the breadth of the spreading activation process and the effect of semantic distance on memory retrieval. (PsycINFO Database Record

Journal ArticleDOI
TL;DR: This research demonstrates how is possible to represent a huge amount of specialized information models with appropriate LOD and Grade in BIM environment and then guarantee a complete interoperability with IFC/RDF format.

01 Jan 2017
TL;DR: The development and future outlooks for the OntoLex-Lemon model are described as well as briefly review some of its current applications and two use cases are looked at: inresenting dictionaries and in the WordNet Col-laborative Interlingual Index.
Abstract: The lemon model has become the primary mechanism for the representation of lexical data on the Semantic Web. The lemon model has been further developed in the context of the W3C OntoLex community group, resulting in the new OntoLex-Lemon model, recently published as a W3C report. In this paper, we describe the development and future outlooks for this model as well as briefly review some of its current applications. The recent evolution of lemon into OntoLex-Lemon, in the context of the community group, has led to improvements on the model that further extends its application domain from formal applications such as question answering and semantic parsing to the representation of general machine-readable dictionaries, including WordNet and digitized versions of existing dictionaries. We look at two use cases of the OntoLex-Lemon model: in r epresenting dictionaries and in the WordNet Col-laborative Interlingual Index. Finally, we consider the future of the OntoLex-Lemon model, which we intend to continue to develop and have recently identified areas that increase the applicability and value of the model to more users.

Journal ArticleDOI
TL;DR: This paper examines, review and present the current IoT technologies starting from the physical layer to the application and data layer, and focuses on future IoT key enabling technologies like the new fifth generation (5G) networks and Semantic Web.
Abstract: The Internet of Things (IoT) is the communications paradigm that can provide the potential of ultimate communication. The IoT paradigm describes communication not only human to human (H2H) but also machine to machine (M2M) without the need of human interference. In this paper, we examine, review and present the current IoT technologies starting from the physical layer to the application and data layer. Additionally, we focus on future IoT key enabling technologies like the new fifth generation (5G) networks and Semantic Web. Finally, we present main IoT application domains like smart cities, transportation, logistics, and healthcare.

Journal ArticleDOI
01 Apr 2017
TL;DR: This paper discusses the work that has been done in the area for the case of RDF/S datasets, with emphasis on session-based interaction schemes for exploratory search, and introduces a small but concise formal model of the interaction that captures the core functionalities.
Abstract: The amounts of available Semantic Web (SW) data (including Linked Open Data) constantly increases. Users would like to browse and explore effectively such information spaces without having to be acquainted with the various vocabularies and query language syntaxes. This paper discusses the work that has been done in the area for the case of RDF/S datasets, with emphasis on session-based interaction schemes for exploratory search. In particular, it surveys the related works according to various aspects, such as assumed user goals, structuring of the underlying information space, generality and configuration requirements, and various (state space-based) features of the navigation structure. Subsequently it introduces a small but concise formal model of the interaction (that captures the core functionalities) which is used as reference model for describing what the existing systems support. Finally the paper describes the evaluation methods that have been used. Overall, the presented analysis aids the understanding and comparison of the various different approaches that have been proposed so far.

Book ChapterDOI
21 Oct 2017
TL;DR: WIDOCO is a WIzard for DOCumenting Ontologies that guides users through the documentation process of their vocabularies and creates a documentation with diagrams, human readable descriptions of the ontology terms and a summary of changes with respect to previous versions of theOntology.
Abstract: In this paper we describe WIDOCO, a WIzard for DOCumenting Ontologies that guides users through the documentation process of their vocabularies. Given an RDF vocabulary, WIDOCO detects missing vocabulary metadata and creates a documentation with diagrams, human readable descriptions of the ontology terms and a summary of changes with respect to previous versions of the ontology. The documentation consists on a set of linked enriched HTML pages that can be further extended by end users. WIDOCO is open source and builds on well established Semantic Web tools. So far, WIDOCO has been used to document more than one hundred ontologies in different domains.

Journal ArticleDOI
TL;DR: The Optique platform is introduced as a suitable OBDA solution for Siemens with a number of novel techniques and components including a deployment module, BootOX for ontology and mapping bootstrapping, a query language STARQL that allows for a uniform querying of both streaming and static data, and a query formulation interface, OptiqueVQS, that allows to formulate STARQL queries without prior knowledge of its formal syntax.

Journal ArticleDOI
TL;DR: This paper presents a semantic knowledge management service and domain ontology which support a novel cloud-edge solution by unifying domestic socio-technical water systems with clean and waste networks at an urban scale, to deliver value-added services for consumers and network operators.

Journal ArticleDOI
Simon Mayer1, Jack Hodges1, Dan Yu1, Mareike Kritzler1, Florian Michahelles1 
TL;DR: The authors show how the Open Semantic Framework can increase worker safety in industrial settings by mitigating workplace safety hazards through the automatic insertion of safety-relevant actions directly in the workflow, based on a model of applicable workplace safety law and regulation.
Abstract: This article introduces the Open Semantic Framework (OSF), a one-stop shop for creating and deploying semantic applications as well as their lifecycle management. The authors discuss how the OSF supports knowledge acquisition into semantic knowledge models via novel interface technologies, manages these models internally within core ontologies and pluggable knowledge packs, and provides moderated access to them via a REST API and prefabricated SPARQL queries. As an example of OSF's usage in a practical scenario, the authors show how it can increase worker safety in industrial settings by mitigating workplace safety hazards through the automatic insertion of safety-relevant actions directly in the workflow, based on a model of applicable workplace safety law and regulation. By facilitating the creation, integration, and provisioning of knowledge, the OSF represents an important step on the way to integrating knowledge models worldwide, enabling semantic applications globally to "stand on the shoulders of giants."

Proceedings ArticleDOI
04 Jul 2017
TL;DR: A simple Building Topology Ontology (BOT) only covering the core concepts of a building is proposed, and three methods for extending this with domain specific ontologies are proposed.
Abstract: In the last years, several ontologies focused on structuring domain specific information within the scope of Architecture, Engineering and Construction (AEC) have emerged. Several of these individual ontologies redefine core concepts of a building already specified in the publicly available ontology version of the ISO standardised Industry Foundation Classes (IFC) schema, thereby violating the W3C best practice rule of minimum redundancy. The voluminous IFC schema with origins in a closed world assumption is likewise violating this rule by redefining concepts about time, location, units etc. already available from other sources, and it is furthermore violating the rule of keeping ontologies simple for easy maintenance. Based on all the available ontologies, we propose a simple Building Topology Ontology (BOT) only covering the core concepts of a building, and three methods for extending this with domain specific ontologies. This approach makes it (1) possible to work with a limited set of core building classes, and (2) extend those as needed towards specific domain ontologies that are in hands of business professionals or domain-specific standardisation bodies, such as the European Telecommunications Standards Institute (ETSI), buildingSMART, the Open Geospatial Consortium (OGC), and so forth.

Journal ArticleDOI
01 Jan 2017-Database
TL;DR: The Consortium of European Taxonomic Facilities agreed on a common system of HTTP-URI-based stable identifiers which is now rolled out to its member organizations, facilitating seamless integration into the growing semantic web.
Abstract: With biodiversity research activities being increasingly shifted to the web, the need for a system of persistent and stable identifiers for physical collection objects becomes increasingly pressing. The Consortium of European Taxonomic Facilities agreed on a common system of HTTP-URI-based stable identifiers which is now rolled out to its member organizations. The system follows Linked Open Data principles and implements redirection mechanisms to human-readable and machine-readable representations of specimens facilitating seamless integration into the growing semantic web. The implementation of stable identifiers across collection organizations is supported with open source provider software scripts, best practices documentations and recommendations for RDF metadata elements facilitating harmonized access to collection information in web portals. Database URL : http://cetaf.org/cetaf-stable-identifiers.

Book
29 Sep 2017
TL;DR: This book introduces data validation and describes its practical use in day-to-day data exchange, using Web addresses as identifiers for data elements enables the construction of distributed databases on a global scale.
Abstract: RDF and Linked Data have broad applicability across many fields, from aircraft manufacturing to zoology. Requirements for detecting bad data differ across communities, fields, and tasks, but nearly all involve some form of data validation. This book introduces data validation and describes its practical use in day-to-day data exchange. The Semantic Web offers a bold, new take on how to organize, distribute, index, and share data. Using Web addresses (URIs) as identifiers for data elements enables the construction of distributed databases on a global scale. Like the Web, the Semantic Web is heralded as an information revolution, and also like the Web, it is encumbered by data quality issues. The quality of Semantic Web data is compromised by the lack of resources for data curation, for maintenance, and for developing globally applicable data models. At the enterprise scale, these problems have conventional solutions. Master data management provides an enterprise-wide vocabulary, while constraint l...

Journal ArticleDOI
TL;DR: Fujitsu Laboratories Limited’s CONICYT/FONDECYT Project and the Millennium Nucleus Center for Semantic Web Research aim to provide real-time information about the “building blocks of knowledge” for semantic web research.
Abstract: Fujitsu Laboratories Limited CONICYT/FONDECYT Project 3130617 FONDECYT Project 11140900 DGIP Project 116.24.1 Millennium Nucleus Center for Semantic Web Research NC120004

Journal ArticleDOI
TL;DR: A Compact Interactive Memetic Algorithm (CIMA) based collaborative ontology matching technology, which can reduce users’ workload by adaptively determining the time of getting users involved, presenting the most problematic correspondences for users and helping users to automatic validate multiple conflict mappings, and increase user involvement’s value.
Abstract: Ontology is the kernel technology of semantic web, which plays a prominent role for achieving inter-operability across heterogeneous systems and applications by formally describing the semantics of data that characterize a particular application domain However, different ontology engineers might have potentially opposing world views which could yield the different descriptions on the same ontology entity, raising so called ontology heterogeneous problem Ontology matching, which aims at identifying the correspondences between the entities of heterogeneous ontologies, is recognized as an effective technology to solve the ontology heterogeneous problem Due to the complexity of ontology matching process, ontology alignments generated by the automatic ontology matchers should be validated by the users to ensure their qualities, and the technology that makes multiple users collaborate with each other to help the automatic tool create high quality matchings in a reasonable amount of time is called collaborative ontology matching Such a collaborative ontology matching poses a new challenge of how to reduce users’ workload, but at the same time, increase their involvement’s value To address this challenge, in this paper, we propose a Compact Interactive Memetic Algorithm (CIMA) based collaborative ontology matching technology, which can reduce users’ workload by adaptively determining the time of getting users involved, presenting the most problematic correspondences for users and helping users to automatic validate multiple conflict mappings, and increase user involvement’s value by propagating the collaborative validation and decreasing the negative effect brought by the error user validations The experimental results show that our proposal is able to efficiently exploit the collaborative validation to improve its non-interactive version, and the runtime and alignment quality of our approach both outperform state-of-the-art interactive ontology matching systems under different user error rate cases

Journal ArticleDOI
TL;DR: The proposed approach extends the existing framework of representing temporal information in ontologies by allowing for representation of concepts evolving in time and of their properties in terms of qualitative descriptions in addition to quantitative ones, as well as integrating temporal reasoning support into the proposed representation.
Abstract: The representation of temporal information has been in the center of intensive research activities over the years in the areas of knowledge representation, databases and more recently, the Semantic Web. The proposed approach extends the existing framework of representing temporal information in ontologies by allowing for representation of concepts evolving in time (referred to as “dynamic” information) and of their properties in terms of qualitative descriptions in addition to quantitative ones (i.e., dates, time instants and intervals). For this purpose, we advocate the use of natural language expressions, such as “before” or “after”, for temporal entities whose exact durations or starting and ending points in time are unknown. Reasoning over all types of temporal information (such as the above) is also an important research problem. The current work addresses all these issues as follows: The representation of dynamic concepts is achieved using the “4D-fluents” or, alternatively, the “N-ary relations” mechanism. Both mechanisms are thoroughly explored and are expanded for representing qualitative and quantitative temporal information in OWL. In turn, temporal information is expressed using either intervals or time instants. Qualitative temporal information representation in particular, is realized using sets of SWRL rules and OWL axioms leading to a sound, complete and tractable reasoning procedure based on path consistency applied on the existing relation sets. Building upon existing Semantic Web standards (OWL), tools and member submissions (SWRL), as well as integrating temporal reasoning support into the proposed representation, are important design features of our approach.

Journal ArticleDOI
TL;DR: In this paper, two topic modeling algorithms are explored, namely LSI and SVD and Mr.LDA for learning topic ontology and the objective is to determine the statistical relationship between document and terms to build a topic ontologies and ontology graph with minimum human intervention.

Book ChapterDOI
21 Oct 2017
TL;DR: CodeOntology allows to generate Linked Data from any Java project, thereby enabling the execution of highly expressive queries over source code, by means of a powerful language like SPARQL.
Abstract: In this paper, we leverage advances in the Semantic Web area, including data modeling (RDF), data management and querying (JENA and SPARQL), to develop CodeOntology, a community-shared software framework supporting expressive queries over source code. The project consists of two main contributions: an ontology that provides a formal representation of object-oriented programming languages, and a parser that is able to analyze Java source code and serialize it into RDF triples. The parser has been successfully applied to the source code of OpenJDK 8, gathering a structured dataset consisting of more than 2 million RDF triples. CodeOntology allows to generate Linked Data from any Java project, thereby enabling the execution of highly expressive queries over source code, by means of a powerful language like SPARQL.

Journal ArticleDOI
TL;DR: A comprehensive data model for smart cities that integrates several data sources, including, geo-referenced data, public transportation, urban fault reporting, road maintenance and municipal waste collection is presented.

01 Jan 2017
TL;DR: A survey reflecting on OBDI applications in the context of Multi-Disciplinary Engineering Environment (MDEE), which analyzes and compares 23 OBDi applications from both the Semantic Web and the Automation System Engineering research communities and provides recommendation guidelines for the selection of O BDI variants and technologies.
Abstract: Today's industrial production plants are complex mechatronic systems. In the course of the production plant lifecycle, engineers from a variety of disciplines (e.g., mechanics, electronics, automation) need to collaborate in multi-disciplinary settings that are characterized by heterogeneity in terminology, methods, and tools. This collaboration yields a variety of engineering artifacts that need to be linked and integrated, which on the technical level is reflected in the need to integrate heterogeneous data. Semantic Web technologies, in particular ontologybased data integration (OBDI), are promising to tackle this challenge that has attracted strong interest from the engineering research community. This interest has resulted in a growing body of literature that is dispersed across the Semantic Web and Automation System Engineering research communities and has not been systematically reviewed so far. We address this gap with a survey reflecting on OBDI applications in the context of Multi-Disciplinary Engineering Environment (MDEE). To this end, we analyze and compare 23 OBDI applications from both the Semantic Web and the Automation System Engineering research communities. Based on this analysis, we (i) categorize OBDI variants used in MDEE, (ii) identify key problem context characteristics, (iii) compare strengths and limitations of OBDI variants as a function of problem context, and (iv) provide recommendation guidelines for the selection of OBDI variants and technologies for OBDI in MDEE.

Book ChapterDOI
21 Oct 2017
TL;DR: WebIsALOD is introduced, a Linked Open Data release of the IsA database, containing 400M hypernymy relations, each provided with rich provenance information, and runs a machine learning algorithm to assign confidence scores to the individual statements.
Abstract: Hypernymy relations are an important asset in many applications, and a central ingredient to Semantic Web ontologies. The IsA database is a large collection of such hypernymy relations extracted from the Common Crawl. In this paper, we introduce WebIsALOD, a Linked Open Data release of the IsA database, containing 400M hypernymy relations, each provided with rich provenance information. As the original dataset contained more than 80% wrong, noisy extractions, we run a machine learning algorithm to assign confidence scores to the individual statements. Furthermore, 2.5M links to DBpedia and 23.7k links to the YAGO class hierarchy were created at a precision of 97%. In total, the dataset contains 5.4B triples.