scispace - formally typeset
Search or ask a question

Showing papers by "Christian Bizer published in 2009"


Journal ArticleDOI
TL;DR: The authors describe progress to date in publishing Linked Data on the Web, review applications that have been developed to exploit the Web of Data, and map out a research agenda for the Linked data community as it moves forward.
Abstract: The term “Linked Data” refers to a set of best practices for publishing and connecting structured data on the Web. These best practices have been adopted by an increasing number of data providers over the last three years, leading to the creation of a global data space containing billions of assertions— the Web of Data. In this article, the authors present the concept and technical principles of Linked Data, and situate these within the broader context of related technological developments. They describe progress to date in publishing Linked Data on the Web, review applications that have been developed to exploit the Web of Data, and map out a research agenda for the Linked Data community as it moves forward.

5,113 citations


Journal ArticleDOI
TL;DR: The extraction of the DBpedia knowledge base is described, the current status of interlinking DBpedia with other data sources on the Web is discussed, and an overview of applications that facilitate the Web of Data around DBpedia is given.

2,224 citations


Journal ArticleDOI
TL;DR: The Berlin SPARQL Benchmark (BSBM) as mentioned in this paper is built around an e-commerce use case in which a set of products is offered by different vendors and consumers have posted reviews about products.
Abstract: The SPARQL Query Language for RDF and the SPARQL Protocol for RDF are implemented by a growing number of storage systems and are used within enterprise and open Web settings. As SPARQL is taken up by the community, there is a growing need for benchmarks to compare the performance of storage systems that expose SPARQL endpoints via the SPARQL protocol. Such systems include native RDF stores as well as systems that rewrite SPARQL queries to SQL queries against non-RDF relational databases. This article introduces the Berlin SPARQL Benchmark (BSBM) for comparing the performance of native RDF stores with the performance of SPARQL-to-SQL rewriters across architectures. The benchmark is built around an e-commerce use case in which a set of products is offered by different vendors and consumers have posted reviews about products. The benchmark query mix emulates the search and navigation pattern of a consumer looking for a product. The article discusses the design of the BSBM benchmark and presents the results of a benchmark experiment comparing the performance of four popular RDF stores (Sesame, Virtuoso, Jena TDB, and Jena SDB) with the performance of two SPARQL-to-SQL rewriters (D2R Server and Virtuoso RDF Views) as well as the performance of two relational database management systems (MySQL and Virtuoso RDBMS).

634 citations


01 Apr 2009
TL;DR: The Silk - Link Discovery Framework is presented, a tool for finding relationships between entities within different data sources and features a declarative language for specifying which types of RDF links should be discovered between data sources as well as which conditions entities must fulfill in order to be interlinked.
Abstract: Web of Data is built upon two simple ideas: Employ the RDF data model to publish structured data on the Web and to set explicit RDF links between entities within different data sources. This paper presents the Silk - Link Discovery Framework, a tool for finding relationships between entities within different data sources. Data publishers can use Silk to set RDF links from their data sources to other data sources on the Web. Silk features a declarative language for specifying which types of RDF links should be discovered between data sources as well as which conditions entities must fulfill in order to be interlinked. Link conditions may be based on various similarity metrics and can take the graph around entities into account, which is addressed using a path-based selector language. Silk accesses data sources over the SPARQL protocol and can thus be used without having to replicate datasets locally.

464 citations


Book ChapterDOI
06 Nov 2009
TL;DR: An approach to execute SPARQL queries over the Web of Linked Data using an iterator-based pipeline to discover data that might be relevant for answering a query during the query execution itself and an extension of the iterator paradigm is proposed.
Abstract: The Web of Linked Data forms a single, globally distributed dataspace. Due to the openness of this dataspace, it is not possible to know in advance all data sources that might be relevant for query answering. This openness poses a new challenge that is not addressed by traditional research on federated query processing. In this paper we present an approach to execute SPARQL queries over the Web of Linked Data. The main idea of our approach is to discover data that might be relevant for answering a query during the query execution itself. This discovery is driven by following RDF links between data sources based on URIs in the query and in partial results. The URIs are resolved over the HTTP protocol into RDF data which is continuously added to the queried dataset. This paper describes concepts and algorithms to implement our approach using an iterator-based pipeline. We introduce a formalization of the pipelining approach and show that classical iterators may cause blocking due to the latency of HTTP requests. To avoid blocking, we propose an extension of the iterator paradigm. The evaluation of our approach shows its strengths as well as the still existing challenges.

386 citations


Book ChapterDOI
06 Nov 2009
TL;DR: The Silk --- Linking Framework is presented, a toolkit for discovering and maintaining data links between Web data sources and allows data sources to exchange both linksets as well as detailed change information and enables continuous link recomputation.
Abstract: The Web of Data is built upon two simple ideas: Employ the RDF data model to publish structured data on the Web and to create explicit data links between entities within different data sources. This paper presents the Silk --- Linking Framework, a toolkit for discovering and maintaining data links between Web data sources. Silk consists of three components: 1. A link discovery engine, which computes links between data sources based on a declarative specification of the conditions that entities must fulfill in order to be interlinked; 2. A tool for evaluating the generated data links in order to fine-tune the linking specification; 3. A protocol for maintaining data links between continuously changing data sources. The protocol allows data sources to exchange both linksets as well as detailed change information and enables continuous link recomputation. The interplay of all the components is demonstrated within a life science use case.

353 citations


Book ChapterDOI
31 May 2009
TL;DR: Current projects, ongoing development, and further research are described in a joint collaboration between the BBC, Freie Universitat Berlin and Rattle Research in order to use DBpedia as the controlled vocabulary and semantic backbone for the whole BBC.
Abstract: In this paper, we describe how the BBC is working to integrate data and linking documents across BBC domains by using Semantic Web technology, in particular Linked Data, MusicBrainz and DBpedia. We cover the work of BBC Programmes and BBC Music building Linked Data sites for all music and programmes related brands, and we describe existing projects, ongoing development, and further research we are doing in a joint collaboration between the BBC, Freie Universitat Berlin and Rattle Research in order to use DBpedia as the controlled vocabulary and semantic backbone for the whole BBC.

297 citations


Journal ArticleDOI
TL;DR: The paper concludes by stating that the Web has succeeded as a single global information space that has dramatically changed the way the authors use information, disrupted business models, and led to profound societal change.
Abstract: The paper discusses the semantic Web and Linked Data. The classic World Wide Web is built upon the idea of setting hyperlinks between Web documents. These hyperlinks are the basis for navigating and crawling the Web.Technologically, the core idea of Linked Data is to use HTTP URLs not only to identify Web documents, but also to identify arbitrary real world entities.Data about these entities is represented using the Resource Description Framework (RDF). Whenever a Web client resolves one of these URLs, the corresponding Web server provides an RDF/ XML or RDFa description of the identified entity. These descriptions can contain links to entities described by other data sources.The Web of Linked Data can be seen as an additional layer that is tightly interwoven with the classic document Web. The author mentions the application of Linked Data in media, publications, life sciences, geographic data, user-generated content, and cross-domain data sources. The paper concludes by stating that the Web has succeeded as a single global information space that has dramatically changed the way we use information, disrupted business models, and led to profound societal change.

293 citations


Journal ArticleDOI
TL;DR: The WIQA-Information Quality Assessment Framework enables information consumers to apply a wide range of policies to filter information, and generates explanations of why information satisfies a specific policy.

228 citations


Journal ArticleDOI
TL;DR: DBpedia Mobile, a location-aware Semantic Web client that can be used on an iPhone and other mobile devices, is described and it is described how published content is interlinked with a nearby DBpedia resource and thus contributes to the overall richness of the GeospatialSemantic Web.

114 citations


01 Jan 2009
TL;DR: The applicability and potential benefits of using Linked Data to connect drug and clinical trials related data sources are examined and an overview of ongoing work within the W3C's Semantic Web for Health Care and Life Sciences Interest Group on publishing drug related data sets on the Web and interlinking them with existing Linked data sources is given.
Abstract: in the biological sciences are allowing pharmaceutical companies to meet the health care crisis with drugs that are more suitable for preventive and tailored treatment, thereby holding the promise of enabling more cost effective care with greater efficacy and reduced side effects. However, this shift in business model increases the need for companies to integrate data across drug discovery, drug development, and clinical practice. This is a fundamental shift from the approach of limiting integration activities to functional areas. The Linked Data approach holds much potential for enabling such connectivity between data silos, thereby enabling pharmaceutical companies to meet the urgent needs in society for more tailored health care. This paper examines the applicability and potential benefits of using Linked Data to connect drug and clinical trials related data sources and gives an overview of ongoing work within the W3C's Semantic Web for Health Care and Life Sciences Interest Group on publishing drug related data sets on the Web and interlinking them with existing Linked Data sources. A use case is provided that demonstrates the immediate benefit of this work in enabling data to be browsed from disease, to clinical trials, drugs, targets and companies.

Journal ArticleDOI
TL;DR: DBpedia as discussed by the authors is a community effort to extract structured information from Wikipedia and to make this information accessible on the Web, and the resulting DBpedia knowledge base currently describes over 2.6 million entities.
Abstract: The DBpedia project is a community effort to extract structured information from Wikipedia and to make this information accessible on the Web. The resulting DBpedia knowledge base currently describes over 2.6 million entities. For each of these entities, DBpedia defines a globally unique identifier that can be dereferenced over the Web into a rich RDF description of the entity, including human-readable definitions in 30 languages, relationships to other resources, classifications in four concept hierarchies, various facts as well as data-level links to other Web data sources describing the entity. Over the last year, an increasing number of data publishers have begun to set data-level links to DBpedia resources, making DBpedia a central interlinking hub for the emerging Web of Data. Currently, the Web of interlinked data sources around DBpedia provides approximately 4.7 billion pieces of information and covers domains such as geographic information, people, companies, films, music, genes, drugs, books, and scientific publications. This article describes the extraction of the DBpedia knowledge base, the current status of interlinking DBpedia with other data sources on the Web, and gives an overview of applications that facilitate the Web of Data around DBpedia.

01 Jan 2009
TL;DR: A software framework for fusing RDF datasets based on different conflict resolution strategies is presented and the framework to fuse infobox data that has been extracted from the English, German, Italian and French editions of Wikipedia is applied.
Abstract: There are currently Wikipedia editions in 264 different languages. Each of these editions contains infoboxes that provide structured data about the topic of the article in which an infobox is contained. The content of infoboxes about the same topic in different Wikipedia editions varies in completeness, coverage and quality. This paper examines the hypothesis that by extracting infobox data from multiple Wikipedia editions and by fusing the extracted data among editions it should be possible to complement data from one edition with previously missing values from other editions and to increase the overall quality of the extracted dataset by choosing property values that are most likely correct in case of inconsistencies among editions. We will present a software framework for fusing RDF datasets based on different conflict resolution strategies. We will apply the framework to fuse infobox data that has been extracted from the English, German, Italian and French editions of Wikipedia and will discuss the accuracy of the conflict resolution strategies that were used in this experiment.

01 Jan 2009
TL;DR: An overview about the DBpedia project is given and how application developers can make use of DBpedia knowledge within their applications are described.
Abstract: The DBpedia project has extracted a rich knowledge base from Wikipedia and serves this knowledge base as Linked Data on the Web. DBpedia’s knowledge base currently provides 274 million pieces of information about 2.6 million concepts. As DBpedia covers a wide range of domains and has a high degree of conceptual overlap with various openlicense datasets that are already available on the Web, an increasing number of data publishers has started to set data links from their data sources to DBpedia, making DBpedia one of the central interlinking hubs of the emerging Web of Data. This paper gives an overview about the DBpedia project and describes how application developers can make use of DBpedia knowledge within their applications.

Dataset
01 Jan 2009
TL;DR: Microformat, Microdata and RDFa data from the 2009 Common Crawl web corpus is extracted, finding structured data within 147 million HTML pages out of the 2 billion pages contained in the crawl.
Abstract: Microformat, Microdata and RDFa data from the 2009 Common Crawl web corpus. We found structured data within 147 million HTML pages out of the 2 billion pages contained in the crawl (5%). These pages originate from 19 million different pay-level-domains. Altogether, the extracted data sets consist of 5 billion RDF quads.

Book ChapterDOI
01 Jan 2009
TL;DR: Es werden verschiedene Architekturen und Ansatze zur Generierung of RDF-Daten aus bestehenden Web 2.0- Datenquellen, zur Vernetzung der extrahierten Daten sowie zur Veroffentlichung der Daten im Web anhand konkreter Beispiele diskutiert.
Abstract: Semantische Mashups sind Anwendungen, die vernetzte Daten aus mehreren Web-Datenquellen mittels standardisierter Datenformate und Zugriffsmechanismen nutzen. Der Artikel gibt einen Uberblick uber die Idee und Motivation der Vernetzung von Daten. Es werden verschiedene Architekturen und Ansatze zur Generierung von RDF-Daten aus bestehenden Web 2.0- Datenquellen, zur Vernetzung der extrahierten Daten sowie zur Veroffentlichung der Daten im Web anhand konkreter Beispiele diskutiert. Hierbei wird insbesondere auf Datenquellen, die aus sozialen Interaktionen hervorgegangen sind eingegangen. Anschliesend wird ein Uberblick uber verschiedene, im Web frei zugangliche semantische Mashups gegeben und auf leichtgewichtige Inferenzansatze eingegangen, mittels derer sich die Funktionalitat von semantischen Mashups weiter verbessern lasst.

Journal ArticleDOI
TL;DR: DBpedia Mobile as mentioned in this paper is a location-aware Semantic Web client that can be used on an iPhone and other mobile devices to render a map indicating nearby locations from the DBpedia data set.
Abstract: The Geospatial Semantic Web makes locations first-class citizens of the Web by representing them as original Web resources. This allows locations to be described in an open and distributed manner using the Resource Description Framework and provides for interlinking data about locations between data sources. In addition to using geo-coordinates to express geographical proximity, the Geospatial Semantic Web provides for relating locations as well as regions to each other using explicit semantic relationship types such as containment or shared borders. This article gives an overview of the Geospatial Semantic Web and describes DBpedia Mobile, a location-aware Semantic Web client that can be used on an iPhone and other mobile devices. Based on the current GPS position, DBpedia Mobile renders a map indicating nearby locations from the DBpedia data set. Starting from this map, the user can explore background information about his surroundings by navigating along data links into other data sources. DBpedia Mobile has been designed for the use case of a tourist exploring a city. Besides accessing Web data, DBpedia Mobile also enables users to publish their current location, pictures and reviews to the Semantic Web so that they can be used by other Semantic Web applications. Instead of simply being tagged with geographical coordinates, published content is interlinked with a nearby DBpedia resource and thus contributes to the overall richness of the Geospatial Semantic Web.