scispace - formally typeset
Book ChapterDOI

Processing RDF Using Hadoop

TLDR
This work proposes a framework which is constructed using Hadoop to store and retrieve massive numbers of RDF triples by taking advantage of the cloud computing paradigm, and results confirm that the proposed framework offers multi-fold efficiencies and benefits which include on-demand processing, operational scalability, competence, cost efficiency and local access to enormous data.
Abstract
The basic inspiration of the Semantic Web is to broaden the existing human-readable web by encoding some of the semantics of resources in a machine-understandable form. There are various formats and technologies that help in making it possible. These technologies comprise of the Resource Description Framework (RDF), an assortment of data interchange formats like RDF/XML, N3, N-Triples, and representations such as RDF Schema (RDFS) and Web Ontology Language (OWL), all of which help in providing a proper description of concepts, terms and associations in a particular knowledge domain. Presently, there are some existing frameworks for semantic web technologies but they have limitations for large RDF graphs. Thus storing and efficiently querying a large number of RDF triples is a challenging and important problem. We propose a framework which is constructed using Hadoop to store and retrieve massive numbers of RDF triples by taking advantage of the cloud computing paradigm. Hadoop permits the development of reliable, scalable, proficient, cost-effective and distributed computing using very simple Java interfaces. Hadoop comprises of a distributed file system HDFS to stock up RDF data. Hadoop Map Reduce framework is used to answer the queries. MapReduce job divides the input data-set into independent units which are processed in parallel by the map tasks , which then serve as inputs to the reduce tasks. This framework takes care of task scheduling, supervising them and re-execution of the failed tasks. Uniqueness of our approach is its efficient, automatic allocation of data and work across machines and in turn exploiting the fundamental parallelism of the CPU cores. Results confirm that our proposed framework offers multi-fold efficiencies and benefits which include on-demand processing, operational scalability, competence, cost efficiency and local access to enormous data, contrasting the various traditional approaches.

read more

Citations
More filters
Dissertation

Towards a semantic web of things for smart cities

Shaun Howell
TL;DR: The work demonstrates that a Semantic Web of Things approach does improve applicationlayer interoperability, and shows that integrating this with rich domain context is beneficial in promoting interoperability and discoverability.
Proceedings ArticleDOI

JOTR: Join-Optimistic Triple Reordering Approach for SPARQL Query Optimization on Big RDF Data

TL;DR: JOTR is presented: a SPARQL query optimization technique for Big RDF data using triple pattern reordering on a distributed Hadoop based RDF system and gives a notable performance on distributed RDF systems and thus is applicable to centralized systems as well.
Proceedings ArticleDOI

HyPSo: Hybrid Partitioning for Big RDF Storage and Query Processing

TL;DR: A hybrid RDF partitioning scheme to speed up SPARQL query processing for Big RDF data and is compared with two existing distributed RDF frameworks in terms of storage space and query execution time and it can be concluded that HyPSo demonstrates significant improvement in performance.

Data Transfers in Hadoop: A Comparative Study

TL;DR: A state-of-the-art comparative study among the various tools for importing and exporting data in Hadoop has been made and it has been decided that where to use one tool over the other with emphasis on the data transfer to and from Hadoops system.
Journal ArticleDOI

Scalable visualization for DBpedia ontology analysis using Hadoop

TL;DR: This work proposes a system based on Hadoop for ontological analysis for large ontologies that is scalable for big ontological data and evaluates the performance of the method by measuring execution times and analyzing experimental results obtained in the visualization process.
References
More filters
Journal ArticleDOI

MapReduce: simplified data processing on large clusters

TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.
Journal ArticleDOI

MapReduce: simplified data processing on large clusters

TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.
Journal ArticleDOI

LUBM: A benchmark for OWL knowledge base systems

TL;DR: This work describes the method for benchmarking Semantic Web knowledge base systems with respect to use in large OWL applications and presents the Lehigh University Benchmark (LUBM) as an example of how to design such benchmarks.
Related Papers (5)