scispace - formally typeset
Open AccessProceedings Article

D-Hive: Data Bees Pollinating RDF, Text, and Time

Reads0
Chats0
TLDR
D-Hive is put forward, a system facilitating analytics over RDF-style (SPO) triples augmented with text and (validity / transaction) time capable of addressing the functionality and scalability requirements which current solutions cannot meet.
Abstract
Although the problem of integrating IR and DB solutions is considered “old”, the increasing importance of big data analytics and its formidable demands for both enriched functionality and scalable performance creates the need to revisit the problem itself and to see possible solutions from a new perspective. Our goal is to develop a system that will make large corpora aware of entities and relationships (ER), addressing the challenges in searching and analyzing ER patterns in web data and social media. We put forward D-Hive, a system facilitating analytics over RDF-style (SPO) triples augmented with text and (validity / transaction) time capable of addressing the functionality and scalability requirements which current solutions cannot meet. We consider various alternatives for the data modeling, storage, indexing, and query processing engines of D-Hive paying attention to the challenges that must be met, which include i) scalable joint indexing of SPO-text-time tuples (quads, quints, octs, etc.), ii) efficient processing of complex queries that involve RDF star and path joins, filtering and grouping on text phrases, band joins over time, and more, as well as iii) optimizing the execution plans for such analytics.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Wearable queries: adapting common retrieval needs to data and users

TL;DR: By interpreting a request in a novel way by means of a Wearable Query (WQ), i.e., a query that captures the user and request specificities, this paper envision an approach to address repeated information needs in distributed, heterogeneous, dynamic environments, with emphasis on the geo-spatial dimension and on data quality.
References
More filters
Book

Linked Data: Evolving the Web into a Global Data Space

TL;DR: This Synthesis lecture provides readers with a detailed technical introduction to Linked Data, including coverage of relevant aspects of Web architecture, as the basis for application development, research or further study.
Proceedings ArticleDOI

Hive - a petabyte scale data warehouse using Hadoop

TL;DR: Hive is presented, an open-source data warehousing solution built on top of Hadoop that supports queries expressed in a SQL-like declarative language - HiveQL, which are compiled into map-reduce jobs that are executed using Hadoops.
Journal ArticleDOI

HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads

TL;DR: This paper explores the feasibility of building a hybrid system that takes the best features from both technologies; the prototype built approaches parallel databases in performance and efficiency, yet still yields the scalability, fault tolerance, and flexibility of MapReduce-based systems.
Journal ArticleDOI

The RDF-3X engine for scalable management of RDF data

TL;DR: The RDF-3X engine is presented, an implementation of SPARQL that achieves excellent performance by pursuing a RISC-style architecture with streamlined indexing and query processing, and can outperform the previously best alternatives by one or two orders of magnitude.
Journal ArticleDOI

Scalable SPARQL querying of large RDF graphs

TL;DR: This paper introduces a scalable RDF data management system that is up to three orders of magnitude more efficient than popular multi-node RDFData management systems.