D-Hive: Data Bees Pollinating RDF, Text, and Time

Open AccessProceedings Article

D-Hive: Data Bees Pollinating RDF, Text, and Time

Chats0

TLDR

D-Hive is put forward, a system facilitating analytics over RDF-style (SPO) triples augmented with text and (validity / transaction) time capable of addressing the functionality and scalability requirements which current solutions cannot meet.

Abstract:

Although the problem of integrating IR and DB solutions is considered “old”, the increasing importance of big data analytics and its formidable demands for both enriched functionality and scalable performance creates the need to revisit the problem itself and to see possible solutions from a new perspective. Our goal is to develop a system that will make large corpora aware of entities and relationships (ER), addressing the challenges in searching and analyzing ER patterns in web data and social media. We put forward D-Hive, a system facilitating analytics over RDF-style (SPO) triples augmented with text and (validity / transaction) time capable of addressing the functionality and scalability requirements which current solutions cannot meet. We consider various alternatives for the data modeling, storage, indexing, and query processing engines of D-Hive paying attention to the challenges that must be met, which include i) scalable joint indexing of SPO-text-time tuples (quads, quints, octs, etc.), ii) efficient processing of complex queries that involve RDF star and path joins, filtering and grouping on text phrases, band joins over time, and more, as well as iii) optimizing the execution plans for such analytics.

D-Hive: Data Bees Pollinating RDF, Text, and Time

Citations

Wearable queries: adapting common retrieval needs to data and users

References

Linked Data: Evolving the Web into a Global Data Space

Hive - a petabyte scale data warehouse using Hadoop

HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads

The RDF-3X engine for scalable management of RDF data

Scalable SPARQL querying of large RDF graphs

Related Papers (5)

RDF-3X: a RISC-style engine for RDF

Materialized View-Based Processing of RDF Queries

Towards scalable RDF graph analytics on MapReduce

Efficient distributed query processing for autonomous RDF databases

Efficient processing of RDF graph pattern matching on MapReduce platforms