Open AccessProceedings Article
D-Hive: Data Bees Pollinating RDF, Text, and Time
Reads0
Chats0
TLDR
D-Hive is put forward, a system facilitating analytics over RDF-style (SPO) triples augmented with text and (validity / transaction) time capable of addressing the functionality and scalability requirements which current solutions cannot meet.Abstract:
Although the problem of integrating IR and DB solutions is considered “old”, the increasing importance of big data analytics and its formidable demands for both enriched functionality and scalable performance creates the need to revisit the problem itself and to see possible solutions from a new perspective. Our goal is to develop a system that will make large corpora aware of entities and relationships (ER), addressing the challenges in searching and analyzing ER patterns in web data and social media. We put forward D-Hive, a system facilitating analytics over RDF-style (SPO) triples augmented with text and (validity / transaction) time capable of addressing the functionality and scalability requirements which current solutions cannot meet. We consider various alternatives for the data modeling, storage, indexing, and query processing engines of D-Hive paying attention to the challenges that must be met, which include i) scalable joint indexing of SPO-text-time tuples (quads, quints, octs, etc.), ii) efficient processing of complex queries that involve RDF star and path joins, filtering and grouping on text phrases, band joins over time, and more, as well as iii) optimizing the execution plans for such analytics.read more
Citations
More filters
Proceedings ArticleDOI
Wearable queries: adapting common retrieval needs to data and users
Barbara Catania,Giovanna Guerrini,Alberto Belussi,Federica Mandreoli,Riccardo Martoglia,Wilma Penzo +5 more
TL;DR: By interpreting a request in a novel way by means of a Wearable Query (WQ), i.e., a query that captures the user and request specificities, this paper envision an approach to address repeated information needs in distributed, heterogeneous, dynamic environments, with emphasis on the geo-spatial dimension and on data quality.
References
More filters
Book
Linked Data: Evolving the Web into a Global Data Space
TL;DR: This Synthesis lecture provides readers with a detailed technical introduction to Linked Data, including coverage of relevant aspects of Web architecture, as the basis for application development, research or further study.
Proceedings ArticleDOI
Hive - a petabyte scale data warehouse using Hadoop
Ashish Thusoo,Joydeep Sen Sarma,Namit Jain,Zheng Shao,Prasad Chakka,Ning Zhang,Suresh Antony,Hao Liu,Raghotham Murthy +8 more
TL;DR: Hive is presented, an open-source data warehousing solution built on top of Hadoop that supports queries expressed in a SQL-like declarative language - HiveQL, which are compiled into map-reduce jobs that are executed using Hadoops.
Journal ArticleDOI
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads
TL;DR: This paper explores the feasibility of building a hybrid system that takes the best features from both technologies; the prototype built approaches parallel databases in performance and efficiency, yet still yields the scalability, fault tolerance, and flexibility of MapReduce-based systems.
Journal ArticleDOI
The RDF-3X engine for scalable management of RDF data
Thomas Neumann,Gerhard Weikum +1 more
TL;DR: The RDF-3X engine is presented, an implementation of SPARQL that achieves excellent performance by pursuing a RISC-style architecture with streamlined indexing and query processing, and can outperform the previously best alternatives by one or two orders of magnitude.
Journal ArticleDOI
Scalable SPARQL querying of large RDF graphs
TL;DR: This paper introduces a scalable RDF data management system that is up to three orders of magnitude more efficient than popular multi-node RDFData management systems.