scispace - formally typeset
Search or ask a question

Showing papers by "Johann-Christoph Freytag published in 2014"


Journal ArticleDOI
01 Dec 2014
TL;DR: The overall system architecture design decisions are presented, Stratosphere is introduced through example queries, and the internal workings of the system’s components that relate to extensibility, programming model, optimization, and query execution are dive into.
Abstract: We present Stratosphere, an open-source software stack for parallel data analysis. Stratosphere brings together a unique set of features that allow the expressive, easy, and efficient programming of analytical applications at very large scale. Stratosphere's features include "in situ" data processing, a declarative query language, treatment of user-defined functions as first-class citizens, automatic program parallelization and optimization, support for iterative programs, and a scalable and efficient execution engine. Stratosphere covers a variety of "Big Data" use cases, such as data warehousing, information extraction and integration, data cleansing, graph analysis, and statistical analysis applications. In this paper, we present the overall system architecture design decisions, introduce Stratosphere through example queries, and then dive into the internal workings of the system's components that relate to extensibility, programming model, optimization, and query execution. We experimentally compare Stratosphere against popular open-source alternatives, and we conclude with a research outlook for the next years.

491 citations


Proceedings Article
01 Jan 2014
TL;DR: This paper adapts the B + -Tree and prefix B-Tree by changing the search algorithm on inner nodes from binary search to k-ary search, and introduces two tree adaptations that satisfy the specific constraints of SIMD instructions.
Abstract: In this paper, we accelerate the processing of tree-based index structures by using SIMD instructions. We adapt the B + -Tree and prefix B-Tree (trie) by changing the search algorithm on inner nodes from binary search to k-ary search. The k-ary search enables the use of SIMD instructions, which are commonly available on most modern processors today. The main challenge for using SIMD instructions on CPUs is their inherent requirement for consecutive memory loads. The data for one SIMD load instruction must be located in consecutive memory locations and cannot be scattered over the entire memory. The original layout of tree-based index structures does not satisfy this constraint and must be adapted to enable SIMD usage. Thus, we introduce two tree adaptations that satisfy the specific constraints of SIMD instructions. We present two di↵erent algorithms for transforming the original tree layout into a SIMD-friendly layout. Additionally, we introduce two SIMD-friendly search algorithms designed for the new layout. Our adapted B + -Tree speeds up search processes by a factor of up to eight for small data types compared to the original B + -Tree using binary search. Furthermore, our adapted prefix B-Tree enables a high search performance even for larger data types. We report a constant 14 fold speedup and an 8 fold reduction in memory consumption compared to the original B + -Tree.

28 citations


Proceedings Article
01 Jan 2014
TL;DR: This paper classify common DBMS by their scheduling strategies and chunk sizes, and proposes a task model called Query Task Model (QTM) that opens a design space for database schedules and generalizes the modeling of parallel query execution such that dierent approaches become comparable.
Abstract: Over the last decade, several approaches for parallel query execution have emerged. The performance of these approaches is mainly aected by the non-manageable cache hierarchy. However, each approach exploits the capabilities of modern processors dierently. Furthermore, the comparison is dicult due to dierent operator-to-resource assignments during run-time (scheduling strategy) and the number of tuples each operator processes (chunk size). In this paper, we rst classify common DBMS by their scheduling strategies and chunk sizes. Then, we propose a task model called Query Task Model (QTM) that opens a design space for database schedules. With QTM, we generalize the modeling of parallel query execution such that dierent approaches become comparable. Using QTM, we

12 citations


Journal ArticleDOI
TL;DR: Einleitung Im Zeitalter der umfassenden und alle Bereiche des taglichen Lebens erreichenden Digitalisierung werden zentral und dezentral mehr and mehr Daten systematisch gesammelt, gespeichert, analysiert und Nutzern zuganglich gemacht.
Abstract: Einleitung Im Zeitalter der umfassenden und alle Bereiche des taglichen Lebens erreichenden Digitalisierung werden zentral und dezentral mehr und mehr Daten systematisch gesammelt, gespeichert, analysiert und Nutzern zuganglich gemacht. Diese rasant wachsenden Datenmengen werden auf Handys, Kameras und Sensoren sowie durch Interaktionen zwischen ubiquitaren, ortsund zeitbezogenen Systemen und Dienstleistern (Servern) sowie Sensoren zur Messung von ortsund zeitbezogenen Grosen generiert. Diese Entwicklung in der Datengenerierung hat zu einer Welle neuer technologischer Entwicklungen gefuhrt, die zurzeit unter dem Begriff Big Data zusammengefasst werden. Jedoch erscheinen die momentanen Ansatze im Big Data-Bereich stark technologiezentriert statt nutzerorientiert zu sein. Dies hat zur Folge, dass die Potenziale beim Einsatz der Big Data-Technologie nur ungenugend ausgeschopft werden konnen. Aus diesem Grunde argumentiert der vorliegende Beitrag, dass eine nutzerorientierte Weiterentwicklung der Big Data-Technologien notwendig ist, und entwirft anhand bisheriger Entwicklungen im Big Data-Bereich sowie Anforderungen aus verschiedenenAnwendungsdomanen eine Vision fur eine Big Data-Plattform, die sich an zwei Grundprinzipien aus dem Datenbankbereich – Skalierbarkeit und deklarative Spezifikation – orientiert. Wie die Erkenntnisse aus Reaktionen auf die NSA-Affare in 2013 zeigen, wird es fur die gesellschaftliche Akzeptanz dieser Technologie notwendig sein, den Schutz der Privatsphare nicht nur gesetzgeberisch sondern auch technologisch so weit wie moglich sicherzustellen. Aus diesem Grunde wird dieser Beitrag durch einen Uberblick uber technische Entwicklungen zum Schutz der Privatsphare abgerundet.

5 citations