Matthias J. Sax

Journal ArticleDOI

The Stratosphere platform for big data analytics

TL;DR: The overall system architecture design decisions are presented, Stratosphere is introduced through example queries, and the internal workings of the system’s components that relate to extensibility, programming model, optimization, and query execution are dive into.

...read moreread less

Posted Content

Opening the Black Boxes in Data Flow Optimization

Fabian Hueske, +6 more

- 01 Aug 2012 -

arXiv: Databases

TL;DR: In this paper, the problem of performing data flow optimization at this level of abstraction, where the semantics of operators are not known, was addressed by statically analyzing the general-purpose code of their user-defined functions.

...read moreread less

Journal ArticleDOI

Opening the black boxes in data flow optimization

Fabian Hueske, +6 more

TL;DR: This work design and implement an optimizer for parallel data flows that does not assume knowledge of semantics or algebraic properties of operators, and can optimize the operator order of nonrelational data flows, a unique feature among today's systems.

...read moreread less

Proceedings ArticleDOI

Streams and Tables: Two Sides of the Same Coin

Matthias J. Sax, +3 more

TL;DR: This model presents the result of an operator as a stream of successive updates, which induces a duality of results and streams, which provides a natural way to cope with inconsistencies between the physical and logical order of streaming data in a continuous manner, without explicit buffering and reordering.

...read moreread less

Proceedings ArticleDOI

Consistency and Completeness: Rethinking Distributed Stream Processing in Apache Kafka

Guozhang Wang, +11 more

TL;DR: Kafka Streams as discussed by the authors is a scalable stream processing client library in Apache Kafka, which defines the processing logic as read-process-write cycles in which all processing state updates and result outputs are captured as log appends.

...read moreread less

Papers

The Stratosphere platform for big data analytics

Opening the Black Boxes in Data Flow Optimization

Opening the black boxes in data flow optimization

Streams and Tables: Two Sides of the Same Coin

Consistency and Completeness: Rethinking Distributed Stream Processing in Apache Kafka