scispace - formally typeset
R

Reynold Xin

Researcher at University of California, Berkeley

Publications -  37
Citations -  9069

Reynold Xin is an academic researcher from University of California, Berkeley. The author has contributed to research in topics: SQL & Spark (mathematics). The author has an hindex of 21, co-authored 37 publications receiving 8011 citations. Previous affiliations of Reynold Xin include Yahoo! & Google.

Papers
More filters
Proceedings ArticleDOI

Structured Streaming: A Declarative API for Real-Time Applications in Apache Spark

TL;DR: Structured Streaming is a new high-level streaming API in Apache Spark based on the experience with Spark Streaming that achieves high performance via Spark SQL's code generation engine and can outperform Apache Flink by up to 2x and Apache Kafka Streams by 90x.
Proceedings Article

The case for tiny tasks in compute clusters

TL;DR: It is argued for breaking data-parallel jobs in compute clusters into tiny tasks that each complete in hundreds of milliseconds, and a 5.2× improvement in response times is demonstrated due to the use of smaller tasks.
Journal ArticleDOI

Scaling spark in the real world: performance and usability

TL;DR: The main challenges and requirements that appeared in taking Spark to a wide set of users, and usability and performance improvements made to the engine in response are described.
Proceedings ArticleDOI

Fine-grained partitioning for aggressive data skipping

TL;DR: This paper proposes a fine-grained blocking technique that reorganizes the data tuples into blocks with a goal of enabling queries to skip blocks aggressively, and shows that this technique leads to 2-5x improvement in query response time over traditional range-based blocking techniques.
Proceedings ArticleDOI

GraphFrames: an integrated API for mixing graph and relational queries

TL;DR: GraphFrames is presented, an integrated system that lets users combine graph algorithms, pattern matching and relational queries, and optimizes work across them, while enabling optimizations across workflow steps that cannot occur in current systems.