R
Reynold Xin
Researcher at University of California, Berkeley
Publications - 37
Citations - 9069
Reynold Xin is an academic researcher from University of California, Berkeley. The author has contributed to research in topics: SQL & Spark (mathematics). The author has an hindex of 21, co-authored 37 publications receiving 8011 citations. Previous affiliations of Reynold Xin include Yahoo! & Google.
Papers
More filters
Proceedings ArticleDOI
Structured Streaming: A Declarative API for Real-Time Applications in Apache Spark
Michael Armbrust,Tathagata Das,Joseph Torres,Burak Yavuz,Shixiong Zhu,Reynold Xin,Ali Ghodsi,Ion Stoica,Matei Zaharia +8 more
TL;DR: Structured Streaming is a new high-level streaming API in Apache Spark based on the experience with Spark Streaming that achieves high performance via Spark SQL's code generation engine and can outperform Apache Flink by up to 2x and Apache Kafka Streams by 90x.
Proceedings Article
The case for tiny tasks in compute clusters
Kay Ousterhout,Aurojit Panda,Joshua Rosen,Shivaram Venkataraman,Reynold Xin,Sylvia Ratnasamy,Scott Shenker,Ion Stoica +7 more
TL;DR: It is argued for breaking data-parallel jobs in compute clusters into tiny tasks that each complete in hundreds of milliseconds, and a 5.2× improvement in response times is demonstrated due to the use of smaller tasks.
Journal ArticleDOI
Scaling spark in the real world: performance and usability
Michael Armbrust,Tathagata Das,Aaron Davidson,Ali Ghodsi,Andrew Or,Josh Rosen,Ion Stoica,Patrick Wendell,Reynold Xin,Matei Zaharia +9 more
TL;DR: The main challenges and requirements that appeared in taking Spark to a wide set of users, and usability and performance improvements made to the engine in response are described.
Proceedings ArticleDOI
Fine-grained partitioning for aggressive data skipping
TL;DR: This paper proposes a fine-grained blocking technique that reorganizes the data tuples into blocks with a goal of enabling queries to skip blocks aggressively, and shows that this technique leads to 2-5x improvement in query response time over traditional range-based blocking techniques.
Proceedings ArticleDOI
GraphFrames: an integrated API for mixing graph and relational queries
TL;DR: GraphFrames is presented, an integrated system that lets users combine graph algorithms, pattern matching and relational queries, and optimizes work across them, while enabling optimizations across workflow steps that cannot occur in current systems.