scispace - formally typeset
R

Reynold Xin

Researcher at University of California, Berkeley

Publications -  37
Citations -  9069

Reynold Xin is an academic researcher from University of California, Berkeley. The author has contributed to research in topics: SQL & Spark (mathematics). The author has an hindex of 21, co-authored 37 publications receiving 8011 citations. Previous affiliations of Reynold Xin include Yahoo! & Google.

Papers
More filters
Proceedings ArticleDOI

Spark and Scala (keynote)

Reynold Xin
TL;DR: This talk will review the evolution of Spark for the last seven years and the experience using Scala as the main programming language in a high profile open source project with a distributed team, and outline language features that the authors can't live without, and features they wish were designed differently.

Go with the Flow: Graphs, Streaming and Relational Computations over Distributed Dataflow

Reynold Xin
TL;DR: This dissertation builds on Apache Spark, a distributed dataflow engine, and creates three related systems: Spark SQL, Structured Streaming, and GraphX, which demonstrate the feasibility and advantages of unifying disparate, specialized data systems on top of distributed data flow systems.

The End of an Architectural Era for Analytical Databases

TL;DR: In this article, the authors propose a new generation of data warehouse systems that are modular, high performance, fault-tolerant, easy to provision, and designed to support both SQL query processing and machine learning applications.
Posted Content

The End of an Architectural Era for Analytical Databases

TL;DR: In this paper, the authors propose a new generation of data warehouse systems, which should be modular, high performance, fault-tolerant, easy to provision, and designed to support both SQL query processing and machine learning applications.

Improving Data Management Applications Using Microtask Platforms

TL;DR: To properly perform such data management tasks requires human inputs for providing information that is missing from the structured data that machines can read, for performing computationally dicult functions, and for matching, ranking, or aggregating results based on fuzzy criteria.