R
Reynold Xin
Researcher at University of California, Berkeley
Publications - 37
Citations - 9069
Reynold Xin is an academic researcher from University of California, Berkeley. The author has contributed to research in topics: SQL & Spark (mathematics). The author has an hindex of 21, co-authored 37 publications receiving 8011 citations. Previous affiliations of Reynold Xin include Yahoo! & Google.
Papers
More filters
Journal ArticleDOI
Apache Spark: a unified engine for big data processing
Matei Zaharia,Reynold Xin,Patrick Wendell,Tathagata Das,Michael Armbrust,Ankur Dave,Xiangrui Meng,Josh Rosen,Shivaram Venkataraman,Michael J. Franklin,Ali Ghodsi,Joseph E. Gonzalez,Scott Shenker,Ion Stoica +13 more
TL;DR: This open source computing framework unifies streaming, batch, and interactive big data workloads to unlock new applications.
Journal Article
MLlib: machine learning in apache spark
Xiangrui Meng,Joseph K. Bradley,Burak Yavuz,Evan R. Sparks,Shivaram Venkataraman,Davies Liu,Jeremy Freeman,DB Tsai,Manish Amde,Sean Owen,Doris Xin,Reynold Xin,Michael J. Franklin,Reza Bosagh Zadeh,Matei Zaharia,Ameet Talwalkar +15 more
TL;DR: MLlib as mentioned in this paper is an open-source distributed machine learning library for Apache Spark that provides efficient functionality for a wide range of learning settings and includes several underlying statistical, optimization, and linear algebra primitives.
Proceedings ArticleDOI
Spark SQL: Relational Data Processing in Spark
Michael Armbrust,Reynold Xin,Cheng Lian,Yin Huai,Davies Liu,Joseph K. Bradley,Xiangrui Meng,Tomer Kaftan,Michael J. Franklin,Ali Ghodsi,Matei Zaharia +10 more
TL;DR: Spark SQL is a new module in Apache Spark that integrates relational processing with Spark's functional programming API, and includes a highly extensible optimizer, Catalyst, built using features of the Scala programming language.
Proceedings ArticleDOI
GraphX: graph processing in a distributed dataflow framework
TL;DR: This paper introduces GraphX, an embedded graph processing framework built on top of Apache Spark, a widely used distributed dataflow system and demonstrates that GraphX achieves an order of magnitude performance gain over the base dataflow framework and matches the performance of specialized graph processing systems while enabling a wider range of computation.
Proceedings ArticleDOI
CrowdDB: answering queries with crowdsourcing
TL;DR: The design of CrowdDB is described, a major change is that the traditional closed-world assumption for query processing does not hold for human input, and important avenues for future work in the development of crowdsourced query processing systems are outlined.