scispace - formally typeset
R

Reynold Xin

Researcher at University of California, Berkeley

Publications -  37
Citations -  9069

Reynold Xin is an academic researcher from University of California, Berkeley. The author has contributed to research in topics: SQL & Spark (mathematics). The author has an hindex of 21, co-authored 37 publications receiving 8011 citations. Previous affiliations of Reynold Xin include Yahoo! & Google.

Papers
More filters
Journal Article

MLlib: machine learning in apache spark

TL;DR: MLlib as mentioned in this paper is an open-source distributed machine learning library for Apache Spark that provides efficient functionality for a wide range of learning settings and includes several underlying statistical, optimization, and linear algebra primitives.
Proceedings ArticleDOI

Spark SQL: Relational Data Processing in Spark

TL;DR: Spark SQL is a new module in Apache Spark that integrates relational processing with Spark's functional programming API, and includes a highly extensible optimizer, Catalyst, built using features of the Scala programming language.
Proceedings ArticleDOI

GraphX: graph processing in a distributed dataflow framework

TL;DR: This paper introduces GraphX, an embedded graph processing framework built on top of Apache Spark, a widely used distributed dataflow system and demonstrates that GraphX achieves an order of magnitude performance gain over the base dataflow framework and matches the performance of specialized graph processing systems while enabling a wider range of computation.
Proceedings ArticleDOI

CrowdDB: answering queries with crowdsourcing

TL;DR: The design of CrowdDB is described, a major change is that the traditional closed-world assumption for query processing does not hold for human input, and important avenues for future work in the development of crowdsourced query processing systems are outlined.