Reynold Xin

Journal ArticleDOI

Apache Spark: a unified engine for big data processing

- 28 Oct 2016 -

TL;DR: This open source computing framework unifies streaming, batch, and interactive big data workloads to unlock new applications.

...read moreread less

Journal Article

MLlib: machine learning in apache spark

Xiangrui Meng, +15 more

- 01 Jan 2016 -

Journal of Machine Learning Research

TL;DR: MLlib as mentioned in this paper is an open-source distributed machine learning library for Apache Spark that provides efficient functionality for a wide range of learning settings and includes several underlying statistical, optimization, and linear algebra primitives.

...read moreread less

Proceedings ArticleDOI

Spark SQL: Relational Data Processing in Spark

Michael Armbrust, +10 more

TL;DR: Spark SQL is a new module in Apache Spark that integrates relational processing with Spark's functional programming API, and includes a highly extensible optimizer, Catalyst, built using features of the Scala programming language.

...read moreread less

Proceedings ArticleDOI

GraphX: graph processing in a distributed dataflow framework

Joseph E. Gonzalez, +5 more

TL;DR: This paper introduces GraphX, an embedded graph processing framework built on top of Apache Spark, a widely used distributed dataflow system and demonstrates that GraphX achieves an order of magnitude performance gain over the base dataflow framework and matches the performance of specialized graph processing systems while enabling a wider range of computation.

...read moreread less

Proceedings ArticleDOI

CrowdDB: answering queries with crowdsourcing

Michael J. Franklin, +4 more

TL;DR: The design of CrowdDB is described, a major change is that the traditional closed-world assumption for query processing does not hold for human input, and important avenues for future work in the development of crowdsourced query processing systems are outlined.

...read moreread less

Papers

Apache Spark: a unified engine for big data processing

MLlib: machine learning in apache spark

Spark SQL: Relational Data Processing in Spark

GraphX: graph processing in a distributed dataflow framework

CrowdDB: answering queries with crowdsourcing