scispace - formally typeset
E

Evan R. Sparks

Researcher at University of California, Berkeley

Publications -  27
Citations -  2896

Evan R. Sparks is an academic researcher from University of California, Berkeley. The author has contributed to research in topics: Scalability & Spark (mathematics). The author has an hindex of 18, co-authored 27 publications receiving 2638 citations. Previous affiliations of Evan R. Sparks include Dartmouth College.

Papers
More filters
Journal Article

MLlib: machine learning in apache spark

TL;DR: MLlib as mentioned in this paper is an open-source distributed machine learning library for Apache Spark that provides efficient functionality for a wide range of learning settings and includes several underlying statistical, optimization, and linear algebra primitives.
Posted Content

MLI: An API for Distributed Machine Learning

TL;DR: The initial results show that this interface can be used to build distributed implementations of a wide variety of common Machine Learning algorithms with minimal complexity and highly competitive performance and scalability.
Proceedings Article

Paleo: A Performance Model for Deep Neural Networks

TL;DR: This work introduces an analytical performance model called PALEO, which can efficiently and accurately model the expected scalability and performance of a putative deep learning system and is robust to the choice of network architecture, hardware, software, communication schemes, and parallelization strategies.
Proceedings ArticleDOI

Automating model search for large scale machine learning

TL;DR: An architecture for automatic machine learning at scale comprised of a cost-based cluster resource allocation estimator, advanced hyper-parameter tuning techniques, bandit resource allocation via runtime algorithm introspection, and physical optimization via batching and optimal resource allocation is proposed.
Proceedings ArticleDOI

KeystoneML: Optimizing Pipelines for Large-Scale Advanced Analytics

TL;DR: KeystoneML is presented, a system that captures and optimizes the end-to-end large-scale machine learning applications for high-throughput training in a distributed environment with a high-level API that offers increased ease of use and higher performance over existing systems for large scale learning.