scispace - formally typeset
Open AccessProceedings ArticleDOI

KeystoneML: Optimizing Pipelines for Large-Scale Advanced Analytics

TLDR
KeystoneML is presented, a system that captures and optimizes the end-to-end large-scale machine learning applications for high-throughput training in a distributed environment with a high-level API that offers increased ease of use and higher performance over existing systems for large scale learning.
Abstract
Modern advanced analytics applications make use of machine learning techniques and contain multiple steps of domain-specific and general-purpose processing with high resource requirements. We present KeystoneML, a system that captures and optimizes the end-to-end large-scale machine learning applications for high-throughput training in a distributed environment with a high-level API. This approach offers increased ease of use and higher performance over existing systems for large scale learning. We demonstrate the effectiveness of KeystoneML in achieving high quality statistical accuracy and scalable training using real world datasets in several domains.

read more

Citations
More filters
Proceedings ArticleDOI

TFX: A TensorFlow-Based Production-Scale Machine Learning Platform

TL;DR: TensorFlow Extended (TFX) is presented, a TensorFlow-based general-purpose machine learning platform implemented at Google that was able to standardize the components, simplify the platform configuration, and reduce the time to production from the order of months to weeks, while providing platform stability that minimizes disruptions.
Posted Content

SparkNet: Training Deep Networks in Spark

TL;DR: SparkNet as mentioned in this paper is a framework for training deep networks in Spark, which includes a convenient interface for reading data from Spark RDDs, a Scala interface to the Caffe deep learning framework, and a lightweight multi-dimensional tensor library.
Journal ArticleDOI

Data Lifecycle Challenges in Production Machine Learning: A Survey

TL;DR: Challenges in data understanding, data validation and cleaning, and data preparation are explored - how different constraints are imposed on the solutions depending on where in the lifecycle of a model the problems are encountered and who encounters them are explored.
Journal ArticleDOI

Automating large-scale data quality verification

TL;DR: This work presents a system for automating the verification of data quality at scale, which meets the requirements of production use cases and provides a declarative API, which combines common quality constraints with user-defined validation code, and thereby enables 'unit tests' for data.
References
More filters
Journal Article

Scikit-learn: Machine Learning in Python

TL;DR: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems, focusing on bringing machine learning to non-specialists using a general-purpose high-level language.
Proceedings ArticleDOI

Object recognition from local scale-invariant features

TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
Proceedings ArticleDOI

Rethinking the Inception Architecture for Computer Vision

TL;DR: In this article, the authors explore ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization.
Posted Content

Rethinking the Inception Architecture for Computer Vision

TL;DR: This work is exploring ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization.
Related Papers (5)