KeystoneML: Optimizing Pipelines for Large-Scale Advanced Analytics
Evan R. Sparks,Shivaram Venkataraman,Tomer Kaftan,Michael J. Franklin,Benjamin Recht +4 more
- Vol. 2016, pp 535-546
TLDR
KeystoneML is presented, a system that captures and optimizes the end-to-end large-scale machine learning applications for high-throughput training in a distributed environment with a high-level API that offers increased ease of use and higher performance over existing systems for large scale learning.Abstract:
Modern advanced analytics applications make use of machine learning techniques and contain multiple steps of domain-specific and general-purpose processing with high resource requirements. We present KeystoneML, a system that captures and optimizes the end-to-end large-scale machine learning applications for high-throughput training in a distributed environment with a high-level API. This approach offers increased ease of use and higher performance over existing systems for large scale learning. We demonstrate the effectiveness of KeystoneML in achieving high quality statistical accuracy and scalable training using real world datasets in several domains.read more
Citations
More filters
Proceedings ArticleDOI
TFX: A TensorFlow-Based Production-Scale Machine Learning Platform
Denis Baylor,Eric Breck,Heng-Tze Cheng,Noah Fiedel,Chuan Yu Foo,Zakaria Haque,Salem Haykal,Mustafa Ispir,Vihan Jain,Levent Koc,Chiu Yuen Koo,Lukasz Lew,Clemens Mewald,Akshay Naresh Modi,Neoklis Polyzotis,Sukriti Ramesh,Sudip Roy,Steven Euijong Whang,Martin Wicke,Jarek Wilkiewicz,Xin Zhang,Martin Zinkevich +21 more
TL;DR: TensorFlow Extended (TFX) is presented, a TensorFlow-based general-purpose machine learning platform implemented at Google that was able to standardize the components, simplify the platform configuration, and reduce the time to production from the order of months to weeks, while providing platform stability that minimizes disruptions.
Posted Content
SparkNet: Training Deep Networks in Spark
TL;DR: SparkNet as mentioned in this paper is a framework for training deep networks in Spark, which includes a convenient interface for reading data from Spark RDDs, a Scala interface to the Caffe deep learning framework, and a lightweight multi-dimensional tensor library.
Journal ArticleDOI
Data Lifecycle Challenges in Production Machine Learning: A Survey
TL;DR: Challenges in data understanding, data validation and cleaning, and data preparation are explored - how different constraints are imposed on the solutions depending on where in the lifecycle of a model the problems are encountered and who encounters them are explored.
Journal ArticleDOI
Automating large-scale data quality verification
Sebastian Schelter,Dustin Lange,Philipp Schmidt,Meltem Celikel,Felix Biessmann,Andreas Grafberger +5 more
TL;DR: This work presents a system for automating the verification of data quality at scale, which meets the requirements of production use cases and provides a declarative API, which combines common quality constraints with user-defined validation code, and thereby enables 'unit tests' for data.
References
More filters
Journal Article
Scikit-learn: Machine Learning in Python
Fabian Pedregosa,Gaël Varoquaux,Alexandre Gramfort,Vincent Michel,Bertrand Thirion,Olivier Grisel,Mathieu Blondel,Peter Prettenhofer,Ron Weiss,Vincent Dubourg,Jake Vanderplas,Alexandre Passos,David Cournapeau,Matthieu Brucher,Matthieu Perrot,Edouard Duchesnay +15 more
TL;DR: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems, focusing on bringing machine learning to non-specialists using a general-purpose high-level language.
Posted Content
Scikit-learn: Machine Learning in Python
Fabian Pedregosa,Gaël Varoquaux,Alexandre Gramfort,Vincent Michel,Bertrand Thirion,Olivier Grisel,Mathieu Blondel,Andreas Müller,Joel Nothman,Gilles Louppe,Peter Prettenhofer,Ron Weiss,Vincent Dubourg,Jake Vanderplas,Alexandre Passos,David Cournapeau,Matthieu Brucher,Matthieu Perrot,Edouard Duchesnay +18 more
TL;DR: Scikit-learn as mentioned in this paper is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems.
Proceedings ArticleDOI
Object recognition from local scale-invariant features
TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
Proceedings ArticleDOI
Rethinking the Inception Architecture for Computer Vision
TL;DR: In this article, the authors explore ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization.
Posted Content
Rethinking the Inception Architecture for Computer Vision
TL;DR: This work is exploring ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization.
Related Papers (5)
Scikit-learn: Machine Learning in Python
TensorFlow: a system for large-scale machine learning
Martín Abadi,Paul Barham,Jianmin Chen,Zhifeng Chen,Andy Davis,Jeffrey Dean,Matthieu Devin,Sanjay Ghemawat,Geoffrey Irving,Michael Isard,Manjunath Kudlur,Josh Levenberg,Rajat Monga,Sherry Moore,Derek G. Murray,Benoit Steiner,Paul A. Tucker,Vijay K. Vasudevan,Pete Warden,Martin Wicke,Yuan Yu,Xiaoqiang Zheng +21 more