Open AccessPosted Content
API design for machine learning software: experiences from the scikit-learn project
Lars Buitinck,Gilles Louppe,Mathieu Blondel,Fabian Pedregosa,Andreas Mueller,Olivier Grisel,Vlad Niculae,Peter Prettenhofer,Alexandre Gramfort,Jaques Grobler,Robert Layton,Jake Vanderplas,Arnaud Joly,Brian Holt,Gaël Varoquaux +14 more
Reads0
Chats0
TLDR
Scikit-learn as mentioned in this paper is a machine learning library written in Python, which is designed to be simple and efficient, accessible to non-experts, and reusable in various contexts.Abstract:
Scikit-learn is an increasingly popular machine learning li- brary. Written in Python, it is designed to be simple and efficient, accessible to non-experts, and reusable in various contexts. In this paper, we present and discuss our design choices for the application programming interface (API) of the project. In particular, we describe the simple and elegant interface shared by all learning and processing units in the library and then discuss its advantages in terms of composition and reusability. The paper also comments on implementation details specific to the Python ecosystem and analyzes obstacles faced by users and developers of the library.read more
Citations
More filters
Journal Article
MLlib: machine learning in apache spark
Xiangrui Meng,Joseph K. Bradley,Burak Yavuz,Evan R. Sparks,Shivaram Venkataraman,Davies Liu,Jeremy Freeman,DB Tsai,Manish Amde,Sean Owen,Doris Xin,Reynold Xin,Michael J. Franklin,Reza Bosagh Zadeh,Matei Zaharia,Ameet Talwalkar +15 more
TL;DR: MLlib as mentioned in this paper is an open-source distributed machine learning library for Apache Spark that provides efficient functionality for a wide range of learning settings and includes several underlying statistical, optimization, and linear algebra primitives.
Proceedings ArticleDOI
Auto-Keras: An Efficient Neural Architecture Search System
Haifeng Jin,Qingquan Song,Xia Hu +2 more
TL;DR: In this article, the authors propose a novel framework enabling Bayesian optimization to guide the network morphism for efficient neural architecture search, which keeps the functionality of a neural network while changing its neural architecture, enabling more efficient training during the search.
Journal ArticleDOI
Scikit-learn: Machine Learning Without Learning the Machinery
TL;DR: A quick introduction to scikit-learn as well as to machine-learning basics are given.
Journal ArticleDOI
CatBoost for big data: an interdisciplinary review
TL;DR: This survey takes an interdisciplinary approach to cover studies related to CatBoost in a single work, and provides researchers an in-depth understanding to help clarify proper application of Cat boost in solving problems.
Journal ArticleDOI
Magellan: toward building entity matching management systems
Pradap Konda,Sanjib Das,G C Paul Suganthan,AnHai Doan,Adel Ardalan,Jeff Ballard,Han Li,Fatemah Panahi,Haojun Zhang,Jeffrey F. Naughton,Shishir Prasad,Ganesh Krishnan,Rohit Deep,Vijay Raghavendra +13 more
TL;DR: Magellan is novel in four important aspects: it provides how-to guides that tell users what to do in each EM scenario, step by step, and provides tools to help users do these steps; the tools seek to cover the entire EM pipeline, not just matching and blocking as current EM systems do.
References
More filters
Journal ArticleDOI
LIBSVM: A library for support vector machines
Chih-Chung Chang,Chih-Jen Lin +1 more
TL;DR: Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.
Journal ArticleDOI
Matplotlib: A 2D Graphics Environment
TL;DR: Matplotlib is a 2D graphics package used for Python for application development, interactive scripting, and publication-quality image generation across user interfaces and operating systems.
Journal ArticleDOI
The WEKA data mining software: an update
TL;DR: This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.
Journal Article
Random search for hyper-parameter optimization
James Bergstra,Yoshua Bengio +1 more
TL;DR: This paper shows empirically and theoretically that randomly chosen trials are more efficient for hyper-parameter optimization than trials on a grid, and shows that random search is a natural baseline against which to judge progress in the development of adaptive (sequential) hyper- parameter optimization algorithms.
Journal ArticleDOI
OpenMP: an industry standard API for shared-memory programming
TL;DR: At its most elemental level, OpenMP is a set of compiler directives and callable runtime library routines that extend Fortran (and separately, C and C++ to express shared memory parallelism) and leaves the base language unspecified.