scispace - formally typeset
Open AccessPosted Content

API design for machine learning software: experiences from the scikit-learn project

Reads0
Chats0
TLDR
Scikit-learn as mentioned in this paper is a machine learning library written in Python, which is designed to be simple and efficient, accessible to non-experts, and reusable in various contexts.
Abstract
Scikit-learn is an increasingly popular machine learning li- brary. Written in Python, it is designed to be simple and efficient, accessible to non-experts, and reusable in various contexts. In this paper, we present and discuss our design choices for the application programming interface (API) of the project. In particular, we describe the simple and elegant interface shared by all learning and processing units in the library and then discuss its advantages in terms of composition and reusability. The paper also comments on implementation details specific to the Python ecosystem and analyzes obstacles faced by users and developers of the library.

read more

Citations
More filters
Journal Article

MLlib: machine learning in apache spark

TL;DR: MLlib as mentioned in this paper is an open-source distributed machine learning library for Apache Spark that provides efficient functionality for a wide range of learning settings and includes several underlying statistical, optimization, and linear algebra primitives.
Proceedings ArticleDOI

Auto-Keras: An Efficient Neural Architecture Search System

TL;DR: In this article, the authors propose a novel framework enabling Bayesian optimization to guide the network morphism for efficient neural architecture search, which keeps the functionality of a neural network while changing its neural architecture, enabling more efficient training during the search.
Journal ArticleDOI

CatBoost for big data: an interdisciplinary review

TL;DR: This survey takes an interdisciplinary approach to cover studies related to CatBoost in a single work, and provides researchers an in-depth understanding to help clarify proper application of Cat boost in solving problems.
Journal ArticleDOI

Magellan: toward building entity matching management systems

TL;DR: Magellan is novel in four important aspects: it provides how-to guides that tell users what to do in each EM scenario, step by step, and provides tools to help users do these steps; the tools seek to cover the entire EM pipeline, not just matching and blocking as current EM systems do.
References
More filters
Journal ArticleDOI

LIBSVM: A library for support vector machines

TL;DR: Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.
Journal ArticleDOI

Matplotlib: A 2D Graphics Environment

TL;DR: Matplotlib is a 2D graphics package used for Python for application development, interactive scripting, and publication-quality image generation across user interfaces and operating systems.
Journal ArticleDOI

The WEKA data mining software: an update

TL;DR: This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.
Journal Article

Random search for hyper-parameter optimization

TL;DR: This paper shows empirically and theoretically that randomly chosen trials are more efficient for hyper-parameter optimization than trials on a grid, and shows that random search is a natural baseline against which to judge progress in the development of adaptive (sequential) hyper- parameter optimization algorithms.
Journal ArticleDOI

OpenMP: an industry standard API for shared-memory programming

L. Dagum, +1 more
TL;DR: At its most elemental level, OpenMP is a set of compiler directives and callable runtime library routines that extend Fortran (and separately, C and C++ to express shared memory parallelism) and leaves the base language unspecified.
Related Papers (5)