scispace - formally typeset
Open AccessProceedings Article

API design for machine learning software: experiences from the scikit-learn project

TLDR
Scikit-learn as discussed by the authors is a machine learning library written in Python, which is designed to be simple and efficient, accessible to non-experts, and reusable in various contexts.
Abstract
Scikit-learn is an increasingly popular machine learning li- brary. Written in Python, it is designed to be simple and efficient, accessible to non-experts, and reusable in various contexts. In this paper, we present and discuss our design choices for the application programming interface (API) of the project. In particular, we describe the simple and elegant interface shared by all learning and processing units in the library and then discuss its advantages in terms of composition and reusability. The paper also comments on implementation details specific to the Python ecosystem and analyzes obstacles faced by users and developers of the library.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Robust Smartphone App Identification via Encrypted Network Traffic Analysis

TL;DR: In this paper, a passive eavesdropper can feasibly identify smartphone apps by fingerprinting the network traffic that they send, which can reveal much information about a user, such as their medical conditions, sexual orientation or religious beliefs.
Posted Content

Anti-Money Laundering in Bitcoin: Experimenting with Graph Convolutional Networks for Financial Forensics.

TL;DR: This workshop tutorial motivates the opportunity to reconcile the cause of safety with that of financial inclusion, and offers a simple prototype capable of navigating the graph and observing model performance on illicit activity over time.
Posted Content

How Does Mixup Help With Robustness and Generalization

TL;DR: It is shown that minimizing the Mixup loss corresponds to approximately minimizing an upper bound of the adversarial loss, which explains why models obtained by Mixup training exhibits robustness to several kinds of adversarial attacks such as Fast Gradient Sign Method.
Posted Content

Provably efficient machine learning for quantum many-body problems.

TL;DR: It is proved that classical ML algorithms can efficiently predict ground state properties of gapped Hamiltonian in finite spatial dimensions, after learning from data obtained by measuring other Hamiltonians in the same quantum phase of matter.
Proceedings ArticleDOI

Word embeddings for Arabic sentiment analysis

TL;DR: This paper relies on word embeddings as the main source of features for opinion mining in Arabic text such as tweets, consumer reviews, and news articles and achieves a slightly better accuracy than the top hand-crafted methods.
References
More filters
Journal Article

Scikit-learn: Machine Learning in Python

TL;DR: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems, focusing on bringing machine learning to non-specialists using a general-purpose high-level language.
Journal ArticleDOI

LIBSVM: A library for support vector machines

TL;DR: Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.
Journal ArticleDOI

Matplotlib: A 2D Graphics Environment

TL;DR: Matplotlib is a 2D graphics package used for Python for application development, interactive scripting, and publication-quality image generation across user interfaces and operating systems.
Journal ArticleDOI

The WEKA data mining software: an update

TL;DR: This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.
Journal ArticleDOI

The NumPy Array: A Structure for Efficient Numerical Computation

TL;DR: In this article, the authors show how to improve the performance of NumPy arrays through vectorizing calculations, avoiding copying data in memory, and minimizing operation counts, which is a technique similar to the one described in this paper.
Related Papers (5)