What are the future works mentioned in the paper "A comparison of automl tools for machine learning, deep learning and xgboost" ?

In future work, the authors intend to enlarge the comparison by considering more open-source AutoML technologies and datasets. In particular, the authors wish to analyze big data, where DL can potentially produce better predictions. The authors also plan to benchmark ML frameworks for specific infrastructure settings, such as involving edge computing.

What is the option for the XGB scenario?

As for the XGB scenario, rminer is the best overall option for binary and regression tasks, while H2O is recommended for multi-class tasks.

What was the criteria for the selection of the ML dataset?

The data selection criterion was defined as selecting the most downloaded datasets that did not include missing data and that reflected three supervised learning tasks: regression, binary and multi-class classification.

How many instances are used for the external testing?

For instance, if the data contains 100 instances, then in the first external 10-fold iteration 90 examples are used by the tool for fitting purposes (model selection and training), with the remaining 10 instances being used for the external testing.

What is the way to validate the model?

In order to create validation sets (to select the best ML algorithms and hyperparameters), the authors adopted an internal 5- fold validation.

What is the lexicographic choice for DL and XGB?

When considering both DL and XGB scenarios, the lexicographic choice favors rminer XGB for the binary classification and regression tasks, while AutoGluon DL is the selected tool for multi-class.

What is the scenario for the validation of the external test set?

Since neither Auto-Keras nor Auto-PyTorch natively support cross-validation during the fitting phase, the authors used a simpler holdout train (75%) and test (25%) set split to select and fit the models.

(Open Access) A Comparison of AutoML Tools for Machine Learning, Deep Learning and XGBoost (2021) | Luís Ferreira

A Comparison of AutoML Tools for Machine

Learning, Deep Learning and XGBoost

ıs Ferreira

EPMQ - IT

CCG ZGDV Institute

ALGORITMI Center

University of Minho

Guimar

aes, Portugal

luis.ferreira@ccg.pt

Andr

e Pilastri

EPMQ - IT

CCG ZGDV Institute

Guimar

aes, Portugal

andre.pilastri@ccg.pt

Carlos Manuel Martins

WeDo Technologies

Braga, Portugal

carlos.mmartins

@mobileum.com

Pedro Miguel Pires

WeDo Technologies

Braga, Portugal

pedro.mpires

@mobileum.com

Paulo Cortez

ALGORITMI Center

Dep. Information Systems

University of Minho

Guimar

aes, Portugal

pcortez@dsi.uminho.pt

Abstract—This paper presents a benchmark of supervised

Automated Machine Learning (AutoML) tools. Firstly, we an-

alyze the characteristics of eight recent open-source AutoML

tools (Auto-Keras, Auto-PyTorch, Auto-Sklearn, AutoGluon, H2O

AutoML, rminer, TPOT and TransmogrifAI) and describe twelve

popular OpenML datasets that were used in the benchmark

(divided into regression, binary and multi-class classiﬁcation

tasks). Then, we perform a comparison study with hundreds

of computational experiments based on three scenarios: General

Machine Learning (GML), Deep Learning (DL) and XGBoost

(XGB). To select the best tool, we used a lexicographic approach,

considering ﬁrst the average prediction score for each task and

then the computational effort. The best predictive results were

achieved for GML, which were further compared with the best

OpenML public results. Overall, the best GML AutoML tools

obtained competitive results, outperforming the best OpenML

models in ﬁve datasets. These results conﬁrm the potential of the

general-purpose AutoML tools to fully automate the Machine

Learning (ML) algorithm selection and tuning.

Index Terms—Automated Deep Learning (AutoDL), Auto-

mated Machine Learning (AutoML), Benchmarking, Neural

Architecture Search (NAS), Software, Supervised Learning.

I. INTRODUCTION

A Machine Learning (ML) application includes typically

several steps: data preparation, feature engineering, algorithm

selection and hyperparameter tuning. Most of these steps

require trial and error approaches, especially for non-ML-

experts. More experienced practitioners often use heuristics

to exploit the vast dimensional space of parameters [1]. With

the increasing number of non-specialists working with ML [2],

in the last years there has been an attempt to automate several

components of the ML workﬂow, giving rise to the concept

of Automated Machine Learning (AutoML) [3].

This paper focuses on the selection of the best supervised

ML algorithm and its hyperparameter tuning. The comparison

study considers eight recent open-source AutoML technolo-

gies: Auto-Keras, Auto-PyTorch, Auto-Sklearn, AutoGluon,

H2O AutoML, rminer, TPOT, and TransmogrifAI. To assess

these tools, we use twelve popular datasets retrieved from

the OpenML platform, divided into regression, binary and

multi-class classiﬁcation tasks. In particular, we design three

main scenarios for the benchmark study: General ML (GML)

algorithm selection; Deep Learning (DL) selection and XG-

Boost (XGB) hyperparameter tuning. Each tool is measured in

terms of its predictive performance (using an external 10-fold

cross-validation) and computational cost (measured in terms

of time elapsed). Moreover, the best AutoML tools are further

compared with the best public OpenML predictive results

(which are assumed as the “gold standard”).

The paper is organized as follows. Section 2 presents the

related work. Next, Section 3 describes the AutoML tools

and datasets. Section 4 details the benchmark design. Then,

Section 5 presents the obtained results. Finally, Section 6

discusses the main conclusions.

II. RELATED WORK

The state-of-the-art works that compare AutoML tools can

be grouped into three major categories. The ﬁrst category

includes publications that introduce a novel AutoML tool and

then compared it with existing ones. The second category (sim-

ilar to our work) is related with comparison of distinct tools,

without proposing a new AutoML framework. Finally, the

third category (less approached) focuses on the characteristics

of the technologies rather than their predictive performances.

Table I summarizes the related works using the following

columns: Ref. – the study reference; Cat. – the AutoML study

category; Dat. – the number of analyzed datasets; Tools –

the number of compared AutoML tools; GML – if General

ML algorithms (not DL) were tested, such as Na

ıve Bayes

(NB), Support Vector Machine (SVM) or XGB; DL – if

DL was included in the comparison; Ext. – the external

validation method used (if any); C. – if computational effort

was measured; and Description – brief explanation of the

comparison approach. The majority of the related works (14

studies) are from the year 2020, which conﬁrms that AutoML

tool comparison is a hot research topic. Some studies explore

a large number of datasets [4], [5]. Our comparison adopts

12 datasets, which is below the two mentioned works but is

still higher than used in eleven other studies (e.g., [6], [7]).

More importantly, we consider eight AutoML technologies,

which is a number only outperformed by [8] (which tested

only one dataset) and [9] (which did not use any datasets).

In particular, we benchmark the following recent tools: Auto-

PyTorch - only studied in [10] and compared in [9]; rminer

– not considered by the related works; and Transmogrifai –

only compared in [11]. Most works target GML. There are

four studies that only address DL (e.g., [6], [12]). Similar

to our approach, there are seven studies that consider both

GML and DL. Of the 21 surveyed works, only 12 employ

an external validation. Most of these studies (8 of 12) use

a single holdout train test split, which is less robust than a

10-fold cross-validation (adopted in four works). In addition,

only 9 studies measure the computational effort. Furthermore,

few studies contrast the AutoML results with the best human

conﬁgured results. Kaggle competition results were included

in [6], [13], [14]. This work adopts open science (OpenML)

best results, which was only performed in [15].

TABLE I

SUMMARY OF THE RELATED WORK (AUTOML TOOL COMPARISON).

Year Ref. Cat. Dat. Tools GML DL Ext. C. Description

2019 [16] 1 8 5 X new AutoML tool

2019 [6] 1 2 4 X HO X new AutoML tool

2019 [17] 1 53 4 X X 10CV X new AutoML tool

2019 [18] 2 39 4 X X AutoML benchmark

2019 [19] 2 5 3 X X AutoML benchmark

2019 [3] 2 n.d. n.d. X X HO X AutoML competition

2019 [4] 2 300 6 X X HO AutoML benchmark

2020 [5] 1 175 2 X HO X new AutoML tool

2020 [7] 1 3 2 X new AutoML tool

2020 [13] 1 50 6 X 10CV X new AutoML tool

2020 [20] 1 39 2 X 10CV X new AutoML tool

2020 [21] 1 5 2 X HO X new AutoML tool

2020 [12] 1 3 2 X HO new AutoML tool

2020 [22] 1 130 3 X X new AutoML tool

2020 [10] 1 8 4 X X new AutoML tool

2020 [15] 2 12 4 X AutoML benchmark

2020 [11] 2 3 2 X HO X

AutoML benchmark

(risk management)

2020 [14] 2 137 5 X 10CV survey and benchmark

2020 [8] 3 1 12 X HO literature review

2020 [23] 3 0 7 X qualitative comparison

2020 [9] 3 0 18 X X qualitative comparison

this

work

- 2 12 8 X X 10CV X benchmark

n.d. - not disclosed.

10CV - 10-fold Cross-Validation (CV).

HO - Hold-Out (HO) validation.

III. MATERIALS AND METHODS

A. AutoML Tools

This study compares eight recent open-source AutoML

tools. Whenever possible, all tools were executed with their

default values, in order to prevent any bias towards a particular

tool, while also corresponding to a natural non-ML-expert

choice. When available in the tool documentation, we show

the number of hyperparameters (H) tuned by the AutoML.

1) Auto-Keras: a Python library based on the Keras module

and that is focused on an automatic DL Neural Architecture

Search (NAS) [24]. The search is performed by using a

Bayesian Optimization, with the tool automatically tuning the

number of dense layers, units, type of activation functions

used, dropout values and other DL hyperparameters. In this

work, we adopt Auto-Keras version 1.0.7, which is used in

the DL scenario (Section IV).

2) Auto-PyTorch: another AutoML tool speciﬁcally fo-

cused on NAS [10]. Auto-PyTorch version 0.0.2 uses the

PyTorch framework and a multi-ﬁdelity optimization to search

the parameters of the best architecture (e.g., network type,

number of layers, activation function). Similarly to Auto-

Keras, we use Auto-PyTorch only in the second DL scenario.

3) Auto-Sklearn: an AutoML library built on top of the

Scikit-Learn ML framework. The choice of algorithms and hy-

perparameters implemented by Auto-Sklearn takes advantage

of recent advances in Bayesian optimization, meta-learning,

and Ensemble Learning [25]. We use Auto-SkLearn version

0.7.0 in the ﬁrst GML scenario, since it does not implement

an automated DL or XGB. All ML algorithms (when available

for the task type) were tested: AdaBoost (H = 4), Bernoulli

(H = 2) and Multinomial NB (H = 2), Gaussian NB

(H = 0), Decision Tree (DT) (H = 4), Extremely Randomized

Trees (XRT) (H = 5), Gradient Boosting Machine (GBM)

(H = 6), k-Nearest Neighbors (k-NN) (H = 3), Linear

Discriminant Analysis (LDA) (H = 4), Linear SVM (LSVM)

(H = 4), Kernel based SVM (KSVM) (H = 7), Passive

Aggressive (H = 3), Quadratic Discriminant Analysis (QDA)

(H = 2), Random Forest (RF) (H = 5) and a Multiple Linear

Regression (MR) classiﬁer (H = 10).

4) AutoGluon: a Python AutoML toolkit focused on DL

[26]. In this work, we consider the tabular prediction feature

of AutoGluon version 0.0.13. The tabular prediction executes

several ML algorithms and then returns a Stacked Ensemble

that uses the distinct ML models in multiple layers. In the

GML scenario (Section IV), ensemble includes all non DL

algorithms: GBM, CatBoost Boosted Trees, RF, Extra Trees

(XT), k-NN and MR. For the DL scenario, the AutoGluon

uses a DL dense architecture that uses heuristics to set the

hidden layer sizes, employing also ReLU activation functions,

dropout regularization and batch normalization layers [26].

5) H2O AutoML: the H2O open-source module for Au-

toML [27]. The tool adopts a grid search to perform the ML

model selection. In this paper, we use H2O AutoML version

3.30.1.2 for the three comparison scenarios: GML, DL and

XGB. In GML, all ML algorithms were explored (except DL):

Generalized Linear Model (GLM) (H = 1), GBM (H = 8),

RF (H = 0), XRT (H = 0), XGB (H = 9) and two Stacked

Ensembles: Best – with only the best models per ML family;

and All – with all trained algorithms. For the DL scenario, the

H2O tool uses a fully connected multi-layer perceptron trained

with a stochastic gradient descent back-propagation algorithm.

The searched H = 7 hyperparameters include the number of

hidden layers and hidden units per layer, the learning rate, the

number of training epochs, activation functions and input and

hidden layer dropout values. Finally, for the XGB scenario,

the tool tunes the same H = 9 hyperparameters of GML.

6) rminer: package of the R tool that is intending to

facilitate the use of ML algorithms [28]. In its most recent

version (1.4.6), rminer implements AutoML functions. The

rminer AutoML executions can be completely customized

by the user, who can deﬁne the searched ML algorithms,

hyperparameter ranges and validation metrics of the assumed

grid search. For less experienced users, rminer includes

three predeﬁned AutoML search templates (https://CRAN.R-

project.org/package=rminer). Similarly to H2O, we test this

tool in the GML and XGB scenarios. In GML, we used the

"automl3" template, which searches the best model among:

GLM (H = 2), Gaussian kernel SVM (H = 2 for classiﬁcation

and H = 3 for regression), shallow multilayer perceptron (with

one hidden layer, H = 1), RF (H = 1), XGB (H = 1) and a

Stacked Ensemble (H = 2, similar to H2O Stacked Best).

7) TPOT: a tool written in Python and that automates

several ML phases (e.g., feature selection, algorithm selection)

by using a Genetic Programming [29]. The GML scenario

tested all TPOT version 0.11.5 algorithms: DT, RF, XGB,

(multinomial) Logistic Regression (LR) and k-NN. TPOT

was not included in the third comparison scenario (XGB,

Section IV) because the tool does not allow the selection of a

single algorithm, such as XGB.

8) TransmogrifAI: an AutoML tool for structured data and

that runs on top of Apache Spark [30]. TransmogrifAI version

0.7.0 uses a grid search to perform the search of the best ML

model. In the GML scenario, the tool was tested with all its

ML algorithms: NB, DT, Gradient Boosted Trees (GBT), RF,

MR, LR and LSVM.

9) Summary: Table II summarizes the AutoML tools that

were used. For each tool, we detail the base ML Framework,

available Application Programming Interface (API) program-

ming Language, compatible Operating Systems, and if it

supports DL (Auto-Keras and Auto-PyTorch only address DL).

B. Data

The analyzed datasets (Table III) were retrieved from

OpenML [31]. The data selection criterion was deﬁned as

selecting the most downloaded datasets that did not include

missing data and that reﬂected three supervised learning tasks:

regression, binary and multi-class classiﬁcation. The datasets

reﬂect different numbers of instances (Rows), input variables

(Cols.) and output target response values (Classes/levels, from

2 to 257; the last column details the Target domain values).

IV. BENCHMARK DESIGN

The comparison study assumes three main scenarios (Ta-

ble II). The ﬁrst GML scenario executes all ML algorithms

from the AutoML tools except DL, aiming to perform a more

horizontal ML family agnostic search. DL was discarded since

some of the tools do not implement DL (Table II), the training

of DL models often requires a higher computational effort and

the second scenario is exclusively devoted to DL. The second

DL scenario focuses on NAS, as implemented by the Auto-

Keras, Auto-PyTorch, AutoGluon and H2O AutoML tools.

Finally, the third scenario is more vertical, considering only

the XGB algorithm. XGB was selected since it is a recently

proposed non DL algorithm that includes a large number

TABLE II

DESCRIPTION OF THE COMPARED AUTOML TOOLS.

AutoML Framework API Operating DL Scenario

Tool Lang. Systems GML DL XGB

Auto-Keras Keras Python

MacOs

Linux

Windows

Yes

(only)

Auto-PyTorch PyTorch Python

MacOs

Linux

Windows

Yes

(only)

Auto-Sklearn Scikit-Learn Python Linux No X

AutoGluon PyTorch Python

MacOS (P.)

Linux

Yes X X

H2O AutoML H2O

Java

Python

MacOs

Linux

Windows (P.)

Yes X X X

rminer

AutoML

rminer R

MacOs

Linux

Windows

No X X

TPOT

Scikit-Learn

Python

MacOs

Linux

Windows

No X

TransmogrifAI

Spark

(MLlib)

Scala

MacOs

Linux

Windows

No X

P. - partially supported (with less capabilities).

TABLE III

DESCRIPTION OF THE SELECTED OPENML DATASETS.

Dataset

Task

Rows Cols.

Classes/

levels

Target

values

Cholesterol regression 303 14 152 [126, 564]

Churn

binary

5000 21 2 {0,1}

Cloud regression 108 7 94 [0, 6]

Cmc

multi-class

1473 10 10

{0,1,...,9}

Credit

binary

1000 21 2 {0,1}

Diabetes

binary

768 9 2 {0,1}

Dmft

multi-class

797 5 6 {0,1,...,5}

Liver

disorders

regression 345 6 16 [0, 20]

Mfeat

multi-class

2000 7 10

{0,1,...,9}

Plasma regression 315 14 257 [179, 1727]

Qsar

binary

1055 42 2 {0,1}

Vehicle

multi-class

846 19 4 {0,1,...,3}

of hyperparameters (e.g., H2O documentation mentions 40

hyperparameters of which only H = 9 are tunned). In this

scenario, we test H2O and rminer, since they are AutoML

tools that allow to run the single XGB algorithm.

For every predictive experiment, the datasets were equally

divided into tens folds, used for the external cross-validation.

In order to create validation sets (to select the best ML

algorithms and hyperparameters), we adopted an internal 5-

fold validation. For instance, if the data contains 100 instances,

then in the ﬁrst external 10-fold iteration 90 examples are used

by the tool for ﬁtting purposes (model selection and training),

with the remaining 10 instances being used for the external

testing. The 90 ﬁt examples are further divided into 5 folds.

In the ﬁrst internal fold, each ML is trained with 72 instances

and 18 are used for validation purposes (allowing to select

the best model). Since neither Auto-Keras nor Auto-PyTorch

natively support cross-validation during the ﬁtting phase, we

used a simpler holdout train (75%) and test (25%) set split to

select and ﬁt the models.

In all three scenarios the same measures are used to evaluate

the performance of the external 10-fold test set predictions.

Popular prediction measures were selected: regression - Mean

Absolute Error (MAE) (∈ [0.0,∞[, where 0.0 denotes a perfect

predictor); binary classiﬁcation - Area Under the receiver

operating characteristic Curve (AUC) (∈ [0.0,1.0], where 1.0

denotes the ideal classiﬁer); multi-class classiﬁcation - Macro

F1-score (∈ [0.0,1.0], where 1.0 denotes the perfect model).

Whenever allowed by the AutoML tool, we adopted the

same measures for the internal AutoML validation set model

comparison. The exceptions were with multi-class datasets

and the Auto-Keras and Auto-PyTorch tools, which did not

allow to use a Macro F1-score validation, thus the default loss

function was adopted for these tools.

All experiments were executed using an Intel Xeon

1.70GHz server with 56 cores and 2TB of disk space. For

each external fold, we also recorded the computational effort

(in terms of time elapsed) for the AutoML ﬁt (model selection

and training). When the AutoML tool allowed to specify a time

limit for training, the chosen time was one hour (3,600 s).

Also, for the tools that implement an early stopping AutoML

parameter, we ﬁxed the value to three rounds. To aggregate

the distinct external 10-fold results, we compute the average

values. We also provide the 10-fold average t-distribution 95%

conﬁdence intervals, which can be used to attest if the tool

differences are statistically signiﬁcant (e.g., by checking if two

conﬁdence intervals do not overlap). Nevertheless, given that

there is a very large number of comparisons, to select the best

tool for each task, we adopt a lexicographic approach [32],

which considers ﬁrst the best average predictive performance

(with a precision up to 1% or 0.01 points) and then the

average computational effort (precision in s). To facilitate the

lexicographic regression analysis, we compute the Normalized

MAE (NMAE) score, which is a scale independent measure,

where N MAE = M AE/(max(y) − min(y)) and y denotes

the output target.

V. RESULTS

Figures 1 and 2 summarize the main scenario (GML)

results. In total, there were 12 (dataset) × 6 (tools) × 10 (folds)

= 720 AutoML executions. Figure 1 presents the average

computational effort (in s) for each external 10-fold iteration.

Figure 2 shows the average external test scores (grouped

in terms of the binary, multi-class and regression tasks). To

facilitate the visualization of the regression scores, in the right

of Figure 2 we use the NMAE score in the y-axis.

For GML, Auto-Sklearn always requires the maximum

allowed computational effort (3,600 s), followed by TPOT

(average of 858 s per external fold and dataset). The other

tools are much faster: AutoGluon – lowest average value (70

s), best in 5 datasets; H2O – second average value (158

s), best in 5 datasets; TransmogrifAI – third best average

(317 s); rminer – fourth best average (408 s), best in 2

datasets. Regarding the prediction performances, there is a

high overall correlation between the validation and test scores

(not shown in Figure 2, although the same effect is present in

Tables IV and V), when considering all tool execution values:

0.75 – binary; 0.90 – multi-class; and 0.92 – regression. For

binary classiﬁcation, and when considering the test set results,

the AutoML differences are smaller for churn (maximum

difference of 3 percentage points - pp) and higher for the

other datasets (10 pp for diabetes, 15 pp for credit and 16 pp

for qsar). TransmogrifAI is the best tool in 3 of the datasets

(churn, credit and qsar), also obtaining the best average AUC

per dataset (88%). An almost identical average (87%) is

achieved by H2O (best in churn and credit), rminer (best in

diabetes) and TPOT (best in churn). AutoGluon and Auto-

Sklearn produced the worst overall results (average AUCs per

dataset of 78% and 80%). Turning to multi-class tasks, the

AutoML differences (best tool test result minus the worst one)

are smaller when compared with the binary task: 4 pp – Cmc;

5 pp – Dmft; 6 pp – Mfeat; and 8 pp – Vehicle. The best test

dataset average is obtained by AutoGluon (Macro F1-Score

58%), followed by Auto-Sklearn, H2O and TPOT (Macro F1-

Score of 57%), then TransmogrifAI (56%) and ﬁnally rminer

(53%). In terms of datasets, the best results were: Cmc –

Auto-Sklearn (54%); Dmft – TransmogrifAI (24%); Mfeat

– AutoGluon, Auto-Sklearn and TPOT (74%); and vehicle

- AutoGluon and Auto-Sklearn (82%). As for the regression

tasks, the AutoML tool differences for each dataset are very

small, corresponding to 1 pp in terms of NMAE for all three

datasets. In effect, all tools obtain the same average NMAE per

dataset (9%). Using the lexicographic selection (Section IV),

the GML tool recommendation is: binary - TransmogrifAI;

multi-class - AutoGluon; regression – rminer.

The DL benchmark consisted of 12 (dataset) × 4 (tools)

× 10 (folds) = 480 AutoML executions. Table IV shows

the average DL 10-fold results (± the 95% conﬁdence in-

tervals) in terms of the external computational effort (Time),

internal validation (Val.) and test scores (Test). The Auto-

Keras and Auto-PyTorch validation scores are omitted, since

they are not disclosed by the tools. Regarding execution time,

AutoGluon is much faster than the other tools, requiring an

average ﬁt time of just 24 s. The second fastest DL tool

is Auto-Keras (average of 984 s), followed by H2O (3,458

s) and then Auto-PyTorch (3,600 s). As for the prediction

performances, the average test values per dataset are: binary

(AUC) - H2O (85%), Auto-PyTorch (77%); AutoGluon (72%)

and Auto-Keras (69%); multi-class (Macro F1-score) - Auto-

Gluon (57%), Auto-PyTorch (56%); H2O (50%) and Auto-

Keras (43%); regression (NMAE) - H2O (10%); Auto-PyTorch

and AutoGluon (11%); Auto-Keras (13%). While only four

tools are compared, larger differences among the tools were

obtained for the DL scenario when compared with GML:

binary - ranging from 11 pp (Qsar) to 24 pp (credit); multi-

class - from 4 pp to 30 pp; and regression – from 2 pp to 10 pp.

Fig. 1. Execution time (y-axis) for the GML scenario (bars denote external 10-fold average values with 95% conﬁdence intervals; the Auto-Sklearn values

were omitted from the graph because they are always constant and equal to 3,600 s).

Fig. 2. Predictive results for the GML scenario (bars denote external 10-fold average values with 95% conﬁdence intervals).

A Comparison of AutoML Tools for Machine Learning, Deep Learning and XGBoost

Figures

Citations

Prediction of Maintenance Equipment Failures Using Automated Machine Learning

A comparison of machine learning methods for extremely unbalanced industrial quality data

A comparison of machine learning approaches for predicting in-car display production quality

Machine Learning Models for Slope Stability Classification of Circular Mode Failure: An Updated Database and Automated Machine Learning (AutoML) Approach

An Empirical Study on the Usage of Automated Machine Learning Tools

References

Efficient and robust automated machine learning

AMC: AutoML for Model Compression and Acceleration on Mobile Devices

Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms

OpenML: networked science in machine learning

Auto-Keras: An Efficient Neural Architecture Search System

Related Papers (5)

An extensive experimental evaluation of automated machine learning methods for recommending classification algorithms

Evolving learners’ behavior in data mining

A Case Study on Machine Learning Applications and Performance Improvement in Learning Algorithm

Automated Machine Learning Tool: The First Stop for Data Science and Statistical Model Building

Dataset threshold for the performance estimators in supervised machine learning experiments

Frequently Asked Questions (14)

Q1. What are the contributions mentioned in the paper "A comparison of automl tools for machine learning, deep learning and xgboost" ?

Q2. What are the future works mentioned in the paper "A comparison of automl tools for machine learning, deep learning and xgboost" ?

Q3. What is the option for the XGB scenario?

Q4. What was the criteria for the selection of the ML dataset?

Q5. How many instances are used for the external testing?

Q6. What is the way to validate the model?

Q7. What is the lexicographic choice for DL and XGB?

Q8. What is the scenario for the validation of the external test set?

Q9. How many times did Auto-Sklearn require the maximum computational effort?

Q10. What is the tool in 3 of the datasets?

Q11. What is the common use of Auto-PyTorch in the second DL?

Q12. Why was TPOT not included in the third comparison scenario?

Q13. What is the ML algorithm for the task type?

Q14. What is the tool for the regression tasks?

Trending Questions (1)