scispace - formally typeset
Open AccessProceedings ArticleDOI

A Comparison of AutoML Tools for Machine Learning, Deep Learning and XGBoost

Reads0
Chats0
TLDR
In this paper, the authors presented a benchmark of supervised Automated Machine Learning (AutoML) tools and analyzed the characteristics of eight recent open-source AutoML tools (Auto-Keras, AutoPyTorch, Auto-Sklearn, AutoGluon, H2O AutoML, rminer, TPOT and TransmogrifAI).
Abstract
This paper presents a benchmark of supervised Automated Machine Learning (AutoML) tools. Firstly, we analyze the characteristics of eight recent open-source AutoML tools (Auto-Keras, Auto-PyTorch, Auto-Sklearn, AutoGluon, H2O AutoML, rminer, TPOT and TransmogrifAI) and describe twelve popular OpenML datasets that were used in the benchmark (divided into regression, binary and multi-class classification tasks). Then, we perform a comparison study with hundreds of computational experiments based on three scenarios: General Machine Learning (GML), Deep Learning (DL) and XGBoost (XGB). To select the best tool, we used a lexicographic approach, considering first the average prediction score for each task and then the computational effort. The best predictive results were achieved for GML, which were further compared with the best OpenML public results. Overall, the best GML AutoML tools obtained competitive results, outperforming the best OpenML models in five datasets. These results confirm the potential of the general-purpose AutoML tools to fully automate the Machine Learning (ML) algorithm selection and tuning.

read more

Content maybe subject to copyright    Report

A Comparison of AutoML Tools for Machine
Learning, Deep Learning and XGBoost
Lu
´
ıs Ferreira
EPMQ - IT
CCG ZGDV Institute
ALGORITMI Center
University of Minho
Guimar
˜
aes, Portugal
luis.ferreira@ccg.pt
Andr
´
e Pilastri
EPMQ - IT
CCG ZGDV Institute
Guimar
˜
aes, Portugal
andre.pilastri@ccg.pt
Carlos Manuel Martins
WeDo Technologies
Braga, Portugal
carlos.mmartins
@mobileum.com
Pedro Miguel Pires
WeDo Technologies
Braga, Portugal
pedro.mpires
@mobileum.com
Paulo Cortez
ALGORITMI Center
Dep. Information Systems
University of Minho
Guimar
˜
aes, Portugal
pcortez@dsi.uminho.pt
Abstract—This paper presents a benchmark of supervised
Automated Machine Learning (AutoML) tools. Firstly, we an-
alyze the characteristics of eight recent open-source AutoML
tools (Auto-Keras, Auto-PyTorch, Auto-Sklearn, AutoGluon, H2O
AutoML, rminer, TPOT and TransmogrifAI) and describe twelve
popular OpenML datasets that were used in the benchmark
(divided into regression, binary and multi-class classification
tasks). Then, we perform a comparison study with hundreds
of computational experiments based on three scenarios: General
Machine Learning (GML), Deep Learning (DL) and XGBoost
(XGB). To select the best tool, we used a lexicographic approach,
considering first the average prediction score for each task and
then the computational effort. The best predictive results were
achieved for GML, which were further compared with the best
OpenML public results. Overall, the best GML AutoML tools
obtained competitive results, outperforming the best OpenML
models in five datasets. These results confirm the potential of the
general-purpose AutoML tools to fully automate the Machine
Learning (ML) algorithm selection and tuning.
Index Terms—Automated Deep Learning (AutoDL), Auto-
mated Machine Learning (AutoML), Benchmarking, Neural
Architecture Search (NAS), Software, Supervised Learning.
I. INTRODUCTION
A Machine Learning (ML) application includes typically
several steps: data preparation, feature engineering, algorithm
selection and hyperparameter tuning. Most of these steps
require trial and error approaches, especially for non-ML-
experts. More experienced practitioners often use heuristics
to exploit the vast dimensional space of parameters [1]. With
the increasing number of non-specialists working with ML [2],
in the last years there has been an attempt to automate several
components of the ML workflow, giving rise to the concept
of Automated Machine Learning (AutoML) [3].
This paper focuses on the selection of the best supervised
ML algorithm and its hyperparameter tuning. The comparison
study considers eight recent open-source AutoML technolo-
gies: Auto-Keras, Auto-PyTorch, Auto-Sklearn, AutoGluon,
H2O AutoML, rminer, TPOT, and TransmogrifAI. To assess
these tools, we use twelve popular datasets retrieved from
the OpenML platform, divided into regression, binary and
multi-class classification tasks. In particular, we design three
main scenarios for the benchmark study: General ML (GML)
algorithm selection; Deep Learning (DL) selection and XG-
Boost (XGB) hyperparameter tuning. Each tool is measured in
terms of its predictive performance (using an external 10-fold
cross-validation) and computational cost (measured in terms
of time elapsed). Moreover, the best AutoML tools are further
compared with the best public OpenML predictive results
(which are assumed as the “gold standard”).
The paper is organized as follows. Section 2 presents the
related work. Next, Section 3 describes the AutoML tools
and datasets. Section 4 details the benchmark design. Then,
Section 5 presents the obtained results. Finally, Section 6
discusses the main conclusions.
II. RELATED WORK
The state-of-the-art works that compare AutoML tools can
be grouped into three major categories. The first category
includes publications that introduce a novel AutoML tool and
then compared it with existing ones. The second category (sim-
ilar to our work) is related with comparison of distinct tools,
without proposing a new AutoML framework. Finally, the
third category (less approached) focuses on the characteristics
of the technologies rather than their predictive performances.
Table I summarizes the related works using the following
columns: Ref. the study reference; Cat. the AutoML study
category; Dat. the number of analyzed datasets; Tools
the number of compared AutoML tools; GML if General
ML algorithms (not DL) were tested, such as Na
¨
ıve Bayes
(NB), Support Vector Machine (SVM) or XGB; DL if
DL was included in the comparison; Ext. the external
validation method used (if any); C. if computational effort
was measured; and Description brief explanation of the
comparison approach. The majority of the related works (14
studies) are from the year 2020, which confirms that AutoML
tool comparison is a hot research topic. Some studies explore
a large number of datasets [4], [5]. Our comparison adopts
12 datasets, which is below the two mentioned works but is
still higher than used in eleven other studies (e.g., [6], [7]).
More importantly, we consider eight AutoML technologies,
which is a number only outperformed by [8] (which tested
only one dataset) and [9] (which did not use any datasets).

In particular, we benchmark the following recent tools: Auto-
PyTorch - only studied in [10] and compared in [9]; rminer
not considered by the related works; and Transmogrifai
only compared in [11]. Most works target GML. There are
four studies that only address DL (e.g., [6], [12]). Similar
to our approach, there are seven studies that consider both
GML and DL. Of the 21 surveyed works, only 12 employ
an external validation. Most of these studies (8 of 12) use
a single holdout train test split, which is less robust than a
10-fold cross-validation (adopted in four works). In addition,
only 9 studies measure the computational effort. Furthermore,
few studies contrast the AutoML results with the best human
configured results. Kaggle competition results were included
in [6], [13], [14]. This work adopts open science (OpenML)
best results, which was only performed in [15].
TABLE I
SUMMARY OF THE RELATED WORK (AUTOML TOOL COMPARISON).
Year Ref. Cat. Dat. Tools GML DL Ext. C. Description
2019 [16] 1 8 5 X new AutoML tool
2019 [6] 1 2 4 X HO X new AutoML tool
2019 [17] 1 53 4 X X 10CV X new AutoML tool
2019 [18] 2 39 4 X X AutoML benchmark
2019 [19] 2 5 3 X X AutoML benchmark
2019 [3] 2 n.d. n.d. X X HO X AutoML competition
2019 [4] 2 300 6 X X HO AutoML benchmark
2020 [5] 1 175 2 X HO X new AutoML tool
2020 [7] 1 3 2 X new AutoML tool
2020 [13] 1 50 6 X 10CV X new AutoML tool
2020 [20] 1 39 2 X 10CV X new AutoML tool
2020 [21] 1 5 2 X HO X new AutoML tool
2020 [12] 1 3 2 X HO new AutoML tool
2020 [22] 1 130 3 X X new AutoML tool
2020 [10] 1 8 4 X X new AutoML tool
2020 [15] 2 12 4 X AutoML benchmark
2020 [11] 2 3 2 X HO X
AutoML benchmark
(risk management)
2020 [14] 2 137 5 X 10CV survey and benchmark
2020 [8] 3 1 12 X HO literature review
2020 [23] 3 0 7 X qualitative comparison
2020 [9] 3 0 18 X X qualitative comparison
this
work
- 2 12 8 X X 10CV X benchmark
n.d. - not disclosed.
10CV - 10-fold Cross-Validation (CV).
HO - Hold-Out (HO) validation.
III. MATERIALS AND METHODS
A. AutoML Tools
This study compares eight recent open-source AutoML
tools. Whenever possible, all tools were executed with their
default values, in order to prevent any bias towards a particular
tool, while also corresponding to a natural non-ML-expert
choice. When available in the tool documentation, we show
the number of hyperparameters (H) tuned by the AutoML.
1) Auto-Keras: a Python library based on the Keras module
and that is focused on an automatic DL Neural Architecture
Search (NAS) [24]. The search is performed by using a
Bayesian Optimization, with the tool automatically tuning the
number of dense layers, units, type of activation functions
used, dropout values and other DL hyperparameters. In this
work, we adopt Auto-Keras version 1.0.7, which is used in
the DL scenario (Section IV).
2) Auto-PyTorch: another AutoML tool specifically fo-
cused on NAS [10]. Auto-PyTorch version 0.0.2 uses the
PyTorch framework and a multi-fidelity optimization to search
the parameters of the best architecture (e.g., network type,
number of layers, activation function). Similarly to Auto-
Keras, we use Auto-PyTorch only in the second DL scenario.
3) Auto-Sklearn: an AutoML library built on top of the
Scikit-Learn ML framework. The choice of algorithms and hy-
perparameters implemented by Auto-Sklearn takes advantage
of recent advances in Bayesian optimization, meta-learning,
and Ensemble Learning [25]. We use Auto-SkLearn version
0.7.0 in the first GML scenario, since it does not implement
an automated DL or XGB. All ML algorithms (when available
for the task type) were tested: AdaBoost (H = 4), Bernoulli
(H = 2) and Multinomial NB (H = 2), Gaussian NB
(H = 0), Decision Tree (DT) (H = 4), Extremely Randomized
Trees (XRT) (H = 5), Gradient Boosting Machine (GBM)
(H = 6), k-Nearest Neighbors (k-NN) (H = 3), Linear
Discriminant Analysis (LDA) (H = 4), Linear SVM (LSVM)
(H = 4), Kernel based SVM (KSVM) (H = 7), Passive
Aggressive (H = 3), Quadratic Discriminant Analysis (QDA)
(H = 2), Random Forest (RF) (H = 5) and a Multiple Linear
Regression (MR) classifier (H = 10).
4) AutoGluon: a Python AutoML toolkit focused on DL
[26]. In this work, we consider the tabular prediction feature
of AutoGluon version 0.0.13. The tabular prediction executes
several ML algorithms and then returns a Stacked Ensemble
that uses the distinct ML models in multiple layers. In the
GML scenario (Section IV), ensemble includes all non DL
algorithms: GBM, CatBoost Boosted Trees, RF, Extra Trees
(XT), k-NN and MR. For the DL scenario, the AutoGluon
uses a DL dense architecture that uses heuristics to set the
hidden layer sizes, employing also ReLU activation functions,
dropout regularization and batch normalization layers [26].
5) H2O AutoML: the H2O open-source module for Au-
toML [27]. The tool adopts a grid search to perform the ML
model selection. In this paper, we use H2O AutoML version
3.30.1.2 for the three comparison scenarios: GML, DL and
XGB. In GML, all ML algorithms were explored (except DL):
Generalized Linear Model (GLM) (H = 1), GBM (H = 8),
RF (H = 0), XRT (H = 0), XGB (H = 9) and two Stacked
Ensembles: Best with only the best models per ML family;
and All with all trained algorithms. For the DL scenario, the
H2O tool uses a fully connected multi-layer perceptron trained
with a stochastic gradient descent back-propagation algorithm.
The searched H = 7 hyperparameters include the number of
hidden layers and hidden units per layer, the learning rate, the
number of training epochs, activation functions and input and
hidden layer dropout values. Finally, for the XGB scenario,
the tool tunes the same H = 9 hyperparameters of GML.
6) rminer: package of the R tool that is intending to
facilitate the use of ML algorithms [28]. In its most recent
version (1.4.6), rminer implements AutoML functions. The

rminer AutoML executions can be completely customized
by the user, who can define the searched ML algorithms,
hyperparameter ranges and validation metrics of the assumed
grid search. For less experienced users, rminer includes
three predefined AutoML search templates (https://CRAN.R-
project.org/package=rminer). Similarly to H2O, we test this
tool in the GML and XGB scenarios. In GML, we used the
"automl3" template, which searches the best model among:
GLM (H = 2), Gaussian kernel SVM (H = 2 for classification
and H = 3 for regression), shallow multilayer perceptron (with
one hidden layer, H = 1), RF (H = 1), XGB (H = 1) and a
Stacked Ensemble (H = 2, similar to H2O Stacked Best).
7) TPOT: a tool written in Python and that automates
several ML phases (e.g., feature selection, algorithm selection)
by using a Genetic Programming [29]. The GML scenario
tested all TPOT version 0.11.5 algorithms: DT, RF, XGB,
(multinomial) Logistic Regression (LR) and k-NN. TPOT
was not included in the third comparison scenario (XGB,
Section IV) because the tool does not allow the selection of a
single algorithm, such as XGB.
8) TransmogrifAI: an AutoML tool for structured data and
that runs on top of Apache Spark [30]. TransmogrifAI version
0.7.0 uses a grid search to perform the search of the best ML
model. In the GML scenario, the tool was tested with all its
ML algorithms: NB, DT, Gradient Boosted Trees (GBT), RF,
MR, LR and LSVM.
9) Summary: Table II summarizes the AutoML tools that
were used. For each tool, we detail the base ML Framework,
available Application Programming Interface (API) program-
ming Language, compatible Operating Systems, and if it
supports DL (Auto-Keras and Auto-PyTorch only address DL).
B. Data
The analyzed datasets (Table III) were retrieved from
OpenML [31]. The data selection criterion was defined as
selecting the most downloaded datasets that did not include
missing data and that reflected three supervised learning tasks:
regression, binary and multi-class classification. The datasets
reflect different numbers of instances (Rows), input variables
(Cols.) and output target response values (Classes/levels, from
2 to 257; the last column details the Target domain values).
IV. BENCHMARK DESIGN
The comparison study assumes three main scenarios (Ta-
ble II). The first GML scenario executes all ML algorithms
from the AutoML tools except DL, aiming to perform a more
horizontal ML family agnostic search. DL was discarded since
some of the tools do not implement DL (Table II), the training
of DL models often requires a higher computational effort and
the second scenario is exclusively devoted to DL. The second
DL scenario focuses on NAS, as implemented by the Auto-
Keras, Auto-PyTorch, AutoGluon and H2O AutoML tools.
Finally, the third scenario is more vertical, considering only
the XGB algorithm. XGB was selected since it is a recently
proposed non DL algorithm that includes a large number
TABLE II
DESCRIPTION OF THE COMPARED AUTOML TOOLS.
AutoML Framework API Operating DL Scenario
Tool Lang. Systems GML DL XGB
Auto-Keras Keras Python
MacOs
Linux
Windows
Yes
(only)
X
Auto-PyTorch PyTorch Python
MacOs
Linux
Windows
Yes
(only)
X
Auto-Sklearn Scikit-Learn Python Linux No X
AutoGluon PyTorch Python
MacOS (P.)
Linux
Yes X X
H2O AutoML H2O
Java
Python
R
MacOs
Linux
Windows (P.)
Yes X X X
rminer
AutoML
rminer R
MacOs
Linux
Windows
No X X
TPOT
Scikit-Learn
Python
MacOs
Linux
Windows
No X
TransmogrifAI
Spark
(MLlib)
Scala
MacOs
Linux
Windows
No X
P. - partially supported (with less capabilities).
TABLE III
DESCRIPTION OF THE SELECTED OPENML DATASETS.
Dataset
Task
Rows Cols.
Classes/
levels
Target
values
Cholesterol regression 303 14 152 [126, 564]
Churn
binary
5000 21 2 {0,1}
Cloud regression 108 7 94 [0, 6]
Cmc
multi-class
1473 10 10
{0,1,...,9}
Credit
binary
1000 21 2 {0,1}
Diabetes
binary
768 9 2 {0,1}
Dmft
multi-class
797 5 6 {0,1,...,5}
Liver
disorders
regression 345 6 16 [0, 20]
Mfeat
multi-class
2000 7 10
{0,1,...,9}
Plasma regression 315 14 257 [179, 1727]
Qsar
binary
1055 42 2 {0,1}
Vehicle
multi-class
846 19 4 {0,1,...,3}
of hyperparameters (e.g., H2O documentation mentions 40
hyperparameters of which only H = 9 are tunned). In this
scenario, we test H2O and rminer, since they are AutoML
tools that allow to run the single XGB algorithm.
For every predictive experiment, the datasets were equally
divided into tens folds, used for the external cross-validation.
In order to create validation sets (to select the best ML
algorithms and hyperparameters), we adopted an internal 5-
fold validation. For instance, if the data contains 100 instances,
then in the first external 10-fold iteration 90 examples are used
by the tool for fitting purposes (model selection and training),
with the remaining 10 instances being used for the external
testing. The 90 fit examples are further divided into 5 folds.

In the first internal fold, each ML is trained with 72 instances
and 18 are used for validation purposes (allowing to select
the best model). Since neither Auto-Keras nor Auto-PyTorch
natively support cross-validation during the fitting phase, we
used a simpler holdout train (75%) and test (25%) set split to
select and fit the models.
In all three scenarios the same measures are used to evaluate
the performance of the external 10-fold test set predictions.
Popular prediction measures were selected: regression - Mean
Absolute Error (MAE) ( [0.0,[, where 0.0 denotes a perfect
predictor); binary classification - Area Under the receiver
operating characteristic Curve (AUC) ( [0.0,1.0], where 1.0
denotes the ideal classifier); multi-class classification - Macro
F1-score ( [0.0,1.0], where 1.0 denotes the perfect model).
Whenever allowed by the AutoML tool, we adopted the
same measures for the internal AutoML validation set model
comparison. The exceptions were with multi-class datasets
and the Auto-Keras and Auto-PyTorch tools, which did not
allow to use a Macro F1-score validation, thus the default loss
function was adopted for these tools.
All experiments were executed using an Intel Xeon
1.70GHz server with 56 cores and 2TB of disk space. For
each external fold, we also recorded the computational effort
(in terms of time elapsed) for the AutoML fit (model selection
and training). When the AutoML tool allowed to specify a time
limit for training, the chosen time was one hour (3,600 s).
Also, for the tools that implement an early stopping AutoML
parameter, we fixed the value to three rounds. To aggregate
the distinct external 10-fold results, we compute the average
values. We also provide the 10-fold average t-distribution 95%
confidence intervals, which can be used to attest if the tool
differences are statistically significant (e.g., by checking if two
confidence intervals do not overlap). Nevertheless, given that
there is a very large number of comparisons, to select the best
tool for each task, we adopt a lexicographic approach [32],
which considers first the best average predictive performance
(with a precision up to 1% or 0.01 points) and then the
average computational effort (precision in s). To facilitate the
lexicographic regression analysis, we compute the Normalized
MAE (NMAE) score, which is a scale independent measure,
where N MAE = M AE/(max(y) min(y)) and y denotes
the output target.
V. RESULTS
Figures 1 and 2 summarize the main scenario (GML)
results. In total, there were 12 (dataset) × 6 (tools) × 10 (folds)
= 720 AutoML executions. Figure 1 presents the average
computational effort (in s) for each external 10-fold iteration.
Figure 2 shows the average external test scores (grouped
in terms of the binary, multi-class and regression tasks). To
facilitate the visualization of the regression scores, in the right
of Figure 2 we use the NMAE score in the y-axis.
For GML, Auto-Sklearn always requires the maximum
allowed computational effort (3,600 s), followed by TPOT
(average of 858 s per external fold and dataset). The other
tools are much faster: AutoGluon lowest average value (70
s), best in 5 datasets; H2O second average value (158
s), best in 5 datasets; TransmogrifAI third best average
(317 s); rminer fourth best average (408 s), best in 2
datasets. Regarding the prediction performances, there is a
high overall correlation between the validation and test scores
(not shown in Figure 2, although the same effect is present in
Tables IV and V), when considering all tool execution values:
0.75 binary; 0.90 multi-class; and 0.92 regression. For
binary classification, and when considering the test set results,
the AutoML differences are smaller for churn (maximum
difference of 3 percentage points - pp) and higher for the
other datasets (10 pp for diabetes, 15 pp for credit and 16 pp
for qsar). TransmogrifAI is the best tool in 3 of the datasets
(churn, credit and qsar), also obtaining the best average AUC
per dataset (88%). An almost identical average (87%) is
achieved by H2O (best in churn and credit), rminer (best in
diabetes) and TPOT (best in churn). AutoGluon and Auto-
Sklearn produced the worst overall results (average AUCs per
dataset of 78% and 80%). Turning to multi-class tasks, the
AutoML differences (best tool test result minus the worst one)
are smaller when compared with the binary task: 4 pp Cmc;
5 pp Dmft; 6 pp Mfeat; and 8 pp Vehicle. The best test
dataset average is obtained by AutoGluon (Macro F1-Score
58%), followed by Auto-Sklearn, H2O and TPOT (Macro F1-
Score of 57%), then TransmogrifAI (56%) and finally rminer
(53%). In terms of datasets, the best results were: Cmc
Auto-Sklearn (54%); Dmft TransmogrifAI (24%); Mfeat
AutoGluon, Auto-Sklearn and TPOT (74%); and vehicle
- AutoGluon and Auto-Sklearn (82%). As for the regression
tasks, the AutoML tool differences for each dataset are very
small, corresponding to 1 pp in terms of NMAE for all three
datasets. In effect, all tools obtain the same average NMAE per
dataset (9%). Using the lexicographic selection (Section IV),
the GML tool recommendation is: binary - TransmogrifAI;
multi-class - AutoGluon; regression rminer.
The DL benchmark consisted of 12 (dataset) × 4 (tools)
× 10 (folds) = 480 AutoML executions. Table IV shows
the average DL 10-fold results (± the 95% confidence in-
tervals) in terms of the external computational effort (Time),
internal validation (Val.) and test scores (Test). The Auto-
Keras and Auto-PyTorch validation scores are omitted, since
they are not disclosed by the tools. Regarding execution time,
AutoGluon is much faster than the other tools, requiring an
average fit time of just 24 s. The second fastest DL tool
is Auto-Keras (average of 984 s), followed by H2O (3,458
s) and then Auto-PyTorch (3,600 s). As for the prediction
performances, the average test values per dataset are: binary
(AUC) - H2O (85%), Auto-PyTorch (77%); AutoGluon (72%)
and Auto-Keras (69%); multi-class (Macro F1-score) - Auto-
Gluon (57%), Auto-PyTorch (56%); H2O (50%) and Auto-
Keras (43%); regression (NMAE) - H2O (10%); Auto-PyTorch
and AutoGluon (11%); Auto-Keras (13%). While only four
tools are compared, larger differences among the tools were
obtained for the DL scenario when compared with GML:
binary - ranging from 11 pp (Qsar) to 24 pp (credit); multi-
class - from 4 pp to 30 pp; and regression from 2 pp to 10 pp.

Fig. 1. Execution time (y-axis) for the GML scenario (bars denote external 10-fold average values with 95% confidence intervals; the Auto-Sklearn values
were omitted from the graph because they are always constant and equal to 3,600 s).
Fig. 2. Predictive results for the GML scenario (bars denote external 10-fold average values with 95% confidence intervals).

Citations
More filters
Proceedings ArticleDOI

Prediction of Maintenance Equipment Failures Using Automated Machine Learning

TL;DR: In this article, the authors explore an Automated Machine Learning (AutoML) approach to address a predictive maintenance task related to a Portuguese software company, and the results achieved by the AutoML approach outperformed the manual method, thus demonstrating the quality of the automated modeling for the predictive maintenance domain.
Book ChapterDOI

A comparison of machine learning methods for extremely unbalanced industrial quality data

TL;DR: In this paper, the authors analyze recent big data collected from a major automotive assembly manufacturer and related with the quality of eight products, and perform a two-stage ML comparison study.

A comparison of machine learning approaches for predicting in-car display production quality

TL;DR: In this article, the Operational Competitiveness and Internationalization Programme (COMPETE 2020) is supported by European Structural and Investment Funds in the FEDERcomponent, through the operational competitiveness and internationalization program.
Journal ArticleDOI

Machine Learning Models for Slope Stability Classification of Circular Mode Failure: An Updated Database and Automated Machine Learning (AutoML) Approach

TL;DR: In this article , an automated machine learning (AutoML) approach was proposed for model development and slope stability assessments of circular mode failure, and a stacked ensemble of the best 1000 models was automatically selected as the top model from 8208 trained models using the H2O-AutoML platform, which requires little expert knowledge or manual tuning.
Proceedings ArticleDOI

An Empirical Study on the Usage of Automated Machine Learning Tools

TL;DR: This empirical study examined the top 10 most used AutoML tools and their respective usages in a large number of open-source project repositories hosted on GitHub to understand how ML practitioners use Auto ML tools in their projects.
References
More filters
Proceedings Article

Efficient and robust automated machine learning

TL;DR: This work introduces a robust new AutoML system based on scikit-learn, which improves on existing AutoML methods by automatically taking into account past performance on similar datasets, and by constructing ensembles from the models evaluated during the optimization.
Book ChapterDOI

AMC: AutoML for Model Compression and Acceleration on Mobile Devices

TL;DR: This paper proposes AutoML for Model Compression (AMC) which leverages reinforcement learning to efficiently sample the design space and can improve the model compression quality and achieves state-of-the-art model compression results in a fully automated way without any human efforts.
Proceedings ArticleDOI

Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms

TL;DR: In this article, the problem of simultaneously selecting a learning algorithm and setting its hyperparameters is addressed by a fully automated approach, leveraging recent innovations in Bayesian optimization, which can help non-expert users to more effectively identify machine learning algorithms and hyperparameter settings appropriate to their applications.
Journal ArticleDOI

OpenML: networked science in machine learning

TL;DR: OpenML as discussed by the authors is a place for machine learning researchers to share and organize data in fine detail, so that they can work more effectively, be more visible, and collaborate with others to tackle harder problems.
Proceedings ArticleDOI

Auto-Keras: An Efficient Neural Architecture Search System

TL;DR: In this article, the authors propose a novel framework enabling Bayesian optimization to guide the network morphism for efficient neural architecture search, which keeps the functionality of a neural network while changing its neural architecture, enabling more efficient training during the search.
Related Papers (5)
Frequently Asked Questions (14)
Q1. What are the contributions mentioned in the paper "A comparison of automl tools for machine learning, deep learning and xgboost" ?

This paper presents a benchmark of supervised Automated Machine Learning ( AutoML ) tools. Firstly, the authors analyze the characteristics of eight recent open-source AutoML tools ( Auto-Keras, Auto-PyTorch, Auto-Sklearn, AutoGluon, H2O AutoML, rminer, TPOT and TransmogrifAI ) and describe twelve popular OpenML datasets that were used in the benchmark ( divided into regression, binary and multi-class classification tasks ). Then, the authors perform a comparison study with hundreds of computational experiments based on three scenarios: General Machine Learning ( GML ), Deep Learning ( DL ) and XGBoost ( XGB ). To select the best tool, the authors used a lexicographic approach, considering first the average prediction score for each task and then the computational effort. The best predictive results were achieved for GML, which were further compared with the best OpenML public results. These results confirm the potential of the general-purpose AutoML tools to fully automate the Machine Learning ( ML ) algorithm selection and tuning. 

In future work, the authors intend to enlarge the comparison by considering more open-source AutoML technologies and datasets. In particular, the authors wish to analyze big data, where DL can potentially produce better predictions. The authors also plan to benchmark ML frameworks for specific infrastructure settings, such as involving edge computing. 

As for the XGB scenario, rminer is the best overall option for binary and regression tasks, while H2O is recommended for multi-class tasks. 

The data selection criterion was defined as selecting the most downloaded datasets that did not include missing data and that reflected three supervised learning tasks: regression, binary and multi-class classification. 

For instance, if the data contains 100 instances, then in the first external 10-fold iteration 90 examples are used by the tool for fitting purposes (model selection and training), with the remaining 10 instances being used for the external testing. 

In order to create validation sets (to select the best ML algorithms and hyperparameters), the authors adopted an internal 5- fold validation. 

When considering both DL and XGB scenarios, the lexicographic choice favors rminer XGB for the binary classification and regression tasks, while AutoGluon DL is the selected tool for multi-class. 

Since neither Auto-Keras nor Auto-PyTorch natively support cross-validation during the fitting phase, the authors used a simpler holdout train (75%) and test (25%) set split to select and fit the models. 

For GML, Auto-Sklearn always requires the maximum allowed computational effort (3,600 s), followed by TPOT (average of 858 s per external fold and dataset). 

TransmogrifAI is the best tool in 3 of the datasets (churn, credit and qsar), also obtaining the best average AUC per dataset (88%). 

Similarly to AutoKeras, the authors use Auto-PyTorch only in the second DL scenario.3) Auto-Sklearn: an AutoML library built on top of the Scikit-Learn ML framework. 

TPOT was not included in the third comparison scenario (XGB, Section IV) because the tool does not allow the selection of a single algorithm, such as XGB.8) TransmogrifAI: an AutoML tool for structured data and that runs on top of Apache Spark [30]. 

All ML algorithms (when available for the task type) were tested: AdaBoost (H = 4), Bernoulli (H = 2) and Multinomial NB (H = 2), Gaussian NB (H = 0), Decision Tree (DT) (H = 4), Extremely Randomized Trees (XRT) (H = 5), Gradient Boosting Machine (GBM) (H = 6), k-Nearest Neighbors (k-NN) (H = 3), Linear Discriminant Analysis (LDA) (H = 4), Linear SVM (LSVM) (H = 4), Kernel based SVM (KSVM) (H = 7), Passive Aggressive (H = 3), Quadratic Discriminant Analysis (QDA) (H = 2), Random Forest (RF) (H = 5) and a Multiple Linear Regression (MR) classifier (H = 10). 

As for the regression tasks, the AutoML tool differences for each dataset are very small, corresponding to 1 pp in terms of NMAE for all three datasets. 

Trending Questions (1)
How do AutoML tools compare with each other in terms of accuracy, efficiency, and ease of use?

The paper compares eight AutoML tools based on their prediction scores and computational effort, finding that some tools outperform others in terms of accuracy and efficiency.