scispace - formally typeset
Open AccessProceedings ArticleDOI

Enhancing Performance Prediction Robustness by Combining Analytical Modeling and Machine Learning

TLDR
Several hybrid/gray box techniques are explored that exploit AM and ML in synergy in synergy to get the best of the two worlds, targeting two complex and widely adopted middleware systems.
Abstract
Classical approaches to performance prediction rely on two, typically antithetic, techniques: Machine Learning (ML) and Analytical Modeling (AM). ML takes a black box approach, whose accuracy strongly depends on the representativeness of the dataset used during the initial training phase. Specifically, it can achieve very good accuracy in areas of the features' space that have been sufficiently explored during the training process. Conversely, AM techniques require no or minimal training, hence exhibiting the potential for supporting prompt instantiation of the performance model of the target system. However, in order to ensure their tractability, they typically rely on a set of simplifying assumptions. Consequently, AM's accuracy can be seriously challenged in scenarios (e.g., workload conditions) in which such assumptions are not matched.In this paper we explore several hybrid/gray box techniques that exploit AM and ML in synergy in order to get the best of the two worlds. We evaluate the proposed techniques in case studies targeting two complex and widely adopted middleware systems: a NoSQL distributed key-value store and a Total Order Broadcast (TOB) service.

read more

Content maybe subject to copyright    Report

Enhancing Performance Prediction Robustness
by Combining Analytical Modeling and Machine Learning
Diego Didona
1
, Francesco Quaglia
2
, Paolo Romano
1
, Ennio Torre
2
1
INESC-ID / Instituto Superior Técnico, Universidade de Lisboa
2
Sapienza, Università di Roma
ABSTRACT
Classical approaches to performance prediction rely on two,
typically antithetic, techniques: Machine Learning (ML)
and Analytical Modeling (AM). ML takes a black box ap-
proach, whose accuracy strongly depends on the represen-
tativeness of the dataset used during the initial training
phase. Specifically, it can achieve very good accuracy in
areas of the features’ space that have been sufficiently ex-
plored during the training process. Conversely, AM tech-
niques require no or minimal training, hence exhibiting the
potential for supporting prompt instantiation of the perfor-
mance model of the target system. However, in order to
ensure their tractability, they typically rely on a set of sim-
plifying assumptions. Consequently, AM’s accuracy can be
seriously challenged in scenarios (e.g., workload conditions)
in which such assumptions are not matched. In this paper
we explore several hybrid/gray box techniques that exploit
AM and ML in synergy in order to get the best of the two
worlds. We evaluate the proposed techniques in case stud-
ies targeting two complex and widely adopted middleware
systems: a NoSQL distributed key-value store and a Total
Order Broadcast (TOB) service.
1. INTRODUCTION
Predicting the performance of applications and systems
is a primary concern for various purposes such as capacity
planning, elastic scaling and anomaly detection. Existing
approaches to performance prediction typically rely on two,
antithetic, techniques, namely Analytical Modeling (AM)
and Machine Learning (ML).
AM has been, for decades, the reference technique to carry
out performance evaluation and prediction of computing
platforms, in a wide range of application contexts (see, e.g.,
[43, 20]). AM takes advantage of available expertise on the
This work has been supported by FCT - Funda¸ao para a Ciˆencia e
a Tecnologia through PEst-OE/EEI/LA0021/2013, project specSTM
(PTDC/EIA-EIA/122785/2010) and project GreenTM EXPL/EEI-
ESS/0361/2013
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full cita-
tion on the first page. Copyrights for components of this work owned by others than
ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-
publish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from permissions@acm.org.
ICPE’15, Jan. 31–Feb. 4, 2015, Austin, Texas, USA.
Copyright
c
2015 ACM 978-1-4503-3248-4/15/01 ...$15.00.
http://dx.doi.org/10.1145/2668930.2688047.
internal dynamics of systems and/or applications, and en-
codes such knowledge into a mathematical model aimed at
capturing how (tunable) parameters map onto performance.
AM techniques typically require no or minimal training in
order to operatively carry out predictions in the target sce-
nario, and have been shown to achieve a good overall accu-
racy. On the other hand, in order to be instantiated and/or
be made tractable, AMs typically rely on simplifying as-
sumptions on how the modeled system and/or its workload
behave. Their accuracy can hence be seriously challenged in
scenarios (i.e., areas of the features’ space or specific work-
load conditions) in which such assumptions are not matched.
ML-based modeling lies on the opposite side of the spec-
trum, given that it requires no knowledge about the tar-
get system/application’s internal behavior. Specifically, ML
takes a black box approach that relies on observing the sys-
tem’s actual behavior under different settings in order to
infer a statistical behavioral model, e.g., in terms of deliv-
ered performance. Over the last years, ML techniques have
become more and more popular as tools for performance pre-
diction of complex systems. Two are the main reasons be-
hind this trend. On one side, the ever-increasing complexity
of modern computing architectures represents a challenge for
the accuracy of existing white box modeling techniques. On
the other side, difficulties arise when employing white box
models in virtualized, multi-tenant Cloud Computing envi-
ronments, where details about the infrastructure physically
hosting the application are normally (intentionally) hidden
away from the users, restricting the possibility of employing
detailed white box models for relevant parts of the system
(e.g., the interconnection/networking infrastructure).
However, ML-based approaches are not the silver bullet
for the problem of performance prediction. Their key draw-
back is that the accuracy they can reach strongly depends
on the representativeness of the dataset used during the ini-
tial training phase. In fact, predictions targeting areas of
the features’ space that have not been sufficiently explored
during the training process have typically very poor accu-
racy [2]. Unfortunately, the space of all possible configura-
tions for a target application grows exponentially with the
number of variables (a.k.a. features in the ML terminology)
that can affect its performance the so called curse of
dimensionality [1]. Hence, in complex systems comprising
large ecosystems of hardware and software components, the
cost of conducting an exhaustive training process, spanning
all possible input configurations, can quickly become pro-
hibitive. Overall, pure ML approaches appear as not fully
suited for contexts, like the Cloud, in which it is relevant
145

to promptly build models capable of determining configura-
tions that guarantee optimal performance (and consequently
resource usage).
In this paper we explore the problem of how to combine
white and black box performance modeling and prediction
methodologies by proposing and evaluating three techniques
based on the common idea of building an ensemble of differ-
ent methodologies. By exploiting AM and ML in synergy,
we aim at building a performance model that is more ro-
bust, i.e., less prone to error than a model based on any
of the two techniques implemented alone. The gray box
techniques that we propose serve this purpose in a twofold
fashion: i) by incorporating some ML component, they al-
low for increasing the prediction accuracy over time as new
data from the operational system are collected; ii) by rely-
ing on a pre-built analytical performance model, they can
be instantiated with a lower training time than conventional,
pure ML-based predictors.
Particularly, we take inspiration from the literature on en-
sembles of ML models, which has been targeted at studying
how to combine multiple black box ML techniques, and pro-
pose three algorithms that allow for the synergistic use of
AM and ML models:
K Nearest Neighbors (KNN): during the learning pro-
cess, this algorithm evaluates the accuracy that can be
achieved by the selected AM model(s) of the target sys-
tem and by one (or several) black box ML approaches
(e.g., Decision Trees, Artificial Neural Networks, Sup-
port Vector Machines) in points of the features’ space
that were not included in the training sets used to
build the ML-based learners (namely, a validation set).
When used to predict the performance achievable in a
configuration c , the average error achieved by the AM
model(s) and by the ML-based learner(s) across the
K Nearest Neighbors configurations belonging to the
validation set is used to determine which prediction
method to choose.
Hybrid Boosting (HyBoost): in this technique a chain
(possibly of length one) of ML algorithms is used to
learn the residual errors of some AM. The intuition is
that the function that characterizes the error of the
AM may be learned more easily than the original tar-
get function that describes the relation between input
and output variables. With this approach, the actual
performance prediction in operative phases is based
on the output by AM, adjusted by the error corrector
function.
Probing (PR): The idea at the basis of this algorithm
is to use ML to perform predictions exclusively on the
regions of the features’ space in which the AM does
not achieve sufficient accuracy (rather than across the
whole space). To this end two learners are exploited.
Initially a classifier is used to learn in which regions
of the features’ space the AM incurs a prediction error
larger than some predetermined threshold. In these
regions, a second black box regressor is trained to learn
the desired performance function.
All of the above algorithms allow for reducing the per-
formance model instantiation time compared to pure ML
techniques. In fact, either (i) the employed ML predictors
do not need to reach extremely high precision across the
whole features’ space given that they are complemented
by white-box predictors (as it occurs in KNN) that can nor-
mally provide good accuracy in broad areas of the features’
space; or (ii) they are targeted at estimating a function,
namely the error curve associated with AM, which can be
simpler (i.e., require less samples) to learn than the actual
performance function (as it occurs in HyBoost); or (iii) they
need to be trained only in circumscribed regions of the fea-
tures’ space (as it occurs in PR), which again can reduce
the number of samples to be observed during the training
phase.
Also, the structure of the framework is open to the possi-
bility of using a family of AM techniques of recent interest
(see, e.g., [12]), where parametric meta-models (requiring
fewer assumptions on the target system than classical ana-
lytical models, hence widening their applicability) are fast
trained, in order give rise to the actual AM instance suited
for the target system. This has been shown to be doable
by relying on a very reduced amount of samples of the real
system behavior. Hence, the same (or a reduced portion) of
training data that are used for the ML models envisioned in
our framework, could be also used to carry out the meta-
model training phase.
We assess the validity of our proposal through an ex-
tensive experimental evaluation carried out in two different
application domains: throughput prediction of a popular
open-source NoSQL distributed key-value store, Red Hat’s
Infinispan [25], and response time prediction of a total or-
der broadcast service, a key building block for fault-tolerant
replicated systems.
Our experimental results show that the best performing of
our proposed techniques can reduce the Root Mean Square
Error on average by about 40% with respect to AM and
ML, with maximum gains that extend up to a factor 3× vs
AM and 5× vs ML. On the other hand, they also show that
none of the proposed ensemble techniques outperforms all
the others in all the considered scenarios, and that their ac-
curacy is strongly dependent on the correct determination of
their internal meta-parameters. In this work we extensively
investigate this issue and we highlight various interesting
trade-offs that affect the parameters’ tuning of the proposed
algorithms.
The remainder of the paper is organized as follows. Sec-
tion 2 discusses related work. In Section 3 we provide some
background on ML techniques, which will form the basis
for the comprehension of our proposal. The three innova-
tive ensemble algorithms are presented in Section 4. The
experimentation-based evaluation of the effectiveness of our
proposals is provided in Section 5. Finally, Section 6 con-
cludes the paper.
2. RELATED WORK
The body of literature on solutions relying either on AM
or ML to predict applications’ performance is extremely
vast [29, 10, 35, 23, 8, 39, 42, 40]. On the other hand,
to the best of our knowledge, only a few approaches rely on
the synergistic exploitation of AM and ML. We group them
in the discussion depending on how the combination of the
two techniques is achieved.
Estimate and model. These works rely on ML to per-
form workload characterization and to estimate the service
demand of the requests in the system. Next, this informa-
146

tion is used to instantiate an AM, e.g. based on queuing
theory. Techniques employed to identify the parameters’
values for the AM include regression [44, 9], clustering [34],
Genetic Programming [18] or a combination of Kalman Fil-
ters and autoregressive models [45]. As ML is only employed
to characterize the workload, the accuracy of these solutions
is ultimately dependent on the accuracy of the adopted AM
technique. The ensemble techniques proposed in our work,
on the other hand, rely on ML to correct the inaccuracies of
an analytical model, and can hence improve accuracy over
time, as new sampling data is collected from the system be-
ing modeled.
Divide and conquer. This technique consists in building
performance models of individual parts of the entire system,
which are either based on AM or on ML. The sub-models are
then combined according to some formula in order to achieve
the prediction curve of the system as a whole. We find ap-
plications of this technique in the context of performance
modeling of distributed transactional applications [14, 16]
and response time prediction of Map-Reduce jobs [22]. In
the former case, AM is employed to capture the effects of
data and CPU contention on performance, whereas ML is
employed to forecast response time of network-bound oper-
ations. In the latter one, AM is exploited to compute some
performance metrics that are input features for the ML pre-
dictor.
The solution we propose in this work is fully complemen-
tary with respect to the divide and conquer approach. In
fact, performance predictors resulting from the adoption of
this technique can still show the limitations typical of the
base AM and ML techniques at their core (resp. inaccuracies
due to approximations and lengthy training phases). Our so-
lution is specifically aimed at mitigating such limitations, by
relying on ensembles of learners to increase accuracy (e.g.,
by discarding the output of some AM/ML predictor in spe-
cific operating points) while jointly reducing the cost of the
training process. We demonstrate the effectiveness of our
proposal by considering the divide and conquer-based model
presented in [16] as the reference performance predictor for
the NoSQL transactional platform case study.
Bootstrapping. This technique, which has been applied
in various contexts ranging from automatic resource provi-
sioning to anomaly detection, consists in relying on an AM
predictor to generate an initial synthetic training set for the
ML, with the purpose of avoiding the initial, long profiling
phase of the target application under different settings [15].
Then, the ML is retrained over time in order to incorpo-
rate the knowledge coming from samples collected from the
operational system [38, 32, 37, 33].
With respect to this solution, that only employs the AM
to generate the initial training set for the ML, our ensemble-
based forecasting techniques maintain the AM as a base pre-
dictor, and exploit different ML-based techniques to train
complementary black box models aimed at correcting the
AM’s inaccuracies.
In a previous work [13], we have explored the possibility
to infer at runtime, via a single ML, a corrective function
that, applied to the output of some AM predictor, is able to
increase the overall accuracy. The HyBoost ensemble that
we propose in this work improves over that solution, partic-
ularly by allowing for the combination of multiple learners
to compensate for the error of the base AM.
Generally speaking, one (additional) common shortcom-
ing of the above discussed literature solutions is that they
rely on a single ML in combination with an AM. This repre-
sents a major limitation to the degree of accuracy and pre-
dictive power that AM and ML, combined, could achieve: in
fact, several independent results in the ML field identify in
models’ diversity and heterogeneity the key means to build
a robust and accurate model with low training time [17, 4].
Our results back and extend this claim: by investigating
different techniques of combining white box and black box
models, relying in their turn on the exploitation of several
MLs, not only we do assess the benefits of combining the
two techniques, but we show evidence that there is not a
single hybrid ensemble model that always outperforms the
others.
Finally, it is worthy to note that nothing prevents our
framework to be usable for combining ML with other kinds
of white box predictors like simulation models. Although in
principle these are generally considered as more expensive
(in terms of time for being solved) as compared to AM ones,
the vast literature on high performance parallel simulation
techniques provides a good support for instantiating simula-
tors allowing to promptly evaluate the behavior of complex
systems (thanks to speedups by parallel runs [11, 3]). This
would lead to the availability of white box simulation mod-
els with features that are still complementary to ML ones,
such as reduced instantiation time, leading not to subvert
the possibility to reach the actual targets of our proposal
when employed as an alternative to AM in the presented
ensemble algorithms.
3. BACKGROUND ON ML MODELING
Before presenting the proposed gray box ensemble tech-
niques, we recall some basic concepts on ML-based tech-
niques and introduce terminology that will be used in the
remainder of the paper.
From a mathematical perspective, a ML algorithm, noted
γ, is a function defined over a set, called training set and
noted D
tr
= {< x, y >}, where x =< x
1
, . . . , x
n
> is a point
in a ndimensional space, called features’ space and noted
F , and y is the value of some unknown function φ : F C.
In this paper we consider the case in which the co-domain C
of function φ is the set R of real numbers, namely we consider
a regression problem. The proposed techniques can however
be straightforwardly adapted to cope with problems, known
under the name of classification problems, in which the co-
domain of φ is discrete.
The output of a ML algorithm γ is a function, called model
and noted Γ, which represents an approximator of function
φ over the features’ space F . More precisely, a model Γ :
F C takes as input a point x F , possibly not observed
in D
tr
, and returns a value ˆy C. The process of building
a model using a ML algorithm γ over a given training set is
also called training phase.
The literature on ML has proposed a number of alter-
native statistical approaches to infer the model Γ given a
training set D
tr
, like Decision Trees (DT), Artificial Neural
Network (ANN) and Support Vector Machines (SVM). In-
dependently of the specific approach used to derive Γ, these
techniques pursue the same objective: minimizing the error
of Γ on the training set, while preserving the ability to gen-
147

eralize the information observed during the training phase
in order to provide accurate estimations of φ even in re-
gions of the features’ space that were not observed during
the training phase.
Various definitions of error can be adopted to evaluate this
trade-off, and, more in general, the accuracy of a prediction
model (independently of whether it adopts a black or white
methodology). In this paper we adopt as error function the
Root Mean Square Error (RMSE), whose definition we recall
in the following. Given a set of actual values y
i
Y and
of corresponding predictions ˆy
i
ˆ
Y , with ˆy
i
, y
i
C, the
RMSE of
ˆ
Y with respect to Y is defined as:
RMSE(
ˆ
Y , Y ) =
r
P
ˆy
i
ˆ
Y
(
ˆ
Y
i
Y
i
)
2
|
ˆ
Y |
4. GRAY BOX ENSEMBLE ALGORITHMS
In this Section we present the three different algorithms
that exploit ML techniques in ensemble with a white box
analytical model, denoted as Γ
AM
. Before presenting the
proposed techniques, we provide a generic mathematical for-
malization of Γ
AM
.
Analogously to a ML-based model, an analytical model
Γ
AM
is a function F
AM
C, which can be queried to pre-
dict the performance of the modeled system ˆy = AM(x)
over a given configuration x F
AM
. For simplicity, we will
assume in the following that F
AM
= F
ML
and refer to them
by simply using the notation F . In other words, we assume
that the domain F
AM
over which the analytical model Γ
AM
is defined coincides with the features’ space, noted F
ML
,
used by the ML techniques that will be used to learn a cor-
rection function for Γ
AM
. In practice, this assumption is
not strictly required, and we simply require that the vari-
ables defining the features’ space are observable, i.e., they
can be measured in the target system. For instance, the
white box model AM may actually use a smaller subset of
the variables defining the features’ space of the black box
learners used in ensemble with AM. This could happen, for
instance, if the AM were not to account for a set of param-
eters, say P / F
AM
, whose effects on system’s performance
may be too hard to model explicitly via analytical models.
The parameters in P could, however, be incorporated in the
features’ space F
ML
, so as to keep their value into account
when learning the target function.
The key difference of an analytical model Γ
AM
with re-
spect to a ML-based model Γ is that the latter is obtained
by running a ML algorithm over a training set D
tr
(i.e.,
Γ = γ(D
tr
)). Hence, whenever new observations are in-
corporated in the training set, yielding an updated training
set D
0
tr
D
tr
, an updated version of the ML-based model
Γ
0
= γ(D
0
tr
) can be computed by training the ML-based
learner on D
0
tr
.
Conversely, an analytical model Γ
AM
incorporates a priori
domain knowledge on the target system, and it does not
require a training phase nor can be dynamically updated.
In other words, we consider the analytical model Γ
AM
to be
a static/immutable object, which cannot be updated based
on the feedback obtained from the target system.
One may note that analytical models typically rely on a
number of internal parameters, which can be used to cal-
ibrate the model’s output. Such parameters could be up-
dated, via fitting techniques [26], in order to minimize the
error achieved by the AM over the set of performance sam-
Algorithm 1 K Nearest Neighbors
1: Set Γ = Set of models to use
2: Set γ = {γ
1
, ..., γ
M
} Set of ML regressors
3: Set D
val
= Validation set
4:
5: function init(Analytical Model Γ
AM
, Training Set D
tr
)
6: Γ = {Γ
AM
} Initialize with the AM model
7: Build the training set for ML regressors
8: Set D
regr
= StatifiedSample(D
tr
)
9: Use a disjoint data set as Validation set
10: D
val
= D \ D
tb
11: for m = 1 M do
12: Γ
m
= γ
m
(D
tb
) Train m-th regressor
13: Γ = Γ {Γ
m
}
14: end for
15: end function
16: function forecast(x
s
)
17: Set D
k
={<x
i
,y
i
>KNN(x
s
, D
val
) s.t. ||x
i
,x
s
|| < c}
18: for each Γ
i
Γ do
19: RMSE[i] = compute RMSE of model Γ
i
on set D
k
20: end for each
21: µ = argmin
i=1...M
RMSE[i] Find learner with lowest RMSE
22: return Γ
µ
(x
s
)
23: end function
ples gathered over time from the target system. Also, as
discussed in Section 2, gray box performance modeling tech-
niques based on the divide-and-conquer approach, couple
analytical and ML-based models targeting different, but de-
pendent, subcomponent of the system. Whenever the ML-
based models are updated, this leads to changes of the input
parameters for the white box analytical models. From this
perspective, hence, these gray box techniques can be seen
as equivalent to white box analytical models whose internal
parameters can be dynamically adjusted.
It is worth noting that, by assuming the analytical model
Γ
AM
to be an immutable object, we can ensure that the
proposed techniques can also be employed in case Γ
AM
can
be dynamically updated. To this end, it simply suffices to
treat the updated white box model Γ
AM
0
as a new/different
model. On the other hand, having not to impose such an
assumption, we would allow the usage of techniques (e.g.,
ensemble techniques designed for “re-trainable” ML-based
learners) that may not be applicable in case the analytical
model was actually static.
As already mentioned, we present in the following three
ensemble techniques that pursue the same objectives (min-
imizing training time and achieving an accuracy better or
comparable to that of both black and white box techniques)
using different algorithmic approaches. In the light of the
above considerations, the proposed techniques can be seen as
instances of ensemble techniques for ML-based learners, spe-
cialized for the case in which one of the learning algorithms
in the ensemble outputs always the same model, namely the
one coded in the AM formulas, which is essentially indepen-
dent of the actual ML training set.
4.1 K Nearest Neighbors
The pseudo-code of the first presented technique, which
we call K Nearest Neighbors (KNN), is reported in Algo-
rithm 1. This technique relies on an analytical model, noted
Γ
AM
, and on a set γ of M alternative prediction models,
noted γ
1
, . . . , γ
M
. These predictors in γ should be selected
to maximize model diversity, which can be achieved in vari-
ous ways. A first technique consists in considering different
ML algorithms, e.g., DT and ANN. One can also train each
148

learner γ
i
using a different training set, with the purpose
of specializing the various models to predict performance
in different regions of the features’ space. Model diversity
can also be promoted by using different analytical models
(focused on capturing different systems’ dynamics), or even
alternative modeling techniques such as simulation.
The KNN algorithm is initialized via the Init function, by
providing Γ
AM
and a data set of samples, D
tr
=< x
i
, y
i
>,
which conveys information on the performance y
i
C of the
target system over a set of observed configurations x
i
F .
The data set D
tr
is not entirely used to train the set Γ of
regressors. Conversely, D
tr
is split into two disjoint data
sets, namely D
regr
and D
val
.
D
regr
is used as training set for the learners in Γ, and it
should be obtained by extracting a random subset amount-
ing to a percentage p
regr
of D
tr
. In order to enhance the
representativeness of the samples included in D
regr
, the pro-
cess of extraction of D
regr
from D
tr
is performed by means
of the stratified sampling technique [2], which ensures that
the distribution of the values y
i
C is the same in the two
sets.
D
val
is obtained as the complementary subset of D
regr
D
tr
, which ensures the disjointness of the two sets D
regr
and
D
val
by construction. The D
val
is used at query time (Func-
tion Forecast), when one wants to predict the expected
performance of the target system, noted y
s
, in the configu-
ration x
s
. To this end, it is first computed the set D
k
that
contains the k nearest neighbors {x
1
,. . .,x
k
}∈ D
val
within
distance c from point x
s
. The samples in D
k
, for which we
have available also the corresponding actual performance,
are then used to compute the average accuracy of each of
the models in the set Γ (Line 19). This allows for determin-
ing the model, noted γ
µ
in the pseudo-code (Line 21), which
is expected to maximize prediction accuracy in the region
surrounding x
s
. Based on this geometrical interpretation,
the c parameter can be interpreted as a cut-off threshold,
which allows discarding samples of the validation set that
are too far away from x
s
and which may not be representa-
tive of the target configuration x
s
.
The relevance of ensuring the disjointness of D
val
and
D
tr
can be understood by recalling that samples x D
tr
are used to train the regressors in Γ. Estimating the ac-
curacy of these models using the same samples that were
used to derive the models during the training phase would
lead to significantly overestimate the accuracy achievable by,
so called, over-fitted models, i.e., models that minimize (or
even nullify) the error with respect to the configurations ob-
served during the training phase, but which are unable to
generalize and thus incur large errors even in regions in the
proximity of points contained in the training set.
4.2 Hybrid Boosting
The second algorithm we present applies a well-known
technique from the literature on ensembles of black box
learners, which is known as Boosting [2]. In particular,
as we are considering a regression problem (whereas the
boosting technique was defined for classification problems),
we draw inspiration from the Adaptive Logistic Regression
technique [19]. This is a boosting algorithm that was orig-
inally conceived to operate with ML-based regressors, and
which we adapted to support the joint usage of one analyt-
ical model and of a set of black box learners.
Algorithm 2 Hybrid Boosting
1: Set γ
red
= {γ
red
1
, . . . , γ
red
M
} ML regressors for residue pred.
2: Set Γ
red
= {Γ
red
1
, . . . , Γ
red
M
} Models for residue pred.
3: Set Γ
per
= {Γ
per
0
, Γ
per
1
, . . . , Γ
per
M
} Models for perf. pred.
4:
5: function init(Analytical Model Γ
AM
, Training Set D
tr
)
6: Γ
per
0
= Γ
AM
Set the AM as the 1
st
predictor
7: for m = 1 M do
8: D
m
=
9: for each <x
n
,y
n
> D
tr
10: y
m,n
= y
n
Γ
per
m1
(x
n
) Compute the residual error
11: D
m
= D
m
< x
n
, y
m,n
> of previous learner
12: end for each
13: Γ
red
m
= γ
red
m
(D
m
) Train on the residuals
14: β
m
= argmin
β
P
N
n=1
y
n
per
m1
(x
n
) + βΓ
red
m
(x
n
))
15: Γ
per
m
= Γ
per
m1
+ β
m
Γ
red
m
Set the m-th predictor
16: end for
17: end function
18: function forecast(x
s
)
19: return Γ
per
0
(x
s
) +
P
M
m=1
β
m
Γ
red
m
(x
s
)
20: end function
The pseudo-code of this technique, which we name Hy-
brid Boosting (HyBoost), is reported in Algorithm 2. In
addition to the analytical model Γ
AM
, also in this case we
assume the availability of a set of M regressors based on
machine learning techniques, which we denote γ
red
. Unlike
in KNN, however, these learners are not used to build al-
ternative models of the performance of the target system.
Conversely, the learners are stacked in a chain (i.e., an or-
dered set) and used to learn the error (residue) introduced
by the previous learner in the chain.
More in detail, HyBoost uses two (ordered) sets of predic-
tive models, noted Γ
red
and Γ
per
, composed by, respectively,
m and m + 1 models. The first model in Γ
red
, i.e., Γ
red
1
, is
obtained by training the first regressor γ
red
1
with a training
set D
i
that characterizes the error (defined as the difference
between the actual and predicted value) of the analytical
model Γ
AM
for each point in the original training set D
tr
.
Any other model Γ
red
i
, with i [1, M], is trained to learn the
prediction error of the model Γ
per
i1
, which incorporates the
knowledge of the AM and of the first i1 ML-based learners
by means of the following recurrence equation (Line 15):
Γ
per
m
= Γ
per
m1
+ β
m
Γ
red
m
where β
m
is a coefficient (computed in Line 14) such that the
cumulative training error of the resulting m-stage regressor
is minimized.
The key intuition at the basis of this algorithm, as already
hinted, is that learning the residual errors of an analytical
model may be easier than learning the original function for
which we are trying to build a robust predictor. Also Hy-
Boost, analogously to KNN, can exploit machine learners
using different algorithms. Moreover, it may be further ex-
tended and optimized using well-known techniques in the
literature on boosting ML-algorithms, such as adaptively
weighting the elements in the training set of the i-th learner
in order to focus it on minimizing its fitting error on samples
over which the i 1-th learner incurred the largest errors.
4.3 Probing
We named the last of the three presented techniques Prob-
ing, and we reported its pseudo-code in Algorithm 3. This
approach, which to the best of our knowledge has no direct
149

Citations
More filters

Pattern Recognition and Machine Learning

TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Proceedings ArticleDOI

Performance Prediction for Apache Spark Platform

TL;DR: This paper presents a simulation driven prediction model that can predict job performance with high accuracy for Apache Spark platform and evaluated the prediction framework using four real-life applications to show that the model can achieve high prediction accuracy.
Journal ArticleDOI

Machine Learning Methods for Reliable Resource Provisioning in Edge-Cloud Computing: A Survey

TL;DR: This article investigates the problem of reliable resource provisioning in joint edge-cloud environments, and surveys technologies, mechanisms, and methods that can be used to improve the reliability of distributed applications in diverse and heterogeneous network environments.
Proceedings ArticleDOI

Rafiki: a middleware for parameter tuning of NoSQL datastores for dynamic metagenomics workloads

TL;DR: Rafiki, a method and a system for optimizing NoSQL configurations for Cassandra and ScyllaDB when running HPC and metagenomics workloads by applying neural networks using the most significant parameters and their workload-dependent mapping to predict database throughput, as a surrogate model.
Proceedings ArticleDOI

All versus one: an empirical comparison on retrained and incremental machine learning for modeling performance of adaptable software

TL;DR: This paper is the first to report on a comprehensive empirical study that examines both modeling methods under distinct domains of adaptable software, 5 performance indicators, 8 learning algorithms and settings, covering a total of 1,360 different conditions.
References
More filters
Book

Pattern Recognition and Machine Learning

TL;DR: Probability Distributions, linear models for Regression, Linear Models for Classification, Neural Networks, Graphical Models, Mixture Models and EM, Sampling Methods, Continuous Latent Variables, Sequential Data are studied.
Book

Dynamic Programming

TL;DR: The more the authors study the information processing aspects of the mind, the more perplexed and impressed they become, and it will be a very long time before they understand these processes sufficiently to reproduce them.

Pattern Recognition and Machine Learning

TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Book

Pattern Recognition and Machine Learning (Information Science and Statistics)

TL;DR: Looking for competent reading resources?
Related Papers (5)
Frequently Asked Questions (13)
Q1. What have the authors contributed in "Enhancing performance prediction robustness by combining analytical modeling and machine learning∗" ?

In this paper the authors explore several hybrid/gray box techniques that exploit AM and ML in synergy in order to get the best of the two worlds. The authors evaluate the proposed techniques in case studies targeting two complex and widely adopted middleware systems: a NoSQL distributed key-value store and a Total Order Broadcast ( TOB ) service. 

Techniques employed to identify the parameters’ values for the AM include regression [44, 9], clustering [34], Genetic Programming [18] or a combination of Kalman Filters and autoregressive models [45]. 

The authors evaluated the effectiveness of their proposals by relying on case studies related to two highly relevant open-source middleware platforms, namely a key-value data store and a group communication system. 

It is worth noting that, by assuming the analytical model ΓAM to be an immutable object, the authors can ensure that the proposed techniques can also be employed in case ΓAM can be dynamically updated. 

The authors assess the validity of their proposal through an extensive experimental evaluation carried out in two different application domains: throughput prediction of a popular open-source NoSQL distributed key-value store, Red Hat’s Infinispan [25], and response time prediction of a total order broadcast service, a key building block for fault-tolerant replicated systems. 

Analogously to a ML-based model, an analytical model ΓAM is a function FAM → C, which can be queried to predict the performance of the modeled system ŷ = AM(x) over a given configuration x ∈ FAM . 

By narrowing the scope in which the ML-based learner Γreg is used to the regions of high error for ΓAM , the complexity of the function that needs to be learnt via ML may be reduced, which may ultimately benefit the accuracy of Γreg. 

In the former case, AM is employed to capture the effects of data and CPU contention on performance, whereas ML is employed to forecast response time of network-bound operations. 

The authors argue that this depends on the fact that the error function of the ensemble composed by ΓAM and by one ML-based regressor was extremely irregular, hence resulting not easy to learn using additional black-box regressors. 

Their experimental study suggests that one of the key factors that affects the performance of the proposed solutions is the “shape” of the error distribution of ΓAM . 

The correct settings of these parameters can be identified recurring to the standard methodology used to tune the internal parameters of ML-based algorithms: performing a parameter’s sweep during the training phase, and using crossvalidation to evaluate the accuracy achieved when using a candidate parameter configuration over a test set disjoint from the training set used to initialize the ensemble [2].incurs the largest errors are relatively circumscribed, solutions like KNN and Probing, which are based on the idea of determining in which regions to use which learner, result the most effective. 

As already mentioned, the authors present in the following three ensemble techniques that pursue the same objectives (minimizing training time and achieving an accuracy better or comparable to that of both black and white box techniques) using different algorithmic approaches. 

The choice of Cubist as reference base learner for the results presented in this section is due to the fact that, at least in the considered case studies, Cubist consistently resulted to be the most accurate individual (non-ensembled) ML tech-nique, when compared to (Weka’s implementations of) ANN and SVM2.