What is the reason why the authors evaluated the effectiveness of their proposals?

The authors evaluated the effectiveness of their proposals by relying on case studies related to two highly relevant open-source middleware platforms, namely a key-value data store and a group communication system.

What is the way to ensure that the proposed techniques can be used in case AM?

It is worth noting that, by assuming the analytical model ΓAM to be an immutable object, the authors can ensure that the proposed techniques can also be employed in case ΓAM can be dynamically updated.

What is the definition of a function in a ML-based model?

Analogously to a ML-based model, an analytical model ΓAM is a function FAM → C, which can be queried to predict the performance of the modeled system ŷ = AM(x) over a given configuration x ∈ FAM .

What is the purpose of the ML-based learner?

By narrowing the scope in which the ML-based learner Γreg is used to the regions of high error for ΓAM , the complexity of the function that needs to be learnt via ML may be reduced, which may ultimately benefit the accuracy of Γreg.

What is the reason for the lack of a regressor?

The authors argue that this depends on the fact that the error function of the ensemble composed by ΓAM and by one ML-based regressor was extremely irregular, hence resulting not easy to learn using additional black-box regressors.

What is the key factor that affects the performance of the proposed solutions?

Their experimental study suggests that one of the key factors that affects the performance of the proposed solutions is the “shape” of the error distribution of ΓAM .

What is the way to tune the parameters of a ML ensemble?

The correct settings of these parameters can be identified recurring to the standard methodology used to tune the internal parameters of ML-based algorithms: performing a parameter’s sweep during the training phase, and using crossvalidation to evaluate the accuracy achieved when using a candidate parameter configuration over a test set disjoint from the training set used to initialize the ensemble [2].incurs the largest errors are relatively circumscribed, solutions like KNN and Probing, which are based on the idea of determining in which regions to use which learner, result the most effective.

Why is Cubist the accurate ML technique?

The choice of Cubist as reference base learner for the results presented in this section is due to the fact that, at least in the considered case studies, Cubist consistently resulted to be the most accurate individual (non-ensembled) ML tech-nique, when compared to (Weka’s implementations of) ANN and SVM2.

(Open Access) Enhancing Performance Prediction Robustness by Combining Analytical Modeling and Machine Learning (2015) | Diego Didona

Q: What have the authors contributed in "Enhancing performance prediction robustness by combining analytical modeling and machine learning∗" ?

In this paper the authors explore several hybrid/gray box techniques that exploit AM and ML in synergy in order to get the best of the two worlds. The authors evaluate the proposed techniques in case studies targeting two complex and widely adopted middleware systems: a NoSQL distributed key-value store and a Total Order Broadcast ( TOB ) service.

Q: How does the paper assess the validity of the proposed ensemble techniques?

The authors assess the validity of their proposal through an extensive experimental evaluation carried out in two different application domains: throughput prediction of a popular open-source NoSQL distributed key-value store, Red Hat’s Infinispan [25], and response time prediction of a total order broadcast service, a key building block for fault-tolerant replicated systems.

Q: What is the use of ML for prediction of performance?

In the former case, AM is employed to capture the effects of data and CPU contention on performance, whereas ML is employed to forecast response time of network-bound operations.

Enhancing Performance Prediction Robustness

by Combining Analytical Modeling and Machine Learning

∗

Diego Didona

, Francesco Quaglia

, Paolo Romano

, Ennio Torre

INESC-ID / Instituto Superior Técnico, Universidade de Lisboa

Sapienza, Università di Roma

ABSTRACT

Classical approaches to performance prediction rely on two,

typically antithetic, techniques: Machine Learning (ML)

and Analytical Modeling (AM). ML takes a black box ap-

proach, whose accuracy strongly depends on the represen-

tativeness of the dataset used during the initial training

phase. Speciﬁcally, it can achieve very good accuracy in

areas of the features’ space that have been suﬃciently ex-

plored during the training process. Conversely, AM tech-

niques require no or minimal training, hence exhibiting the

potential for supporting prompt instantiation of the perfor-

mance model of the target system. However, in order to

ensure their tractability, they typically rely on a set of sim-

plifying assumptions. Consequently, AM’s accuracy can be

seriously challenged in scenarios (e.g., workload conditions)

in which such assumptions are not matched. In this paper

we explore several hybrid/gray box techniques that exploit

AM and ML in synergy in order to get the best of the two

worlds. We evaluate the proposed techniques in case stud-

ies targeting two complex and widely adopted middleware

systems: a NoSQL distributed key-value store and a Total

Order Broadcast (TOB) service.

1. INTRODUCTION

Predicting the performance of applications and systems

is a primary concern for various purposes such as capacity

planning, elastic scaling and anomaly detection. Existing

approaches to performance prediction typically rely on two,

antithetic, techniques, namely Analytical Modeling (AM)

and Machine Learning (ML).

AM has been, for decades, the reference technique to carry

out performance evaluation and prediction of computing

platforms, in a wide range of application contexts (see, e.g.,

[43, 20]). AM takes advantage of available expertise on the

∗

This work has been supported by FCT - Funda¸c˜ao para a Ciˆencia e

a Tecnologia through PEst-OE/EEI/LA0021/2013, project specSTM

(PTDC/EIA-EIA/122785/2010) and project GreenTM EXPL/EEI-

ESS/0361/2013

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for proﬁt or commercial advantage and that copies bear this notice and the full cita-

tion on the ﬁrst page. Copyrights for components of this work owned by others than

ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-

publish, to post on servers or to redistribute to lists, requires prior speciﬁc permission

and/or a fee. Request permissions from permissions@acm.org.

ICPE’15, Jan. 31–Feb. 4, 2015, Austin, Texas, USA.

 2015 ACM 978-1-4503-3248-4/15/01 ...$15.00.

http://dx.doi.org/10.1145/2668930.2688047.

internal dynamics of systems and/or applications, and en-

codes such knowledge into a mathematical model aimed at

capturing how (tunable) parameters map onto performance.

AM techniques typically require no or minimal training in

order to operatively carry out predictions in the target sce-

nario, and have been shown to achieve a good overall accu-

racy. On the other hand, in order to be instantiated and/or

be made tractable, AMs typically rely on simplifying as-

sumptions on how the modeled system and/or its workload

behave. Their accuracy can hence be seriously challenged in

scenarios (i.e., areas of the features’ space or speciﬁc work-

load conditions) in which such assumptions are not matched.

ML-based modeling lies on the opposite side of the spec-

trum, given that it requires no knowledge about the tar-

get system/application’s internal behavior. Speciﬁcally, ML

takes a black box approach that relies on observing the sys-

tem’s actual behavior under diﬀerent settings in order to

infer a statistical behavioral model, e.g., in terms of deliv-

ered performance. Over the last years, ML techniques have

become more and more popular as tools for performance pre-

diction of complex systems. Two are the main reasons be-

hind this trend. On one side, the ever-increasing complexity

of modern computing architectures represents a challenge for

the accuracy of existing white box modeling techniques. On

the other side, diﬃculties arise when employing white box

models in virtualized, multi-tenant Cloud Computing envi-

ronments, where details about the infrastructure physically

hosting the application are normally (intentionally) hidden

away from the users, restricting the possibility of employing

detailed white box models for relevant parts of the system

(e.g., the interconnection/networking infrastructure).

However, ML-based approaches are not the silver bullet

for the problem of performance prediction. Their key draw-

back is that the accuracy they can reach strongly depends

on the representativeness of the dataset used during the ini-

tial training phase. In fact, predictions targeting areas of

the features’ space that have not been suﬃciently explored

during the training process have typically very poor accu-

racy [2]. Unfortunately, the space of all possible conﬁgura-

tions for a target application grows exponentially with the

number of variables (a.k.a. features in the ML terminology)

that can aﬀect its performance — the so called curse of

dimensionality [1]. Hence, in complex systems comprising

large ecosystems of hardware and software components, the

cost of conducting an exhaustive training process, spanning

all possible input conﬁgurations, can quickly become pro-

hibitive. Overall, pure ML approaches appear as not fully

suited for contexts, like the Cloud, in which it is relevant

145

to promptly build models capable of determining conﬁgura-

tions that guarantee optimal performance (and consequently

resource usage).

In this paper we explore the problem of how to combine

white and black box performance modeling and prediction

methodologies by proposing and evaluating three techniques

based on the common idea of building an ensemble of diﬀer-

ent methodologies. By exploiting AM and ML in synergy,

we aim at building a performance model that is more ro-

bust, i.e., less prone to error than a model based on any

of the two techniques implemented alone. The gray box

techniques that we propose serve this purpose in a twofold

fashion: i) by incorporating some ML component, they al-

low for increasing the prediction accuracy over time as new

data from the operational system are collected; ii) by rely-

ing on a pre-built analytical performance model, they can

be instantiated with a lower training time than conventional,

pure ML-based predictors.

Particularly, we take inspiration from the literature on en-

sembles of ML models, which has been targeted at studying

how to combine multiple black box ML techniques, and pro-

pose three algorithms that allow for the synergistic use of

AM and ML models:

• K Nearest Neighbors (KNN): during the learning pro-

cess, this algorithm evaluates the accuracy that can be

achieved by the selected AM model(s) of the target sys-

tem and by one (or several) black box ML approaches

(e.g., Decision Trees, Artiﬁcial Neural Networks, Sup-

port Vector Machines) in points of the features’ space

that were not included in the training sets used to

build the ML-based learners (namely, a validation set).

When used to predict the performance achievable in a

conﬁguration c , the average error achieved by the AM

model(s) and by the ML-based learner(s) across the

K Nearest Neighbors conﬁgurations belonging to the

validation set is used to determine which prediction

method to choose.

• Hybrid Boosting (HyBoost): in this technique a chain

(possibly of length one) of ML algorithms is used to

learn the residual errors of some AM. The intuition is

that the function that characterizes the error of the

AM may be learned more easily than the original tar-

get function that describes the relation between input

and output variables. With this approach, the actual

performance prediction in operative phases is based

on the output by AM, adjusted by the error corrector

function.

• Probing (PR): The idea at the basis of this algorithm

is to use ML to perform predictions exclusively on the

regions of the features’ space in which the AM does

not achieve suﬃcient accuracy (rather than across the

whole space). To this end two learners are exploited.

Initially a classiﬁer is used to learn in which regions

of the features’ space the AM incurs a prediction error

larger than some predetermined threshold. In these

regions, a second black box regressor is trained to learn

the desired performance function.

All of the above algorithms allow for reducing the per-

formance model instantiation time compared to pure ML

techniques. In fact, either (i) the employed ML predictors

do not need to reach extremely high precision across the

whole features’ space — given that they are complemented

by white-box predictors (as it occurs in KNN) that can nor-

mally provide good accuracy in broad areas of the features’

space; or (ii) they are targeted at estimating a function,

namely the error curve associated with AM, which can be

simpler (i.e., require less samples) to learn than the actual

performance function (as it occurs in HyBoost); or (iii) they

need to be trained only in circumscribed regions of the fea-

tures’ space (as it occurs in PR), which again can reduce

the number of samples to be observed during the training

phase.

Also, the structure of the framework is open to the possi-

bility of using a family of AM techniques of recent interest

(see, e.g., [12]), where parametric meta-models (requiring

fewer assumptions on the target system than classical ana-

lytical models, hence widening their applicability) are fast

trained, in order give rise to the actual AM instance suited

for the target system. This has been shown to be doable

by relying on a very reduced amount of samples of the real

system behavior. Hence, the same (or a reduced portion) of

training data that are used for the ML models envisioned in

our framework, could be also used to carry out the meta-

model training phase.

We assess the validity of our proposal through an ex-

tensive experimental evaluation carried out in two diﬀerent

application domains: throughput prediction of a popular

open-source NoSQL distributed key-value store, Red Hat’s

Inﬁnispan [25], and response time prediction of a total or-

der broadcast service, a key building block for fault-tolerant

replicated systems.

Our experimental results show that the best performing of

our proposed techniques can reduce the Root Mean Square

Error on average by about 40% with respect to AM and

ML, with maximum gains that extend up to a factor 3× vs

AM and 5× vs ML. On the other hand, they also show that

none of the proposed ensemble techniques outperforms all

the others in all the considered scenarios, and that their ac-

curacy is strongly dependent on the correct determination of

their internal meta-parameters. In this work we extensively

investigate this issue and we highlight various interesting

trade-oﬀs that aﬀect the parameters’ tuning of the proposed

algorithms.

The remainder of the paper is organized as follows. Sec-

tion 2 discusses related work. In Section 3 we provide some

background on ML techniques, which will form the basis

for the comprehension of our proposal. The three innova-

tive ensemble algorithms are presented in Section 4. The

experimentation-based evaluation of the eﬀectiveness of our

proposals is provided in Section 5. Finally, Section 6 con-

cludes the paper.

2. RELATED WORK

The body of literature on solutions relying either on AM

or ML to predict applications’ performance is extremely

vast [29, 10, 35, 23, 8, 39, 42, 40]. On the other hand,

to the best of our knowledge, only a few approaches rely on

the synergistic exploitation of AM and ML. We group them

in the discussion depending on how the combination of the

two techniques is achieved.

Estimate and model. These works rely on ML to per-

form workload characterization and to estimate the service

demand of the requests in the system. Next, this informa-

146

tion is used to instantiate an AM, e.g. based on queuing

theory. Techniques employed to identify the parameters’

values for the AM include regression [44, 9], clustering [34],

Genetic Programming [18] or a combination of Kalman Fil-

ters and autoregressive models [45]. As ML is only employed

to characterize the workload, the accuracy of these solutions

is ultimately dependent on the accuracy of the adopted AM

technique. The ensemble techniques proposed in our work,

on the other hand, rely on ML to correct the inaccuracies of

an analytical model, and can hence improve accuracy over

time, as new sampling data is collected from the system be-

ing modeled.

Divide and conquer. This technique consists in building

performance models of individual parts of the entire system,

which are either based on AM or on ML. The sub-models are

then combined according to some formula in order to achieve

the prediction curve of the system as a whole. We ﬁnd ap-

plications of this technique in the context of performance

modeling of distributed transactional applications [14, 16]

and response time prediction of Map-Reduce jobs [22]. In

the former case, AM is employed to capture the eﬀects of

data and CPU contention on performance, whereas ML is

employed to forecast response time of network-bound oper-

ations. In the latter one, AM is exploited to compute some

performance metrics that are input features for the ML pre-

dictor.

The solution we propose in this work is fully complemen-

tary with respect to the divide and conquer approach. In

fact, performance predictors resulting from the adoption of

this technique can still show the limitations typical of the

base AM and ML techniques at their core (resp. inaccuracies

due to approximations and lengthy training phases). Our so-

lution is speciﬁcally aimed at mitigating such limitations, by

relying on ensembles of learners to increase accuracy (e.g.,

by discarding the output of some AM/ML predictor in spe-

ciﬁc operating points) while jointly reducing the cost of the

training process. We demonstrate the eﬀectiveness of our

proposal by considering the divide and conquer-based model

presented in [16] as the reference performance predictor for

the NoSQL transactional platform case study.

Bootstrapping. This technique, which has been applied

in various contexts ranging from automatic resource provi-

sioning to anomaly detection, consists in relying on an AM

predictor to generate an initial synthetic training set for the

ML, with the purpose of avoiding the initial, long proﬁling

phase of the target application under diﬀerent settings [15].

Then, the ML is retrained over time in order to incorpo-

rate the knowledge coming from samples collected from the

operational system [38, 32, 37, 33].

With respect to this solution, that only employs the AM

to generate the initial training set for the ML, our ensemble-

based forecasting techniques maintain the AM as a base pre-

dictor, and exploit diﬀerent ML-based techniques to train

complementary black box models aimed at correcting the

AM’s inaccuracies.

In a previous work [13], we have explored the possibility

to infer at runtime, via a single ML, a corrective function

that, applied to the output of some AM predictor, is able to

increase the overall accuracy. The HyBoost ensemble that

we propose in this work improves over that solution, partic-

ularly by allowing for the combination of multiple learners

to compensate for the error of the base AM.

Generally speaking, one (additional) common shortcom-

ing of the above discussed literature solutions is that they

rely on a single ML in combination with an AM. This repre-

sents a major limitation to the degree of accuracy and pre-

dictive power that AM and ML, combined, could achieve: in

fact, several independent results in the ML ﬁeld identify in

models’ diversity and heterogeneity the key means to build

a robust and accurate model with low training time [17, 4].

Our results back and extend this claim: by investigating

diﬀerent techniques of combining white box and black box

models, relying in their turn on the exploitation of several

MLs, not only we do assess the beneﬁts of combining the

two techniques, but we show evidence that there is not a

single hybrid ensemble model that always outperforms the

others.

Finally, it is worthy to note that nothing prevents our

framework to be usable for combining ML with other kinds

of white box predictors like simulation models. Although in

principle these are generally considered as more expensive

(in terms of time for being solved) as compared to AM ones,

the vast literature on high performance parallel simulation

techniques provides a good support for instantiating simula-

tors allowing to promptly evaluate the behavior of complex

systems (thanks to speedups by parallel runs [11, 3]). This

would lead to the availability of white box simulation mod-

els with features that are still complementary to ML ones,

such as reduced instantiation time, leading not to subvert

the possibility to reach the actual targets of our proposal

when employed as an alternative to AM in the presented

ensemble algorithms.

3. BACKGROUND ON ML MODELING

Before presenting the proposed gray box ensemble tech-

niques, we recall some basic concepts on ML-based tech-

niques and introduce terminology that will be used in the

remainder of the paper.

From a mathematical perspective, a ML algorithm, noted

γ, is a function deﬁned over a set, called training set and

noted D

= {< x, y >}, where x =< x

, . . . , x

> is a point

in a n−dimensional space, called features’ space and noted

F , and y is the value of some unknown function φ : F → C.

In this paper we consider the case in which the co-domain C

of function φ is the set R of real numbers, namely we consider

a regression problem. The proposed techniques can however

be straightforwardly adapted to cope with problems, known

under the name of classiﬁcation problems, in which the co-

domain of φ is discrete.

The output of a ML algorithm γ is a function, called model

and noted Γ, which represents an approximator of function

φ over the features’ space F . More precisely, a model Γ :

F → C takes as input a point x ∈ F , possibly not observed

in D

, and returns a value ˆy ∈ C. The process of building

a model using a ML algorithm γ over a given training set is

also called training phase.

The literature on ML has proposed a number of alter-

native statistical approaches to infer the model Γ given a

training set D

, like Decision Trees (DT), Artiﬁcial Neural

Network (ANN) and Support Vector Machines (SVM). In-

dependently of the speciﬁc approach used to derive Γ, these

techniques pursue the same objective: minimizing the error

of Γ on the training set, while preserving the ability to gen-

147

eralize the information observed during the training phase

in order to provide accurate estimations of φ even in re-

gions of the features’ space that were not observed during

the training phase.

Various deﬁnitions of error can be adopted to evaluate this

trade-oﬀ, and, more in general, the accuracy of a prediction

model (independently of whether it adopts a black or white

methodology). In this paper we adopt as error function the

Root Mean Square Error (RMSE), whose deﬁnition we recall

in the following. Given a set of actual values y

∈ Y and

of corresponding predictions ˆy

∈

Y , with ˆy

, y

∈ C, the

RMSE of

Y with respect to Y is deﬁned as:

RMSE(

Y , Y ) =

ˆy

∈

(

−Y

)

Y |

4. GRAY BOX ENSEMBLE ALGORITHMS

In this Section we present the three diﬀerent algorithms

that exploit ML techniques in ensemble with a white box

analytical model, denoted as Γ

. Before presenting the

proposed techniques, we provide a generic mathematical for-

malization of Γ

Analogously to a ML-based model, an analytical model

is a function F

→ C, which can be queried to pre-

dict the performance of the modeled system ˆy = AM(x)

over a given conﬁguration x ∈ F

. For simplicity, we will

assume in the following that F

= F

and refer to them

by simply using the notation F . In other words, we assume

that the domain F

over which the analytical model Γ

is deﬁned coincides with the features’ space, noted F

used by the ML techniques that will be used to learn a cor-

rection function for Γ

. In practice, this assumption is

not strictly required, and we simply require that the vari-

ables deﬁning the features’ space are observable, i.e., they

can be measured in the target system. For instance, the

white box model AM may actually use a smaller subset of

the variables deﬁning the features’ space of the black box

learners used in ensemble with AM. This could happen, for

instance, if the AM were not to account for a set of param-

eters, say P /∈ F

, whose eﬀects on system’s performance

may be too hard to model explicitly via analytical models.

The parameters in P could, however, be incorporated in the

features’ space F

, so as to keep their value into account

when learning the target function.

The key diﬀerence of an analytical model Γ

with re-

spect to a ML-based model Γ is that the latter is obtained

by running a ML algorithm over a training set D

(i.e.,

Γ = γ(D

)). Hence, whenever new observations are in-

corporated in the training set, yielding an updated training

set D

⊇ D

, an updated version of the ML-based model

= γ(D

) can be computed by training the ML-based

learner on D

Conversely, an analytical model Γ

incorporates a priori

domain knowledge on the target system, and it does not

require a training phase nor can be dynamically updated.

In other words, we consider the analytical model Γ

to be

a static/immutable object, which cannot be updated based

on the feedback obtained from the target system.

One may note that analytical models typically rely on a

number of internal parameters, which can be used to cal-

ibrate the model’s output. Such parameters could be up-

dated, via ﬁtting techniques [26], in order to minimize the

error achieved by the AM over the set of performance sam-

Algorithm 1 K Nearest Neighbors

1: Set Γ = ∅  Set of models to use

2: Set γ = {γ

, ..., γ

}  Set of ML regressors

3: Set D

val

= ∅  Validation set

5: function init(Analytical Model Γ

, Training Set D

)

6: Γ = {Γ

}  Initialize with the AM model

7:  Build the training set for ML regressors

8: Set D

regr

= StatifiedSample(D

)

9:  Use a disjoint data set as Validation set

10: D

val

= D \ D

11: for m = 1 → M do

12: Γ

= γ

)  Train m-th regressor

13: Γ = Γ ∪ {Γ

}

14: end for

15: end function

16: function forecast(x

)

17: Set D

={<x

>∈KNN(x

, D

val

) s.t. ||x

|| < c}

18: for each Γ

∈ Γ do

19: RMSE[i] = compute RMSE of model Γ

on set D

20: end for each

21: µ = argmin

i=1...M

RMSE[i]  Find learner with lowest RMSE

22: return Γ

)

23: end function

ples gathered over time from the target system. Also, as

discussed in Section 2, gray box performance modeling tech-

niques based on the divide-and-conquer approach, couple

analytical and ML-based models targeting diﬀerent, but de-

pendent, subcomponent of the system. Whenever the ML-

based models are updated, this leads to changes of the input

parameters for the white box analytical models. From this

perspective, hence, these gray box techniques can be seen

as equivalent to white box analytical models whose internal

parameters can be dynamically adjusted.

It is worth noting that, by assuming the analytical model

to be an immutable object, we can ensure that the

proposed techniques can also be employed in case Γ

can

be dynamically updated. To this end, it simply suﬃces to

treat the updated white box model Γ

as a new/diﬀerent

model. On the other hand, having not to impose such an

assumption, we would allow the usage of techniques (e.g.,

ensemble techniques designed for “re-trainable” ML-based

learners) that may not be applicable in case the analytical

model was actually static.

As already mentioned, we present in the following three

ensemble techniques that pursue the same objectives (min-

imizing training time and achieving an accuracy better or

comparable to that of both black and white box techniques)

using diﬀerent algorithmic approaches. In the light of the

above considerations, the proposed techniques can be seen as

instances of ensemble techniques for ML-based learners, spe-

cialized for the case in which one of the learning algorithms

in the ensemble outputs always the same model, namely the

one coded in the AM formulas, which is essentially indepen-

dent of the actual ML training set.

4.1 K Nearest Neighbors

The pseudo-code of the ﬁrst presented technique, which

we call K Nearest Neighbors (KNN), is reported in Algo-

rithm 1. This technique relies on an analytical model, noted

, and on a set γ of M alternative prediction models,

noted γ

, . . . , γ

. These predictors in γ should be selected

to maximize model diversity, which can be achieved in vari-

ous ways. A ﬁrst technique consists in considering diﬀerent

ML algorithms, e.g., DT and ANN. One can also train each

148

learner γ

using a diﬀerent training set, with the purpose

of specializing the various models to predict performance

in diﬀerent regions of the features’ space. Model diversity

can also be promoted by using diﬀerent analytical models

(focused on capturing diﬀerent systems’ dynamics), or even

alternative modeling techniques such as simulation.

The KNN algorithm is initialized via the Init function, by

providing Γ

and a data set of samples, D

=< x

, y

which conveys information on the performance y

∈ C of the

target system over a set of observed conﬁgurations x

∈ F .

The data set D

is not entirely used to train the set Γ of

regressors. Conversely, D

is split into two disjoint data

sets, namely D

regr

and D

val

regr

is used as training set for the learners in Γ, and it

should be obtained by extracting a random subset amount-

ing to a percentage p

regr

of D

. In order to enhance the

representativeness of the samples included in D

regr

, the pro-

cess of extraction of D

regr

from D

is performed by means

of the stratiﬁed sampling technique [2], which ensures that

the distribution of the values y

∈ C is the same in the two

sets.

val

is obtained as the complementary subset of D

regr

∈

, which ensures the disjointness of the two sets D

regr

and

val

by construction. The D

val

is used at query time (Func-

tion Forecast), when one wants to predict the expected

performance of the target system, noted y

, in the conﬁgu-

ration x

. To this end, it is ﬁrst computed the set D

that

contains the k nearest neighbors {x

,. . .,x

}∈ D

val

within

distance c from point x

. The samples in D

, for which we

have available also the corresponding actual performance,

are then used to compute the average accuracy of each of

the models in the set Γ (Line 19). This allows for determin-

ing the model, noted γ

in the pseudo-code (Line 21), which

is expected to maximize prediction accuracy in the region

surrounding x

. Based on this geometrical interpretation,

the c parameter can be interpreted as a cut-oﬀ threshold,

which allows discarding samples of the validation set that

are too far away from x

and which may not be representa-

tive of the target conﬁguration x

The relevance of ensuring the disjointness of D

val

and

can be understood by recalling that samples x∈ D

are used to train the regressors in Γ. Estimating the ac-

curacy of these models using the same samples that were

used to derive the models during the training phase would

lead to signiﬁcantly overestimate the accuracy achievable by,

so called, over-ﬁtted models, i.e., models that minimize (or

even nullify) the error with respect to the conﬁgurations ob-

served during the training phase, but which are unable to

generalize and thus incur large errors even in regions in the

proximity of points contained in the training set.

4.2 Hybrid Boosting

The second algorithm we present applies a well-known

technique from the literature on ensembles of black box

learners, which is known as Boosting [2]. In particular,

as we are considering a regression problem (whereas the

boosting technique was deﬁned for classiﬁcation problems),

we draw inspiration from the Adaptive Logistic Regression

technique [19]. This is a boosting algorithm that was orig-

inally conceived to operate with ML-based regressors, and

which we adapted to support the joint usage of one analyt-

ical model and of a set of black box learners.

Algorithm 2 Hybrid Boosting

1: Set γ

red

= {γ

red

, . . . , γ

red

}  ML regressors for residue pred.

2: Set Γ

red

= {Γ

red

, . . . , Γ

red

}  Models for residue pred.

3: Set Γ

per

= {Γ

per

, Γ

per

, . . . , Γ

per

}  Models for perf. pred.

5: function init(Analytical Model Γ

, Training Set D

)

6: Γ

per

= Γ

 Set the AM as the 1

predictor

7: for m = 1 → M do

8: D

= ∅

9: for each <x

>∈ D

10: y

m,n

= y

− Γ

per

m−1

)  Compute the residual error

11: D

= D

∪ < x

, y

m,n

>  of previous learner

12: end for each

13: Γ

red

= γ

red

)  Train on the residuals

14: β

= argmin

n=1

− (Γ

per

m−1

) + βΓ

red

))

15: Γ

per

= Γ

per

m−1

+ β

red

 Set the m-th predictor

16: end for

17: end function

18: function forecast(x

)

19: return Γ

per

) +

m=1

red

)

20: end function

The pseudo-code of this technique, which we name Hy-

brid Boosting (HyBoost), is reported in Algorithm 2. In

addition to the analytical model Γ

, also in this case we

assume the availability of a set of M regressors based on

machine learning techniques, which we denote γ

red

. Unlike

in KNN, however, these learners are not used to build al-

ternative models of the performance of the target system.

Conversely, the learners are stacked in a chain (i.e., an or-

dered set) and used to learn the error (residue) introduced

by the previous learner in the chain.

More in detail, HyBoost uses two (ordered) sets of predic-

tive models, noted Γ

red

and Γ

per

, composed by, respectively,

m and m + 1 models. The ﬁrst model in Γ

red

, i.e., Γ

red

, is

obtained by training the ﬁrst regressor γ

red

with a training

set D

that characterizes the error (deﬁned as the diﬀerence

between the actual and predicted value) of the analytical

model Γ

for each point in the original training set D

Any other model Γ

red

, with i ∈ [1, M], is trained to learn the

prediction error of the model Γ

per

i−1

, which incorporates the

knowledge of the AM and of the ﬁrst i−1 ML-based learners

by means of the following recurrence equation (Line 15):

per

= Γ

per

m−1

+ β

red

where β

is a coeﬃcient (computed in Line 14) such that the

cumulative training error of the resulting m-stage regressor

is minimized.

The key intuition at the basis of this algorithm, as already

hinted, is that learning the residual errors of an analytical

model may be easier than learning the original function for

which we are trying to build a robust predictor. Also Hy-

Boost, analogously to KNN, can exploit machine learners

using diﬀerent algorithms. Moreover, it may be further ex-

tended and optimized using well-known techniques in the

literature on boosting ML-algorithms, such as adaptively

weighting the elements in the training set of the i-th learner

in order to focus it on minimizing its ﬁtting error on samples

over which the i − 1-th learner incurred the largest errors.

4.3 Probing

We named the last of the three presented techniques Prob-

ing, and we reported its pseudo-code in Algorithm 3. This

approach, which to the best of our knowledge has no direct

149

Enhancing Performance Prediction Robustness by Combining Analytical Modeling and Machine Learning

Figures

Citations

Pattern Recognition and Machine Learning

Performance Prediction for Apache Spark Platform

Machine Learning Methods for Reliable Resource Provisioning in Edge-Cloud Computing: A Survey

Rafiki: a middleware for parameter tuning of NoSQL datastores for dynamic metagenomics workloads

All versus one: an empirical comparison on retrained and incremental machine learning for modeling performance of adaptable software

References

An Algorithm for Least-Squares Estimation of Nonlinear Parameters

Pattern Recognition and Machine Learning

Dynamic Programming

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning (Information Science and Statistics)

Related Papers (5)

Identifying the optimal level of parallelism in transactional memory applications

A Regression-Based Analytic Model for Dynamic Resource Provisioning of Multi-Tier Applications

Starfish: A Self-tuning System for Big Data Analytics.

Benchmarking cloud serving systems with YCSB

Benchmarking Machine Learning Methods for Performance Modeling of Scientific Applications

Frequently Asked Questions (13)

Q1. What have the authors contributed in "Enhancing performance prediction robustness by combining analytical modeling and machine learning∗" ?

Q2. What are the techniques used to identify the parameters’ values for the AM?

Q3. What is the reason why the authors evaluated the effectiveness of their proposals?

Q4. What is the way to ensure that the proposed techniques can be used in case AM?

Q5. How does the paper assess the validity of the proposed ensemble techniques?

Q6. What is the definition of a function in a ML-based model?

Q7. What is the purpose of the ML-based learner?

Q8. What is the use of ML for prediction of performance?

Q9. What is the reason for the lack of a regressor?

Q10. What is the key factor that affects the performance of the proposed solutions?

Q11. What is the way to tune the parameters of a ML ensemble?

Q12. What are the three ensemble techniques that are used to achieve the same objectives?

Q13. Why is Cubist the accurate ML technique?