scispace - formally typeset
Open AccessJournal ArticleDOI

Joint Monitorless Load-Balancing and Autoscaling for Zero-Wait-Time in Data Centers

Reads0
Chats0
TLDR
In this paper, a unified and centralized-monitoring-free architecture achieving both autoscaling and load-balancing, reducing operational overhead while increasing response time performance, is proposed, which can achieve asymptotic zero-wait time with high (and controlable) probability.
Abstract
Cloud architectures achieve scaling through two main functions: (i) load-balancers, which dispatch queries among replicated virtualized application instances, and (ii) autoscalers, which automatically adjust the number of replicated instances to accommodate variations in load patterns. These functions are often provided through centralized load monitoring, incurring operational complexity. This article introduces a unified and centralized-monitoring-free architecture achieving both autoscaling and load-balancing, reducing operational overhead while increasing response time performance. Application instances are virtually ordered in a chain, and new queries are forwarded along this chain until an instance, based on its local load, accepts the query. Autoscaling is triggered by the last application instance, which inspects its average load and infers if its chain is under- or over-provisioned. An analytical model of the system is derived, and proves that the proposed technique can achieve asymptotic zero-wait time with high (and controlable) probability. This result is confirmed by extensive simulations, which highlight close-to-ideal performance in terms of both response time and resource costs.

read more

Content maybe subject to copyright    Report

HAL Id: hal-03171974
https://hal-polytechnique.archives-ouvertes.fr/hal-03171974
Submitted on 17 Mar 2021
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entic research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diusion de documents
scientiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Joint Monitorless Load-Balancing and Autoscaling for
Zero-Wait-Time in Data Centers
Yoann Desmouceaux, Marcel Enguehard, Thomas Heide Clausen
To cite this version:
Yoann Desmouceaux, Marcel Enguehard, Thomas Heide Clausen. Joint Monitorless Load-Balancing
and Autoscaling for Zero-Wait-Time in Data Centers. IEEE Transactions on Network and Service
Management, IEEE, 2021, 18 (1), pp.672-686. �10.1109/TNSM.2020.3045059�. �hal-03171974�

1
Joint Monitorless Load-Balancing and Autoscaling
for Zero-Wait-Time in Data Centers
Yoann Desmouceaux, Marcel Enguehard, Thomas H. Clausen
Abstract—Cloud architectures achieve scaling through two
main functions: (i) load-balancers, which dispatch queries among
replicated virtualized application instances, and (ii) autoscalers,
which automatically adjust the number of replicated instances
to accommodate variations in load patterns. These functions are
often provided through centralized load monitoring, incurring
operational complexity. This paper introduces a unified and
centralized-monitoring-free architecture achieving both autoscal-
ing and load-balancing, reducing operational overhead while
increasing response time performance. Application instances are
virtually ordered in a chain, and new queries are forwarded along
this chain until an instance, based on its local load, accepts the
query. Autoscaling is triggered by the last application instance,
which inspects its average load and infers if its chain is under- or
over-provisioned. An analytical model of the system is derived,
and proves that the proposed technique can achieve asymptotic
zero-wait time with high (and controlable) probability. This result
is confirmed by extensive simulations, which highlight close-to-
ideal performance in terms of both response time and resource
costs.
Index Terms—Load balancing, auto-scaling, segment routing,
application-aware, performance analysis.
I. INTRODUCTION
Virtualization and cloud architectures, wherein different
tenants share computing resources to run their workloads,
have made fast task allocation and deallocation a commodity
primitive in data centers [1]. To optimize costs while pre-
serving Quality of Service (QoS), applications are thus (i)
replicated among multiple instances running, e.g., in containers
or in virtual machines (VMs) [2], [3], and (ii) the number of
aforementioned instances is automatically scaled up or down
to meet a given Service Level Agreement (SLA) [4]. Two
functions enable this: (i) a load-balancer, which dispatches
queries onto identical replicas of the application, and (ii) an
autoscaler, which monitors these instances and automatically
adjusts their number according to the incoming load.
A challenge for network load-balancers is to provide
performance and resiliency while satisfying per-application
SLAs. Some architectures, such as Equal Cost Multi-Path
(ECMP) [5] or Maglev [6], distribute flows among applica-
tion instances pseudo-randomly, forwarding packets without
terminating Layer-4 connections, and thus providing a high
throughput. The use of consistent hashing also provides re-
siliency for when an existing flow is handed over to another
load-balancer [6]. This requires, nonetheless, that flows be
Y. Desmouceaux is with Cisco Systems, 92130 Issy-les-
Moulineaux, France. M. Enguehard is with Polyconseil, 75008
Paris, France. T. H. Clausen is with École Polytechnique, 91120
Palaiseau, France. Emails: ydesmouc@cisco.com, marcel@enguehard.org,
thomas.clausen@polytechnique.edu
assigned to instances regardless of their load state, even though
it has been demonstrated [7] that considering application load
can greatly improve overall performance. Other load-balancing
architectures do take application state into account, by termi-
nating Layer-4 connections [8], and/or by using centralized
monitoring [9] thus incurring both a performance overhead
and a degradation in resiliency.
Similarly, autoscalers use centralized monitoring, with an
external agent gathering load metrics from all servers so as to
make scaling decisions [10], [11]. The delay incurred by an
external agent collecting these metrics causes such decisions to
be made based on out-of-date information. Furthermore, such
agents typically collect external metrics (e.g., CPU load of a
VM as seen by the hypervisor), ignoring application-specific
metrics possibly more suitable for making scaling decisions.
A. Statement of Purpose
While workloads lasting hours or minutes (e.g., data pro-
cessing tasks) can be efficiently scheduled with offline op-
timization algorithms [12], and while sub-millisecond work-
loads require over-provisioning as the time to commission
a new instance is too large as compared to the application
execution time, mid-sized workloads (lasting from 100 ms to
1 s, e.g., Web workloads) are amenable to reactive autoscaling,
as container boot times are typically sub-second [13]. Thus,
in this paper, the problem of mid-sized workloads scalability
under QoS constraints is explored, for replicated applications
deployed, e.g., as containers. In particular, a centralized-
monitoring-free architecture for achieving asymptotic zero-
wait-time is introduced. More precisely, the architecture is
centralized-monitoring-free as it relies on the application
themselves monitoring their load, without piggy-backing in-
formation to a central controller. It yields asymptotic zero-
wait-time in the sense that each incoming query finds, with
probability converging to one as the number of application
instances goes to infinity, an idle application instance. The
architecture relies on two interdependent components: a load-
aware load-balancing algorithm and a decentralized autoscal-
ing policy.
First, a centralizerd-monitoring-free load-balacing algorithm
is introduced: Join-the-First-Idle-Queue (JFIQ). JFIQ relies on
ordering the available application instances in a chain along
which incoming queries are directed. Each of the instances in
the chain makes a local decision based on its load, accepting
the query if it has available capacity, and forwarding the query
to the next instance in the chain otherwise. The proposed
architecture operates entirely within the network layer (Layer-
3) using IPv6 Segment Routing (SRv6) [14], thus removing

2
the need from terminating or proxying network connections.
Second, to achieve asymptotic zero-wait-time, JFIQ is
complemented with a centralizerd-monitoring-free autoscaling
policy which uses the fact that the busyness of the last
instance in the chain is an indicator of the busyness of the
whole system. This allows offloading autoscaling decisions
to that last instance, by measuring its occupancy ratio over
time. Upscaling/downscaling is triggered if that ratio crosses
pre-determined maximum/minimum thresholds. An analytical
model demonstrates the validity of using this autoscaling
policy conjointly with JFIQ to achieve asymptotic zero-wait-
time, and quantifies the behavior of the system in terms of
response time.
Finally, this analytical model is complemented with exten-
sive simulations, capturing the dynamics of the architecture,
and showing that the proposed mechanism allows to precisely
control the tail of the response time distribution. These sim-
ulations illustrate that the propose mechanisms reduce the
resource cost (i.e., the number of necessary instances) for an
identical target response time by an order of magnitude in
the evaluated scenario, when compared to the simpler policies
used in consistent-hashing-based load-balancers.
B. Related work
This section discusses the literature on network load-
balancing (section I-B1) and autoscaling (section I-B2).
1) Load-balancing: The goal of a load-balancer is to as-
sign incoming queries for a given service to one of several
distributed instances of this service. As such, this requires:
(i) selecting the instance so as to minimize response time,
and (ii) making sure that the load-balancer does not become
a bottleneck.
Several load-aware load-balancing algorithms exist [15],
including Random (RND), where queries are assigned ran-
domly to one of n application instances, and Round-Robin
(RR), where the i-th query is assigned to the (i mod n)-th
instance. The optimal policy is the Least-Work-Left (LWL)
policy, which assigns queries to the application instance with
the least amount of pending work [16]. A simpler algorithm
is Join-the-Shortest-Queue (JSQ), which assigns queries to
the least loaded of the application instances. JSQ does not
require knowledge of the remaining work time of currently-
served queries, and provides near-optimal performance [17],
even in high-load regimes [18]. JSQ needs to query the state
of all application instances for each incoming query, which
incurs a monitoring overhead of n messages per query. A
more scalable algorithm, Join-the-Idle-Queue (JIQ), has been
proposed in [19]: queries are assigned to an idle application
instance if one exists, or to a random instance otherwise. This
is implemented by maintaining a centralized idle queue of
the identities of currently idle application instances, minimiz-
ing the monitoring overhead as compared to JSQ. Another
algorithm is Join-the-Shortest-of-d-Queues (JSQ
d
) [7], which
assigns queries to the least loaded of d randomly sampled ap-
plication instances, and which is therefore more decentralized
but less efficient than JIQ (as stated in [20]). The algorithms
algorithm listed above have been analyzed in the heavy-traffic
limit (where the query rate approaches stability), allowing to
quantify the achieved expected waiting time as a function of
the number of application instances [20], [21].
The above has summarized a set of algorithms for assigning
flows to applications, as well as their key performance charac-
teristics. It is equally important to be able to actually distribute
network flows across application instances, at the network
layer. This consists of directing flows (e.g., TCP packets)
corresponding to queries for a given service (described by a
virtual IP address, VIP) to the physical IP address (PIP) of
a deployed instance. This load-balancing function can itself
be replicated, in which case it is deployed behind a layer
of ECMP routers, which can arbitrarily redistribute packets
between load-balancer instances, for new flows as well as for
already-established flows. It is thus necessary to maintain Per-
Connection-Consistency (PCC), i.e., to ensure that already-
established flows are always directed to the same application
instance, regardless of the load-balancer they are handled by.
Maglev [6] and Ananta [22] use a combination of consistent
hashing and per-flow tables to ensure PCC. This has been com-
plemented by enabling hardware-support [23], [24], [25], or by
using in-packet state to maintain PCC [25], [26], [27]. While
providing per-connection consistency, these architectures do
not consider the application instance load, using a naïve RND
policy at the cost of decreased application-performance [15],
[28]. A first step towards considering the load of application
instances is 6LB [29], where consistent hashing is used with
a variant of the JSQ
2
algorithm that assigns queries to the
first available from among two candidate instances. Some
architectures [9] rely on Software-Defined Networking (SDN)
to monitor the network and the servers, and thus make load-
aware decisions but at the cost of a monitoring overhead.
2) Autoscaling: Methods to provide autoscaling have been
classified as reactive and proactive [4]. Reactive methods reg-
ularly gather measurements, and take actions when thresholds
are crossed. For instance, in [10] up/downscaling is triggered
when bounds on some observed metrics are reached; a similar
approach can be found in [11], but with dynamic threshold
adjustment. These incur an overhead from gathering statistics,
and a time gap between detection of violations and appropriate
reaction. Similar threshold-based approaches include [30],
[31], [32].
Conversely, proactive approaches consist of anticipating
state and acting correspondingly. For example, in [33], moving
averages are used to anticipate the future value of metrics
of interests. Similarly, [34] uses Machine Learning (ML) to
classify workloads by their resource allocation preferences,
and in [35], neural networks are used to predict CPU load
trends of application instances and provision resources ac-
cordingly. A Tree-Augmented Naive Bayesian network is used
in [36] to detect SLA violations, and scale resources up when
this happens. In [37], [38], control theory is used to track
CPU usage and to allocate resources accordingly, and in [39],
control theory is used to adapt the amount of CPU resources
allocated to each query so that they complete within a deadline.
While solving the issue of timeliness, proactive approaches
suffer the need to collect statistics and perform centralized
computations.

3
Using queuing theory has also been proposed [40], [41].
In [42], an autoscaling scheme for JIQ is proposed, by creating
a feedback loop that decommissions application instances that
remain idle for a long period of time, and commissions a new
application instance for each new query. In [43], a similar
token-based mechanism is introduced, with a new application
instance being commissioned only when a task only finds busy
instances.
C. Paper Outline
The remainder of this paper is organized as follows. Sec-
tion II gives an overview of the architecture introduced in this
paper. An analytical model for the response time of the system
with a fixed number of instances is introduced in section III,
and the asymptotic behavior of the system if characterized.
Numerical results are given in section IV, along with computa-
tional simulations providing further insight. Finally, section V
concludes this paper.
II. JOINT LOAD-BALANCING AND AUTOSCALING
In this paper, an application is replicated on a set of n
application instances {s
1
, . . . , s
n
} with identical processing
capacities. The goal is to minimize response time, i.e., queries
should be served with zero waiting time, by way of (i) en-
suring that enough application instances are available, and (ii)
mapping the query to an idle application instance. To address
the challenges introduced in sections I-B1 and I-B2, this
goal is attained through joint load-balancing and autoscaling
strategies which provide not only close-to-ideal algorithmic
performance, but which can also be efficiently implemented,
i.e., both the load-balancing and autoscaling functions must
incur minimal state and network overhead. The proposed
architecture relies on three intertwined building blocks: (i) a
load-balancing algorithm that achieves asymptotic zero-wait-
time if the number of application instances is correctly scaled;
(ii) an enhanced IPv6 dataplane to perform query dispatching
in a decentralized and stateless fashion; (iii) a centralized-
monitoring-free autoscaling technique to adapt the number of
application instances while incurring no monitoring cost.
A. Join-the-First-Idle-Queue Load-Balancing
An ideal load-balancing algorithm should achieve asymp-
totic zero-wait time for a properly-scaled set of application
instances. In particular, this is the behaviour of the reference
JIQ policy, which keeps track of available instances by means
of a centralized idle queue, with instances communicating
their availability to a centralized controller upon completion
of a query. The drawbacks of JIQ are twofold: it requires
centralized communication (which can create implementabil-
ity and scalability issues), and it requires centralized load
monitoring if used in conjunction with an autoscaler. To
address these issues, this paper proposes a new load-balancing
technique: Join-the-First-Idle-Queue (JFIQ), which does not
rely on centralized load tracking.
JFIQ relies on ordering the n application instances in a
chain (since the application instances are assumed to have
S2S1
S3 S4
Figure 1. Join-the-First-Idle-Queue LB (Algorithm 1) with n = 4 instances
LB
SYN { c , S1, S2 , S3, v }
SYN { c , v}
S2
accepts
S1
refuses
SYN -AC K { v , S2, LB, c }
S3
SYN { c , S1, S2 , S3, v }
client
data center
Figure 2. Example of SR load-balancing [29] with 3 instances, wherein the
second one accepts the connection.
identical capacity, the actual order of the instances in the chain
does not matter, so long as it remains consistent throughout
the lifetime of the system). Then, JFIQ enforces that each of
the first (n1) instances never serves more than 1 query at a
given time (see figure 1). Formally, each query is forwarded
along the chain (s
1
, . . . , s
n
) of n application instances. Each
instance s
i
6= s
n
in the list either accepts the query if it
currently idle, and otherwise forwards it to the next instance
s
i+1
. To ensure that all queries are served, the last instance s
n
must always accept queries. Thus, each of the first (n1) in-
stances can hold only 0 or 1 query, ensuring zero waiting time
for queries served by those. As shown later in section III-B,
JFIQ allows to predictably control the probability of having
a blocked task (i.e., a task waiting for the last application
instance to become idle) by varying the number n of instances.
B. Network-level JFIQ using SRv6
To achieve JFIQ at the network layer while enabling
application-awareness, this paper leverages the dataplane of
6LB [29] and SHELL [44], summarized in figure 2. This
dataplane is based on SRv6, a source-routing architecture
which allows specifying, within a specific IPv6 Extension
Header [45], a list of segments to be traversed by a given
packet, where each segment is an IPv6 address representing
an instruction to be performed on the packet.
First, a control plane provisions the egress router with a
fixed list of application instances to be used by the JFIQ
algorithm. Then, when a connection establishment packet (e.g.,
a TCP SYN) destined for the VIP is received by the egress
router, it inserts an SRv6 header, with a list of PIPs corre-
sponding to that list of instances. Instances then implement the
JFIQ algorithm as described in algorithm 1, by either handling
the packet locally or forwarding it to the next instance. To
avoid perpetual triangular traffic, a “stickiness” mechanism
is then used to let subsequent packets within this flow be
directed to the instance having accepted the connection [44].
A specific field of the transport header is used as a covert
channel to encode the index of the application instance that
has accepted the connection examples of such fields include
QUIC session ID, low-order bits of TCP timestamps, or high-
order bits of TCP sequence numbers. This field must be able

4
Algorithm 1 Local Connection Request Handling
p connection establishment packet e.g., TCP SYN
v p.lastSegment VIP
b number of busy threads for v
if b = 0 then application instance is available
p.segmentsLeft 0
p.dst v
forward p to local workload v
else forward to next application instance
p.segmentsLeft p.segmentsLeft 1
p.dst p.nextSegment
transfer p to p.dst
end if
S2S1 S3
S2S1 S3
S2S1 S3 S4
λ=1.4
λ=1.7
λ=1.7
upscale
arrivals
increase
Figure 3. Autoscaling when n = 3 and µ = 1. The level of red in each
application instance shows the average number of concurrently-served queries
as computed in section III. When the query rate increases from λ = 1.4 to
λ = 1.7, the third instance observes that it has become highly occupied and
thus requests upscaling.
to be set by the application instance and transparently echoed
in packets sent by the client, thus allowing the ingress router to
statelessly determine to which application instance non-SYN
packets should be forwarded.
Therefore, the load-balancing function does not require
per-flow state, consisting of (i) applying a fixed SR list on
connection establishment packets, or (ii) applying a one-
segment list on other packets, with a destination address that
depends on the value encoded in the covert channel found in
the packet. This makes the load-balancing function simpler,
thus more amenable to low-latency, high-throughput hardware
implementations. Plus, as the functionality performed by the
ingress router does not require any synchronization, it can
be distributed among several routers, yielding scalability and
flexibility.
C. Autoscaling
A key feature of JFIQ (compared, e.g., to JIQ) is that the
last instance has a unique view on whether the system is
overloaded or not. By construction, all instances but the last
only accept queries when idle. This can be exploited to per-
form autoscaling: when the last instance detects that it serves
too many or too few queries, it requests to the control plane
that the chain be scaled up or down. The control plane then
provisions or deprovisions an instance as needed, and updates
the ingress router with the new list of instances to be used
Algorithm 2 Local Autoscaling at Last Application Instance
p
e
, p
e
parameter up/downscaling thresholds
r
avg
parameter average application execution time (1)
W 1000 × r
avg
window size for EWMA
t
0
time() timestamp of last event for EWMA
bp
e
0 EWMA sample of p
e
= P[N
n
= 0]
r 0 number of events
for each connection establishment packet p from client do
r r + 1
N
n
number of busy threads for v
α 1 exp((time() t
0
)/W )
bp
e
(1 α) bp
e
+ α1
{N
n
=0}
t
0
time()
if r > 50 then make sure to have a significant sample
if bp
e
> p
e
then
request downscaling; reset all variables
else if bp
e
< p
e
then
request upscaling; reset all variables
end if
end if
p.segmentsLeft 0
p.dst v
forward p to local workload v
end for
for each connection termination packet p from application do
r r + 1
α 1 exp((time() t
0
)/W )
bp
e
(1 α) bp
e
N
n
was > 0 over the last period
t
0
time()
forward p
end for
1
1.2
1.4
1.6
1.8
2
20 21 22 23 24 25
Expected response time
n=27
n=28
n=29
n=30
n=31
n=32
n=33
n=34
Autoscale
0
0.2
0.4
0.6
0.8
1
1.2
20 21 22 23 24 25
Probability p
e
Request rate =λ/μ
n=27
n=28
n=29
n=30
n=31
n=32
n=33
n=34
Autoscale
Target p
e*
Figure 4. JFIQ autoscaling: example of upscaling for p
e
= 0.4 and ρ
(20, 25): the number n of instances adapts to maintain p
e
within p
e
(thick
line) and p
e
. The top graph depicts the corresponding expected response time
E[T ], numerically computed with the method introduced in section III.
by the load-balancing function. This allows for centralized-
monitoring-free autoscaling, as illustrated in figure 3.
As formalized in Algorithm 2, the last instance in the chain
keeps statistics about its queue size over time. The fraction
of time p
e
during which the last instance is empty is sampled
(with an Exponentially-Weighted Moving Average, EWMA)
and the autoscaling mechanism tries to maintain it close to
a fixed, tunable, target p
e
. When p
e
goes below a threshold
p
e
, the instance triggers upscaling of the chain. Conversely,
when this goes above a threshold p
e
, the instance triggers
downscaling of the chain. To avoid oscillations, the proposed
autoscaling method ensure that p
n1
e
, the fraction of time

Figures
Citations
More filters
Journal ArticleDOI

Dynamic Distributed Multi-Path Aided Load Balancing for Optical Data Center Networks

TL;DR: In this paper , the authors proposed a dynamic distributed multi-path load balancing algorithm that relies on dynamic hashing computing for network flow distribution in DCNs, which dynamically adjusts traffic flow distribution at microsecond level according to the inverse ratio of the buffer occupancy.
Journal ArticleDOI

Asynchronous Load Balancing and Auto-scaling: Mean-Field Limit and Optimal Design

Jonatha Anselmi
- 04 Apr 2022 - 
TL;DR: A Markovian framework for load balancing where classical algorithms such as Power-of- d are combined with asynchronous auto-scaling features that allow the net service capacity to scale up or down in response to the current load within the same timescale of job dynamics is introduced.
References
More filters
Journal ArticleDOI

A Proof for the Queuing Formula: L = λW

TL;DR: In this paper, it was shown that if the three means are finite and the corresponding stochastic processes strictly stationary, and if the arrival process is metrically transitive with nonzero mean, then L = λW.
Journal ArticleDOI

The power of two choices in randomized load balancing

TL;DR: This work uses a limiting, deterministic model representing the behavior as n/spl rarr//spl infin/ to approximate the behavior of finite systems and provides simulations that demonstrate that the method accurately predicts system behavior, even for relatively small systems.
Journal ArticleDOI

Adaptive load sharing in homogeneous distributed systems

TL;DR: It is shown that extremely simple adaptive load sharing policies, which collect very small amounts of system state information and which use this information in very simple ways, yield dramatic performance improvements.
Journal ArticleDOI

Survey of virtual machine research

TL;DR: The complete instruction-by-instruction simulation of one computer system on a different system is a well-known computing technique often used for software development when a hardware base is being altered.
Journal ArticleDOI

Containers and Cloud: From LXC to Docker to Kubernetes

TL;DR: Docker, an open source project that automates the faster deployment of Linux applications, and Kubernetes, a open source cluster manager for Docker containers, are looked at.
Related Papers (5)
Frequently Asked Questions (8)
Q1. What are the contributions mentioned in the paper "Joint monitorless load-balancing and autoscaling for zero-wait-time in data centers" ?

This paper introduces a unified and centralized-monitoring-free architecture achieving both autoscaling and load-balancing, reducing operational overhead while increasing response time performance. 

In [37], [38], control theory is used to track CPU usage and to allocate resources accordingly, and in [39], control theory is used to adapt the amount of CPU resources allocated to each query so that they complete within a deadline. 

Several load-aware load-balancing algorithms exist [15], including Random (RND), where queries are assigned randomly to one of n application instances, and Round-Robin (RR), where the i-th query is assigned to the (i mod n)-th instance. 

With the JFIQ algorithm, when exposed to a query rate increase, the last instance might have to accept an important number of queries before deciding that upscaling is necessary. 

Due to taking the load of all instances into account, JFIQ autoscaling performs better than policies RND and JSQ2 (when ρ > 1.2), respectively, and yields results close to those of the reference policy JIQ. 

In particular, query rates vary between 300 and 700 req/s, and the expected number of queries injected into the system is:∫ 864000 λ(t)dt = 43.2 ·106. 

To evaluate the performance of JFIQ when using a fixed number of instances, the expected number of queries handled by the system is computed (as described in section III-B) as a function of the query rate ρ, for different values of the number n of instances. 

Each application instance has an identical processing capacity µ > 0, with exponentially-distributed service times (i.e., the probability of a query completing in less than t is 1− e−µt).