In this paper, a unified and centralized-monitoring-free architecture achieving both autoscaling and load-balancing, reducing operational overhead while increasing response time performance, is proposed, which can achieve asymptotic zero-wait time with high (and controlable) probability.
Abstract:
Cloud architectures achieve scaling through two main functions: (i) load-balancers, which dispatch queries among replicated virtualized application instances, and (ii) autoscalers, which automatically adjust the number of replicated instances to accommodate variations in load patterns. These functions are often provided through centralized load monitoring, incurring operational complexity. This article introduces a unified and centralized-monitoring-free architecture achieving both autoscaling and load-balancing, reducing operational overhead while increasing response time performance. Application instances are virtually ordered in a chain, and new queries are forwarded along this chain until an instance, based on its local load, accepts the query. Autoscaling is triggered by the last application instance, which inspects its average load and infers if its chain is under- or over-provisioned. An analytical model of the system is derived, and proves that the proposed technique can achieve asymptotic zero-wait time with high (and controlable) probability. This result is confirmed by extensive simulations, which highlight close-to-ideal performance in terms of both response time and resource costs.
TL;DR: In this paper , the authors proposed a dynamic distributed multi-path load balancing algorithm that relies on dynamic hashing computing for network flow distribution in DCNs, which dynamically adjusts traffic flow distribution at microsecond level according to the inverse ratio of the buffer occupancy.
TL;DR: A Markovian framework for load balancing where classical algorithms such as Power-of- d are combined with asynchronous auto-scaling features that allow the net service capacity to scale up or down in response to the current load within the same timescale of job dynamics is introduced.
TL;DR: In this paper, it was shown that if the three means are finite and the corresponding stochastic processes strictly stationary, and if the arrival process is metrically transitive with nonzero mean, then L = λW.
TL;DR: This work uses a limiting, deterministic model representing the behavior as n/spl rarr//spl infin/ to approximate the behavior of finite systems and provides simulations that demonstrate that the method accurately predicts system behavior, even for relatively small systems.
TL;DR: It is shown that extremely simple adaptive load sharing policies, which collect very small amounts of system state information and which use this information in very simple ways, yield dramatic performance improvements.
TL;DR: The complete instruction-by-instruction simulation of one computer system on a different system is a well-known computing technique often used for software development when a hardware base is being altered.
TL;DR: Docker, an open source project that automates the faster deployment of Linux applications, and Kubernetes, a open source cluster manager for Docker containers, are looked at.
Q1. What are the contributions mentioned in the paper "Joint monitorless load-balancing and autoscaling for zero-wait-time in data centers" ?
This paper introduces a unified and centralized-monitoring-free architecture achieving both autoscaling and load-balancing, reducing operational overhead while increasing response time performance.
Q2. In what theory is used to track CPU usage and allocate resources?
In [37], [38], control theory is used to track CPU usage and to allocate resources accordingly, and in [39], control theory is used to adapt the amount of CPU resources allocated to each query so that they complete within a deadline.
Q3. What are some of the common load-balancing algorithms?
Several load-aware load-balancing algorithms exist [15], including Random (RND), where queries are assigned randomly to one of n application instances, and Round-Robin (RR), where the i-th query is assigned to the (i mod n)-th instance.
Q4. How many times did the last instance have to accept an important number of queries?
With the JFIQ algorithm, when exposed to a query rate increase, the last instance might have to accept an important number of queries before deciding that upscaling is necessary.
Q5. Why does JFIQ autoscaling perform better than RND and JSQ2?
Due to taking the load of all instances into account, JFIQ autoscaling performs better than policies RND and JSQ2 (when ρ > 1.2), respectively, and yields results close to those of the reference policy JIQ.
Q6. what is the expected number of queries injected into the system?
In particular, query rates vary between 300 and 700 req/s, and the expected number of queries injected into the system is:∫ 864000 λ(t)dt = 43.2 ·106.
Q7. What is the simplest way to evaluate the performance of a JFIQ?
To evaluate the performance of JFIQ when using a fixed number of instances, the expected number of queries handled by the system is computed (as described in section III-B) as a function of the query rate ρ, for different values of the number n of instances.
Q8. What is the probability of a query completing in less than t?
Each application instance has an identical processing capacity µ > 0, with exponentially-distributed service times (i.e., the probability of a query completing in less than t is 1− e−µt).