scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Universality of load balancing schemes on the diffusion scale

01 Dec 2016-Journal of Applied Probability (University of Sheffield)-Vol. 53, Iss: 4, pp 1111-1124
TL;DR: A stochastic coupling construction is developed to obtain the diffusion limit of the queue process in the Halfin‒Whitt heavy-traffic regime, and it is established that it does not depend on the value of d, implying that assigning tasks to idle servers is sufficient for diffusion level optimality.
Abstract: We consider a system of N parallel queues with identical exponential service rates and a single dispatcher where tasks arrive as a Poisson process. When a task arrives, the dispatcher always assigns it to an idle server, if there is any, and to a server with the shortest queue among d randomly selected servers otherwise (1≤d≤N). This load balancing scheme subsumes the so-called join-the-idle queue policy (d=1) and the celebrated join-the-shortest queue policy (d=N) as two crucial special cases. We develop a stochastic coupling construction to obtain the diffusion limit of the queue process in the Halfin‒Whitt heavy-traffic regime, and establish that it does not depend on the value of d, implying that assigning tasks to idle servers is sufficient for diffusion level optimality.
Citations
More filters
Posted Content
TL;DR: It is demonstrated how Stochastic coupling techniques and stochastic-process limits play an instrumental role in establishing the asymptotic optimality and carries over to infinite-server settings, finite buffers, multiple dispatchers, servers arranged on graph topologies, and token-based load balancing including the popular Join-the-Idle-Queue (JIQ) scheme.
Abstract: The basic load balancing scenario involves a single dispatcher where tasks arrive that must immediately be forwarded to one of $N$ single-server queues. We discuss recent advances on scalable load balancing schemes which provide favorable delay performance when $N$ grows large, and yet only require minimal implementation overhead. Join-the-Shortest-Queue (JSQ) yields vanishing delays as $N$ grows large, as in a centralized queueing arrangement, but involves a prohibitive communication burden. In contrast, power-of-$d$ or JSQ($d$) schemes that assign an incoming task to a server with the shortest queue among $d$ servers selected uniformly at random require little communication, but lead to constant delays. In order to examine this fundamental trade-off between delay performance and implementation overhead, we consider JSQ($d(N)$) schemes where the diversity parameter $d(N)$ depends on $N$ and investigate what growth rate of $d(N)$ is required to asymptotically match the optimal JSQ performance on fluid and diffusion scale. Stochastic coupling techniques and stochastic-process limits play an instrumental role in establishing the asymptotic optimality. We demonstrate how this methodology carries over to infinite-server settings, finite buffers, multiple dispatchers, servers arranged on graph topologies, and token-based load balancing including the popular Join-the-Idle-Queue (JIQ) scheme. In this way we provide a broad overview of the many recent advances in the field. This survey extends the short review presented at ICM 2018 (arXiv:1712.08555).

57 citations


Cites background or methods from "Universality of load balancing sche..."

  • ...Instead, the asymptotic equivalence results in [116, 118, 119] are derived by relating the relevant system occupancy processes to the corresponding processes under a JSQ policy, and showing that the deviation between these processes is asymptotically negligible on either fluid scale or diffusion scale under suitable assumptions on d(N) or GN ....

    [...]

  • ...In this survey we highlight the stochastic coupling techniques that played an instrumental role in proving the asymptotic equivalence results in [116, 118, 119]....

    [...]

  • ...1 relies on a novel coupling construction introduced in [118] as described below in detail....

    [...]

  • ...Indeed, the specific coupling arguments that were developed in [116, 118, 119] are different from those that were originally used in establishing the stochastic optimality properties of the JSQ policy....

    [...]

  • ...We now turn to the diffusion limit of the JIQ scheme established in [118]....

    [...]

Posted Content
TL;DR: It is proved that the diffusion limit is exponentially ergodic, and the diffusion scaled sequence of the steady-state number of idle servers and non-empty buffers is tight, which means that the process-level convergence proved in Eschenfeldt & Gamarnik (2015) implies convergence of steady- state distributions.
Abstract: This paper studies the steady-state properties of the Join the Shortest Queue model in the Halfin-Whitt regime. We focus on the process tracking the number of idle servers, and the number of servers with non-empty buffers. Recently, Eschenfeldt & Gamarnik (2015) proved that a scaled version of this process converges, over finite time intervals, to a two-dimensional diffusion limit as the number of servers goes to infinity. In this paper we prove that the diffusion limit is exponentially ergodic, and that the diffusion scaled sequence of the steady-state number of idle servers and non-empty buffers is tight. Our results mean that the process-level convergence proved in Eschenfeldt & Gamarnik (2015) implies convergence of steady-state distributions. The methodology used is the generator expansion framework based on Stein's method, also referred to as the drift-based fluid limit Lyapunov function approach in Stolyar (2015). One technical contribution to the framework is to show that it can be used as a general tool to establish exponential ergodicity.

41 citations


Cites background or methods from "Universality of load balancing sche..."

  • ...The following result is copied from [32] (but it was first proved in [9])....

    [...]

  • ...In the asymptotic regime where n→∞, all previous considerations of the diffusionscaled model [9, 18, 32] have been in the transient setting....

    [...]

  • ...In [32], the authors work in the Halfin-Whitt regime and show that JIQ is asymptotically optimal, and therefore asymptotically equivalent to JSQ, on the diffusion scale....

    [...]

Journal ArticleDOI
TL;DR: In this paper, a multi-router generalization of the pull-based customer assignment (routing) algorithm PULL, introduced in Stolyar (Queueing Syst 80(4): 341---361, 2015) for the singlerouter model, is studied.
Abstract: The model is a service system, consisting of several large server pools. A server's processing speed and buffer size (which may be finite or infinite) depend on the pool. The input flow of customers is split equally among a fixed number of routers, which must assign customers to the servers immediately upon arrival. We consider an asymptotic regime in which the total customer arrival rate and pool sizes scale to infinity simultaneously, in proportion to a scaling parameter n, while the number of routers remains fixed. We define and study a multi-router generalization of the pull-based customer assignment (routing) algorithm PULL, introduced in Stolyar (Queueing Syst 80(4): 341---361, 2015) for the single-router model. Under the PULL algorithm, when a server becomes idle it sends a "pull-message" to a randomly uniformly selected router; each router operates independently--it assigns an arriving customer to a server according to a randomly uniformly chosen available (at this router) pull-message, if there is any, or to a randomly uniformly selected server in the entire system otherwise. Under Markov assumptions (Poisson arrival process and independent exponentially distributed service requirements), and under subcritical system load, we prove asymptotic optimality of PULL: as $$n\rightarrow \infty $$nźź, the steady-state probability of an arriving customer experiencing blocking or waiting vanishes. Furthermore, PULL has an extremely low router---server message exchange rate of one message per customer. These results generalize some of the single-router results in Stolyar (2015).

40 citations

Proceedings ArticleDOI
05 Jun 2017
TL;DR: It is proved that both the waiting time of tasks and the relative energy portion consumed by idle servers vanish in the limit of the proposed auto-scaling and load balancing scheme, thus ensuring scalability in massive data center operations.
Abstract: A fundamental challenge in large-scale cloud networks and data centers is to achieve highly efficient server utilization and limit energy consumption, while providing excellent user-perceived performance in the presence of uncertain and time-varying demand patterns. Auto-scaling provides a popular paradigm for automatically adjusting service capacity in response to demand while meeting performance targets, and queue-driven auto-scaling techniques have been widely investigated in the literature. In typical data center architectures and cloud environments however, no centralized queue is maintained, and load balancing algorithms immediately distribute incoming tasks among parallel queues. In these distributed settings with vast numbers of servers, centralized queue-driven auto-scaling techniques involve a substantial communication overhead and major implementation burden, or may not even be viable at all.Motivated by the above issues, we propose a joint auto-scaling and load balancing scheme which does not require any global queue length information or explicit knowledge of system parameters, and yet provides provably near-optimal service elasticity. We establish the fluid-level dynamics for the proposed scheme in a regime where the total traffic volume and nominal service capacity grow large in proportion. The fluid-limit results show that the proposed scheme achieves asymptotic optimality in terms of user-perceived delay performance as well as energy consumption. Specifically, we prove that both the waiting time of tasks and the relative energy portion consumed by idle servers vanish in the limit. At the same time, the proposed scheme operates in a distributed fashion and involves only constant communication overhead per task, thus ensuring scalability in massive data center operations. Extensive simulation experiments corroborate the fluid-limit results, and demonstrate that the proposed scheme can match the user performance and energy consumption of state-of-the-art approaches that do take full advantage of a centralized queue.

32 citations

Proceedings ArticleDOI
01 May 2019
TL;DR: An overview of scalable load balancing algorithms which provide favorable delay performance in large-scale systems, and yet only require minimal implementation overhead is presented, and the results demonstrate that the asymptotics for the JSQ($d) policy are insensitive to the exact growth rate of $d(N)$, as long as the latter is sufficiently fast, implying that the optimality of the JSZ policy can asymptonically be preserved while dramatically reducing the communication overhead.
Abstract: We present an overview of scalable load balancing algorithms which provide favorable delay performance in large-scale systems, and yet only require minimal implementation overhead. Aimed at a broad audience, the paper starts with an introduction to the basic load balancing scenario, consisting of a single dispatcher where tasks arrive that must immediately be forwarded to one of N single-server queues. A popular class of load balancing algorithms are so-called power-of-d or JSQ(d) policies, where an incoming task is assigned to a server with the shortest queue among d servers selected uniformly at random. This class includes the Join-the-Shortest-Queue (JSQ) policy as a special case (d=N), which has strong stochastic optimality properties and yields a mean waiting time that vanishes as N grows large for any fixed subcritical load. However, a nominal implementation of the JSQ policy involves a prohibitive communication burden in large-scale deployments. In contrast, a random assignment policy (d=1) does not entail any communication overhead, but the mean waiting time remains constant as N grows large for any fixed positive load. In order to examine the fundamental trade-off between performance and implementation overhead, we consider an asymptotic regime where d(N) depends on N. We investigate what growth rate of d(N) is required to match the performance of the JSQ policy on fluid and diffusion scale. The results demonstrate that the asymptotics for the JSQ(d(N)) policy are insensitive to the exact growth rate of d(N), as long as the latter is sufficiently fast, implying that the optimality of the JSQ policy can asymptotically be preserved while dramatically reducing the communication overhead. We additionally show how the communication overhead can be reduced yet further by the so-called Join-the-Idle-Queue scheme, leveraging memory at the dispatcher.

29 citations

References
More filters
Journal ArticleDOI
TL;DR: This work uses a limiting, deterministic model representing the behavior as n/spl rarr//spl infin/ to approximate the behavior of finite systems and provides simulations that demonstrate that the method accurately predicts system behavior, even for relatively small systems.
Abstract: We consider the following natural model: customers arrive as a Poisson stream of rate /spl lambda/n, /spl lambda/<1, at a collection of n servers. Each customer chooses some constant d servers independently and uniformly at random from the n servers and waits for service at the one with the fewest customers. Customers are served according to the first-in first-out (FIFO) protocol and the service time for a customer is exponentially distributed with mean 1. We call this problem the supermarket model. We wish to know how the system behaves and in particular we are interested in the effect that the parameter d has on the expected time a customer spends in the system in equilibrium. Our approach uses a limiting, deterministic model representing the behavior as n/spl rarr//spl infin/ to approximate the behavior of finite systems. The analysis of the deterministic model is interesting in its own right. Along with a theoretical justification of this approach, we provide simulations that demonstrate that the method accurately predicts system behavior, even for relatively small systems. Our analysis provides surprising implications. Having d=2 choices leads to exponential improvements in the expected time a customer spends in the system over d=1, whereas having d=3 choices is only a constant factor better than d=2. We discuss the possible implications for system design.

1,444 citations


"Universality of load balancing sche..." refers background in this paper

  • ...Mean-field limit theorems in [9] and [15] indicate that even a value as small as d = 2 yields significant performance improvements in a many-server regime, in the sense that the tail of the queue length distribution at each individual server falls off much more rapidly compared to a strictly random assignment policy (d = 1)....

    [...]

Journal ArticleDOI
TL;DR: In this article, two different kinds of heavy-traffic limit theorems have been proved for s-server queues: the first involves a sequence of queueing systems having a fixed number of servers with an associated sequence of traffic intensities that converges to the critical value of one from below.
Abstract: Two different kinds of heavy-traffic limit theorems have been proved for s-server queues. The first kind involves a sequence of queueing systems having a fixed number of servers with an associated sequence of traffic intensities that converges to the critical value of one from below. The second kind, which is often not thought of as heavy traffic, involves a sequence of queueing systems in which the associated sequences of arrival rates and numbers of servers go to infinity while the service time distributions and the traffic intensities remain fixed, with the traffic intensities being less than the critical value of one. In each case the sequence of random variables depicting the steady-state number of customers waiting or being served diverges to infinity but converges to a nondegenerate limit after appropriate normalization. However, in an important respect neither procedure adequately represents a typical queueing system in practice because in the (heavy-traffic) limit an arriving customer is either a...

740 citations

Journal ArticleDOI
TL;DR: This work proposes a novel class of algorithms called Join-Idle-Queue (JIQ) for distributed load balancing in large systems, which effectively results in a reduced system load and produces 30-fold reduction in queueing overhead compared to Power-of-Two at medium to high load.

393 citations


"Universality of load balancing sche..." refers methods in this paper

  • ...Observe that the JIQ(N) scheme coincides with the ordinary JSQ policy, while the JIQ(1) scheme corresponds to the so-called Join-the-Idle-Queue (JIQ) policy considered in [1, 8, 12]....

    [...]

  • ...As mentioned earlier, the above-described scheme coincides with the ordinary Join-theShortest-Queue (JSQ) policy when d = N , and corresponds to the so-called Join-the-IdleQueue (JIQ) policy considered in [1, 8, 12] when d = 1....

    [...]

Journal ArticleDOI
TL;DR: In this article, the authors consider a queuing system with several identical servers, each with its own queue, and assign the arriving customers so as to maximize the number of customers which complete their service by a certain time.
Abstract: We consider a queuing system with several identical servers, each with its own queue. Identical customers arrive according to some stochastic process and as each customer arrives it must be assigned to some server's queue. No jockeying amongst the queues is allowed. We are interested in assigning the arriving customers so as to maximize the number of customers which complete their service by a certain time. If each customer's service time is a random variable with a non-decreasing hazard rate then the strategy which does this is one which assigns each arrival to the shortest queue.

332 citations


"Universality of load balancing sche..." refers background in this paper

  • ...In this canonical case, the so-called join-the-shortest-queue (JSQ) policy has several strong optimality properties, and, in particular, minimizes the overall mean delay among the class of nonanticipating load balancing policies that do not have any advance knowledge of the service requirements [3], [16], [18]....

    [...]

Journal ArticleDOI
TL;DR: If the queue lengths at both servers are observed then the Optimal decision is to route jobs to the shorter queue, whereas if the queue lenths are not observed then it is best to alternate between queues, provided the initial distribution of the two queue sizes is the same.
Abstract: As jobs arrive they have to be routed to one of two similar exponential servers. It is shown that if the queue lengths at both servers are observed then the Optimal decision is to route jobs to the shorter queue, whereas if the queue lenths are not observed then it is best to alternate between queues, provided the initial distribution of the two queue sizes is the same. The optimality of these routing strategies is independent of the statistics of the job arrivals.

331 citations


"Universality of load balancing sche..." refers background in this paper

  • ...In this canonical case, the so-called join-the-shortest-queue (JSQ) policy has several strong optimality properties, and, in particular, minimizes the overall mean delay among the class of nonanticipating load balancing policies that do not have any advance knowledge of the service requirements [3], [16], [18]....

    [...]