scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

A simple approach for adapting continuous load balancing processes to discrete settings

TL;DR: A general method that converts a wide class of continuous neighborhood load balancing algorithms into a discrete version that achieves asymptotically lower discrepancies and presents a randomized version of the algorithm balancing the load if the initial load on every node is large enough.
Abstract: We introduce a general method that converts a wide class of continuous neighborhood load balancing algorithms into a discrete version. Assume that initially the tasks are arbitrarily distributed among the nodes of a graph. In every round every node is allowed to communicate and exchange load with an arbitrary subset of its neighbors. The goal is to balance the load as evenly as possible. Continuous load balancing algorithms that are allowed to split tasks arbitrarily can balance the load perfectly, so that every node has exactly the same load. Discrete load balancing algorithms are not allowed to split tasks and therefore cannot balance the load perfectly. In this paper we consider the problem in a very general setting, where the tasks can have arbitrary weights and the nodes can have different speeds. Given a neighborhood load balancing algorithm that balances the load perfectly in t rounds, we convert the algorithm into a discrete version. This new algorithm is deterministic and balances the load in t rounds so that the difference between the average and the maximum load is at most 2d•wmax, where d is the maximum degree of the network and wmax is the maximum weight of any task. Compared to the previous methods that work for general graphs [12], our method achieves asymptotically lower discrepancies (e.g. O(1) vs. O(log n) for constant-degree expanders and O(r) vs. O(n1/r) for r-dimensional tori) in the same number of rounds. For the case of uniform weights we present a randomized version of our algorithm balancing the load so that the difference between the minimum and the maximum load is at most O√dlog n) if the initial load on every node is large enough.

Summary (2 min read)

1 Introduction

  • In this paper the authors consider the problem of neighbourhood load balancing in arbitrary networks.
  • The tasks can have arbitrary weights; the weight of ∗This paper is an extended version of [6].
  • Neighbourhood load balancing algorithms usually work in synchronous rounds.
  • These matchings are then used periodically (periodic matching model).
  • Here all the nodes balance their load with all their neighbours.

1.1 New Results

  • In every round the discrete algorithm imitates the continuous algorithm as closely as possible by trying to send the same amount of load over every edge as the continuous algorithm.
  • That would incur communication overhead proportional to the number of dummy tokens.
  • Furthermore, let T be the time it takes for the continuous process to balance the load (more or less) completely (see Section 3 for details).
  • An additive algorithm, starting with a load distribution D = D1 +D2, transmits the same amount of tasks over every edge as the sum of the amounts it would transmit in 1The discrete version of the algorithm has to know the continuous flow f ce (t) for every edge e = (u,v).
  • Algorithm 1 achieves a final max-min discrepancy independent of n and graph expansion, and in particular, the only algorithm achieving constant max-min discrepancy for all constant-degree graphs.

2 Existing Algorithms and Techniques

  • The authors give an overview of the results on continuous (Section 2.1) and discrete neighbourhood load balancing (Section 2.2) only.
  • The authors will not consider these models here any further.
  • When not stated otherwise, the results are for the uniform case without speeds and weights.
  • In the following the authors will consider the results both in the discrete and the continuous settings.

2.1 Continuous Load Balancing

  • The first diffusion algorithm (also called first order schedule, FOS) was independently introduced by Cybenko [15] and Boillat [12].
  • Their results were later generalized to the case of non-uniform speeds in [20].
  • To introduce the FOS process the authors first need some additional notation.
  • The SOS method is inspired by a numerical iterative method called successive over-relaxation.
  • The model was originally introduced in [30], together with a distributed edge-colouring algorithm (see also [35, 36]) that can be used to construct the matchings.

2.2 Discrete Load Balancing

  • As far as the authors know, existing papers consider only discrete algorithms in the uniform task model.
  • (7) For FOS schemes, [34] left it as an open question to analyze the potential drop when the potential is smaller than O(d2n2).
  • All the edges are assigned weights proportional to their scheduled load transfer.
  • When the continuous flow is rounded down, the final discrepancy is Ω(d · diam(G)) for a discrete FOS process [26, 27] and Ω(diam(G)) for a discrete process in the matching model [27].

2.3 Improved Processes for Discrete Load Balancing

  • The next three subsections discuss three different approaches that were used in order to reduce the difference (caused by the rounding error) in the load distribution between discrete and continuous balancing processes.
  • The authors combine the approach of [37] with analysis techniques for randomized algorithms to show improved discrepancy bounds for general graphs.
  • Note that it is possible to get similar results if the excess tokens are sent to neighbours chosen randomly with replacement or if the neighbours are chosen in a roundrobin fashion with a random starting point [5].
  • Note that this algorithm might also create negative load on some of the nodes.

3 Notation and Basic Facts

  • Initially there are in total m tasks which are assigned arbitrarily to the n nodes of the graph G. Tasks may be of different integer weights and the maximum task weight is denoted by wmax.
  • Consider a continuous process A. For the transformations introduced by Algorithm 1 and Algorithm 2, the authors require initial load vectors that do not lead to negative load in the continuous case; that is, they need to ensure that when executing A, the outgoing demand of a node never exceeds its available load.
  • Consider a load balancing process A. Let x′, and x′′ be nonnegative load vectors.
  • The next lemma shows that the class of additive terminating processes includes several well known existing processes.

4 Deterministic Flow Imitation

  • The authors present and analyze an algorithm that transforms a continuous process A into its discrete counterpart which they call D(A).
  • The authors also note that in actual implementation they do not need to create and transfer workload units and consume communication bandwidth for each dummy token.
  • For other algorithms, the result in part (1) of the above theorem automatically holds, and the condition in part (2) can be translated as having sufficient initial load.
  • Now, the result can be obtained using Observation 4.

5 Randomized Flow Imitation

  • Instead of always rounding down the flow that has to be sent over an edge, Algorithm 2 uses randomized rounding.
  • Then each of the random variables Ei, j(t) can assume at most two different values and rounding up or down is independent of other edges (see part (3) of Observation 9).
  • The next lemma provides the two main ingredients for proving the Theorem 8.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: An improved load-balancing algorithm is proposed that will be effectively executed within the constructed FSW, where nodes consider the capacity and calculate the average effective-load, and compared with two significant diffusion methods presented in the literature.

44 citations


Additional excerpts

  • ...Neighborhood load balancing algorithms Akbari et al., 2012) are diffusion algorithm that have the advantage hat they are very simple and that the vertices do not need any global nformation to base their balancing decisions on....

    [...]

Proceedings ArticleDOI
23 Jul 2013
TL;DR: Viewing the parallel rotor walk as a load balancing process, it is proved that the rotor walk falls in the class of bounded-error diffusion processes introduced in [11], which gives discrepancy bounds of O(log3/2 n) and O(1) for hypercube and r-dimensional torus with r=O(1), respectively, which improve over the best existing bounds.
Abstract: We study the parallel rotor walk process, which works as follows: Consider a graph along with an arbitrary distribution of tokens over its nodes. Every node is equipped with a rotor that points to its neighbours in a fixed circular order. In each round, every node distributes all of its tokens using the rotor. One token is allocated to the neighbour pointed at by the rotor, then the rotor moves to the subsequent neighbour, and so on, until no token remains.The process can be considered as a deterministic analogue of a process in which tokens perform one independent random walk step in each round. We compare the distribution of tokens in the rotor walk process with expected distribution in the random walk model. The similarity between the two processes is measured by their discrepancy, which is the maximum difference between the corresponding distribution entries over all rounds and nodes. We analyze a lazy variation of rotor walks that simulates a random walk with loop probability of 1/2 on each node, and each node sends not all its tokens, but every other token in each round.Viewing the rotor walk as a load balancing process, we prove that the rotor walk falls in the class of bounded-error diffusion processes introduced in [11]. This gives us discrepancy bounds of O(log3/2n) and O(1) for hypercube and r-dimensional torus with r=O(1), respectively, which improve over the best existing bounds of O(log2n) and O(n1/r). Also, as a result of switching to the load balancing view, we observe that the existing load balancing results can be translated to rotor walk discrepancy bounds not previously noticed in the rotor walk literature.We also use the idea of rotor walks to propose and analyze a randomized rounding discrete load balancing process that achieves the same balancing quality as similar protocols [11, 3], but uses fewer number of random bits compared to [3], and avoids the negative load problem of [11].

35 citations

Proceedings ArticleDOI
21 Jul 2015
TL;DR: In this article, the authors consider the problem of deterministic load balancing of tokens in the discrete model, where each node exchanges some of its tokens with each of its neighbors in the network.
Abstract: We consider the problem of deterministic load balancing of tokens in the discrete model. A set of n processors is connected into a d-regular undirected network. In every time step, each processor exchanges some of its tokens with each of its neighbors in the network. The goal is to minimize the discrepancy between the number of tokens on the most-loaded and the least-loaded processor as quickly as possible. Rabani et al. (1998) present a general technique for the analysis of a wide class of discrete load balancing algorithms. Their approach is to characterize the deviation between the actual loads of a discrete balancing algorithm with the distribution generated by a related Markov chain. The Markov chain can also be regarded as the underlying model of a continuous diffusion algorithm. Rabani et al. showed that after time T = O(log (Kn)/μ), any algorithm of their class achieves a discrepancy of O(d log n/μ), where μ is the spectral gap of the transition matrix of the graph, and K is the initial load discrepancy in the system.In this work we identify some natural additional conditions on deterministic balancing algorithms, resulting in a class of algorithms reaching a smaller discrepancy. This class contains well-known algorithms, e.g., the rotor-router. Specifically, we introduce the notion of cumulatively fair load-balancing algorithms where in any interval of consecutive time steps, the total number of tokens sent out over an edge by a node is the same (up to constants) for all adjacent edges. We prove that algorithms which are cumulatively fair and where every node retains a sufficient part of its load in each step, achieve a discrepancy of O(d√log n/μ ,d√n) in time O(T). We also show that in general neither of these assumptions may be omitted without increasing discrepancy. We then show by a combinatorial potential reduction argument that any cumulatively fair scheme satisfying some additional assumptions achieves a discrepancy of O(d) almost as quickly as the continuous diffusion process. This positive result applies to some of the simplest and most natural discrete load balancing schemes.

20 citations

Posted Content
TL;DR: In this paper, the authors consider the problem of balancing load items (tokens) in networks, and show that for any regular network in the matching model, all nodes have the same load up to an additive constant in (asymptotically) the same number of rounds as required in the continuous case.
Abstract: We consider the problem of balancing load items (tokens) in networks. Starting with an arbitrary load distribution, we allow nodes to exchange tokens with their neighbors in each round. The goal is to achieve a distribution where all nodes have nearly the same number of tokens. For the continuous case where tokens are arbitrarily divisible, most load balancing schemes correspond to Markov chains, whose convergence is fairly well-understood in terms of their spectral gap. However, in many applications, load items cannot be divided arbitrarily, and we need to deal with the discrete case where the load is composed of indivisible tokens. This discretization entails a non-linear behavior due to its rounding errors, which makes this analysis much harder than in the continuous case. We investigate several randomized protocols for different communication models in the discrete case. As our main result, we prove that for any regular network in the matching model, all nodes have the same load up to an additive constant in (asymptotically) the same number of rounds as required in the continuous case. This generalizes and tightens the previous best result, which only holds for expander graphs, and demonstrates that there is almost no difference between the discrete and continuous cases. Our results also provide a positive answer to the question of how well discrete load balancing can be approximated by (continuous) Markov chains, which has been posed by many researchers.

19 citations

Journal ArticleDOI
TL;DR: A deterministic and randomized version of the algorithm that balances the load up to a discrepancy of $$\mathscr {O}(\sqrt{d \log n})$$O(dlogn) provided that the initial load on every node is large enough.
Abstract: We consider the neighbourhood load balancing problem. Given a network of processors and an arbitrary distribution of tasks over the network, the goal is to balance load by exchanging tasks between neighbours. In the continuous model, tasks can be arbitrarily divided and perfectly balanced state can always be reached. This is not possible in the discrete model where tasks are non-divisible. In this paper we consider the problem in a very general setting, where the tasks can have arbitrary weights and the nodes can have different speeds. Given a continuous load balancing algorithm that balances the load perfectly in $$T$$T rounds, we convert the algorithm into a discrete version. This new algorithm is deterministic and balances the load in $$T$$T rounds so that the difference between the average and the maximum load is at most $$2d\cdot w_{\max }$$2d·wmax, where d is the maximum degree of the network and $$w_{\max }$$wmax is the maximum weight of any task. For general graphs, these bounds are asymptotically lower compared to the previous results. The proposed conversion scheme can be applied to a wide class of continuous processes, including first and second order diffusion, dimension exchange, and random matching processes. For the case of identical tasks, we present a randomized version of our algorithm that balances the load up to a discrepancy of $$\mathscr {O}(\sqrt{d \log n})$$O(dlogn) provided that the initial load on every node is large enough.

14 citations

References
More filters
Journal ArticleDOI
TL;DR: The balancing flow that is calculated by schemes for homogeneous networks is minimal with regard to the l2 -norm and it is proved that this to hold true for generalized schemes, too.
Abstract: Several different diffusion schemes have previously been developed for load balancing on homogeneous processor networks. We generalize existing schemes, in order to deal with heterogeneous networks.

86 citations

Proceedings ArticleDOI
01 Jun 1993
TL;DR: This paper shows that in ann-node network with maximumdegree d whose live edges, at every time step, form a -expander, the algorithm will balance the load to within an additive O(d logn= ) term in O( log(n )= ) time, where is the initial imbalance.
Abstract: This paper presents a simple local algorithm for load balancing in a distributed network. The algorithm makes no assumption about the structure of the network. It can be executed on a synchronous network with fixed topology, a synchronous network with dynamically changing topology, or an asynchronous network. It works quickly and balances well when the network has an expansion property. In particular, we show that in ann-node network with maximumdegree d whose live edges, at every time step, form a -expander, the algorithm will balance the load to within an additive O(d logn= ) term in O( log(n )= ) time, where is the initial imbalance. The algorithm improves upon previous approaches that yield O(n) time bounds in dynamic and asynchronous networks.

82 citations

Journal ArticleDOI
TL;DR: In this paper, the authors compare the performance of the random walk and the deterministic Propp machine on the two-dimensional grid and show that the Propp machines always have the same number of tokens on a node as does the random walks, apart from an additive error of less than eight.
Abstract: Deterministic and randomized balancing schemes are used to distribute workload evenly in networks. In this paper, we compare two very general ones: The random walk and the (deterministic) Propp machine. Roughly speaking, we show that on the two-dimensional grid, the Propp machine always has the same number of tokens on a node as does the random walk in expectation, apart from an additive error of less than eight. This constant is independent of the total number of tokens and the runtime of the two processes. However, we also show that it makes a difference whether the Propp machine serves the neighbors in a circular or non-circular order.

66 citations

Proceedings ArticleDOI
31 May 2009
TL;DR: It is proved that in comparison to the corresponding model of Rabani, Sinclair, and Wanka (1998) with arbitrary roundings, the randomization yields an improvement of roughly a square root of the achieved discrepancy in the same number of time-steps on all graphs.
Abstract: We consider and analyze a new algorithm for balancing indivisible loads on a distributed network with n processors. The aim is minimizing the discrepancy between the maximum and minimum load. In every time-step paired processors balance their load as evenly as possible. The direction of the excess token is chosen according to a randomized rounding of the participating loads.We prove that in comparison to the corresponding model of Rabani, Sinclair, and Wanka (1998) with arbitrary roundings, the randomization yields an improvement of roughly a square root of the achieved discrepancy in the same number of time-steps on all graphs. For the important case of expanders we can even achieve a constant discrepancy in O(log n (log log n)3) rounds. This is optimal up to loglog-factors while the best previous algorithms in this setting either require ©(log2 n) time or can only achieve a logarithmic discrepancy. Our new result also demonstrates that with randomized rounding the difference between discrete and continuous load balancing vanishes almost completely.

58 citations

Journal ArticleDOI
TL;DR: The analysis of the following load balancing algorithm, which examines the number of tokens at each of its neighbors and sends a token to each neighbor with at least 2d+1 fewer tokens, is extended to a variant of this algorithm for dynamic and asynchronous networks.
Abstract: This paper presents an analysis of the following load balancing algorithm. At each step, each node in a network examines the number of tokens at each of its neighbors and sends a token to each neighbor with at least 2d+1 fewer tokens, where d is the maximum degree of any node in the network. We show that within $O(\Delta / \alpha)$ steps, the algorithm reduces the maximum difference in tokens between any two nodes to at most $O((d^2 \log n)/\alpha)$, where $\Delta$ is the global imbalance in tokens (i.e., the maximum difference between the number of tokens at any node initially and the average number of tokens), n is the number of nodes in the network, and $\alpha$ is the edge expansion of the network. The time bound is tight in the sense that for any graph with edge expansion $\alpha$, and for any value $\Delta$, there exists an initial distribution of tokens with imbalance $\Delta$ for which the time to reduce the imbalance to even $\Delta/2$ is at least $\Omega(\Delta/\alpha)$. The bound on the final imbalance is tight in the sense that there exists a class of networks that can be locally balanced everywhere (i.e., the maximum difference in tokens between any two neighbors is at most 2d), while the global imbalance remains $\Omega((d^2 \log n) / \alpha)$. Furthermore, we show that upon reaching a state with a global imbalance of $O((d^2 \log n)/\alpha)$, the time for this algorithm to locally balance the network can be as large as $\Omega(n^{1/2})$. We extend our analysis to a variant of this algorithm for dynamic and asynchronous networks. We also present tight bounds for a randomized algorithm in which each node sends at most one token in each step.

57 citations