Proceedings ArticleDOI

# A simple approach for adapting continuous load balancing processes to discrete settings

16 Jul 2012-pp 271-280

TL;DR: A general method that converts a wide class of continuous neighborhood load balancing algorithms into a discrete version that achieves asymptotically lower discrepancies and presents a randomized version of the algorithm balancing the load if the initial load on every node is large enough.

AbstractWe introduce a general method that converts a wide class of continuous neighborhood load balancing algorithms into a discrete version. Assume that initially the tasks are arbitrarily distributed among the nodes of a graph. In every round every node is allowed to communicate and exchange load with an arbitrary subset of its neighbors. The goal is to balance the load as evenly as possible. Continuous load balancing algorithms that are allowed to split tasks arbitrarily can balance the load perfectly, so that every node has exactly the same load. Discrete load balancing algorithms are not allowed to split tasks and therefore cannot balance the load perfectly. In this paper we consider the problem in a very general setting, where the tasks can have arbitrary weights and the nodes can have different speeds. Given a neighborhood load balancing algorithm that balances the load perfectly in t rounds, we convert the algorithm into a discrete version. This new algorithm is deterministic and balances the load in t rounds so that the difference between the average and the maximum load is at most 2d•wmax, where d is the maximum degree of the network and wmax is the maximum weight of any task. Compared to the previous methods that work for general graphs [12], our method achieves asymptotically lower discrepancies (e.g. O(1) vs. O(log n) for constant-degree expanders and O(r) vs. O(n1/r) for r-dimensional tori) in the same number of rounds. For the case of uniform weights we present a randomized version of our algorithm balancing the load so that the difference between the minimum and the maximum load is at most O√dlog n) if the initial load on every node is large enough.

Topics:

## Summary (2 min read)

### 1 Introduction

• In this paper the authors consider the problem of neighbourhood load balancing in arbitrary networks.
• The tasks can have arbitrary weights; the weight of ∗This paper is an extended version of [6].
• Neighbourhood load balancing algorithms usually work in synchronous rounds.
• These matchings are then used periodically (periodic matching model).
• Here all the nodes balance their load with all their neighbours.

### 1.1 New Results

• In every round the discrete algorithm imitates the continuous algorithm as closely as possible by trying to send the same amount of load over every edge as the continuous algorithm.
• That would incur communication overhead proportional to the number of dummy tokens.
• Furthermore, let T be the time it takes for the continuous process to balance the load (more or less) completely (see Section 3 for details).
• An additive algorithm, starting with a load distribution D = D1 +D2, transmits the same amount of tasks over every edge as the sum of the amounts it would transmit in 1The discrete version of the algorithm has to know the continuous flow f ce (t) for every edge e = (u,v).
• Algorithm 1 achieves a final max-min discrepancy independent of n and graph expansion, and in particular, the only algorithm achieving constant max-min discrepancy for all constant-degree graphs.

### 2 Existing Algorithms and Techniques

• The authors give an overview of the results on continuous (Section 2.1) and discrete neighbourhood load balancing (Section 2.2) only.
• The authors will not consider these models here any further.
• When not stated otherwise, the results are for the uniform case without speeds and weights.
• In the following the authors will consider the results both in the discrete and the continuous settings.

### 2.1 Continuous Load Balancing

• The first diffusion algorithm (also called first order schedule, FOS) was independently introduced by Cybenko [15] and Boillat [12].
• Their results were later generalized to the case of non-uniform speeds in [20].
• To introduce the FOS process the authors first need some additional notation.
• The SOS method is inspired by a numerical iterative method called successive over-relaxation.
• The model was originally introduced in [30], together with a distributed edge-colouring algorithm (see also [35, 36]) that can be used to construct the matchings.

### 2.2 Discrete Load Balancing

• As far as the authors know, existing papers consider only discrete algorithms in the uniform task model.
• (7) For FOS schemes, [34] left it as an open question to analyze the potential drop when the potential is smaller than O(d2n2).
• All the edges are assigned weights proportional to their scheduled load transfer.
• When the continuous flow is rounded down, the final discrepancy is Ω(d · diam(G)) for a discrete FOS process [26, 27] and Ω(diam(G)) for a discrete process in the matching model [27].

### 2.3 Improved Processes for Discrete Load Balancing

• The next three subsections discuss three different approaches that were used in order to reduce the difference (caused by the rounding error) in the load distribution between discrete and continuous balancing processes.
• The authors combine the approach of [37] with analysis techniques for randomized algorithms to show improved discrepancy bounds for general graphs.
• Note that it is possible to get similar results if the excess tokens are sent to neighbours chosen randomly with replacement or if the neighbours are chosen in a roundrobin fashion with a random starting point [5].
• Note that this algorithm might also create negative load on some of the nodes.

### 3 Notation and Basic Facts

• Initially there are in total m tasks which are assigned arbitrarily to the n nodes of the graph G. Tasks may be of different integer weights and the maximum task weight is denoted by wmax.
• Consider a continuous process A. For the transformations introduced by Algorithm 1 and Algorithm 2, the authors require initial load vectors that do not lead to negative load in the continuous case; that is, they need to ensure that when executing A, the outgoing demand of a node never exceeds its available load.
• Consider a load balancing process A. Let x′, and x′′ be nonnegative load vectors.
• The next lemma shows that the class of additive terminating processes includes several well known existing processes.

### 4 Deterministic Flow Imitation

• The authors present and analyze an algorithm that transforms a continuous process A into its discrete counterpart which they call D(A).
• The authors also note that in actual implementation they do not need to create and transfer workload units and consume communication bandwidth for each dummy token.
• For other algorithms, the result in part (1) of the above theorem automatically holds, and the condition in part (2) can be translated as having sufficient initial load.
• Now, the result can be obtained using Observation 4.

### 5 Randomized Flow Imitation

• Instead of always rounding down the flow that has to be sent over an edge, Algorithm 2 uses randomized rounding.
• Then each of the random variables Ei, j(t) can assume at most two different values and rounding up or down is independent of other edges (see part (3) of Observation 9).
• The next lemma provides the two main ingredients for proving the Theorem 8.

Did you find this useful? Give us your feedback

...read more

Content maybe subject to copyright    Report

##### Citations
More filters

Journal ArticleDOI
TL;DR: An improved load-balancing algorithm is proposed that will be effectively executed within the constructed FSW, where nodes consider the capacity and calculate the average effective-load, and compared with two significant diffusion methods presented in the literature.
Abstract: Load-balancing algorithms play a key role in improving the performance of distributed-computing-systems that consist of heterogeneous nodes with different capacities. The performance of load-balancing algorithms and its convergence-rate deteriorate as the number-of-nodes in the system, the network-diameter, and the communication-overhead increase. Moreover, the load-balancing technical-factors significantly affect the performance of rebalancing the load among nodes. Therefore, we propose an approach that improves the performance of load-balancing algorithms by considering the load-balancing technical-factors and the structure of the network that executes the algorithm. We present the design of an overlay network, namely, functional small world (FSW) that facilitates efficient load-balancing in heterogeneous systems. The FSW achieves the efficiency by reducing the number-of-nodes that exchange their information, decreasing the network diameter, minimizing the communication-overhead, and decreasing the time-delay results from the tasks re-migration process. We propose an improved load-balancing algorithm that will be effectively executed within the constructed FSW, where nodes consider the capacity and calculate the average effective-load. We compared our approach with two significant diffusion methods presented in the literature. The simulation results indicate that our approach considerably outperformed the original neighborhood approach and the nearest neighbor approach in terms of response time, throughput, communication overhead, and movements cost. We propose a load-balancing algorithm for heterogeneous systems.We construct an overlay network based on small world theory to enhance the algorithm performance.The proposed algorithm improves the distributed system performance.

39 citations

### Additional excerpts

• ...Neighborhood load balancing algorithms Akbari et al., 2012) are diffusion algorithm that have the advantage hat they are very simple and that the vertices do not need any global nformation to base their balancing decisions on....

[...]

Proceedings ArticleDOI
23 Jul 2013
TL;DR: Viewing the parallel rotor walk as a load balancing process, it is proved that the rotor walk falls in the class of bounded-error diffusion processes introduced in [11], which gives discrepancy bounds of O(log3/2 n) and O(1) for hypercube and r-dimensional torus with r=O(1), respectively, which improve over the best existing bounds.
Abstract: We study the parallel rotor walk process, which works as follows: Consider a graph along with an arbitrary distribution of tokens over its nodes. Every node is equipped with a rotor that points to its neighbours in a fixed circular order. In each round, every node distributes all of its tokens using the rotor. One token is allocated to the neighbour pointed at by the rotor, then the rotor moves to the subsequent neighbour, and so on, until no token remains.The process can be considered as a deterministic analogue of a process in which tokens perform one independent random walk step in each round. We compare the distribution of tokens in the rotor walk process with expected distribution in the random walk model. The similarity between the two processes is measured by their discrepancy, which is the maximum difference between the corresponding distribution entries over all rounds and nodes. We analyze a lazy variation of rotor walks that simulates a random walk with loop probability of 1/2 on each node, and each node sends not all its tokens, but every other token in each round.Viewing the rotor walk as a load balancing process, we prove that the rotor walk falls in the class of bounded-error diffusion processes introduced in [11]. This gives us discrepancy bounds of O(log3/2n) and O(1) for hypercube and r-dimensional torus with r=O(1), respectively, which improve over the best existing bounds of O(log2n) and O(n1/r). Also, as a result of switching to the load balancing view, we observe that the existing load balancing results can be translated to rotor walk discrepancy bounds not previously noticed in the rotor walk literature.We also use the idea of rotor walks to propose and analyze a randomized rounding discrete load balancing process that achieves the same balancing quality as similar protocols [11, 3], but uses fewer number of random bits compared to [3], and avoids the negative load problem of [11].

32 citations

Proceedings ArticleDOI

21 Jul 2015
Abstract: We consider the problem of deterministic load balancing of tokens in the discrete model. A set of n processors is connected into a d-regular undirected network. In every time step, each processor exchanges some of its tokens with each of its neighbors in the network. The goal is to minimize the discrepancy between the number of tokens on the most-loaded and the least-loaded processor as quickly as possible. Rabani et al. (1998) present a general technique for the analysis of a wide class of discrete load balancing algorithms. Their approach is to characterize the deviation between the actual loads of a discrete balancing algorithm with the distribution generated by a related Markov chain. The Markov chain can also be regarded as the underlying model of a continuous diffusion algorithm. Rabani et al. showed that after time T = O(log (Kn)/μ), any algorithm of their class achieves a discrepancy of O(d log n/μ), where μ is the spectral gap of the transition matrix of the graph, and K is the initial load discrepancy in the system.In this work we identify some natural additional conditions on deterministic balancing algorithms, resulting in a class of algorithms reaching a smaller discrepancy. This class contains well-known algorithms, e.g., the rotor-router. Specifically, we introduce the notion of cumulatively fair load-balancing algorithms where in any interval of consecutive time steps, the total number of tokens sent out over an edge by a node is the same (up to constants) for all adjacent edges. We prove that algorithms which are cumulatively fair and where every node retains a sufficient part of its load in each step, achieve a discrepancy of O(d√log n/μ ,d√n) in time O(T). We also show that in general neither of these assumptions may be omitted without increasing discrepancy. We then show by a combinatorial potential reduction argument that any cumulatively fair scheme satisfying some additional assumptions achieves a discrepancy of O(d) almost as quickly as the continuous diffusion process. This positive result applies to some of the simplest and most natural discrete load balancing schemes.

19 citations

Journal ArticleDOI
TL;DR: A deterministic and randomized version of the algorithm that balances the load up to a discrepancy of $$\mathscr {O}(\sqrt{d \log n})$$O(dlogn) provided that the initial load on every node is large enough.
Abstract: We consider the neighbourhood load balancing problem. Given a network of processors and an arbitrary distribution of tasks over the network, the goal is to balance load by exchanging tasks between neighbours. In the continuous model, tasks can be arbitrarily divided and perfectly balanced state can always be reached. This is not possible in the discrete model where tasks are non-divisible. In this paper we consider the problem in a very general setting, where the tasks can have arbitrary weights and the nodes can have different speeds. Given a continuous load balancing algorithm that balances the load perfectly in $$T$$T rounds, we convert the algorithm into a discrete version. This new algorithm is deterministic and balances the load in $$T$$T rounds so that the difference between the average and the maximum load is at most $$2d\cdot w_{\max }$$2d·wmax, where d is the maximum degree of the network and $$w_{\max }$$wmax is the maximum weight of any task. For general graphs, these bounds are asymptotically lower compared to the previous results. The proposed conversion scheme can be applied to a wide class of continuous processes, including first and second order diffusion, dimension exchange, and random matching processes. For the case of identical tasks, we present a randomized version of our algorithm that balances the load up to a discrepancy of $$\mathscr {O}(\sqrt{d \log n})$$O(dlogn) provided that the initial load on every node is large enough.

12 citations

Posted Content
, He Sun1
Abstract: We consider the problem of balancing load items (tokens) in networks. Starting with an arbitrary load distribution, we allow nodes to exchange tokens with their neighbors in each round. The goal is to achieve a distribution where all nodes have nearly the same number of tokens. For the continuous case where tokens are arbitrarily divisible, most load balancing schemes correspond to Markov chains, whose convergence is fairly well-understood in terms of their spectral gap. However, in many applications, load items cannot be divided arbitrarily, and we need to deal with the discrete case where the load is composed of indivisible tokens. This discretization entails a non-linear behavior due to its rounding errors, which makes this analysis much harder than in the continuous case. We investigate several randomized protocols for different communication models in the discrete case. As our main result, we prove that for any regular network in the matching model, all nodes have the same load up to an additive constant in (asymptotically) the same number of rounds as required in the continuous case. This generalizes and tightens the previous best result, which only holds for expander graphs, and demonstrates that there is almost no difference between the discrete and continuous cases. Our results also provide a positive answer to the question of how well discrete load balancing can be approximated by (continuous) Markov chains, which has been posed by many researchers.

12 citations

##### References
More filters

Book ChapterDOI
Abstract: Upper bounds are derived for the probability that the sum S of n independent random variables exceeds its mean ES by a positive number nt. It is assumed that the range of each summand of S is bounded or bounded above. The bounds for Pr {S – ES ≥ nt} depend only on the endpoints of the ranges of the summands and the mean, or the mean and the variance of S. These results are then used to obtain analogous inequalities for certain sums of dependent random variables such as U statistics and the sum of a random sample without replacement from a finite population.

8,475 citations

Journal ArticleDOI
TL;DR: This paper completely analyze the hypercube network by explicitly computing the eigenstructure of its node adjacency matrix and shows that a diffusion approach to load balancing on a hypercube multiprocessor is inferior to another approach which is called the dimension exchange method.
Abstract: In this paper we study diffusion schemes for dynamic load balancing on message passing multiprocessor networks. One of the main results concerns conditions under which these dynamic schemes converge and their rates of convergence for arbitrary topologies. These results use the eigenstructure of the iteration matrices that arise in dynamic load balancing. We completely analyze the hypercube network by explicitly computing the eigenstructure of its node adjacency matrix. Using a realistic model of interprocessor communications, we show that a diffusion approach to load balancing on a hypercube multiprocessor is inferior to another approach which we call the dimension exchange method. For a d-dimensional hypercube, we compute the rate of convergence to a uniform work distribution and show that after d + 1 iterations of a diffusion type approach, we can guarantee that the work distribution is approximately within e-* of the uniform distribution independent of the hypercube dimension d. Both static and dynamic random models of work distribution are studied. o

1,039 citations

Journal ArticleDOI
Abstract: This paper investigates the notion of negative dependence amongst random variables and attempts to advocate its use as a simple and unifying paradigm for the analysis of random structures and algorithms. The assumption of independence between random variables is often very convenient for the several reasons. Firstly, it makes analyses and calculations much simpler. Secondly, one has at hand a whole array of powerful mathematical concepts and tools from classical probability theory for the analysis, such as laws of large numbers, central limit theorems and large deviation bounds which are usually derived under the assumption of independence. Unfortunately, the analysis of most randomized algorithms involves random variables that are not independent. In this case, classical tools from standard probability theory like large deviation theorems, that are valid under the assumption of independence between the random variables involved, cannot be used as such. It is then necessary to determine under what conditions of dependence one can still use the classical tools. It has been observed before [32, 33, 38, 8], that in some situations, even though the variables involved are not independent, one can still apply some of the standard tools that are valid for independent variables (directly or in suitably modified form), provided that the variables are dependent in specific ways. Unfortunately, it appears that in most cases somewhat ad hoc strategems have been devised, tailored to the specific situation at hand, and that a unifying underlying theory that delves deeper into the nature of dependence amongst the variables involved is lacking. A frequently occurring scenario underlying the analysis of many randomised algorithms and processes involves random variables that are, intuitively, dependent in the following negative way: if one subset of the variables is "high" then a disjoint subset of the variables is "low". In this paper, we bring to the forefront and systematize some precise notions of negative dependence in the literature, analyse their properties, compare them relative to each other, and illustrate them with several applications. One specific paradigm involving negative dependence is the classical "balls and bins" experiment. Suppose we throw m balls into n bins independently at random. For i in [n], let Bi be the random variable denoting the number of balls in the ith bin. We will often refer to these variables as occupancy numbers. This is a classical probabilistic paradigm [16, 22, 26] (see also [31, sec. 3.1]) that underlies the analysis of many probabilistic algorithms and processes. In the case when the balls are identical, this gives rise to the well-known multinomial distribution [16, sec VI.9]: there are m repeated independent trials (balls) where each trial (ball) can result in one of the outcomes E1, ..., En (bins). The probability of the realisation of event Ei is pi for i in [n] for each trial. (Of course the probabilities are subject to the condition Sum_i pi = 1.) Under the multinomial distribution, for any integers m1, ..., mn such that Sum_i mi = m the probability that for each i in [n], event Ei occurs mi times is m! m1! : : :mn!pm1 1 : : :pmn n : The balls and bins experiment is a generalisation of the multinomial distribution: in the general case, one can have an arbitrary set of probabilities for each ball: the probability that ball k goes into bin i is pi;k, subject only to the natural restriction that for each ball k, P i pi;k = 1. The joint distribution function correspondingly has a more complicated form. A fundamental natural question of interest is: how are these Bi related? Note that even though the balls are thrown independently of each other, the Bi variables are not independent; in particular, their sum is fixed to m. Intuitively, the Bi's are negatively dependent on each other in the manner described above: if one set of variables is "high", a disjoint set is "low". However, establishing such assertions precisely by a direct calculation from the joint distribution function, though possible in principle, appears to be quite a formidable task, even in the case where the balls are assumed to be identical. One of the major contributions of this paper is establishing that the the Bi are negatively dependent in a very strong sense. In particular, we show that the Bi variables satisfy negative association and negative regression, two strong notions of negative dependence that we define precisely below. All the intuitively obvious assertions of negative dependence in the balls and bins experiment follow as easy corollaries. We illustrate the usefulness of these results by showing how to streamline and simplify many existing probabilistic analyses in literature.

324 citations

Journal ArticleDOI
TL;DR: Fast and simple randomized algorithms for edge coloring a graph in the synchronous distributed point-to-point model of computation and new techniques for proving upper bounds on the tail probabilities of certain random variables which are not stochastically independent are introduced.
Abstract: Certain types of routing, scheduling, and resource-allocation problems in a distributed setting can be modeled as edge-coloring problems We present fast and simple randomized algorithms for edge coloring a graph in the synchronous distributed point-to-point model of computation Our algorithms compute an edge coloring of a graph $G$ with $n$ nodes and maximum degree $\Delta$ with at most $16 \Delta + O(\log^{1+ \delta} n)$ colors with high probability (arbitrarily close to 1) for any fixed $\delta > 0$; they run in polylogarithmic time The upper bound on the number of colors improves upon the $(2 \Delta - 1)$-coloring achievable by a simple reduction to vertex coloring To analyze the performance of our algorithms, we introduce new techniques for proving upper bounds on the tail probabilities of certain random variables The Chernoff--Hoeffding bounds are fundamental tools that are used very frequently in estimating tail probabilities However, they assume stochastic independence among certain random variables, which may not always hold Our results extend the Chernoff--Hoeffding bounds to certain types of random variables which are not stochastically independent We believe that these results are of independent interest and merit further study

316 citations

Journal ArticleDOI
TL;DR: A fully distributed dynamic load balancing algorithm for parallel MIMD architectures that can be described as a system of identical parallel processes, each running on a processor of an arbitrary interconnected network of processors is presented.
Abstract: We present a fully distributed dynamic load balancing algorithm for parallel MIMD architectures. The algorithm can be described as a system of identical parallel processes, each running on a processor of an arbitrary interconnected network of processors. We show that the algorithm can be interpreted as a Poisson (heath) equation in a graph. This equation is analysed using Markov chain techniques and is proved to converge in polynomial time resulting in a global load balance. We also discuss some important parallel architectures and interconnection schemes such as linear processor arrays, tori, hypercubes, etc. Finally we present two applications where the algorithm has been successfully embedded (process mapping and molecular dynamic simulation).

223 citations