scispace - formally typeset
Search or ask a question

Showing papers on "Load balancing (computing) published in 1990"


Journal ArticleDOI
TL;DR: In this article, a two-stage solution methodology based on a modified simulated annealing technique and the epsilon -constraint method for general multiobjective optimization problems is developed.
Abstract: A new formulation of the network reconfiguration problem for both loss reduction and load balancing that takes into consideration load constraints and operational constraints is presented. The number of switch-on/switch-off operations involved in network reconfiguration is put into a constraint. The new formulation is a constrained, multiobjective and nondifferential optimization problem with both equality and inequality constraints. A two-stage solution methodology based on a modified simulated annealing technique and the epsilon -constraint method for general multiobjective optimization problems is developed. A salient feature of the solution methodology is that it allows designers to find a desirable, global noninferior solution for the problem. An effective scheme to speed up the solution methodology is presented and analyzed. >

341 citations


Journal ArticleDOI
J. E. Boillat1
TL;DR: A fully distributed dynamic load balancing algorithm for parallel MIMD architectures that can be described as a system of identical parallel processes, each running on a processor of an arbitrary interconnected network of processors is presented.
Abstract: We present a fully distributed dynamic load balancing algorithm for parallel MIMD architectures. The algorithm can be described as a system of identical parallel processes, each running on a processor of an arbitrary interconnected network of processors. We show that the algorithm can be interpreted as a Poisson (heath) equation in a graph. This equation is analysed using Markov chain techniques and is proved to converge in polynomial time resulting in a global load balance. We also discuss some important parallel architectures and interconnection schemes such as linear processor arrays, tori, hypercubes, etc. Finally we present two applications where the algorithm has been successfully embedded (process mapping and molecular dynamic simulation).

229 citations


Journal ArticleDOI
TL;DR: A description is given of a novel design, using a hierarchy of controllers, that effectively controls a multiuser, multiprogrammed parallel system that allows dynamic repartitioning according to changing job requirements.
Abstract: A description is given of a novel design, using a hierarchy of controllers, that effectively controls a multiuser, multiprogrammed parallel system. Such a structure allows dynamic repartitioning according to changing job requirements. The design goals are examined, and the principles of distributed hierarchical control are presented. Control over processors is discussed. Mapping and load balancing with distributed hierarchical control are considered. Support for gang scheduling as well as availability and fault tolerance is addressed. The use of distributed hierarchical control in memory management and I/O is discussed. >

166 citations


Journal ArticleDOI
F. Bonomi1, Anurag Kumar1
TL;DR: It is shown that if the arrival streams are all Poisson and all jobs have the same exponentially distributed service requirements, the probabilistic splitting of the generic stream that minimizes the average job response time is such that it balances the server idle times in a weighted least-squares sense, where the weighting coefficients are related to the service speeds of the servers.
Abstract: A model comprising several servers, each equipped with its own queue and with possibly different service speeds, is considered. Each server receives a dedicated arrival stream of jobs; there is also a stream of generic jobs that arrive to a job scheduler and can be individually allocated to any of the servers. It is shown that if the arrival streams are all Poisson and all jobs have the same exponentially distributed service requirements, the probabilistic splitting of the generic stream that minimizes the average job response time is such that it balances the server idle times in a weighted least-squares sense, where the weighting coefficients are related to the service speeds of the servers. The corresponding result holds for nonexponentially distributed service times if the service speeds are all equal. This result is used to develop adaptive quasi-static algorithms for allocating jobs in the generic arrival stream when the load parameters are unknown. The algorithms utilize server idle-time measurements which are sent periodically to the central job scheduler. A model is developed for these measurements, and the result mentioned is used to cast the problem into one of finding a projection of the root of an affine function, when only noisy values of the function can be observed. >

146 citations


Journal ArticleDOI
TL;DR: This paper formulate queuing theoretic models for each of the algorithms operating in heterogeneous systems under the assumption that the job arrival process at each node in Poisson and the service times and job transfer times are exponentially distributed.

126 citations


Proceedings ArticleDOI
08 Nov 1990
TL;DR: Deceit as mentioned in this paper is a distributed file system that provides flexibility in the fault-tolerance and availability of files, and provides many capabilities to the user: file replication with concurrent reads and writes, a range of update propagation strategies, automatic disk load balancing and the ability to have multiple versions of a file.
Abstract: Deceit, a distributed file system that provides flexibility in the fault-tolerance and availability of files, is described. Deceit provides many capabilities to the user: file replication with concurrent reads and writes, a range of update propagation strategies, automatic disk load balancing and the ability to have multiple versions of a file. Deceit provides Sun Network File Server (NFS) protocol compatibility; no change in NFS client software is necessary in order to use Deceit. The purpose of Deceit is to replace large collections of NFS servers. NFS suffers from several problems in an environment where most clients mount most servers. First, if any one server crashes, clients will block or fail when they try to access that server, and, as the number of servers increases, this problem becomes more likely. Second, servers have a (roughly) fixed capacity, yet it is difficult to move files from one NFS server to another without disrupting clients. Third, replicating a file to increase its availability must be managed by the user. Deceit addresses these three problems. >

104 citations


Journal ArticleDOI
TL;DR: A solution to the problem of partitioning the work for sparse matrix factorization to individual processors on a multiprocessor system and results from the Intel iPSC/2 are presented for various finite-element problems using both nested dissection and minimum degree orderings.
Abstract: This paper presents a solution to the problem of partitioning the work for sparse matrix factorization to individual processors on a multiprocessor system. The proposed task assignment strategy is based on the structure of the elimination tree associated with the given sparse matrix. The goal of the task scheduling strategy is to achieve load balancing and a high degree of concurrency among the processors while reducing the amount of processor-to-processor data comnication, even for arbitrarily unbalanced elimination trees. This is important because popular fill-reducing ordering methods, such as the minimum degree algorithm, often produce unbalanced elimination trees. Results from the Intel iPSC/2 are presented for various finite-element problems using both nested dissection and minimum degree orderings.

98 citations


Journal ArticleDOI
TL;DR: In the case of the binary n -cube processor network, it is proved that after n steps of the integer version, for any initial load distribution, each processor has a load not more than n /2 away from the average.

96 citations


01 Jan 1990
TL;DR: Deceit is described, a distributed file system that provides flexibility in the fault-tolerance and availability of files, and provides Sun Network File Server (NFS) protocol compatibility; no change in NFS client software is necessary in order to use Deceit.
Abstract: Deceit, a distributed file system that provides flexibility in the fault-tolerance and availability of files, is described. Deceit provides many capabilities to the user: file replication with concurrent reads and writes, a range of update propagation strategies, automatic disk load balancing and the ability to have multiple versions of a file. Deceit provides Sun Network File Server (NFS) protocol compatibility; no change in NFS client software is necessary in order to use Deceit. The purpose of Deceit is to replace large collections of NFS servers. NFS suffers from several problems in an environment where most clients mount most servers. First, if any one server crashes, clients will block or fail when they try to access that server, and, as the number of servers increases, this problem becomes more likely. Second, servers have a (roughly) fixed capacity, yet it is difficult to move files from one NFS server to another without disrupting clients. Third, replicating a file to increase its availability must be managed by the user. Deceit addresses these three problems. >

92 citations


Proceedings ArticleDOI
01 Feb 1990
TL;DR: A novel implementation of the progressive refinement radiosity algorithm is described using the capabilities of a multiprocessor graphics workstation and speedups of a factor of 40 or more over the equivalent software implementation are observed.
Abstract: This paper describes a novel implementation of the progressive refinement radiosity algorithm. Algorithm performance is greatly enhanced using the capabilities of a multiprocessor graphics workstation. Hemi-cube item buffers are produced using the graphics hardware while the remaining computations are performed in parallel on the multiple host processors. Speedups of a factor of 40 or more over the equivalent software implementation are observed. Load balancing issues are discussed and a system performance model is developed based on actual results.Additionally, a new user interface scheme is presented where the radiosity calculations and walk-through tasks are separated. At each new iteration, the radiosity algorithm automatically updates colors used by the viewing program via shared memory while simultaneously obtaining hints on where to further refine the solution.

74 citations


Proceedings ArticleDOI
01 Jul 1990
TL;DR: A parallel sort merge join algorithm which uses a divide-and-conquer approach to address the data skew problem, and is shown to be very robust relative to the degree of data skew and the total number of processors.
Abstract: Parallel processing of relational queries has received considerable attention of late. However, in the presence of data skew, the speedup from conventional parallel join algorithms can be very limited, due to load imbalances among the various processors. Even a single large skew element can cause a processor to become overloaded. In this paper, we propose a parallel sort merge join algorithm which uses a divide-and-conquer approach to address the data skew problem. The proposed algorithm adds an extra scheduling phase to the usual sort, transfer and join phases. During the scheduling phase, a parallelizable optimization algorithm, using the output of the sort phase, attempts to balance the load across the multiple processors in the subsequent join phase. The algorithm naturally identifies the largest skew elements, and assigns each of them to an optimal number of processors. Assuming a Zipf-like distribution for data skew, the algorithm is demonstrated to achieve very good load balancing for the join phase in a CPU-bound environment, and is shown to be very robust relative to the degree of data skew and the total number of processors.

Journal ArticleDOI
TL;DR: Some problems in distributed system control, such as load balancing, routing, scheduling in a real-time environment, and reconfiguration require two-phase execution at a central server.

Proceedings ArticleDOI
01 Feb 1990
TL;DR: This work has developed a dynamic load balancing scheme which is applicable to OR-parallel programs in general and scalable to any number of processors because of this multi-level hierarchical structure.
Abstract: Good load balancing is the key to deriving maximal performance from multiprocessors. Several successful dynamic load balancing techniques on tightly-coupled multiprocessors have been developed. However, load balancing is more difficult on loosely-coupled multiprocessors because inter-processor communication overheads cost more. Dynamic load balancing techniques have been employed in a few programs on loosely-coupled multiprocessors, but they are tightly built into the particular programs and not much attention is paid to scalability. We have developed a dynamic load balancing scheme which is applicable to OR-parallel programs in general. Processors are grouped, and work loads of groups and processors are balanced hierarchically. Moreover, it is scalable to any number of processors because of this multi-level hierarchical structure. The scheme is tested for the all-solution exhaustive search Pentomino program on the mesh-connected loosely-coupled multiprocessor Multi-PSI, and speedups of 28.4 times with 32 processors and 50 times with 64 processors have been attained.

Journal ArticleDOI
TL;DR: It is shown that there is typically a large set of resource locations that all have the minimum load, and that for large average loads the maximum load is near the average load.
Abstract: A set of M resource locations and a set of alpha M consumers are given. Each consumer requires a specified amount of resource, and is constrained to obtain the resource from a specified subset of locations. The problem of assigning consumers to resource locations so as to balance the load among the resource locations as much as possible is considered. It is shown that there are assignments, termed uniformly most-balanced assignments, that simultaneously minimize certain symmetric, separable, convex cost functions. The problem of finding such assignments is equivalent to a network flow problem with convex cost. Algorithms of both the iterative and combinatorial type are given for computing the assignments. The distribution function of the load at a given location for a uniformly most-balanced assignment is studied, assuming that the set of locations each consumer can use is random. An asymptotic lower bound on the distribution function is given for M tending to infinity, and an upper bound is given on the probable maximum load. It is shown that there is typically a large set of resource locations that all have the minimum load, and that for large average loads the maximum load is near the average load. >

Journal ArticleDOI
TL;DR: A parallelization of the Quicksort algorithm that is suitable for execution on a shared memory multiprocessor with an efficient implementation of the fetch-and-add operation is presented.
Abstract: A parallelization of the Quicksort algorithm that is suitable for execution on a shared memory multiprocessor with an efficient implementation of the fetch-and-add operation is presented. The partitioning phase of Quicksort, which has been considered a serial bottleneck, is cooperatively executed in parallel by many processors through the use of fetch-and-add. The parallel algorithm maintains the in-place nature of Quicksort, thereby allowing internal sorting of large arrays. A class of fetch-and-add-based algorithms for dynamically scheduling processors to subproblems is presented. Adaptive scheduling algorithms in this class have low overhead and achieve effective processor load balancing. The basic algorithm is shown to execute in an average of O(log(N)) time on an N-processor PRAM (parallel random-access machine) assuming a constant-time fetch-and-add. Estimated speedups, based on simulations, are also presented for cases when the number of items to be sorted is much greater than the number of processors. >

Proceedings ArticleDOI
28 May 1990
TL;DR: A straightforward and efficient algorithm for optimal load balancing of multiclass jobs is derived and it is shown that for obtaining the optimal solution the authors' algorithm and the Dafermos algorithm require comparable computation times that are far less than that of the FD algorithm.
Abstract: This model is an extension of the Tantawi and Towsley (1985) single-job-class model as applied to a multiple-job-class model. Some properties of the optimal solution are shown. On the basis of these properties, a straightforward and efficient algorithm for optimal load balancing of multiclass jobs is derived. The performance of this algorithm is compared with that of two other well-known algorithms for multiclass jobs, the flow deviation (FD) algorithm and the Dafermos algorithm. The authors' algorithm and the FD algorithm both require a comparable amount of storage that is far less than that required by the Dafermos algorithm. Numerical experiments show that for obtaining the optimal solution the authors' algorithm and the Dafermos algorithm require comparable computation times that are far less than that of the FD algorithm. >

Proceedings ArticleDOI
08 Apr 1990
TL;DR: The effects of various strategies in parallel algorithm design, including interconnection topologies, global communication patterns, data mapping schemes, load balancing, and pipelining techniques for overlapping communication with computation are illustrated.
Abstract: In this talk we show how graphical animation of the behavior of parallel algorithms can facilitate the design and performance enhancement of algorithms for matrix computations on parallel computer architectures. Using a portable instrumented communication library and a graphical animation package developed at Oak Ridge National Laboratory, we illustrate the effects of various strategies in parallel algorithm design, including interconnection topologies, global communication patterns, data mapping schemes, load balancing, and pipelining techniques for overlapping communication with computation. In this talk we focus on distributed-memory parallel architectures in which the processors communicate by passing messages. The linear algebra problems we consider include matrix factorization and the solution of triangular systems.

Journal ArticleDOI
TL;DR: This paper proposes that a resource management system for large distributed systems should have two levels --- a lower one, responsible for export and allocation of resources in local distributed systems, and an upper one, which manages special resources/services that are not provided locally.
Abstract: In this paper, we propose that a resource management system for large distributed systems should have two levels --- a lower one, responsible for export and allocation of resources in local distributed systems, and an upper one, which manages special resources/services that are not provided locally. For a local environment, load balancing (implementing export and allocation of computational resources) is realized in a distributed way; and management of peripheral resources is developed based on a name server, which can be centralized, or distributed and replicated. The upper level has a centralized resource management center, which is responsible for export and allocation of both peripheral and computational resources. It contains two parts: a name server, which stores attributed names of all shareable resources and a resource manager, which allocates resources to requesting users of a large distributed system. Communication between the resource management center and the local systems is facilitated through integrating modules. This system is now designed based on the RHODOS distributed operating system.

Proceedings ArticleDOI
01 Oct 1990
TL;DR: A parallel simulator with distributed load balancers is developed on an iPSC/2 hypercube system to determine the effects of various system utilizations, load imbalances, communication and migration overheads, and multicomputer sizes.
Abstract: In this paper, a new adaptive scheme is presented for dynamic load balancing on a message-passing multicomputer. The scheme is based on using easy-to-implement heuristics and variable threshold in migrating processes among the multicomputer nodes. It uses a distributed control over all processor nodes as coordinated by a host processor. Four heuristic methods for process migration are presented, which are distinguished by choosing different policies for process migration and threshold update. A parallel simulator with distributed load balancers is developed on an iPSC/2 hypercube system. The load balancing scheme is evaluated on the effects of system utilization, load imbalance, communication and migration overhead, and multicomputer size. Relative merits of the four methods are revealed under various multicomputer conditions.

Book
01 Jan 1990
TL;DR: This thesis addresses several issues in parallel architectures and parallel algorithms for integrated vision systems, and shows that SIMD, MIMD and systolic algorithms can be easily mapped onto processor clusters, and almost linear speedups are possible.
Abstract: Computer vision has been regarded as one of the most complex and computationally intensive problems. An integrated vision system (IVS) is a system that uses vision algorithms from all levels of processing to perform for a high level application (e.g, object recognition). This thesis addresses several issues in parallel architectures and parallel algorithms for integrated vision systems. First, a model of computation for IVSs is presented. The model captures computational requirements, defines spatial and temporal data dependencies between tasks, and shows what types of interactions may occur between tasks from different levels of processing. The model is used to develop features and capabilities of a parallel architecture suitable for IVSs. A multiprocessor architecture for IVSs (called NETRA) is presented. NETRA is highly flexible without the use of complex interconnection schemes. NETRA is recursively defined hierarchical architecture whose leaf nodes consist of clusters processors connected with a programmable crossbar with a selective broadcast capability. Hence, it is easily scalable from small to large systems. Homogeneity of NETRA permits fault tolerance and graceful degradation under faults. Several refinements in the architecture over the original design are also proposed. Performance of several vision algorithms when they are mapped on one cluster is presented. It is shown that SIMD, MIMD and systolic algorithms can be easily mapped onto processor clusters, and almost linear speedups are possible. An extensive analysis of inter-cluster communication strategies in NETRA is presented. A methodology to evaluate performance of algorithms on NETRA is described. Performance analysis of parallel algorithms when mapped across clusters is presented. The parameters are derived from the characteristics of the parallel algorithms, which are then, used to evaluate the alternative communication strategies in NETRA. The effects of communication interference on the performance of algorithms are studied. It is observed that if communication speeds are matched with the computation speeds, almost linear speedups are possible when algorithms are mapped across clusters. Finally, several techniques to perform data decomposition, and static and dynamic load balancing for IVS algorithms are described. These techniques can be used to perform load balancing for intermediate and high level, data dependent vision algorithms. They are shown to perform well, using them on an implementation of a motion estimation system on a hypercube multiprocessor. (Abstract shortened with permission of author.)

Proceedings Article
01 Sep 1990
TL;DR: This paper extends the concepts of the distributed linear hashed main memory file system with the objective of supporting higher level parallel dambase operations and investigating the performance of distributed linear hashing and parallel projection.
Abstract: This paper extends the concepts of the distributed linear hashed main memory file system with the objective of supporting higher level parallel dambase operations The basic distributed linear hashing technique provides a high speed hash based dynamic file system on a NUMA atchitecture multi-processor system Distributed linear hashing has been extended to include the ability to perform high speed parallel scans of the hashed file The fast scan feature provides load balancing to compensate for uneven distributions of records and uneven processing speed among different processors These extensions are used to implement a parallel projection capability The performance of distributed linear hashing and parallel projection is investigated

Journal ArticleDOI
TL;DR: In this article, the application of load balancing in a broader context as the emerging standard for analyzing post-tensioned buildings is reviewed and the governing relationships are introduced and discussed.
Abstract: The paper reviews the application of load balancing in a broader context as the emerging standard for analyzing post-tensioned buildings. Terminology, concepts, and current procedures used in the extended scope of load balancing are presented and the governing relationships are introduced and discussed. The redistribution of elastically computed moments due to limited joint plastification is examined and numerical examples illustrate the application of load balancing to more complex structures and the importance of faithful representation of balanced loading.

01 Jan 1990
TL;DR: A parallelizing compiler is developed which, given a sequential program and a memory layout of its data, performs process decomposition while balancing parallelism against locality of reference, and several message optimizations that address the issues of overhead and synchronization in message transmission are discussed.
Abstract: Parallel computers provide a large degree of computational power for programmers who are willing and able to harness it The introduction of high-level languages and good compilers made possible the wide use of sequential machines but the lack of such tools for parallel machines hinders their widespread acceptance and use Programmers must address issues such as process decomposition, synchronization, and load balancing This is a severe burden and opens the door to time-dependent bugs, such as race conditions between reads and writes which are extremely difficult to detect In this thesis, we use compile--time analysis and automatic restructuring of programs to exploit a two--level memory hierarchy Many multiprocessor architectures can be modelled as two-level memory hierarchies, including message-passing machines such as the Intel iPSC/2 We show that such an approach can exploit data locality while avoiding the overhead associated with run-time coherence management At the same time, it relieves the programmer from the burden of managing process decomposition and synchronization by automatically performing these tasks We have developed a parallelizing compiler which, given a sequential program and a memory layout of its data, performs process decomposition while balancing parallelism against locality of reference A process decomposition is obtained by specializing the program, for each processor, to the data that resides on that processor If this analysis fails, the compiler falls back to a simple but inefficient scheme called run-time resolution Each process''s role in the computation is determined by examining the data required for execution at run-time Thus, our approach to process decomposition is ``data-driven'''' rather than ``program-driven'''' We discuss several message optimizations that address the issues of overhead and synchronization in message transmission Accumulation reorganizes the computation of a commutative and associative operator to reduce message traffic Pipelining sends a value as close to its computation as possible to increase parallelism Vectorization of messages combines messages with the same source and the same destination to reduce overhead Our results from experiments in parallelizing SIMPLE, a large hydrodynamics benchmark, for the Intel iPSC/2, show a speed-up within sixty to seventy percent of hand-written code

Journal ArticleDOI
TL;DR: The algorithms proposed here increase system performance through load balancing through increasing the maximum process queue length and the maximum amount of CPU time of active processes on each host.

Proceedings ArticleDOI
Rajiv Gupta1, P. Gopinath
08 Apr 1990
TL;DR: A hierarchical algo- rithm for performing dynamic load balancing in a distri- buted system to keep all nodes normally loaded by migrating processes from heavily loaded nodes to lightly loaded nodes.
Abstract: In this paper we present a hierarchical algo- rithm for performing dynamic load balancing in a distri- buted system The processors in the system are viewed as being in a lightly loaded, heavily loaded, or normally loaded state The goal of the algorithm is to keep all nodes normally loaded by migrating processes from heavily loaded nodes to lightly loaded nodes In addi- tion, the load balancing must involve low communica- tion overhead and respond quickly to load imbalance in the system The system is partitioned into disjoint groups of processors First intra-partition process migra- tion is performed to achieve an acceptable load distribu- tion If this is not sufficient, inter-partition load balanc- ing is carried out

Proceedings ArticleDOI
02 Dec 1990
TL;DR: The paper presents theoretical analysis of the deterministic complexity of the load balancing problem (LBP) and shows certain cases of the LBP to be NP-complete.
Abstract: The paper presents theoretical analysis of the deterministic complexity of the load balancing problem (LBP). Because of difficulty of the general problem, research in the area mostly restricts itself to probabilistic or approximation algorithms, or to the average behavior of a network. The paper provides deterministic analysis of the problem for general networks. It focuses on the worst-case complexity analysis of the problem. It shows certain cases of the LBP to be NP-complete. The paper also discusses situations closely related to computer networks, where there is a global view of load distribution in the network; it provides a polynomial algorithm for solving the load balancing problem in this network. >

Proceedings ArticleDOI
01 Jan 1990
TL;DR: In this paper, a parallel ray tracing algorithm for DMPC using a Shared Virtual Memory (SVM) is presented, which is implemented on a hypercube iPSC/2 and results are given.
Abstract: The production of realistic image generated by computer requires a huge amount of computation and a large memory capacity. The use of highly parallel computers allows this process to be performed faster. Distributed memory parallel computers (DMPCs), such as hypercubes or transputer-based machines, offer an attractive performance/cost ratio when the load balancing has been balance and the partition of the data domain has been performed. This paper presents a parallel ray tracing algorithm for DMPC using a Shared Virtual Memory (SVM) which solves these two classical problems. This algorithm has been implemented on a hypercube iPSC/2 and results are given.

Proceedings ArticleDOI
08 Apr 1990
TL;DR: In this paper, the authors present a structured scheme for allowing a programmer to specify the mapping of data to distributed memory multiprocessors, allowing the programmer specify information about communication patterns as well as information about distributing data structures onto processors (including partitioning with replication).
Abstract: : The authors present a structured scheme for allowing a programmer to specify the mapping of data to distributed memory multiprocessors. This scheme lets the programmer specify information about communication patterns as well as information about distributing data structures onto processors (including partitioning with replication). This mapping scheme allows the user to map arrays of data to arrays of processors. The user specifies how each axis of the data structure is mapped onto an axis of the processor structure. This mapping may either be one to one or one to many depending on the parallelism, load balancing, and communication requirements. The authors discuss the basics of how this scheme is implemented in the DINO language, the areas in which it has worked well, the few areas in which there were significant problems, and some ideas for future improvements.

Patent
09 Mar 1990
TL;DR: In this article, an internal routing algorithm is provided in the internal network to balance the workload among MHP's based on the characteristics of a circuit identification code (CIC), and an identifying trunk circuit is uniformly distributed, and a load unbalancing factor is formulated approximately as a function of the number of failed MHPs as a worst case.
Abstract: An internal routing method in an internal network for improving signaling message processing capability by balancing the workload among message handling processors (MHP) in common channel signaling system used in electronic exchanges. An internal routing algorithm is provided in the internal network to balance the workload among MHP's based on the characteristics of a circuit identification code (CIC). An identifying trunk circuit is uniformly distributed, and a load unbalancing factor is formulated approximately as a function of the number of failed MHP's as a worst case. With this approach, all MHP's are loaded almost equally regardless of the number of failed MHP's.

Proceedings ArticleDOI
08 Apr 1990
TL;DR: This paper presents a new approach to parallelizing particle-in-cell (PIC) algorithms used in the numeri- cal simulation of three-dimensional plasmas on MIMD multicomputers with two new concepts: unitary load balance and hierarchical decomposition.
Abstract: This paper presents a new approach to parallelizing particle-in-cell (PIC) algorithms used in the numeri- cal simulation of three-dimensional plasmas on MIMD multicomputers. Two new concepts are introduced: unitary load balance and hierarchical decomposition. The combined load for particle and field calculations ouer the time step is balanced together to form a single spatial decomposition. The unitary load scheme permits the load to be approzimately balanced while requiring less communication. Decomposition and dynamic bal- ancing as performed in each of the coordinate directions independently (hierarchical), and is particularly efi- cient when load imbalance propagates preferentially in a given direction. The hierarchical decomposition also minimizes the amount of particles that cross bound- ary regions, thereby decreasing communication. A local load balancing method is also introduced which allows rows or columns of processors to perform dynamic load balancing locally and in parallel.