Showing papers on "Load balancing (computing) published in 1990"

PDF

Open Access

Journal Article•DOI•

Optimal network reconfigurations in distribution systems. I. A new formulation and a solution methodology

[...]

Hsiao-Dong Chiang¹, R. Jean-Jumeau¹•Institutions (1)

01 Oct 1990-IEEE Transactions on Power Delivery

TL;DR: In this article, a two-stage solution methodology based on a modified simulated annealing technique and the epsilon -constraint method for general multiobjective optimization problems is developed.

...read moreread less

Abstract: A new formulation of the network reconfiguration problem for both loss reduction and load balancing that takes into consideration load constraints and operational constraints is presented. The number of switch-on/switch-off operations involved in network reconfiguration is put into a constraint. The new formulation is a constrained, multiobjective and nondifferential optimization problem with both equality and inequality constraints. A two-stage solution methodology based on a modified simulated annealing technique and the epsilon -constraint method for general multiobjective optimization problems is developed. A salient feature of the solution methodology is that it allows designers to find a desirable, global noninferior solution for the problem. An effective scheme to speed up the solution methodology is presented and analyzed. >

...read moreread less

341 citations

Journal Article•DOI•

Load balancing and Poisson equation in a graph

[...]

J. E. Boillat¹•Institutions (1)

University of Bern¹

01 Nov 1990-Concurrency and Computation: Practice and Experience

TL;DR: A fully distributed dynamic load balancing algorithm for parallel MIMD architectures that can be described as a system of identical parallel processes, each running on a processor of an arbitrary interconnected network of processors is presented.

...read moreread less

Abstract: We present a fully distributed dynamic load balancing algorithm for parallel MIMD architectures. The algorithm can be described as a system of identical parallel processes, each running on a processor of an arbitrary interconnected network of processors. We show that the algorithm can be interpreted as a Poisson (heath) equation in a graph. This equation is analysed using Markov chain techniques and is proved to converge in polynomial time resulting in a global load balance. We also discuss some important parallel architectures and interconnection schemes such as linear processor arrays, tori, hypercubes, etc. Finally we present two applications where the algorithm has been successfully embedded (process mapping and molecular dynamic simulation).

...read moreread less

229 citations

Journal Article•DOI•

Distributed hierarchical control for parallel processing

[...]

Dror G. Feitelson¹, Larry Rudolph¹•Institutions (1)

Hebrew University of Jerusalem¹

01 May 1990-IEEE Computer

TL;DR: A description is given of a novel design, using a hierarchy of controllers, that effectively controls a multiuser, multiprogrammed parallel system that allows dynamic repartitioning according to changing job requirements.

...read moreread less

Abstract: A description is given of a novel design, using a hierarchy of controllers, that effectively controls a multiuser, multiprogrammed parallel system. Such a structure allows dynamic repartitioning according to changing job requirements. The design goals are examined, and the principles of distributed hierarchical control are presented. Control over processors is discussed. Mapping and load balancing with distributed hierarchical control are considered. Support for gang scheduling as well as availability and fault tolerance is addressed. The use of distributed hierarchical control in memory management and I/O is discussed. >

...read moreread less

166 citations

Journal Article•DOI•

Adaptive optimal load balancing in a nonhomogeneous multiserver system with a central job scheduler

[...]

F. Bonomi¹, Anurag Kumar¹•Institutions (1)

Bell Labs¹

01 Oct 1990-IEEE Transactions on Computers

TL;DR: It is shown that if the arrival streams are all Poisson and all jobs have the same exponentially distributed service requirements, the probabilistic splitting of the generic stream that minimizes the average job response time is such that it balances the server idle times in a weighted least-squares sense, where the weighting coefficients are related to the service speeds of the servers.

...read moreread less

Abstract: A model comprising several servers, each equipped with its own queue and with possibly different service speeds, is considered. Each server receives a dedicated arrival stream of jobs; there is also a stream of generic jobs that arrive to a job scheduler and can be individually allocated to any of the servers. It is shown that if the arrival streams are all Poisson and all jobs have the same exponentially distributed service requirements, the probabilistic splitting of the generic stream that minimizes the average job response time is such that it balances the server idle times in a weighted least-squares sense, where the weighting coefficients are related to the service speeds of the servers. The corresponding result holds for nonexponentially distributed service times if the service speeds are all equal. This result is used to develop adaptive quasi-static algorithms for allocating jobs in the generic arrival stream when the load parameters are unknown. The algorithms utilize server idle-time measurements which are sent periodically to the central job scheduler. A model is developed for these measurements, and the result mentioned is used to cast the problem into one of finding a projection of the root of an affine function, when only noisy values of the function can be observed. >

...read moreread less

146 citations

Journal Article•DOI•

Adaptive load sharing in heterogeneous distributed systems

[...]

Ravi Mirchandaney¹, Don Towsley², John A. Stankovic²•Institutions (2)

Yale University¹, University of Massachusetts Amherst²

01 Aug 1990-Journal of Parallel and Distributed Computing

TL;DR: This paper formulate queuing theoretic models for each of the algorithms operating in heterogeneous systems under the assumption that the job arrival process at each node in Poisson and the service times and job transfer times are exponentially distributed.

...read moreread less

126 citations

Proceedings Article•DOI•

Deceit: a flexible distributed file system

[...]

Alex Siegel¹, Kenneth P. Birman¹, Keith Marzullo¹•Institutions (1)

Cornell University¹

08 Nov 1990

TL;DR: Deceit as mentioned in this paper is a distributed file system that provides flexibility in the fault-tolerance and availability of files, and provides many capabilities to the user: file replication with concurrent reads and writes, a range of update propagation strategies, automatic disk load balancing and the ability to have multiple versions of a file.

...read moreread less

Abstract: Deceit, a distributed file system that provides flexibility in the fault-tolerance and availability of files, is described. Deceit provides many capabilities to the user: file replication with concurrent reads and writes, a range of update propagation strategies, automatic disk load balancing and the ability to have multiple versions of a file. Deceit provides Sun Network File Server (NFS) protocol compatibility; no change in NFS client software is necessary in order to use Deceit. The purpose of Deceit is to replace large collections of NFS servers. NFS suffers from several problems in an environment where most clients mount most servers. First, if any one server crashes, clients will block or fail when they try to access that server, and, as the number of servers increases, this problem becomes more likely. Second, servers have a (roughly) fixed capacity, yet it is difficult to move files from one NFS server to another without disrupting clients. Third, replicating a file to increase its availability must be managed by the user. Deceit addresses these three problems. >

...read moreread less

104 citations

Journal Article•DOI•

Task scheduling for parallel sparse Cholesky factorization

[...]

G. A. Geist¹, E. Ng•Institutions (1)

Oak Ridge National Laboratory¹

01 Jul 1990-International Journal of Parallel Programming

TL;DR: A solution to the problem of partitioning the work for sparse matrix factorization to individual processors on a multiprocessor system and results from the Intel iPSC/2 are presented for various finite-element problems using both nested dissection and minimum degree orderings.

...read moreread less

Abstract: This paper presents a solution to the problem of partitioning the work for sparse matrix factorization to individual processors on a multiprocessor system. The proposed task assignment strategy is based on the structure of the elimination tree associated with the given sparse matrix. The goal of the task scheduling strategy is to achieve load balancing and a high degree of concurrency among the processors while reducing the amount of processor-to-processor data comnication, even for arbitrarily unbalanced elimination trees. This is important because popular fill-reducing ordering methods, such as the minimum degree algorithm, often produce unbalanced elimination trees. Results from the Intel iPSC/2 are presented for various finite-element problems using both nested dissection and minimum degree orderings.

...read moreread less

98 citations

Journal Article•DOI•

Analysis of a graph coloring based distributed load balancing algorithm

[...]

S. H. Hosseini¹, B. E. Litow¹, Mohammad Malkawi¹, J. McPherson¹, K. Vairavan¹ - Show less +1 more•Institutions (1)

University of Wisconsin–Milwaukee¹

01 Sep 1990-Journal of Parallel and Distributed Computing

TL;DR: In the case of the binary n -cube processor network, it is proved that after n steps of the integer version, for any initial load distribution, each processor has a load not more than n /2 away from the average.

...read moreread less

96 citations

Deceit: A Flexible Distributed File System.

[...]

Alex Siegel¹, Kenneth P. Birman¹, Keith Marzullo¹•Institutions (1)

Cornell University¹

01 Jan 1990

TL;DR: Deceit is described, a distributed file system that provides flexibility in the fault-tolerance and availability of files, and provides Sun Network File Server (NFS) protocol compatibility; no change in NFS client software is necessary in order to use Deceit.

...read moreread less

92 citations

Proceedings Article•DOI•

Real time radiosity through parallel processing and hardware acceleration

[...]

Daniel R. Baum, James M. Winget

01 Feb 1990

TL;DR: A novel implementation of the progressive refinement radiosity algorithm is described using the capabilities of a multiprocessor graphics workstation and speedups of a factor of 40 or more over the equivalent software implementation are observed.

...read moreread less

Abstract: This paper describes a novel implementation of the progressive refinement radiosity algorithm. Algorithm performance is greatly enhanced using the capabilities of a multiprocessor graphics workstation. Hemi-cube item buffers are produced using the graphics hardware while the remaining computations are performed in parallel on the multiple host processors. Speedups of a factor of 40 or more over the equivalent software implementation are observed. Load balancing issues are discussed and a system performance model is developed based on actual results.Additionally, a new user interface scheme is presented where the radiosity calculations and walk-through tasks are separated. At each new iteration, the radiosity algorithm automatically updates colors used by the viewing program via shared memory while simultaneously obtaining hints on where to further refine the solution.

...read moreread less

74 citations

Proceedings Article•DOI•

An effective algorithm for parallelizing sort merge joins in the presence of data skew

[...]

Joel L. Wolf¹, Daniel M. Dias¹, Philip S. Yu¹•Institutions (1)

IBM¹

01 Jul 1990

TL;DR: A parallel sort merge join algorithm which uses a divide-and-conquer approach to address the data skew problem, and is shown to be very robust relative to the degree of data skew and the total number of processors.

...read moreread less

Abstract: Parallel processing of relational queries has received considerable attention of late. However, in the presence of data skew, the speedup from conventional parallel join algorithms can be very limited, due to load imbalances among the various processors. Even a single large skew element can cause a processor to become overloaded. In this paper, we propose a parallel sort merge join algorithm which uses a divide-and-conquer approach to address the data skew problem. The proposed algorithm adds an extra scheduling phase to the usual sort, transfer and join phases. During the scheduling phase, a parallelizable optimization algorithm, using the output of the sort phase, attempts to balance the load across the multiple processors in the subsequent join phase. The algorithm naturally identifies the largest skew elements, and assigns each of them to an optimal number of processors. Assuming a Zipf-like distribution for data skew, the algorithm is demonstrated to achieve very good load balancing for the join phase in a CPU-bound environment, and is shown to be very robust relative to the degree of data skew and the total number of processors.

...read moreread less

Journal Article•DOI•

A study of two-phase service

[...]

Chandan Krishna¹, Yann-Hang Lee²•Institutions (2)

University of Massachusetts Amherst¹, University of Florida²

01 Mar 1990-Operations Research Letters

TL;DR: Some problems in distributed system control, such as load balancing, routing, scheduling in a real-time environment, and reconfiguration require two-phase execution at a central server.

...read moreread less

Proceedings Article•DOI•

A multi-level load balancing scheme for OR-parallel exhaustive search programs on the multi-PSI

[...]

Masakazu Furuichi¹, K. Taki, N. Ichiyoshi•Institutions (1)

Mitsubishi Electric¹

01 Feb 1990

TL;DR: This work has developed a dynamic load balancing scheme which is applicable to OR-parallel programs in general and scalable to any number of processors because of this multi-level hierarchical structure.

...read moreread less

Abstract: Good load balancing is the key to deriving maximal performance from multiprocessors. Several successful dynamic load balancing techniques on tightly-coupled multiprocessors have been developed. However, load balancing is more difficult on loosely-coupled multiprocessors because inter-processor communication overheads cost more. Dynamic load balancing techniques have been employed in a few programs on loosely-coupled multiprocessors, but they are tightly built into the particular programs and not much attention is paid to scalability. We have developed a dynamic load balancing scheme which is applicable to OR-parallel programs in general. Processors are grouped, and work loads of groups and processors are balanced hierarchically. Moreover, it is scalable to any number of processors because of this multi-level hierarchical structure. The scheme is tested for the all-solution exhaustive search Pentomino program on the mesh-connected loosely-coupled multiprocessor Multi-PSI, and speedups of 28.4 times with 32 processors and 50 times with 64 processors have been attained.

...read moreread less

Journal Article•DOI•

Performance of global load balancing by local adjustment

[...]

Bruce Hajek¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Nov 1990-IEEE Transactions on Information Theory

TL;DR: It is shown that there is typically a large set of resource locations that all have the minimum load, and that for large average loads the maximum load is near the average load.

...read moreread less

Abstract: A set of M resource locations and a set of alpha M consumers are given. Each consumer requires a specified amount of resource, and is constrained to obtain the resource from a specified subset of locations. The problem of assigning consumers to resource locations so as to balance the load among the resource locations as much as possible is considered. It is shown that there are assignments, termed uniformly most-balanced assignments, that simultaneously minimize certain symmetric, separable, convex cost functions. The problem of finding such assignments is equivalent to a network flow problem with convex cost. Algorithms of both the iterative and combinatorial type are given for computing the assignments. The distribution function of the load at a given location for a uniformly most-balanced assignment is studied, assuming that the set of locations each consumer can use is random. An asymptotic lower bound on the distribution function is given for M tending to infinity, and an upper bound is given on the probable maximum load. It is shown that there is typically a large set of resource locations that all have the minimum load, and that for large average loads the maximum load is near the average load. >

...read moreread less

Journal Article•DOI•

Parallel Quicksort using fetch-and-add

[...]

Philip Heidelberger¹, A. Norton¹, J.T. Robinson¹•Institutions (1)

IBM¹

01 Jan 1990-IEEE Transactions on Computers

TL;DR: A parallelization of the Quicksort algorithm that is suitable for execution on a shared memory multiprocessor with an efficient implementation of the fetch-and-add operation is presented.

...read moreread less

Abstract: A parallelization of the Quicksort algorithm that is suitable for execution on a shared memory multiprocessor with an efficient implementation of the fetch-and-add operation is presented. The partitioning phase of Quicksort, which has been considered a serial bottleneck, is cooperatively executed in parallel by many processors through the use of fetch-and-add. The parallel algorithm maintains the in-place nature of Quicksort, thereby allowing internal sorting of large arrays. A class of fetch-and-add-based algorithms for dynamically scheduling processors to subproblems is presented. Adaptive scheduling algorithms in this class have low overhead and achieve effective processor load balancing. The basic algorithm is shown to execute in an average of O(log(N)) time on an N-processor PRAM (parallel random-access machine) assuming a constant-time fetch-and-add. Estimated speedups, based on simulations, are also presented for cases when the number of items to be sorted is much greater than the number of processors. >

...read moreread less

Proceedings Article•DOI•

Optimal static load balancing of multi-class jobs in a distributed computer system

[...]

Chonggun Kim¹, Hisao Kameda¹•Institutions (1)

University of Electro-Communications¹

28 May 1990

TL;DR: A straightforward and efficient algorithm for optimal load balancing of multiclass jobs is derived and it is shown that for obtaining the optimal solution the authors' algorithm and the Dafermos algorithm require comparable computation times that are far less than that of the FD algorithm.

...read moreread less

Abstract: This model is an extension of the Tantawi and Towsley (1985) single-job-class model as applied to a multiple-job-class model. Some properties of the optimal solution are shown. On the basis of these properties, a straightforward and efficient algorithm for optimal load balancing of multiclass jobs is derived. The performance of this algorithm is compared with that of two other well-known algorithms for multiclass jobs, the flow deviation (FD) algorithm and the Dafermos algorithm. The authors' algorithm and the FD algorithm both require a comparable amount of storage that is far less than that required by the Dafermos algorithm. Numerical experiments show that for obtaining the optimal solution the authors' algorithm and the Dafermos algorithm require comparable computation times that are far less than that of the FD algorithm. >

...read moreread less

Proceedings Article•DOI•

Visual Animation of Parallel Algorithms for Matrix Computations

[...]

Michael T. Heath¹•Institutions (1)

Oak Ridge National Laboratory¹

08 Apr 1990

TL;DR: The effects of various strategies in parallel algorithm design, including interconnection topologies, global communication patterns, data mapping schemes, load balancing, and pipelining techniques for overlapping communication with computation are illustrated.

...read moreread less

Abstract: In this talk we show how graphical animation of the behavior of parallel algorithms can facilitate the design and performance enhancement of algorithms for matrix computations on parallel computer architectures. Using a portable instrumented communication library and a graphical animation package developed at Oak Ridge National Laboratory, we illustrate the effects of various strategies in parallel algorithm design, including interconnection topologies, global communication patterns, data mapping schemes, load balancing, and pipelining techniques for overlapping communication with computation. In this talk we focus on distributed-memory parallel architectures in which the processors communicate by passing messages. The linear algebra problems we consider include matrix factorization and the solution of triangular systems.

...read moreread less

Journal Article•DOI•

Resource management in large distributed systems

[...]

Andrzej Goscinski¹, Mirion Bearman²•Institutions (2)

University of New South Wales¹, University of Canberra²

01 Sep 1990-Operating Systems Review

TL;DR: This paper proposes that a resource management system for large distributed systems should have two levels --- a lower one, responsible for export and allocation of resources in local distributed systems, and an upper one, which manages special resources/services that are not provided locally.

...read moreread less

Abstract: In this paper, we propose that a resource management system for large distributed systems should have two levels --- a lower one, responsible for export and allocation of resources in local distributed systems, and an upper one, which manages special resources/services that are not provided locally. For a local environment, load balancing (implementing export and allocation of computational resources) is realized in a distributed way; and management of peripheral resources is developed based on a name server, which can be centralized, or distributed and replicated. The upper level has a centralized resource management center, which is responsible for export and allocation of both peripheral and computational resources. It contains two parts: a name server, which stores attributed names of all shareable resources and a resource manager, which allocates resources to requesting users of a large distributed system. Communication between the resource management center and the local systems is facilitated through integrating modules. This system is now designed based on the RHODOS distributed operating system.

...read moreread less

Proceedings Article•DOI•

Heuristic methods for dynamic load balancing in a message-passing supercomputer

[...]

Jian Xu¹, Kai Hwang²•Institutions (2)

IBM¹, University of Southern California²

01 Oct 1990

TL;DR: A parallel simulator with distributed load balancers is developed on an iPSC/2 hypercube system to determine the effects of various system utilizations, load imbalances, communication and migration overheads, and multicomputer sizes.

...read moreread less

Abstract: In this paper, a new adaptive scheme is presented for dynamic load balancing on a message-passing multicomputer. The scheme is based on using easy-to-implement heuristics and variable threshold in migrating processes among the multicomputer nodes. It uses a distributed control over all processor nodes as coordinated by a host processor. Four heuristic methods for process migration are presented, which are distinguished by choosing different policies for process migration and threshold update. A parallel simulator with distributed load balancers is developed on an iPSC/2 hypercube system. The load balancing scheme is evaluated on the effects of system utilization, load imbalance, communication and migration overhead, and multicomputer size. Relative merits of the four methods are revealed under various multicomputer conditions.

...read moreread less

Book•

Parallel Architectures and Parallel Algorithms for Integrated Vision Systems

[...]

Janak H. Patel¹, Alok Choudhary¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Jan 1990

TL;DR: This thesis addresses several issues in parallel architectures and parallel algorithms for integrated vision systems, and shows that SIMD, MIMD and systolic algorithms can be easily mapped onto processor clusters, and almost linear speedups are possible.

...read moreread less

Abstract: Computer vision has been regarded as one of the most complex and computationally intensive problems. An integrated vision system (IVS) is a system that uses vision algorithms from all levels of processing to perform for a high level application (e.g, object recognition). This thesis addresses several issues in parallel architectures and parallel algorithms for integrated vision systems. First, a model of computation for IVSs is presented. The model captures computational requirements, defines spatial and temporal data dependencies between tasks, and shows what types of interactions may occur between tasks from different levels of processing. The model is used to develop features and capabilities of a parallel architecture suitable for IVSs. A multiprocessor architecture for IVSs (called NETRA) is presented. NETRA is highly flexible without the use of complex interconnection schemes. NETRA is recursively defined hierarchical architecture whose leaf nodes consist of clusters processors connected with a programmable crossbar with a selective broadcast capability. Hence, it is easily scalable from small to large systems. Homogeneity of NETRA permits fault tolerance and graceful degradation under faults. Several refinements in the architecture over the original design are also proposed. Performance of several vision algorithms when they are mapped on one cluster is presented. It is shown that SIMD, MIMD and systolic algorithms can be easily mapped onto processor clusters, and almost linear speedups are possible. An extensive analysis of inter-cluster communication strategies in NETRA is presented. A methodology to evaluate performance of algorithms on NETRA is described. Performance analysis of parallel algorithms when mapped across clusters is presented. The parameters are derived from the characteristics of the parallel algorithms, which are then, used to evaluate the alternative communication strategies in NETRA. The effects of communication interference on the performance of algorithms are studied. It is observed that if communication speeds are matched with the computation speeds, almost linear speedups are possible when algorithms are mapped across clusters. Finally, several techniques to perform data decomposition, and static and dynamic load balancing for IVS algorithms are described. These techniques can be used to perform load balancing for intermediate and high level, data dependent vision algorithms. They are shown to perform well, using them on an implementation of a motion estimation system on a hypercube multiprocessor. (Abstract shortened with permission of author.)

...read moreread less

Proceedings Article•

Distributed linear hashing and parallel projection in main memory databases

[...]

Charles Severance, Sakti Pramanik, P. Wolberg

01 Sep 1990

TL;DR: This paper extends the concepts of the distributed linear hashed main memory file system with the objective of supporting higher level parallel dambase operations and investigating the performance of distributed linear hashing and parallel projection.

...read moreread less

Abstract: This paper extends the concepts of the distributed linear hashed main memory file system with the objective of supporting higher level parallel dambase operations The basic distributed linear hashing technique provides a high speed hash based dynamic file system on a NUMA atchitecture multi-processor system Distributed linear hashing has been extended to include the ability to perform high speed parallel scans of the hashed file The fast scan feature provides load balancing to compensate for uneven distributions of records and uneven processing speed among different processors These extensions are used to implement a parallel projection capability The performance of distributed linear hashing and parallel projection is investigated

...read moreread less

Journal Article•DOI•

Load balancing: a comprehensive solution to post-tensioning

[...]

Bijan O. Aalami

01 Nov 1990-Aci Structural Journal

TL;DR: In this article, the application of load balancing in a broader context as the emerging standard for analyzing post-tensioned buildings is reviewed and the governing relationships are introduced and discussed.

...read moreread less

Abstract: The paper reviews the application of load balancing in a broader context as the emerging standard for analyzing post-tensioned buildings. Terminology, concepts, and current procedures used in the extended scope of load balancing are presented and the governing relationships are introduced and discussed. The redistribution of elastically computed moments due to limited joint plastification is examined and numerical examples illustrate the application of load balancing to more complex structures and the importance of faithful representation of balanced loading.

...read moreread less

Compiling for locality of reference

[...]

Anne Rogers

01 Jan 1990

TL;DR: A parallelizing compiler is developed which, given a sequential program and a memory layout of its data, performs process decomposition while balancing parallelism against locality of reference, and several message optimizations that address the issues of overhead and synchronization in message transmission are discussed.

...read moreread less

Abstract: Parallel computers provide a large degree of computational power for programmers who are willing and able to harness it The introduction of high-level languages and good compilers made possible the wide use of sequential machines but the lack of such tools for parallel machines hinders their widespread acceptance and use Programmers must address issues such as process decomposition, synchronization, and load balancing This is a severe burden and opens the door to time-dependent bugs, such as race conditions between reads and writes which are extremely difficult to detect In this thesis, we use compile--time analysis and automatic restructuring of programs to exploit a two--level memory hierarchy Many multiprocessor architectures can be modelled as two-level memory hierarchies, including message-passing machines such as the Intel iPSC/2 We show that such an approach can exploit data locality while avoiding the overhead associated with run-time coherence management At the same time, it relieves the programmer from the burden of managing process decomposition and synchronization by automatically performing these tasks We have developed a parallelizing compiler which, given a sequential program and a memory layout of its data, performs process decomposition while balancing parallelism against locality of reference A process decomposition is obtained by specializing the program, for each processor, to the data that resides on that processor If this analysis fails, the compiler falls back to a simple but inefficient scheme called run-time resolution Each process''s role in the computation is determined by examining the data required for execution at run-time Thus, our approach to process decomposition is ``data-driven'''' rather than ``program-driven'''' We discuss several message optimizations that address the issues of overhead and synchronization in message transmission Accumulation reorganizes the computation of a commutative and associative operator to reduce message traffic Pipelining sends a value as close to its computation as possible to increase parallelism Vectorization of messages combines messages with the same source and the same destination to reduce overhead Our results from experiments in parallelizing SIMPLE, a large hydrodynamics benchmark, for the Intel iPSC/2, show a speed-up within sixty to seventy percent of hand-written code

...read moreread less

Journal Article•DOI•

Dynamic load balancing in a distributed system using a sender-initiated algorithm

[...]

Anna Hać¹, Xiaowei Jin²•Institutions (2)

Bell Labs¹, Johns Hopkins University²

01 Jan 1990-Journal of Systems and Software

TL;DR: The algorithms proposed here increase system performance through load balancing through increasing the maximum process queue length and the maximum amount of CPU time of active processes on each host.

...read moreread less

Proceedings Article•DOI•

A Hierarchical Approach to Load Balancing in Distributed Systems

[...]

Rajiv Gupta¹, P. Gopinath•Institutions (1)

Philips¹

08 Apr 1990

TL;DR: A hierarchical algo- rithm for performing dynamic load balancing in a distri- buted system to keep all nodes normally loaded by migrating processes from heavily loaded nodes to lightly loaded nodes.

...read moreread less

Abstract: In this paper we present a hierarchical algo- rithm for performing dynamic load balancing in a distri- buted system The processors in the system are viewed as being in a lightly loaded, heavily loaded, or normally loaded state The goal of the algorithm is to keep all nodes normally loaded by migrating processes from heavily loaded nodes to lightly loaded nodes In addi- tion, the load balancing must involve low communica- tion overhead and respond quickly to load imbalance in the system The system is partitioned into disjoint groups of processors First intra-partition process migra- tion is performed to achieve an acceptable load distribu- tion If this is not sufficient, inter-partition load balanc- ing is carried out

...read moreread less

Proceedings Article•DOI•

Deterministic load balancing in computer networks

[...]

Xiaotie Deng¹, Hai-Ning Liu, Bing Xiao•Institutions (1)

Simon Fraser University¹

02 Dec 1990

TL;DR: The paper presents theoretical analysis of the deterministic complexity of the load balancing problem (LBP) and shows certain cases of the LBP to be NP-complete.

...read moreread less

Abstract: The paper presents theoretical analysis of the deterministic complexity of the load balancing problem (LBP). Because of difficulty of the general problem, research in the area mostly restricts itself to probabilistic or approximation algorithms, or to the average behavior of a network. The paper provides deterministic analysis of the problem for general networks. It focuses on the worst-case complexity analysis of the problem. It shows certain cases of the LBP to be NP-complete. The paper also discusses situations closely related to computer networks, where there is a global view of load distribution in the network; it provides a polynomial algorithm for solving the load balancing problem in this network. >

...read moreread less

Proceedings Article•DOI•

An efficient parallel ray tracing scheme for highly parallel architectures

[...]

Didier Badouel, Thierry Priol

01 Jan 1990

TL;DR: In this paper, a parallel ray tracing algorithm for DMPC using a Shared Virtual Memory (SVM) is presented, which is implemented on a hypercube iPSC/2 and results are given.

...read moreread less

Abstract: The production of realistic image generated by computer requires a huge amount of computation and a large memory capacity. The use of highly parallel computers allows this process to be performed faster. Distributed memory parallel computers (DMPCs), such as hypercubes or transputer-based machines, offer an attractive performance/cost ratio when the load balancing has been balance and the partition of the data domain has been performed. This paper presents a parallel ray tracing algorithm for DMPC using a Shared Virtual Memory (SVM) which solves these two classical problems. This algorithm has been implemented on a hypercube iPSC/2 and results are given.

...read moreread less

Proceedings Article•DOI•

Mapping Data to Processors in Distributed Memory Computations

[...]

Matthew Rosing¹, Robert P. Weaver¹•Institutions (1)

University of Colorado Boulder¹

08 Apr 1990

TL;DR: In this paper, the authors present a structured scheme for allowing a programmer to specify the mapping of data to distributed memory multiprocessors, allowing the programmer specify information about communication patterns as well as information about distributing data structures onto processors (including partitioning with replication).

...read moreread less

Abstract: : The authors present a structured scheme for allowing a programmer to specify the mapping of data to distributed memory multiprocessors. This scheme lets the programmer specify information about communication patterns as well as information about distributing data structures onto processors (including partitioning with replication). This mapping scheme allows the user to map arrays of data to arrays of processors. The user specifies how each axis of the data structure is mapped onto an axis of the processor structure. This mapping may either be one to one or one to many depending on the parallelism, load balancing, and communication requirements. The authors discuss the basics of how this scheme is implemented in the DINO language, the areas in which it has worked well, the few areas in which there were significant problems, and some ideas for future improvements.

...read moreread less

Patent•

Internal routing method for load balancing

[...]

Keun Ku Lee¹, Jin Young Choi¹, Young Seob Cho¹, Hyeong Ho Lee¹•Institutions (1)

Electronics and Telecommunications Research Institute¹

09 Mar 1990

TL;DR: In this article, an internal routing algorithm is provided in the internal network to balance the workload among MHP's based on the characteristics of a circuit identification code (CIC), and an identifying trunk circuit is uniformly distributed, and a load unbalancing factor is formulated approximately as a function of the number of failed MHPs as a worst case.

...read moreread less

Abstract: An internal routing method in an internal network for improving signaling message processing capability by balancing the workload among message handling processors (MHP) in common channel signaling system used in electronic exchanges. An internal routing algorithm is provided in the internal network to balance the workload among MHP's based on the characteristics of a circuit identification code (CIC). An identifying trunk circuit is uniformly distributed, and a load unbalancing factor is formulated approximately as a function of the number of failed MHP's as a worst case. With this approach, all MHP's are loaded almost equally regardless of the number of failed MHP's.

...read moreread less

Proceedings Article•DOI•

Hierarchical Domain Decomposition With Unitary Load Balancing For Electromagnetic Particle-In-Cell Codes

[...]

P.M. Campbell, E.A. Carmona, David W. Walker

08 Apr 1990

TL;DR: This paper presents a new approach to parallelizing particle-in-cell (PIC) algorithms used in the numeri- cal simulation of three-dimensional plasmas on MIMD multicomputers with two new concepts: unitary load balance and hierarchical decomposition.

...read moreread less

Abstract: This paper presents a new approach to parallelizing particle-in-cell (PIC) algorithms used in the numeri- cal simulation of three-dimensional plasmas on MIMD multicomputers. Two new concepts are introduced: unitary load balance and hierarchical decomposition. The combined load for particle and field calculations ouer the time step is balanced together to form a single spatial decomposition. The unitary load scheme permits the load to be approzimately balanced while requiring less communication. Decomposition and dynamic bal- ancing as performed in each of the coordinate directions independently (hierarchical), and is particularly efi- cient when load imbalance propagates preferentially in a given direction. The hierarchical decomposition also minimizes the amount of particles that cross bound- ary regions, thereby decreasing communication. A local load balancing method is also introduced which allows rows or columns of processors to perform dynamic load balancing locally and in parallel.

...read moreread less