Showing papers in "International Journal of Parallel Programming in 1989"

PDF

Open Access

Journal Article•DOI•

Workcrews: an abstraction for controlling parallelism

[...]

01 Aug 1989-International Journal of Parallel Programming

TL;DR: This paper introduces a dynamic strategy called WorkCrews for controlling the use of parallelism on small-scale, tightly-coupled multiprocessors and favors coarse-grained subtasks, which reduces further the overhead of task decomposition.

...read moreread less

Abstract: In implementing parallel programs, it is important to find strategies for controlling parallelism that make the most effective use of available resources. In this paper, we introduce a dynamic strategy called WorkCrews for controlling the use of parallelism on small-scale, tightly-coupled multiprocessors. In the WorkCrew model, tasks are assigned to a finite set ofworkers. As in other mechanisms for specifying parallelism, each worker can enqueue subtasks for concurrent evaluation by other workers as they become idle. The WorkCrew paradigm has two advantages. First, much of the work associated with task division can be deferred until a new worker actually undertakes the subtask and avoided altogether if the original worker ends up executing the subtask serially. Second, the ordering of queue requests under WorkCrews favors coarse-grained subtasks, which reduces further the overhead of task decomposition.

...read moreread less

107 citations

Journal Article•DOI•

Parallel processing of biological sequence comparison algorithms

[...]

Elizabeth E. Edmiston¹, Nolan G. Core², Joel H. Saltz², Roger M. Smith²•Institutions (2)

Duke University¹, Yale University²

01 Jun 1989-International Journal of Parallel Programming

TL;DR: The results of initial investigations using the Intel iPSC/1 hypercube and the Connection Machine for parallel sequence comparisons have a wide applicability for the parallel processing of biological sequence comparisons.

...read moreread less

Abstract: Comparison of biological (DNA or protein) sequences provides insight into molecular structure, function, and homology, and is increasingly important as the available databases become larger and more numerous. One method of increasing the speed of the calculations is to perform them in parallel. We present the results of initial investigations using the Intel iPSC/1 hypercube and the Connection Machine (CM-I) for these comparisons. Since these machines have very different architectures, the issues and performance trade-offs discussed have a wide applicability for the parallel processing of biological sequence comparisons.

...read moreread less

76 citations

Journal Article•DOI•

Multiprocessor execution of functional programs

[...]

Benjamin Goldberg¹•Institutions (1)

New York University¹

01 Oct 1989-International Journal of Parallel Programming

TL;DR: This paper describes research that was performed to demonstrate that multiprocessor execution of functional programs on current multip rocessors is feasible, and results in a significant reduction in their execution times.

...read moreread less

Abstract: Functional languages have recently gained attention as vehicles for programming in a concise and elegant manner. In addition, it has been suggested that functional programming provides a natural methodology for programming multiprocessor computers. This dissertation demonstrates that multiprocessor execution of functional programs is feasible, and results in a significant reduction in their execution times. Two implementations of the functional language ALFL were built on commercially available multiprocessors. Alfalfa is an implementation on the Intel iPSC hypercube multiprocessor, and Buckwheat is an implementation on the Encore Multimax shared-memory multiprocessor. Each implementation includes a compiler that performs automatic decomposition of ALFL programs. The compiler is responsible for detecting the inherent parallelism in a program, and decomposing the program into a collection of tasks, called serial combinators, that can be executed in parallel. One of the primary goals of the compiler is to generate serial combinators exhibiting the coarsest granularity possibly without sacrificing useful parallelism. This dissertation describes the algorithms used by the compiler to analyze, decompose, and optimize functional programs. The abstract machine model supported by Alfalfa and Buckwheat is called heterogeneous graph reduction, which is a hybrid of graph reduction and conventional stack-oriented execution. This model supports parallelism, lazy evaluation, and higher order functions while at the same time making efficient use of the processors in the system. The Alfalfa and Buckwheat run-time systems support dynamic load balancing, interprocessor communication (if required), and storage management. A large number of experiments were performed on Alfalfa and Buckwheat for a variety of programs. The results of these experiments, as well as the conclusions drawn from them, are presented.

...read moreread less

62 citations

Journal Article•DOI•

A randomized parallel branch-and-bound algorithm

[...]

Virenda K. Janakiram¹, Edward F. Gehringer², Dharma P. Agrawel², Ravi Mehrotra³•Institutions (3)

AT&T¹, North Carolina State University², Arthur Andersen³

01 Jun 1989-International Journal of Parallel Programming

TL;DR: It is shown that the performance of randomized algorithms is less affected by factors that prevent most parallel deterministic algorithms from attaining their theoretical speedup bounds and reliability is enhanced because the failure of a single processor leads only to degradation, not failure, of the algorithm.

...read moreread less

Abstract: Randomized algorithms are algorithms that employ randomness in their solution method. We show that the performance of randomized algorithms is less affected by factors that prevent most parallel deterministic algorithms from attaining their theoretical speedup bounds. A major reason is that the mapping of randomized algorithms onto multiprocessors involves very little scheduling or communication overhead. Furthermore, reliability is enhanced because the failure of a single processor leads only to degradation, not failure, of the algorithm. We present results of an extensive simulation done on a multiprocessor simulator, running a randomized branch-and-bound algorithm. The particular case we consider is the knapsack problem, due to its ease of formulation. We observe the largest speedups in precisely those problems that take large amounts of time to solve.

...read moreread less

37 citations

Journal Article•DOI•

Algorithms for parallel memory allocation

[...]

Carla Schlatter Ellis¹, Thomas Olson²•Institutions (2)

Duke University¹, University of Rochester²

01 Aug 1989-International Journal of Parallel Programming

TL;DR: Four algorithms based on the first fit approach that provide different granularities of parallel access to the allocator's data structures are investigated, showing that simple algorithms are appropriate when the expected number of concurrent requests per memory is low and the request pattern is not bursty.

...read moreread less

Abstract: Dynamic storage allocation is a vital component of programming systems intended for multiprocessor architectures that support globally shared memory. Highly parallel algorithms for access to system data structures lie at the core of effective memory allocation strategies as well as solutions to other parallel systems problems. In this paper, we investigate four algorithms, all based on the first fit approach, that provide different granularities of parallel access to the allocator's data structures. These solutions employ a variety of design techniques including specialized locking protocols, the use of atomic fetch-and-Ф operations, and structural modifications. We describe experiments designed to compare the performance of these schemes. The results show that simple algorithms are appropriate when the expected number of concurrent requests per memory is low and the request pattern is not bursty. Algorithms that support finer granularity access while avoiding locking protocols are successful in a range of larger processor/memory ratios.

...read moreread less

28 citations

Journal Article•DOI•

Practical parallel union-find algorithms for transitive closure and clustering

[...]

George Cybenko¹, T. Allen, J. E. Polito•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Oct 1989-International Journal of Parallel Programming

TL;DR: The implementations indicate that transitive closure computations are intrinsically difficult for distributed memory parallel machines because of the need for global information, and the results for shared memory machines exhibited excellent speedups.

...read moreread less

Abstract: Practical parallel algorithms, based on classical sequential Union-Find algorithms for computing transitive closures of binary relations, are described and implemented for both shared memory and distributed memory parallel computers. By practical algorithms, we mean algorithms that are efficient for parallel systems with bounded numbers of processors as opposed to algorithms where the number of processors grows with the problem size. Transitive closures are useful for decomposing many applications problems into independent subproblems. The implementations were on an ENCORE Multimax shared memory machine and an NCUBE hypercube. Our implementations indicate that transitive closure computations are intrinsically difficult for distributed memory parallel machines because of the need for global information. By contrast, our results for shared memory machines exhibited excellent speedups.

...read moreread less

23 citations

Journal Article•DOI•

A garbage collection algorithm for shared memory parallel processors

[...]

J. Crammond¹•Institutions (1)

Imperial College London¹

01 Dec 1989-International Journal of Parallel Programming

TL;DR: A technique for adapting the Morris sliding garbage collection algorithm to execute on parallel machines with shared memory and how the technique for parallelizing the sequential algorithm can be adapted for a semi-space copying algorithm is described.

...read moreread less

Abstract: This paper describes a technique for adapting the Morris sliding garbage collection algorithm to execute on parallel machines with shared memory. The algorithm is described within the framework of an implementation of the parallel logic language Parlog. However, the algorithm is a general one and can easily be adapted to parallel Prolog systems and to other languages. The performance of the algorithm executing a few simple Parlog benchmarks is analyzed. Finally, it is shown how the technique for parallelizing the sequential algorithm can be adapted for a semi-space copying algorithm.

...read moreread less

22 citations

Journal Article•DOI•

Implementing a scheme-based parallel processing system

[...]

James S. Miller¹•Institutions (1)

Brandeis University¹

01 Oct 1989-International Journal of Parallel Programming

TL;DR: MultiScheme, the system resulting from these extensions, supports Halstead's future construct as the simple model for parallelism by revealing the underlying placeholders on top of which this construct is built, and supports a variety of additional parallel programming techniques.

...read moreread less

Abstract: The Scheme language can be converted into a parallel processing language by adding two new data types (placeholders andweak pairs), two processor synchronization primitives, and a task distribution mechanism. The mechanisms that support task creation, scheduling, and task synchronization are built using these extensions and features already present in the sequential language. Implementing the core of the parallel processing component in Scheme itself provides testbed for a variety of experiments and extensions. MultiScheme, the system resulting from these extensions, supports Halstead's future construct as the simple model for parallelism. By revealing the underlying placeholders on top of which this construct is built, Multischeme supports a variety of additional parallel programming techniques. It supports speculative computation through a simple procedural interface and the automatic garbage collection of tasks. The qlet and qlambda constructs of the QLisp language are also easily implemented in MultiScheme, as are the more familiar fork and join constructs of imperative programming.

...read moreread less

16 citations

Journal Article•DOI•

On partitioning and mapping for hypercube computing

[...]

L. M. Ni¹, Chung-Ta King²•Institutions (2)

Michigan State University¹, New Jersey Institute of Technology²

01 Dec 1989-International Journal of Parallel Programming

TL;DR: It is demonstrated that, for certain data parallel algorithms, it is possible to determine optimal design parameters analytically using a simple model for the NCUBE hypercube computer.

...read moreread less

Abstract: Designing efficient parallel algorithms in a message-based parallel computer should consider both time-space tradeoffs and computation-communication tradeoffs. In order to balance these tradeoffs and achieve the optimal performance of an algorith, one has to consider various design parameters such as the number of processors required and the size of partitions. In this paper, we demonstrate that, for certain data parallel algorithms, it is possible to determine these design parameters analytically. To serve as a basis for the discussions that follow, a simple model for the NCUBE hypercube computer is introduced. Using this model, we use two examples, array summation and matrix multiplication, to illustrate how their performance can be modeled. By optimizing these expressions, one is able to determine optimal design parameters which arrive at efficient execution. Experiments on a 64-node NCUBE verified the accuracy of the analytic results and are used to further support the discussions.

...read moreread less

11 citations

Journal Article•DOI•

Comments on Samal and Henderson: :20parallel consistent labeling algorithms”

[...]

M. J. Swain¹•Institutions (1)

University of Rochester¹

01 Dec 1989-International Journal of Parallel Programming

TL;DR: It is argued that Samal and Henderson's argument makes assumptions about how processors are used and given a counterexample that enforces arc consistency in a constant number of steps usingO(n[su2a22na) processors, suggesting that the lower bound holds for a polynomial number of processors.

...read moreread less

Abstract: Samal and Henderson claim that any parallel algorithm for enforcing arc consistency in the worst case must have Ω(na) sequential steps, wheren is the number of nodes, anda is the number of labels per node. We argue that Samal and Henderson's argument makes assumptions about how processors are used and give a counterexample that enforces arc consistency in a constant number of steps usingO(n[su2a22na) processors. It is possible that the lower bound holds for a polynomial number of processors; if such a lower bound were to be proven it would answer an important open question in theoretical computer science concerning the relation between the complexity classesP andNC. The strongest existing lower bound for the arc consistency problem states that it cannot be solved in polynomial log time unlessP=NC.

...read moreread less

8 citations

Journal Article•DOI•

A backtracking algorithm for the stream AND-parallel execution of logic programs

[...]

Zoltan Somogyi¹, Kotagiri Ramamohanarao¹, Jayen Vaghani¹•Institutions (1)

University of Melbourne¹

01 Jun 1989-International Journal of Parallel Programming

TL;DR: It is shown that modes can increase the precision of the backtracking algorithm, though the algorithm allows this precision to be traded off against overhead on a procedure- by-procedure and call-by-call basis.

...read moreread less

Abstract: We present the first backtracking algorithm for stream AND-parallel logic programs. It relies on compile-time knowledge of the dataflow graph of each clause to let it figure out efficiently which goals to kill or restart when a goal fails. This crucial information, which we derive from mode declarations, was not available at compile-time in any previous stream AND-parallel system.

...read moreread less