scispace - formally typeset
Search or ask a question

Showing papers on "Parallel algorithm published in 1981"


Journal ArticleDOI
TL;DR: An algorithm is proposed that creates mutual exclusion in a computer network whose nodes communicate only by messages and do not share memory, and it is shown that the number can be contained in a fixed amount of memory by storing it as the residue of a modulus.
Abstract: An algorithm is proposed that creates mutual exclusion in a computer network whose nodes communicate only by messages and do not share memory. The algorithm sends only 2*(N - 1) messages, where N is the number of nodes in the network per critical section invocation. This number of messages is at a minimum if parallel, distributed, symmetric control is used; hence, the algorithm is optimal in this respect. The time needed to achieve mutual exclusion is also minimal under some general assumptions. As in Lamport's "bakery algorithm," unbounded sequence numbers are used to provide first-come firstserved priority into the critical section. It is shown that the number can be contained in a fixed amount of memory by storing it as the residue of a modulus. The number of messages required to implement the exclusion can be reduced by using sequential node-by-node processing, by using broadcast message techniques, or by sending information through timing channels. The "readers and writers" problem is solved by a simple modification of the algorithm and the modifications necessary to make the algorithm robust are described.

702 citations


Journal ArticleDOI
TL;DR: An approach to carrying out asynchronous, distributed simulation on multiprocessor messagepassing architectures because the amount of memory required by all processors together is bounded and no more than the amount required in sequential simulation.
Abstract: An approach to carrying out asynchronous, distributed simulation on multiprocessor messagepassing architectures is presented. This scheme differs from other distributed simulation schemes because (1) the amount of memory required by all processors together is bounded and is no more than the amount required in sequential simulation and (2) the multiprocessor network is allowed to deadlock, the deadlock is detected, and then the deadlock is broken. Proofs for the correctness of this approach are outlined.

686 citations


Journal ArticleDOI
TL;DR: An algorithm is given for routing in permutation networks-that is, for computing the switch settings that implement a given permutation.
Abstract: An algorithm is given for routing in permutation networks-that is, for computing the switch settings that implement a given permutation. The algorithm takes serial time O(n(log N)2) (for one processor with random access to a memory of O(n) words) or parallel time O((log n)3) (for n synchronous processors with conflict-free random access to a common memory of O(n) words). These time bounds may be reduced by a further logarithmic factor when all of the switch sizes are integral powers of two.

282 citations


Book ChapterDOI
10 Jun 1981
TL;DR: A model for synchronized parallel computation is described in which all p processors have access to a common memory and this model is used to solve the problems of finding the maximum, merging, and sorting by p processors.
Abstract: A model for synchronized parallel computation is described in which all p processors have access to a common memory. this model is used to solve the problems of finding the maximum, merging, and sorting by p processors.

252 citations


Journal ArticleDOI
Y.F Tsao1, King-Sun Fu1
TL;DR: A parallel algorithm for three-dimensional object thinning with two approaches, path connectivity and surface connectivity, are presented and criteria to avoid excessive deletion and preserve connectivity are described.

239 citations


Journal ArticleDOI
B. Lint1, T. Agerwala
TL;DR: Several models of synchronous and asynchronous parallel computation and their use in analyzing algorithms and the importance of interprocessor communication in parallel processing is demonstrated.
Abstract: As multiple processor systems become more widely accepted the importance of parallel programming increases. In this paper, approaches to the design and analysis of parallel algorithms are investigated. Through several examples, the importance of interprocessor communication in parallel processing is demonstrated. Various techniques that are applicable in the design and analysis of parallel algorithms are examined with emphasis on those techniques that incorporate communication aspects. The paper discusses several models of synchronous and asynchronous parallel computation and their use in analyzing algorithms. Relatively primitive methodologies for designing parallel algorithms are discussed and the need for more general and practical methodologies is indicated.

86 citations


Book ChapterDOI
01 Jan 1981
TL;DR: The chapter describes some of the less known features of full approximation scheme multigrid processing, such as the high efficiency in solving nonlinear and eigenvalue problems as well as chains of many similar problems.
Abstract: Publisher Summary This chapter describes the potential of multigrid or, more generally, multilevel adaptive techniques on computers with many processors, and the interconnection requirements they pose. The description is in terms of finite-difference formulations; however, analogous methods exist also in finite-element formulations. The chapter describes some of the less known features of full approximation scheme multigrid processing, such as the high efficiency in solving nonlinear and eigenvalue problems as well as chains of many similar problems. The features should be taken into account both in designing the parallel algorithms, and in comparing the multigrid performance to other methods. The same efficiency, with the same interconnection schemes, is obtained even when flexible local refinements are incorporated. The same operation count is also obtained in solving initial-value problems. The constants in these operation counts are likely to be dominated by the amount of processing required at grid points on or near the boundaries.

85 citations


Proceedings ArticleDOI
28 Oct 1981
TL;DR: A probabilistic parallel algorithm to sort n keys drawn from some arbitrary total ordered set such that the average runtime is bounded by O(log n), which means the product of time and number of processors meets the information theoretic lower bound for sorting.
Abstract: We describe a probabilistic parallel algorithm to sort n keys drawn from some arbitrary total ordered set. This algorithm can be implemented on a parallel computer consisting of n RAMs, each with small private memory, and a common memory of size O(n) such that the average runtime is bounded by O(log n). Hence for this algorithm the product of time and number of processors meets the information theoretic lower bound for sorting.

61 citations


Proceedings Article
24 Aug 1981
TL;DR: A problem solver is constructed that combines the metaphors of constraint propagation and hypothesize-and-test and is empirically found that the parallel algorithm is, on the average, more efficient than a corresponding sequential one.
Abstract: The role of parallel processing in heuristic search is examined by means of an example (cryptarithmetic addition). A problem solver is constructed that combines the metaphors of constraint propagation and hypothesize-and-test. The system is capable of working on many incompatible hypotheses at one time. Furthermore, it is capable of allocating different amounts of processing power to running activities and changing these allocations as computation proceeds. It is empirically found that the parallel algorithm is, on the average, more efficient than a corresponding sequential one. Implication* of this for problem solving in general are discussed.

43 citations


Book ChapterDOI
10 Jun 1981
TL;DR: The success in using binary trees for parallel computations, indicates that the binary tree is an important and useful design tool for parallel algorithms.
Abstract: This paper examines the use of binary trees in the design of efficient parallel algorithms. Using binary trees, we develop efficient algorithms for several scheduling problems. The shared memory model for parallel computation is used. Our success in using binary trees for parallel computations, indicates that the binary tree is an important and useful design tool for parallel algorithms.

26 citations



ReportDOI
19 Aug 1981
TL;DR: It is argued on the basis asymptotic analysis that a constant corridor width is preferred even though such lattices cannot make full use of the processor elements for most complex interconnection patterns, e.g., universal interconnection structures like the cube connected cycles and shuffle exchange.
Abstract: : The main question under study is how wide the corridor width should be for the switch lattice of the Configurable, Highly Parallel (CHiP) computer. (The CHiP computer family is introduced and its use for parallel algorithm composition is motivated.) It is argued on the basis asymptotic analysis that a constant corridor width is preferred even though such lattices cannot make full use of the processor elements for most complex interconnection patterns, e.g., universal interconnection structures like the cube connected cycles and shuffle exchange, and for certain 'simple' ones, e.g., certain planar graphs. (Author)

Journal ArticleDOI
TL;DR: The proposed algorithms are analytically compared in terms of their time efficiencies and speed-up ratios and theoretical results for their convergence and orders of convergence are presented.
Abstract: In this paper, parallel algorithms are proposed for solving both systems of nonlinear algebraic equations and unconstrained optimization problems. Theoretical results for their convergence and orders of convergence are also presented. The proposed algorithms are analytically compared in terms of their time efficiencies and speed-up ratios.

Proceedings ArticleDOI
28 Oct 1981
TL;DR: It is pointed out that analyses of parallelism in computational problems have practical implications even when multi-processor machines are not available, and a unified framework for cases like this is presented.
Abstract: The goal of this paper is to point out that analyses of parallelism in computational problems have practical implications even when multi-processor machines are not available. This is true because, in many cases, a good parallel algorithm for one problem may turn out to be useful for designing an efficient serial algorithm for another problem. A unified framework for cases like this is presented. Particular cases, which are discussed in this paper, provide motivation for examining parallelism in problems like sorting, selection, minimum-spanning-tree, shortest route, maxflow, matrix multiplication, as well as scheduling and locational problems.

Proceedings ArticleDOI
28 Oct 1981
TL;DR: In this paper, a model of VLSI computation suitable for the description of algorithms at a high level is introduced, which is basically a language to express parallel computations which can be efficiently implemented by a VSLI circuit.
Abstract: A model of VLSI computation suitable for the description of algorithms at a high level is introduced. The model is basically a language to express parallel computations which can be efficiently implemented by a VLSI circuit. This language is used to describe area-time efficient algorithms for a few well known graph problems. The exact complexity of these algorithms and their relevance to recent work on the inherent limitations of VLSI computations are also presented.

ReportDOI
31 Mar 1981
TL;DR: The design decisions relating to process communication primitives are discussed, and AMPL is compared to several other languages for parallel processing.
Abstract: : AMPL is an experimental high-level language for expressing parallel algorithms which involve many interdependent and cooperating tasks. AMPL is a strongly-typed language in which all inter-process communication takes place via message passing. The language has been implemented on the Cm multiprocessor, and a number of programs have been written to perforrm numeric and symbolic computation. In this eport, the design decisions relating to process communication primitives are discussed, and AMPL is compared to several other languages for parallel processing. The implementation of message passing, process creation, and parallel garbage collection are described. Measurements of several AMPL programs are used to study the effects of language design decisions upon program performance and algorithm design.

Proceedings ArticleDOI
05 Aug 1981
TL;DR: Parallel execution of algebraic computation is discussed in the first half of this paper as discussed by the authors, and it is argued that, although a high efficiency is obtained by parallel execution of divide-and-conquer algorithms, the ratio of the throughput to the number of processors is still small.
Abstract: Parallel execution of algebraic computation is discussed in the first half of this paper. It is argued that, although a high efficiency is obtained by parallel execution of divide-and-conquer algorithms, the ratio of the throughput to the number of processors is still small. Parallel processing will be most successful for the modular algorithms and many algorithms in linear algebra. In the second half of this paper, parallel algorithms for symbolic determinants and linear equations are proposed. The algorithms manifest a very high efficiency in a simple parallel processing scheme. These algorithms are well usable in also the serial processing scheme.


01 Jan 1981
TL;DR: This thesis derives both upper and lower bounds for parallel algorithms, using a very general model of parallel computation, and presents a large collection of basic algorithms for both the ultracomputer and the paracomputers, which are extremely effective parallel computer systems.
Abstract: With the advent of VLSI, new opportunities in computer architecture are emerging. Parallel processors composed of many thousands of PEs will soon be practical. In this thesis, we derive both upper and lower bounds for parallel algorithms. Our analyses emphasize two specific models of parallel computation--the "ultracomputer" and the "paracomputer"--but the general ideas and many of the results are much more widely applicable. We present general lower bounds for solving a wide class of problems on direct connection machines, and a sharper lower bound for effecting permutations. This latter bound shows that the permutation problem is not completely parallelizable on any direct connection machine that is not "almost" completely connected. In addition, using a very general model of parallel computation, we study the worst case time complexity of searching in parallel. We then present a large collection of basic algorithms for both the ultracomputer and the paracomputer. Since the performances of many of these algorithms achieve the lower bounds mentioned above, both models are extremely effective parallel computer systems. Finally, a systematic method for generalizing any "dependent-size" algorithm to an "independent-size" one is given.

Journal ArticleDOI
TL;DR: A simple approach based on Shoenberg's theorem to test whether a set of border points of a simply 4-connected digital picture is convex, which can be used for decomposing two-dimensional objects into convex sets and for filling concavities.

Journal ArticleDOI
01 Dec 1981-Calcolo
TL;DR: Some 0 (logn) parallel algorithms for invertingn×n tridiagonal and pentadiagonal matrices and the determinant ofr-band Hessenberg matrices are illustrated.
Abstract: We illustrate some 0 (logn) parallel algorithms for invertingn×n tridiagonal and pentadiagonal matrices. Also, an 0(logn) parallel algorithm is proposed to computer th order linear recurrences and the determinant ofr-band Hessenberg matrices.

Book ChapterDOI
10 Jun 1981
TL;DR: New parallel algorithms for solving a band system of linear equations with bandwidth 2m+1 and for matrix inversion of such matrix are proposed and a computational complexity of these algorithms is the same as of a band triangular system solver.
Abstract: In this paper new parallel algorithms for solving a band system of linear equations with bandwidth 2m+1 and for matrix inversion of such matrix are proposed. The algorithms are based on the simultaneous computation of m band triangular systems differing from each other only at the right-hand side. Thus, a computational complexity of our algorithm for the band system is the same as of a band triangular system solver. A difference is only at the number of processors used. The application of the algorithm for solving the inversion is advantageous if this computation is a part of the solving of system and it is necessary to know only selected rows or columns of the matrix inverse.

Journal ArticleDOI
TL;DR: A parallel algorithm for the solution of linear systems and determinant evaluation suitable for use on the proposed parallel computers of the future is presented.



01 Jan 1981
TL;DR: The use of N microprocessors in the SIMD mode of parallel processing to do classifications almost N times faster than a single microprocessor is discussed in this paper, where examples of contextual classifiers are given, uniprocessor algorithms for performing contextual classifications are presented, and their computational complexity is analyzed.
Abstract: The use of N microprocessors in the SIMD mode of parallel processing to do classifications almost N times faster than a single microprocessor is discussed. Examples of contextual classifiers are given, uniprocessor algorithms for performing contextual classifications are presented, and their computational complexity is analyzed. The SIMD mode of parallel processing is defined and PASM is overviewed. The presented uniprocessor algorithms are used as a basis for developing parallel algorithms for performing computationally intensive contextual classifications.

Journal ArticleDOI
TL;DR: The language concentrates author's experience in designing and programming various image processing algorithms and may also serve as a readable formalism for parallel algorithm description.


Book ChapterDOI
10 Jun 1981
TL;DR: The degree of parallelism in the identification procedures which yield univariable or multivariable response functions is studied and the design of parallel processor structures capable of performing parallel algorithms in real-time is presented.
Abstract: We examine three correlation time-of-flight (CTOF) techniques that are most promising for neutron, molecular and ion beam spectroscopy. The techniques use pseudorandom binary sequences as input modulations. In particular we study the degree of parallelism in the identification procedures which yield univariable or multivariable response functions. The time-complexity of the algorithms involved is discussed as a function of the number of parallel processors available. The indirect character of CTOF techniques and the experiment control and assessment make real-time evaluation essential. Therefore the design of parallel processor structures capable of performing parallel algorithms in real-time is presented for the three CTOF methods.

Book ChapterDOI
10 Jun 1981
TL;DR: A theoretical group interpretation of the well-known fast Fourier transform is made, which shows that this approach can be applied to a wider class of transformations.
Abstract: The subject of the paper is discrete systems (DS) represented by a set of elements and a set of operations. In particular these operations are permutations of DS elements. The DS is represented by subsystems with sufficient small number of elements and operations of the same kind. The subsystems are distributed at hierarchical levels. The subsystems of each level can execute their operations in parallel. On the basis of this approach algorithms are constructed for parallel realisation of the permutations in DS using a generating set of permutations. A theoretical group interpretation of the well-known fast Fourier transform is made, which shows that this approach can be applied to a wider class of transformations.