scispace - formally typeset
Search or ask a question

Showing papers on "Parallel algorithm published in 1980"


Book ChapterDOI
TL;DR: This chapter deals with the basic issues and techniques in designing parallel algorithms for various architectures and concludes that issues concerning algorithms for synchronous parallel computers are quite different from those for asynchronous parallel computers.
Abstract: Publisher Summary This chapter presents many examples of parallel algorithms and studies them under a uniform framework. The chapter explains a parallel algorithm as a collection of independent task modules that can be executed in parallel and that communicate with each other during the execution of the algorithm. The chapter explains the three important attributes of a parallel algorithm and classifies parallel algorithms in terms of these attributes. Three orthogonal dimensions of the space of parallel algorithms: concurrency control, module granularity, and communication geometry. The classification of parallel algorithms corresponds naturally to that of parallel architectures. Algorithms for synchronous parallel computers are considered, where examples of algorithms using various communication geometries are presented. Algorithms for asynchronous parallel computers are also considered in the chapter. A number of techniques dealing with the difficulties arising from the asynchronous behavior of computation and the examples are mainly drawn from results in concurrent database systems. This chapter deals with the basic issues and techniques in designing parallel algorithms for various architectures. The chapter concludes that issues concerning algorithms for synchronous parallel computers are quite different from those for asynchronous parallel computers.

201 citations


Journal ArticleDOI
TL;DR: An opUmal algorithm to route data in a mesh-connected parallel computer is presented that uses the minimum number of unit distance routing steps for every data permutation that can be specified as above.
Abstract: An opUmal algorithm to route data in a mesh-connected parallel computer is presented This algorithm can be used to perform any data routing that can be specified by the permutation and complementing of the bits in a PE address Matrix transpose, bit reversal, vector reversal, and perfect shuffle are examples of data permutations that can be specified in this way The algorithm presented uses the minimum number of unit distance routing steps for every data permutation that can be specified as above K~EV WORDS ANY PrmASES parallel algorithm, mesh-connected computer, ILLIAC IV, permutation, complexity, data routing CR CATEGORIES 5 25, 5 31, 6 22

142 citations


01 Jan 1980
TL;DR: It is shown that solutions to geometric problems can be organized to reveal a large amount of parallelism, which can be exploited to substantially reduce the computation time.
Abstract: : The existence of parallel computing systems and the important applications of geometric solutions have motivated our study on the design and analysis algorithms for solving geometric problems on two parallel computing systems: the Shared Memory Machine (SMM) and the Cube-Connected-Cycles (CCC). The validity of the first SMM resides in uncovering the inherent data-dependence of the problems, while that of the CCC, which complies with the VLSI technological constraints, is the development of practical parallel algorithms. It is shown that solutions to geometric problems can be organized to reveal a large amount of parallelism, which can be exploited to substantially reduce the computation time.

92 citations


Journal ArticleDOI
TL;DR: A new parallel algorithm is studied that constructs an MST of an N-node graph in time proportional to N lg N, on an N(lg N)-processor computing system.

73 citations




Dissertation
01 Jan 1980

24 citations


Journal ArticleDOI
TL;DR: A parallel nongradient algorithm and a parallel variablemetric algorithm are used to search for the initial costate vector that defines the solution to the optimal control problem and indicate that convergence time is significantly less than that required by highly efficient serial procedures.
Abstract: This paper describes a collection of parallel optimal control algorithms which are suitable for implementation on an advanced computer with the facility for large-scale parallel processing. Specifically, a parallel nongradient algorithm and a parallel variablemetric algorithm are used to search for the initial costate vector that defines the solution to the optimal control problem. To avoid the computational problems sometimes associated with simultaneous forward integration of both the state and costate equations, a parallel shooting procedure based upon partitioning of the integration interval is considered. To further speed computations, parallel integration methods are proposed. Application of this all-parallel procedure to a forced Van der Pol system indicates that convergence time is significantly less than that required by highly efficient serial procedures.

23 citations



Proceedings ArticleDOI
L. Siegel1
01 Apr 1980
TL;DR: The use of the SIMD (single instruction stream-multiple data stream) mode of parallelism to perform linear predictive coding analysis is explored and parallel algorithms for the autocorrelation formulation of linear prediction are presented and analyzed.
Abstract: The use of the SIMD (single instruction stream-multiple data stream) mode of parallelism to perform linear predictive coding analysis is explored. Parallel algorithms for the autocorrelation formulation of linear prediction are presented and analyzed. The algorithms are evaluated in terms of the number of arithmetic operations and interprocessor data transfers required.

8 citations



01 Jan 1980
TL;DR: A parallel program requiring no critical section is given to implement the algorithm and its correctness is proved and a spacewise more efficient implementation is also given but requires the use of critical sections.
Abstract: : Given a sequence of tasks to be performed serially, a parallel algorithm is proposed to accelerate the execution of the tasks on an asynchronous multiprocessor by taking advantage of fluctuations in the execution times. A parallel program requiring no critical section is given to implement the algorithm and its correctness is proved. A spacewise more efficient implementation is also given but requires the use of critical sections. An analysis is presented for both implementations to estimate the speed-up achievable with the parallel algorithm. When the execution times are exponentially distributed, and no critical section is used, the algorithm with k processes yields a speed-up of order sq rt of k. (Author)


Proceedings ArticleDOI
01 Jan 1980
TL;DR: Experimental results demonstrate that even simple adaptation models can substantially improve algorithm convergence accuracy, and a generalization of the algorithm wherein a hierarchical variable resolution search is employed to gain major improvements in algorithm convergence speed and robustness.
Abstract: We report on the development of a new class of parallel computation algorithm for low-level scene analysis. The algorithm is a high resolution, high speed estimator for boundary extraction of simple objects imaged under noisy conditions. We explain the algorithm structure and underlying physical models; we then present demonstrative pictorial examples of application to synthetic test imagery. We next introduce a generalization of the algorithm wherein a hierarchical variable resolution search is employed to gain major improvements in algorithm convergence speed and robustness. We discuss the importance of making the algorithm adaptive to local image statistics and show that the algorithm parallel-window topology is consonant with this goal. We present further experimental results that depict the generalized algorithm applied to real data bases; these results demonstrate that even simple adaptation models can substantially improve algorithm convergence accuracy.

Proceedings ArticleDOI
01 Apr 1980
TL;DR: A transformation system is presented, which uses computation graphs as a representation of both the algorithmic structure and the processor configuration, and is able to rewrite the computation graph automatically, dependent on the available hardware resources.
Abstract: To reduce computation time in a multiprocessor environment the efficient configuration and utilization of hardware components is necessary. It requires both a restructuring of the considered algorithms and a reconfiguration of the corresponding machine architectures. A transformation system is presented, which uses computation graphs as a representation of both the algorithmic structure and the processor configuration. The system is able to rewrite the computation graph automatically, dependent on the available hardware resources. In this paper the design strategy for algorithms and machine models is illustrated by the DFT. Several models for the algorithm are discussed. Finally the results of time and hardware complexity with regard to the different graph structures and machine architectures are presented.