scispace - formally typeset
Search or ask a question

Showing papers on "Parallel algorithm published in 1983"


Journal ArticleDOI
TL;DR: It is pointed out that analyses of parallelism in computational problems have practical implications even when multi-processor machines are not available, and a unified framework for cases like this is presented.
Abstract: The goal of this paper is to point out that analyses of parallelism m computational problems have practical implications even when mult~processor machines are not available. This is true because, in many cases, a good parallel algorithm for one problem may turn out to be useful for designing an efficsent serial algorithm for another problem A unified framework for cases like this is presented. Particular cases, which axe discussed in this paper, provide motivation for examining parallelism in sorting, selecuon, minimum-spanning-tree, shortest route, max-flow, and matrix multiplication problems, as well as in scheduling and locational problems.

696 citations


Journal ArticleDOI
01 Nov 1983-Nature
TL;DR: The functional abilities and parallel architecture of the human visual system are a rich source of ideas about visual processing and several parallel algorithms have been found that exploit information implicit in an image to compute intrinsic properties of surfaces, such as surface orientation, reflectance and depth.
Abstract: The functional abilities and parallel architecture of the human visual system are a rich source of ideas about visual processing. Any visual task that we can perform quickly and effortlessly is likely to have a computational solution using a parallel algorithm. Recently, several such parallel algorithms have been found that exploit information implicit in an image to compute intrinsic properties of surfaces, such as surface orientation, reflectance and depth. These algorithms require a computational architecture that has similarities to that of visual cortex in primates.

346 citations


Journal ArticleDOI
TL;DR: In this article, the authors presented and analyzed algorithms for parallel processing of relational database operations in a general multiprocessor framework and introduced an analysis methodology which incorporates I/O, CPU, and message costs.
Abstract: This paper presents and analyzes algorithms for parallel processing of relational database operations in a general multiprocessor framework. To analyze alternative algorithms, we introduce an analysis methodology which incorporates I/O, CPU, and message costs and which can be adjusted to fit different multiprocessor architectures. Algorithms are presented and analyzed for sorting, projection, and join operations. While some of these algorithms have been presented and analyzed previously, we have generalized each in order to handle the case where the number of pages is significantly larger than the number of processors. In addition, we present and analyze algorithms for the parallel execution of update and aggregate operations.

227 citations


Journal ArticleDOI
TL;DR: A highly concurrent Toeplitz system solver, featuring maximum parallelism and localized communication, and a pipelined processor architecture is proposed which uses only localized interconnections and yet retains themaximum parallelism attainable.
Abstract: The design of VLSI parallel processors requires a fundamental understanding of the parallel computing algorithm and an appreciation of the implementational constraint on communications. Based on such consideration, this paper develops a highly concurrent Toeplitz system solver, featuring maximum parallelism and localized communication. More precisely, a highly parallel algorithm is proposed which achieves O(N) computing time with a linear array of O(N) processors. This compares very favorably to the O(N \log_{2} N) computing time attainable with the traditional Levinson algorithm implemented in parallel. Furthermore, to comply with the communication constraint, a pipelined processor architecture is proposed which uses only localized interconnections and yet retains the maximum parallelism attainable.

161 citations


Journal ArticleDOI
TL;DR: The OTC and OTN can be looked upon as general purpose parallel processors since a number of other problems such as sorting and DFT can be solved on them with an area * time2 performance matching that of other networks.
Abstract: In this paper we describe two interconnection networks for parallel processing, namely the orthogonal trees network and the orthogonal tree cycles (OTN and OTC). Both networks are suitable for VISI implementation and have been analyzed using Thompson's model of VLSI. While the OTN and OTC have time performances similar to fast networks such as the perfect shuffle network (PSN), the cube comnected cycles (CCC), etc., they have substantially better area * time2 performances for a number of matrix and graph problems. For instance, the connected components and a minimal spanning tree of an undirected N-vertex graph can be found in 0(log4 N) time on the OTC with an area * time2 performance of 0(N2 log8 N) and 0(N2 log9 N) respectively. This is asymptoticaly much better than the performances of the CCC, PSN and Mesh. The OTC and OTN can be looked upon as general purpose parallel processors since a number of other problems such as sorting and DFT can be solved on them with an area * time2 performance matching that of other networks. Finally, programming the OTN and OTC is simple and they are also amenable to pipelining a series of problems.

116 citations


Journal ArticleDOI
TL;DR: A number of parallel algorithms for thinning elongated shapes are contrasted and compared on a Clip 4 parallel processor and new algorithms are proposed which produce more satisfactory results, but are more expensive in terms of speed and space requirements.

95 citations


Proceedings ArticleDOI
01 Jul 1983
TL;DR: A novel parallel anti-aliasing algorithm is presented in which subpixel coverage by edges is approximated using a look-up table, which is fast and accurate, it is attractive even in a serial environment, and it avoids several artifacts that commonly occur in animated sequences.
Abstract: Popular approaches to speeding up scan conversion often employ parallel processing. Recently, several special-purpose parallel architectures have been suggested. We propose an alternative to these systems: the general-purpose ultracomputer, a parallel processor with many autonomous processing elements and a shared memory. The “serial semantics/parallel execution” feature of this architecture is exploited in the formulation of a scan conversion algorithm. Hidden surfaces are removed using a single scanline, z-buffer algorithm. Since exact anti-aliasing is inherently slow, a novel parallel anti-aliasing algorithm is presented in which subpixel coverage by edges is approximated using a look-up table. The ultimate intensity of a pixel is the weighted sum of the intensity contribution of the closest edge, that of the “losing” edges, and that of the background. The algorithm is fast and accurate, it is attractive even in a serial environment, and it avoids several artifacts that commonly occur in animated sequences.

90 citations


Journal ArticleDOI
TL;DR: A realistic model for divide-and-conquer based algorithms is postulated; the efficiency of some algorithms is analyzed, taking into account all relevant parameters of the model (time, data movement and number of processors.)
Abstract: The well known divide-and-conquer paradigm has proved to be useful for deriving efficient algorithms for many problems. Several researchers have pointed out its usefulness for parallel processing; however, the problem of analyzing such parallel algorithms in a realistic setting has been largely overlooked. In this paper a realistic model for divide-and-conquer based algorithms is postulated; the efficiency of some algorithms is then analyzed, taking into account all relevant parameters of the model (time, data movement and number of processors.)

88 citations


Journal ArticleDOI
TL;DR: The success in using binary trees for parallel computations, indicates that the binary tree is an important and useful design tool for parallel algorithms.
Abstract: This paper examines the use of binary trees in the design of efficient parallel algorithms. Using binary trees, we develop efficient algorithms for several scheduling problems. The shared memory model for parallel computation is used. Our success in using binary trees for parallel computations, indicates that the binary tree is an important and useful design tool for parallel algorithms.

80 citations


Journal ArticleDOI
TL;DR: Parallel algorithms are given for scheduling problems such as scheduling to minimizing the number of tardy jobs, job sequencing with deadlines, scheduling to minimize earliness and tardiness penalties, channel assignment, and minimizing the mean finish time.
Abstract: Parallel algorithms are given for scheduling problems such as scheduling to minimize the number of tardy jobs, job sequencing with deadlines, scheduling to minimize earliness and tardiness penalties, channel assignment, and minimizing the mean finish time. The shared memory model of parallel computers is used to obtain fast algorithms.

66 citations


01 Jan 1983
TL;DR: Algorithms for assembling in parallel the sparse system of linear equations that result from finite difference or finite element discretizations of elliptic partial differential equations, such as those that arise in structural engineering, are developed.
Abstract: Large sparse linear systems of equations require hours to solve on conventional mainframe computers: however, with the advent of parallel architectures such as vector computers or arrays of microprocessors, these problems may be solved in less time. In addition, with hardware becoming cheaper, parallel algorithms for solving problems on these architectures may prove to be cost effective. In this thesis, we develop algorithms for assembling in parallel the sparse system of linear equations that result from finite difference or finite element discretizations of elliptic partial differential equations, such as those that arise in structural engineering. Parallel linear stationary iterative algorithms and parallel preconditioned conjugate gradient algorithms are developed for solving these systems. In addition, a model for comparing parallel algorithms on array architectures is developed and results of this model for our algorithms are given.

Book ChapterDOI
21 Aug 1983
TL;DR: The c lasses LOGCFL c o n s i s t s of all s e t s log s p a c e r e d u c i b l e to t h e c lass CFL of c y n t e x t f r ee l a n g u a g e s.
Abstract: The c lass LOGCFL c o n s i s t s of all s e t s log s p a c e r e d u c i b l e to t h e c lass CFL of c o n t e x t f r ee l a n g u a g e s . (He re A is log s p a c e r e d u c i b l e to B iff t h e r e is s o m e log s p a c e c o m p u t a b l e f u n c t i o n f s u c h t h a t fo r all x , x e A iff f ( z ) e B ) . Sudbor ough [Su] c h a r a c t e r i z e d LOGCFL as t h o s e s e t s a c c e p t e d by a n o n d e t e r m i n i s t i c a u x i l i a r y p u s h d o w n m a c h i n e in log s p a c e and p o l y n o m i a l t ime . F r o m this , i t follows t h a t NL C LOGCFL. Ruzzo [Ru2] f u r t h e r c h a r a c t e r i z e d LGGCFL as t h o s e s e t s a c c e p t e d by an ATM in log s p a c e a n d p o l y n o m i a l t r e e size, and p r o v e d LOGCFL C N C 2. Bes ides c o n t e x t f r e e l a n g u a g e s and m e m b e r s of NL, t h e c l a s s LOGCFL cont a i n s t h e m o n o t o n e p l a n a r c i r c u i t va lue p r o b l e m [DC], b o u n d e d v a l e n c e subt r e e i s o m o r p h i s m [Ru3], a n d bas i c d y n a m i c p r o g r a m m i n g p r o b l e m s [Gol] . The l a t t e r a r e m o r e n a t u r a l l y e x p r e s s e d as f u n c t i o n s , and so i t s e e m s t h a t t h e n a t u r a l c l ass to c o n s i d e r is CFL" ( the c l o s u r e of CFL u n d e r -<). P r o p o s i t i o n 5. LOGCFL C CFL*. Proof . An i n s p e c t i o n of S u d b o r o u g h ' s p roof [Su] t h a t e v e r y s e t a c c e p t e d by a n o n d e t e r m i n i s t i c a u x i l i a r y p u s h d o w n m a c h i n e in log s p a c e and p o l y n o m i a l t i m e is l o g s p a c e r e d u c i b l e to CFL shows t h a t t h e r e d u c t i o n is v ia an NC 1 c o m p u t a b l e f u n c t i o n . H e n c e LOGCFL = NC1CFL, and t h e p r o p o s i t i o n follows. The above i n c l u s i o n is p rope r , n o t only b e c a u s e CFL" c o n t a i n s f u n c t i o n s o t h e r t h a n 0 1 f u n c t i o n s , b u t b e c a u s e a p p a r e n t l y LOGCFL is n o t c losed u n d e r c o m p l e m e n t a t i o n . For e x a m p l e , t he c o m p l e m e n t of t he g r a p h a c c e s s i b i l i t y p r o b l e m does no t a p p e a r to be in LGGCFL.

Proceedings Article
08 Aug 1983
TL;DR: Four enhancements to the alpha- beta algorithm--iterative deepening, aspiration search, memory tables and principal variation search--are compared separately and in various combinations to determine the most effective alpha-beta implementation.
Abstract: Most of the data on the relative efficiency of different implementations of the alpha-beta algorithm is neither readily available nor in a form suitable for easy comparisons. In the present study four enhancements to the alpha-beta algorithm--iterative deepening, aspiration search, memory tables and principal variation search--are compared separately and in various combinations to determine the most effective alpha-beta implementation. The rationale for this work is to ensure that new parallel algorithms incorporate the best sequential techniques. Rather than relying on simulation or searches of specially constructed trees, a simple chess program was used to provide a uniform basis for comparisons.

Journal ArticleDOI
TL;DR: Efficient parallel algorithms to obtain the postfix and tree forms of an infix arithmetic expression are developed using the shared memory model of parallel computing.
Abstract: : Efficient parallel algorithms to obtain the postfix and tree forms of an infix arithmetic expression are developed. The shared memory model of parallel computing is used. (Author)

Journal ArticleDOI
TL;DR: An algorithm for the parallel solution of large sparse sets of linear equations, given their factor matrices, is developed aimed at efficient practical implementation on a processor of the multiple instruction multiple data stream (MIMD) type.
Abstract: An algorithm for the parallel solution of large sparse sets of linear equations, given their factor matrices, is developed. It is aimed at efficient practical implementation on a processor of the multiple instruction multiple data stream (MIMD) type. The software required to implement the algorithm is described. In addition, the amount of memory necessary for data retention during execution is considered and related to that which is required on single processor systems. Hardware developed for the implementation of the algorithm is described. Bus contention for the system is outlined and shown to be insignificant. Possible bus contention problems for systems differing in the number of processors and speed of processing elements are also considered. A simulator modeling the execution of the algorithm on large systems has been implemented. The performance of the algorithm, in terms of execution speed enhancement relative to the theoretical maximum, is shown to be good.

Journal ArticleDOI
TL;DR: A lower bound for schedule length is established for dense GE DAG's and it is proved that the proposed algorithm produces schedules which achieve these bounds.
Abstract: A parallel algorithm for Gaussian elimination (GE) is described, which solves a linear system of size n using m ≤ n parallel processors and a shared random access memory. Converting the serial GE algorithm to parallel form involves scheduling its computation DAG (directed acyclic graph) on m processors. A lower bound for schedule length is established for dense GE DAG's and it is proved that the proposed algorithm produces schedules which achieve these bounds. Finally, both the construction and execution of the schedule are incorporated into a single concurrent program which is shown to run in optimal time.

Journal ArticleDOI
TL;DR: The problems of measuring the performance of a highly parallel multiple processor system, such as the 4096 element ICL Distributed Array Processor are presented in relation to the conventional methods used for serial processors in order to provide a framework for the discussion.
Abstract: The problems of measuring the performance of a highly parallel multiple processor system, such as the 4096 element ICL Distributed Array Processor are presented in relation to the conventional methods used for serial processors; this is preceded by a brief description of the DAP hardware in order to. provide a framework for the discussion, together with some of the resulting implications for algorithm design. The importance of choosing algorithms for parallel computation in such a way as to make the best use of the parallelism of the hardware for the problem to be solved is discussed, and examples are given of parallel and hybrid algorithms—in the latter a mixture of serial and parallel techniques are used. A method of comparison of performance at the problem solving level is presented, which is illustrated by results obtained by DAP users studying problems which arise in a wide range of application areas.

Journal ArticleDOI
Hockney1
TL;DR: A two-parameter description of any computer is given that characterizes the performance of serial, pipelined, and array-like architectures and a family of FACR direct methods for solving Poisson's equation is optimized on the basis of this characterization.
Abstract: A two-parameter description of any computer is given that characterizes the performance of serial, pipelined, and array-like architectures. The first parameter (r∞) is the traditional maximum performance in megaflops, and the new second parameter (n½) measures the apparent parallelism of the computer. For computers with a single instruction stream (unicomputers), the relative performance of two algorithms on the same computer depends only on n½ and the average vector length of the algorithm. The performance of a family of FACR direct methods for solving Poisson's equation is optimized on the basis of this characterization.

Journal ArticleDOI
TL;DR: A new topological sorting algorithm is formulated using the parallel computation approach and a synchronization of all processors is proposed to avoid contention for logical resources.
Abstract: A new topological sorting algorithm is formulated using the parallel computation approach. The time complexity of this algorithm is of the order of the longest distance between a source node and a sink node in an acyclic digraph representing the partial orderings between elements. An implementation of this algorithm with an SIMD machine is discussed. To avoid contention for logical resources, a synchronization of all processors is proposed and its performance is also discussed.

01 Jan 1983
TL;DR: Parallel techniques are shown to eliminate some types of overhead associated with serial processing, offer the possibility of improved algorithm capability and accuracy, and decrease execution time.
Abstract: Contour extraction is used as an image processing scenario to explore the advantages of parallelism and the architectural requirements for a parallel computer system, such as PASM. Parallel forms of edge-guided thresholding and contour tracing algorithms are developed and analyzed to highlight important aspects of the scenario. Edge-guided thresholding uses adaptive thresholding to allow contour extraction where gray level variations would not allow global thresholding to be effective. Parallel techniques are shown to eliminate some types of overhead associated with serial processing, offer the possibility of improved algorithm capability and accuracy, and decrease execution time. The implications that the parallel scenario has for machine architecture are considered. Various desirable system attributes are established. 30 references.

Journal ArticleDOI
TL;DR: This paper attempts to provide minimization algorithms which are adapted to execution on parallel computers, by discussing in detail their mathematical behavior, when the cooperating processes are either synchronous or asynchronous.
Abstract: This paper attempts to provide minimization algorithms which are adapted to execution on parallel computers. For this purpose, three well-known nongradient methods are examined. From these, three parallel iterative procedures are derived, by discussing in detail their mathematical behavior, when the cooperating processes are either synchronous or asynchronous.

Proceedings ArticleDOI
07 Nov 1983
TL;DR: New paradigms for the construction of efficient parallel graph algorithms, called filtration and funnelled pipelining, are introduced and illustrated with VLSI circuits for computing connected components, minimum spanning forests, and biconnected components.
Abstract: We introduce new paradigms for the construction of efficient parallel graph algorithms. These paradigms, called filtration and funnelled pipelining, are illustrated with VLSI circuits for computing connected components, minimum spanning forests, and biconnected components. These circuits use realistic I/O schedules and require time and area of O(n1+e). Thus they are essentially optimal. Filtration is a technique used to rapidly discard irrelevant input data. This greatly reduces storage, time, and communications costs in a wide variety of problems. A funnelled pipeline is obtained by building a series of increasingly thorough filter stages. Transition times along such a pipeline of filters form an exponentially increasing sequence. The increasing amount of time exactly balances the increasing degree of filtration. This balance makes possible the cascaded filtration critical to the minimum spanning forest and the biconnected components algorithms.

Journal ArticleDOI
TL;DR: This work establishes the algorithmic and architectural footing for the evolution of the design of VLSI array processors and notes that the systolic and wavefront arrays elegantly avoid global interconnection by effectively managing local data movements.

01 Jan 1983
TL;DR: A network model based on the asymptotic properties of closed queueing networks representing the effects of the network topology, node and communication link speeds, and the internode communication patterns is developed and is shown to be sufficient to make acceptable scheduling decisions.
Abstract: A new paradigm for parallel computation based on large numbers of interconnected microcomputer nodes has recently emerged. Each network node, fabricated as one or two VLSI chips would contain a processor with some local memory, a communication controller that routes messages without delaying the processor, and a few connections to other network nodes. The cooperating tasks of a parallel algorithm would execute asynchronously on different nodes and communicate via message passing. This approach to parallel processing poses several new and interesting problems in network performance evaluation, distributed task scheduling, and parallel algorithm design. The absence of shared memory makes the evaluation and design of an interconnection network capable of efficiently supporting internode communication patterns crucial. A network model based on the asymptotic properties of closed queueing networks representing the effects of the network topology, node and communication link speeds, and the internode communication patterns is developed. With this model, it is possible to compare the performance of different network topologies processing the same workload, determine the range of network sizes over which a given topology can meet specified performance requirements, and calculate the size of computation quanta below which communication delays negate possible gains due to increased parallelism. Because of communication delays, no node can possess an exact description of the entire network state; all scheduling decisions must be made using incomplete and possibly inaccurate status information. The efficacy of distributed scheduling heuristics as a function of network topology, status information accuracy, and the amount of computation represented by each task are examined. Knowledge of a small area surrounding each node is shown to be sufficient to make acceptable scheduling decisions. Finally, the importance of partial differential equations as models of many phenomena has motivated the search for solution algorithms suited to multimicrocomputer networks. This work has sought to relax the severe synchronization constraints imposed by most algorithms and to determine an appropriate number of discretization points to place in each network node. This has led to a class of solution algorithms spanning the spectrum from completely sequential and synchronous to completely parallel and fully asynchronous.

Journal ArticleDOI
TL;DR: The convergence of parallel synchronous iterative procedures corresponding to linearly independent direction methods and to mutually conjugate direction methods is discussed and convergence with finite termination on quadratic objective functions and convergence on sufficiently smooth nonquadratic Objective functions is proved.
Abstract: This paper analyzes the mathematical behavior of nongradient parallel minimization algorithms. The convergence of parallel synchronous iterative procedures corresponding to linearly independent direction methods and to mutually conjugate direction methods is discussed. For the latter, convergence with finite termination on quadratic objective functions and convergence on sufficiently smooth nonquadratic objective functions is proved.

Dissertation
01 Jan 1983
TL;DR: The work presented in this thesis is mainly involved in the design and analysis of asynchronous parallel algorithms that can be run on MIMD type parallel computers, in particular the NEPTUNE system at Loughborough University.
Abstract: The work presented in this thesis is mainly involved in the design and analysis of asynchronous parallel algorithms that can be run on MIMD type parallel computers, in particular the NEPTUNE system at Loughborough University. Initially, different types of existing parallel computers including the Data-Flow computers and VLSI technology are described from both the hardware and implementation points of view. Basic ideas of programming such computers are also outlined. Also, the main characteristics of the NEPTUNE MIMD-system are presented together with the principles of synchronisation, the resource demands and the overhead costs of the parallel control structures. Such information is measured frequently in the performance analysis of the algorithms presented in this thesis in order to exploit the potentiality of the NEPTUNE system and parallel computers in general. The Speed-up and Efficiency factors are calculated and the optimum number of processors and processes is suggested in most of the algorithms presented...


01 Jun 1983
TL;DR: In this paper, a hierarchical parallel algorithm for efficient feature matching has also been developed for applications of motion, stereo, and image registration, which integrates aspects of both parallel array processing and associative memories for real-time implementation of motion algorithms.
Abstract: : The major focus of the DARPA funded research program revolves around issues of dynamic image processing. The authors have been examining techniques for recovery of environmental information, such as depth maps of the visible surfaces, from a sequence of images produced by a sensor in motion. Algorithms that appear robust have been developed for constrained sensor motion such as pure translation, pure rotation, and motion constrained to a plane. Interesting algorithms with promising preliminary experimental results have also been developed for the case of general sensor motion in images where there are several significant depth discontinuities, and for scenes with multiple independently moving objects. A general hierarchical parallel algorithm for efficient feature matching has also been developed for applications of motion, stereo, and image registration. In addition, they have been designing a highly parallel architecture that integrates aspects of both parallel array processing and associative memories for real-time implementation of motion algorithms. Finally, there has been a continuation of the VISIONS static image interpretation project, with interesting results in top-down processing of a set of complex outdoor house scenes. (Author)

01 Jan 1983
TL;DR: Some techniques and data structures to exploit parallelism are categorized and designed for the following problems: minimum spanning tree, one-to-all shortest path, maximum flow, and minimum cost flow.
Abstract: Parallel processing is a new area of computer science which has evolved in the recent years. Development of parallel computers has increased interest in the study of parallel algorithms. The importance of parallel algorithms lies in the fact that the widespread use of parallel computers depends on the availability of efficient parallel algorithms. There are a large number of parallel algorithms for numerical problems, but parallel algorithms for nonnumerical problems are rare. In this dissertation some techniques and data structures to exploit parallelism are categorized. Parallel algorithms are designed for the following problems: minimum spanning tree, one-to-all shortest path, maximum flow, and minimum cost flow. Most of the parallel algorithms have been implemented on an actual parallel computer and/or a parallel computer simulator. ('+)This work was partially supported by U.S. Army Research Office grant number DAAG-82-K-0107 and National Science Foundation equipment grant number MCS-8203868.

01 Dec 1983
TL;DR: This methodology provides a unified conceptual framework that clearly displays the key properties of parallel systems and can be expressed both informally, in pictorial fashion, and formally, in the language of precedence relations and compositions of functions.
Abstract: : Several methods for modeling and analysis of parallel algorithms and architecture have been proposed in the recent years These include recursion-type methods, like recursion equations, z-transform descriptions and 'do-loops' in high-level programming languages, and precedence-graph-type methods like data-flow graphs (marked graphs) and related Petri-net derived models This paper presents a new methodology for modeling and analysis of parallel algorithms and architectures This methodology provides a unified conceptual framework that clearly displays the key properties of parallel systems This methodology is largely based upon the theory of directed graphs and can, therefore, be expressed both informally, in pictorial fashion, and formally, in the language of precedence relations and compositions of functions This duality will, hopefully, help to bridge the gap between the two schools of research in this field