Showing papers on "Parallel algorithm published in 1990"

PDF

Open Access

Journal Article•DOI•

Some efficient heuristic methods for the flow shop sequencing problem

[...]

École Polytechnique Fédérale de Lausanne¹

05 Jul 1990-European Journal of Operational Research

TL;DR: In this paper, the best heuristic methods known up to now are compared to solve the flow shop sequencing problem and they improve the complexity of the best one, and a parallel taboo search algorithm is presented and experimental results show that this heuristic allows very good speed-up.

...read moreread less

811 citations

Journal Article•DOI•

The Gamma database machine project

[...]

David J. DeWitt¹, Shahram Ghandeharizadeh¹, Donovan A. Schneider¹, A. Bricker¹, H.-I. Hsiao¹, R. Rasmussen¹ - Show less +2 more•Institutions (1)

University of Wisconsin-Madison¹

01 Mar 1990-IEEE Transactions on Knowledge and Data Engineering

TL;DR: Gamma as mentioned in this paper is a relational database machine running on an Intel iPSC/2 hypercube with 32 processors and 32 disk drives, where all relations are horizontally partitioned across multiple disk drives enabling relations to be scanned in parallel.

...read moreread less

Abstract: The design of the Gamma database machine and the techniques employed in its implementation are described. Gamma is a relational database machine currently operating on an Intel iPSC/2 hypercube with 32 processors and 32 disk drives. Gamma employs three key technical ideas which enable the architecture to be scaled to hundreds of processors. First, all relations are horizontally partitioned across multiple disk drives, enabling relations to be scanned in parallel. Second, parallel algorithms based on hashing are used to implement the complex relational operators, such as join and aggregate functions. Third, dataflow scheduling techniques are used to coordinate multioperator queries. By using these techniques, it is possible to control the execution of very complex queries with minimal coordination. The design of the Gamma software is described and a thorough performance evaluation of the iPSC/s hypercube version of Gamma is presented. >

...read moreread less

662 citations

Journal Article•DOI•

Recursive reconstruction of high resolution image from noisy undersampled multiframes

[...]

S. P. Kim¹, Nirmal K. Bose¹, H.M. Valenzuela¹•Institutions (1)

Pennsylvania State University¹

01 Jun 1990-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: An algorithm based on weighted recursive least-squares theory is developed in the wavenumber domain, which is efficient because interpolation and noise removal are performed recursively, and is highly suitable for implementation via the massively parallel computational architectures currently available.

...read moreread less

Abstract: In several applications it is required to reconstruct a high-resolution noise-free image from multipath frames of undersampled low-resolution noisy images. Using the aliasing relationship between the undersamples frames and the reference image, an algorithm based on weighted recursive least-squares theory is developed in the wavenumber domain. This algorithm is efficient because interpolation and noise removal are performed recursively, and is highly suitable for implementation via the massively parallel computational architectures currently available. Success in the use of the algorithm is demonstrated through various simulated examples. >

...read moreread less

567 citations

Journal Article•DOI•

Towards an architecture-independent analysis of parallel algorithms

[...]

Christos H. Papadimitriou¹, Mihalis Yannakakis²•Institutions (2)

University of California, San Diego¹, Bell Labs²

01 Apr 1990-SIAM Journal on Computing

TL;DR: A simple and efficient method for evaluating the performance of an algorithm, rendered as a directed acyclic graph, on any parallel computer is presented and its application to several common algorithms shows that it is surprisingly accurate.

...read moreread less

Abstract: A simple and efficient method for evaluating the performance of an algorithm, rendered as a directed acyclic graph, on any parallel computer is presented. The crucial ingredient is an efficient approximation algorithm for a particular scheduling problem. The only parameter of the parallel computer needed by our method is the message-to-instruction ratio $\tau$. Although the method used in this paper does not take into account the number of processors available, its application to several common algorithms shows that it is surprisingly accurate.

...read moreread less

422 citations

Proceedings Article•DOI•

Lazy task creation: a technique for increasing the granularity of parallel programs

[...]

E. Mohr¹, David A. Kranz², Robert H. Halstead•Institutions (2)

Yale University¹, Massachusetts Institute of Technology²

01 May 1990

TL;DR: This paper rejects the simpler load-based inlining method, where tasks are combined based on dynamic load level, in favor of the safer and more robust lazy task creation method, which allows efficient execution of naturally expressed algorithms of a substantially finer grain than possible with previous parallel Lisp systems.

...read moreread less

Abstract: Many parallel algorithms are naturally expressed at a fine level of granularity, often finer than a MIMD parallel system can exploit efficiently. Most builders of parallel systems have looked to either the programmer or a parallelizing compiler to increase the granularity of such algorithms. In this paper we explore a third approach to the granularity problem by analyzing two strategies for combining parallel tasks dynamically at run-time. We reject the simpler load-based inlining method, where tasks are combined based on dynamic load level, in favor of the safer and more robust lazy task creation method, where tasks are created only retroactively as processing resources become available.These strategies grew out of work on Mul-T [14], an efficient parallel implementation of Scheme, but could be used with other applicative languages as well. We describe our Mul-T implementations of lazy task creation for two contrasting machines, and present performance statistics which show the method's effectiveness. Lazy task creation allows efficient execution of naturally expressed algorithms of a substantially finer grain than possible with previous parallel Lisp systems.

...read moreread less

344 citations

Book•

How to write parallel programs: a first course

[...]

Nicholas Carriero¹, David Gelernter¹•Institutions (1)

Yale University¹

01 Dec 1990

TL;DR: This straightforward tutorial explains why parallelism is a powerful and proven way to run programs fast and provides the instruction that will transform ordinary programmers into parallel programmers.

...read moreread less

Abstract: In the not-too-distant future every programmer, software engineer, and computer scientist will need to understand parallelism, a powerful and proven way to run programs fast. The authors of this straightforward tutorial explain why this is so and provide the instruction that will transform ordinary programmers into parallel programmers."How to Write Parallel Programs" focuses on programming techniques for the largest class of parallel machines - general purpose asynchronous or MIMD machines. It outlines the basic parallel algorithm classes and the three basic programming paradigms, takes up the implementation techniques for these paradigms, and presents a series of case studies explaining code and discussing its measured performance. Because parallel programming requires both a computing language and a coordination language, the authors use C and Linda (a language they developed) as a combination that can be simply and efficiently implemented on a wide range of machines. The techniques discussed, however, can be applied in any comparable language environment.Contents: Introduction. The Three Basic Models of Parallelism. Programming Techniques for the Three Basic Models. A Simple Problem, in Detail. Case Studies. From Parallelism to Coordination. Conclusions. Appendix: Linda User's Manual.

...read moreread less

324 citations

Parallel Algorithms for Shared-Memory Machines.

[...]

Richard M. Karp¹, Vijaya Ramachandran²•Institutions (2)

University of California, Berkeley¹, University of Texas at Austin²

01 Jan 1990

TL;DR: This chapter discusses parallel algorithms for shared-memory machines, which focus on the technological limits of today's chips, in which gates and wires are packed into a small number of planar layers.

...read moreread less

Abstract: Publisher Summary This chapter discusses parallel algorithms for shared-memory machines. Parallel computation is rapidly becoming a dominant theme in all areas of computer science and its applications. It is estimated that, within a decade, virtually all developments in computer architecture, systems programming, computer applications and the design of algorithms will be taking place within the context of parallel computation. In preparation for this revolution, theoretical computer scientists have begun to develop a body of theory centered on parallel algorithms and parallel architectures. As there is no consensus yet on the appropriate logical organization of a massively parallel computer, and as the speed of parallel algorithms is constrained as much by limits on interprocessor communication as it is by purely computational issues, it is not surprising that a variety of abstract models of parallel computation have been pursued. Closest to the hardware level are the VLSI models, which focus on the technological limits of today's chips, in which gates and wires are packed into a small number of planar layers.

...read moreread less

284 citations

Journal Article•DOI•

Measuring parallel processor performance

[...]

Alan H. Karp¹, Horace P. Flatt¹•Institutions (1)

IBM¹

01 May 1990-Communications of The ACM

TL;DR: A new metric that has some advantages over the others is introduced that is illustrated with data from the Linpack benchmark report and the winners of the Gordon Bell Award.

...read moreread less

Abstract: Many metrics are used for measuring the performance of a parallel algorithm running on a parallel processor. This article introduces a new metric that has some advantages over the others. Its use is illustrated with data from the Linpack benchmark report and the winners of the Gordon Bell Award.

...read moreread less

247 citations

Journal Article•DOI•

A complexity theory of efficient parallel algorithms

[...]

Clyde P. Kruskal¹, Larry Rudolph², Marc Snir³•Institutions (3)

University of Maryland, College Park¹, Hebrew University of Jerusalem², IBM³

13 Mar 1990

TL;DR: The relationship between various models of parallel computation is investigated, using a newly defined concept of efficient simulation, and it is proved that the class PE is invariant across the shared memory models (PRAM's) and fully connected message passing machines.

...read moreread less

Abstract: Theoretical research on parallel algorithms has focused on NC theory. This motivates the development of parallel algorithms that are extremely fast, but possibly wasteful in their use of processors. Such algorithms seem of limited interest for real applications currently run on parallel computers. This paper explores an alternative approach that emphasizes the efficiency of parallel algorithms. We define a complexity class PE of problems that can be solved by parallel algorithms that are efficient (the speedup is proportional to the number of processors used) and polynomially faster than sequential algorithms. Other complexity classes are also defined, in terms of time and efficiency: A class that has a slightly weaker efficiency requirement than PE, and a class that is a natural generalization of NC. We investigate the relationship between various models of parallel computation, using a newly defined concept of efficient simulation. This includes new models that reflect asynchrony and high communication latency in parallel computers. We prove that the class PE is invariant across the shared memory models (PRAM's) and fully connected message passing machines. These results show that our definitions are robust. Many open problems motivated by our approach are listed.

...read moreread less

244 citations

Journal Article•DOI•

Parallel algorithms for dense linear algebra computations

[...]

Kyle A. Gallivan, Robert J. Plemmons, Ahmed H. Sameh

01 Mar 1990-Siam Review

TL;DR: The purpose is to review the current status and to provide an overall perspective of parallel algorithms for solving dense, banded, or block-structured problems arising in the major areas of direct solution of linear systems, least squares computations, eigenvalue and singular value computation, and rapid elliptic solvers.

...read moreread less

Abstract: Scientific and engineering research is becoming increasingly dependent upon the development and implementation of efficient parallel algorithms on modern high-performance computers. Numerical linear algebra is an indispensable tool in such research and this paper attempts to collect and describe a selection of some of its more important parallel algorithms. The purpose is to review the current status and to provide an overall perspective of parallel algorithms for solving dense, banded, or block-structured problems arising in the major areas of direct solution of linear systems, least squares computations, eigenvalue and singular value computations, and rapid elliptic solvers. A major emphasis is given here to certain computational primitives whose efficient execution on parallel and vector computers is essential in order to obtain high performance algorithms.

...read moreread less

203 citations

Journal Article•DOI•

A Parallel Version of the Fast Multipole Method

[...]

Leslie Greengard¹, William Gropp¹•Institutions (1)

Yale University¹

01 Jan 1990-Computers & Mathematics With Applications

TL;DR: In this paper, a parallel version of the fast multipole method (FMM) is presented for the evaluation of the potential and force fields in systems of particles whose interactions are Coulombic or gravitational in nature.

...read moreread less

Abstract: This paper presents a parallel version of the fast multipole method (FMM). The FMM is a recently developed scheme for the evaluation of the potential and force fields in systems of particles whose interactions are Coulombic or gravitational in nature. The sequential method requires O(N) operations to obtain the fields due to N charges, rather than the O(N2) operations required by the direct calculation. Here, we describe the modifications necessary for implementation of the method on parallel architectures and show that the expected time requirements grow as log N when using N processors. Numerical results are given for a shared memory machine (the Encore Multimax 320).

...read moreread less

Journal Article•DOI•

Parallel recursive prediction error algorithm for training layered neural networks

[...]

Sheng Chen¹, Colin F. N. Cowan¹, Stephen A. Billings², Peter Grant¹•Institutions (2)

University of Edinburgh¹, University of Sheffield²

01 Jan 1990-International Journal of Control

TL;DR: A new recursive prediction error algorithm is derived for the training of feedforward layered neural networks that enables the weights in each neuron of the network to be updated in an efficient parallel manner and has better convergence properties than the classical back propagation algorithm.

...read moreread less

Abstract: A new recursive prediction error algorithm is derived for the training of feedforward layered neural networks. The algorithm enables the weights in each neuron of the network to be updated in an efficient parallel manner and has better convergence properties than the classical back propagation algorithm. The relationship between this new parallel algorithm and other existing learning algorithms is discussed. Examples taken from the fields of communication channel equalization and nonlinear systems modelling are used to demonstrate the superior performance of the new algorithm compared with the back propagation routine.

...read moreread less

Journal Article•DOI•

Algorithm-based fault tolerance on a hypercube multiprocessor

[...]

Prithviraj Banerjee¹, J.T. Rahmeh², Craig B. Stunkel³, V.S.S. Nair¹, Kaushik Roy¹, V. Balasubramanian¹, Jacob A. Abraham² - Show less +3 more•Institutions (3)

University of Illinois at Urbana–Champaign¹, University of Texas at Austin², IBM³

01 Sep 1990-IEEE Transactions on Computers

TL;DR: The authors propose the detection and location of faulty processors concurrently with the actual execution of parallel applications on the hypercube using a novel scheme of algorithm-based error detection, which allows the authors to isolate and replace faulty processors with spare processors.

...read moreread less

Abstract: The design of fault-tolerant hypercube multiprocessor architecture is discussed. The authors propose the detection and location of faulty processors concurrently with the actual execution of parallel applications on the hypercube using a novel scheme of algorithm-based error detection. System-level error detection mechanisms have been implemented for three parallel applications on a 16-processor Intel iPSC hypercube multiprocessor: matrix multiplication, Gaussian elimination, and fast Fourier transform. Schemes for other applications are under development. Extensive studies have been done of error coverage of the system-level error detection schemes in the presence of finite-precision arithmetic, which affects the system-level encodings. Two reconfiguration schemes are proposed that allow the authors to isolate and replace faulty processors with spare processors. >

...read moreread less

Journal Article•DOI•

Stochastic and deterministic networks for texture segmentation

[...]

B.S. Manjunath¹, T. Simchony¹, Rama Chellappa¹•Institutions (1)

University of Southern California¹

01 Jun 1990-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: Several texture segmentation algorithms based on deterministic and stochastic relaxation principles, and their implementation on parallel networks, are described, and results of the various schemes in classifying some real textured images are presented.

...read moreread less

Abstract: Several texture segmentation algorithms based on deterministic and stochastic relaxation principles, and their implementation on parallel networks, are described. The segmentation process is posed as an optimization problem and two different optimality criteria are considered. The first criterion involves maximizing the posterior distribution of the intensity field given the label field (maximum a posteriori estimate). The posterior distribution of the texture labels is derived by modeling the textures as Gauss Markov random fields (GMRFs) and characterizing the distribution of different texture labels by a discrete multilevel Markov model. A stochastic learning algorithm is proposed. This iterated hill-climbing algorithm combines fast convergence of deterministic relaxation with the sustained exploration of the stochastic algorithms, but is guaranteed to find only a local minimum. The second optimality criterion requires minimizing the expected percentage of misclassification per pixel by maximizing the posterior marginal distribution, and the maximum posterior marginal algorithm is used to obtain the corresponding solution. All these methods implemented on parallel networks can be easily extended for hierarchical segmentation; results of the various schemes in classifying some real textured images are presented. >

...read moreread less

Journal Article•DOI•

Forecasting and control using adaptive connectionist networks

[...]

B.E. Ydstie¹•Institutions (1)

University of Massachusetts Amherst¹

01 May 1990-Computers & Chemical Engineering

TL;DR: This work investigates the feasibility of applying connectionist networks with hidden units to forecasting and process control and develops a particular approach which embeds input-output pairs in a state space using delay coordinates.

...read moreread less

Proceedings Article•DOI•

A simple parallel 3D thinning algorithm

[...]

W. Gong, Gilles Bertrand

16 Jun 1990

TL;DR: A parallel 3-D thinning algorithm which conserves medial surfaces is presented and some new topological predicates are given which are very simple to calculate and it is proved that the thinning operation based on those new predicates does not disconnect a3-D object.

...read moreread less

Abstract: A parallel 3-D thinning algorithm which conserves medial surfaces is presented. A new characterization of simple points is proposed and some new topological predicates are given which are very simple to calculate. Some new geometrical predicates are also given. It is proved that the thinning operation based on those new predicates does not disconnect a 3-D object. Experiments show that the method gives a satisfactory result. >

...read moreread less

Journal Article•DOI•

Constant time sorting on a processor array with a reconfigurable bus system

[...]

B.-F. Wang, Gen-Huey Chen, F.-C. Lin

24 Apr 1990-Information Processing Letters

TL;DR: A constant time sorting algorithm is derived on a three-dimensional processor array equipped with a reconfigurable bus system, which is far more feasible than the CRCW PRAM model.

...read moreread less

Journal Article•DOI•

Parallel simulated annealing algorithms for cell placement on hypercube multiprocessors

[...]

Prithviraj Banerjee¹, M.H. Jones², J.S. Sargent•Institutions (2)

University of Illinois at Urbana–Champaign¹, AT&T²

01 Jan 1990-IEEE Transactions on Parallel and Distributed Systems

TL;DR: A discussion is presented of two ways of mapping the cells in a two-dimensional area of a chip onto processors in an n-dimensional hypercube such that both small and large cell moves can be applied.

...read moreread less

Abstract: A discussion is presented of two ways of mapping the cells in a two-dimensional area of a chip onto processors in an n-dimensional hypercube such that both small and large cell moves can be applied. Two types of move are allowed: cell exchanges and cell displacements. The computation of the cost function in parallel among all the processors in the hypercube is described, along with a distributed data structure that needs to be stored in the hypercube to support such a parallel cost evaluation. A novel tree broadcasting strategy is presented for the hypercube that is used extensively in the algorithm for updating cell locations in the parallel environment. A dynamic parallel annealing schedule is proposed that estimates the errors due to interacting parallel moves and adapts the rate of synchronization automatically. Two novel approaches in controlling error in parallel algorithms are described: heuristic cell coloring and adaptive sequence control. The performance on an Intel iPSC-2/D4/MX hypercube is reported. >

...read moreread less

Journal Article•DOI•

Efficient parallel algorithms for graph problems

[...]

Clyde P. Kruskal¹, Larry Rudolph², Marc Snir³•Institutions (3)

University of Maryland, College Park¹, Hebrew University of Jerusalem², IBM³

01 Mar 1990-Algorithmica

TL;DR: An efficient technique for parallel manipulation of data structures that avoids memory access conflicts is presented and is used in a new parallel radix sort algorithm that is optimal for keys whose values are over a small range.

...read moreread less

Abstract: We present an efficient technique for parallel manipulation of data structures that avoids memory access conflicts. That is, this technique works on the Exclusive Read/Exclusive Write (EREW) model of computation, which is the weakest shared memory, MIMD machine model. It is used in a new parallel radix sort algorithm that is optimal for keys whose values are over a small range. Using the radix sort and known results for parallel prefix on linked lists, we develop parallel algorithms that efficiently solve various computations on trees and “unicycular graphs.” Finally, we develop parallel algorithms for connected components, spanning trees, minimum spanning trees, and other graph problems. All of the graph algorithms achieve linear speedup for all but the sparsest graphs.

...read moreread less

Journal Article•DOI•

A parallel algorithm for tiling problems

[...]

Yoshiyasu Takefuji¹, Y.-C. Lee¹•Institutions (1)

Case Western Reserve University¹

01 Mar 1990-IEEE Transactions on Neural Networks

TL;DR: A parallel algorithm for tiling with polyominoes is presented and can be used for placement of components or cells in a very large-scale integrated circuit (VLSI) chip, designing and compacting printed circuit boards, and solving a variety of two- or three-dimensional packing problems.

...read moreread less

Abstract: A parallel algorithm for tiling with polyominoes is presented. The tiling problem is to pack polyominoes in a finite checkerboard. The algorithm using l*m*n processing elements requires O(1) time, where l is the number of different kinds of polyominoes on an m*n checkerboard. The algorithm can be used for placement of components or cells in a very large-scale integrated circuit (VLSI) chip, designing and compacting printed circuit boards, and solving a variety of two- or three-dimensional packing problems. >

...read moreread less

Journal Article•DOI•

DCT algorithms for VLSI parallel implementations

[...]

Nam Ik Cho¹, Sang Uk Lee¹•Institutions (1)

Seoul National University¹

01 Jan 1990-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: Two algorithms are presented for computing the discrete cosine transform (DCT) on existing VLSI structures and a new prime factor DCT algorithm is presented for the class of DCTs of length N=N/ sub 1/*N/sub 2/, where N/sub 1/ and N/ sub 2/ are relatively prime and odd numbers.

...read moreread less

Abstract: Two algorithms are presented for computing the discrete cosine transform (DCT) on existing VLSI structures. First, it is shown that the N-point DCT can be implemented on the existing systolic architecture for the N-point discrete Fourier transform (DFT) by introducing some modifications. Second, a new prime factor DCT algorithm is presented for the class of DCTs of length N=N/sub 1/*N/sub 2/, where N/sub 1/ and N/sub 2/ are relatively prime and odd numbers. It is shown that the proposed algorithm can be implemented on the already existing VLSI structures for prime factor DFT. The number of multipliers required is comparable to that required for the other fast DCT algorithms. It is shown that the discrete sine transform (DST) can be computed by the same structure. >

...read moreread less

Journal Article•DOI•

Stepwise refinement of parallel algorithms

[...]

Ralph-Johan Back¹, Kaisa Sere¹•Institutions (1)

Åbo Akademi University¹

01 Apr 1990-Science of Computer Programming

TL;DR: It is shown that the sequencial refinement calculus can be used as such for most of the derivation steps of a parallel version of the Gaussian elimination method for solving simultaneous linear equation systems.

...read moreread less

Book Chapter•DOI•

Implementing the Genetic Algorithm on Transputer Based Parallel Processing Systems

[...]

Terence C. Fogarty, Runhe Huang

01 Oct 1990

TL;DR: The paper will discuss the trade-offs between communication overheads involved and numbers of processors employed using various communication networks between processors.

...read moreread less

Abstract: The paper discusses the parallel implementation of the genetic algorithm on transputer based parallel processing systems. It considers the implementation of the batch version of the algorithm using a problem from the domain of real-time control. With the problem chosen the evaluation of a member of the population takes a relatively long time, compared with the generation of a member of the population, and so emphasis is laid on parallel evaluation. However, any distribution of processing over a number of processors will involve some communication overheads which are not present when the processing is done on one processor. This overhead will vary depending upon the communication network used. The paper will discuss the trade-offs between communication overheads involved and numbers of processors employed using various communication networks between processors.

...read moreread less

Journal Article•DOI•

Parallel algorithms for hierarchical clustering and cluster validity

[...]

X. Li

01 Nov 1990-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Parallel algorithms on SIMD (single-instruction stream multiple-data stream) machines for hierarchical clustering and cluster validity computation are proposed, which uses a parallel memory system and an alignment network to facilitate parallel access to both pattern matrix and proximity matrix.

...read moreread less

Abstract: Parallel algorithms on SIMD (single-instruction stream multiple-data stream) machines for hierarchical clustering and cluster validity computation are proposed. The machine model uses a parallel memory system and an alignment network to facilitate parallel access to both pattern matrix and proximity matrix. For a problem with N patterns, the number of memory accesses is reduced from O(N/sup 3/) on a sequential machine to O(N/sup 2/) on an SIMD machine with N PEs. >

...read moreread less

Journal Article•DOI•

A parallel branch and bound algorithm for test generation

[...]

S. Patil¹, Prithviraj Banerjee¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Mar 1990-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: It is shown that parallel processing of HTD faults does indeed result in high fault coverage, which is otherwise not achievable by a uniprocessor algorithm, and the parallel algorithm exhibits superlinear speedups in some cases due to search anomalies.

...read moreread less

Abstract: For circuits of VLSI complexity, test generation time can be prohibitive. Most of the time is consumed by hard-to-detect (HTD) faults, which might remain undetected even after a large number of backtracks. The problems inherent in a uniprocessor implementation of a test generation algorithm are identified, and a parallel test generation method which tries to achieve a high fault coverage for HTD faults in a reasonable amount of time is proposed. A dynamic search space allocation strategy which allocates disjoint search spaces to minimize the redundant work is proposed. The search space allocation strategy tries to utilize the partial solutions generated by other processors to increase the probability of searching in a solution area. The parallel test generation algorithm has been implemented on an Intel iPSC/2 hypercube. It is shown that parallel processing of HTD faults does indeed result in high fault coverage, which is otherwise not achievable by a uniprocessor algorithm. The parallel algorithm exhibits superlinear speedups in some cases due to search anomalies. >

...read moreread less

Journal Article•DOI•

Partially asynchronous, parallel algorithms for network flow and other problems

[...]

Paul Tseng¹, Dimitri P. Bertsekas¹, John N. Tsitsiklis¹•Institutions (1)

Massachusetts Institute of Technology¹

01 May 1990-Siam Journal on Control and Optimization

TL;DR: The problem of computing a fixed point of a nonexpansive function f is considered, and simulation results illustrating the attainable speedup and the effects of asynchronism are presented.

...read moreread less

Abstract: The problem of computing a fixed point of a nonexpansive function f is considered. Sufficient conditions are provided under which a parallel, partially asynchronous implementation of the iteration $x: = f(x)$ converges. These results are then applied to (i) quadratic programming subject to box constraints, (ii) strictly convex cost network flow optimization, (iii) an agreement and a Markov chain problem, (iv) neural network optimization, and (v) finding the least element of a polyhedral set determined by a weakly diagonally dominant, Leontief system. Finally, simulation results illustrating the attainable speedup and the effects of asynchronism are presented.

...read moreread less

Journal Article•DOI•

A parallel algorithm for the arbitrary rotation of digitized images using process-and-data-decomposition approach

[...]

Hamid R. Arabnia¹•Institutions (1)

University of Georgia¹

01 Sep 1990-Journal of Parallel and Distributed Computing

TL;DR: A parallel algorithm for the rotation of digitized images is presented that combines the decomposition of a process into a number of subprocesses and the allocation of each subprocess to a processor for execution, together with the decomposing of data into smaller portions.

...read moreread less

Journal Article•DOI•

Parallel algorithms for matrix normal forms

[...]

Erich Kaltofen¹, Mukkai S. Krishnamoorthy¹, B. David Saunders²•Institutions (2)

Rensselaer Polytechnic Institute¹, University UCINF²

15 Jul 1990-Linear Algebra and its Applications

TL;DR: A new randomized parallel algorithm that determines the Smith normal form of a matrix with entries being univariate polynomials with coefficients in an arbitrary field that is probabilistic of Las Vegas type and reduces the problem of Smith form computation to two Hermite form computations.

...read moreread less

Journal Article•DOI•

Pandore: a system to manage data distribution

[...]

Françoise André, Jean-Louis Pazat, Henry Thomas

01 Jun 1990

TL;DR: The goal of the Pandore system is to allow the execution of parallel algorithms on DMPCs (Distributed Memory Parallel Computers) without having to take into account the low-level characteristics of the target distributed computer to program the algorithm.

...read moreread less

Abstract: The goal of the Pandore system is to allow the execution of parallel algorithms on DMPCs (Distributed Memory Parallel Computers) without having to take into account the low-level characteristics of the target distributed computer to program the algorithm. No explicit process definition and interprocess communications are needed. Parallelization is achieved through logical data organization. The Pandore system provides the user with a mean to specify data partitioning and data distribution over a domain of virtual processors for each parallel step of his algorithm.At compile time, Pandore splits the original program into parallel processes. Each process will execute some appropriate parts of the original code, according to the given data decomposition. In order to achieve a correct utilization of the data structures distributed over the processors, the Pandore system provides an execution scheme based on a communication layer, which is an abstraction of a message-passing architecture. This intermediate level is them implemented using the effective primitives of the real architecture (in our specific case, an Intel iPSC/2).

...read moreread less

Book•

Hypercube Algorithms: with Applications to Image Processing and Pattern Recognition

[...]

Sanjay Ranka¹, Sartaj Sahni²•Institutions (2)

Syracuse University¹, University of Minnesota²

18 Oct 1990

TL;DR: Most of the algorithms in this book are for hypercubes with the number of processors being a function of problems size, however, for image processing problems, the book also includes algorithms for and MIMD hypercube with a small number of processes.

...read moreread less

Abstract: Fundamentals algorithms for SIMD and MIMD hypercubes are developed. These include algorithms for such problems as data broadcasting, data sum, prefix sum, shift, data circulation, data accumulation, sorting, random access reads and writes and data permutation. The fundamental algorithms are then used to obtain efficient hypercube algorithms for matrix multiplication, image processing problems such as convolution, template matching, hough transform, clustering and image processing transformation, and string editing. Most of the algorithms in this book are for hypercubes with the number of processors being a function of problems size. However, for image processing problems, the book also includes algorithms for and MIMD hypercube with a small number of processes. Experimental results on an NCUBE/77 MIMD hypercube are also presented. The book is suitable for use in a one-semester or one-quarter course on hypercube algorithms. For students with no prior exposure to parallel algorithms, it is recommended that one week will be spent on the material in chapter 1, about six weeks on chapter 2 and one week on chapter 3. The remainder of the term can be spent covering topics from the rest of the book.

...read moreread less

Collapse