Showing papers on "Parallel algorithm published in 1988"

PDF

Open Access

Journal Article•DOI•

A new approach to the maximum-flow problem

[...]

Andrew V. Goldberg¹, Robert E. Tarjan²•Institutions (2)

Massachusetts Institute of Technology¹, Princeton University²

01 Oct 1988-Journal of the ACM

TL;DR: An alternative method based on the preflow concept of Karzanov, which runs as fast as any other known method on dense graphs, achieving an O(n) time bound on an n-vertex graph and faster on graphs of moderate density.

...read moreread less

Abstract: All previously known efficient maximum-flow algorithms work by finding augmenting paths, either one path at a time (as in the original Ford and Fulkerson algorithm) or all shortest-length augmenting paths at once (using the layered network approach of Dinic). An alternative method based on the preflow concept of Karzanov is introduced. A preflow is like a flow, except that the total amount flowing into a vertex is allowed to exceed the total amount flowing out. The method maintains a preflow in the original network and pushes local flow excess toward the sink along what are estimated to be shortest paths. The algorithm and its analysis are simple and intuitive, yet the algorithm runs as fast as any other known method on dense graphs, achieving an O(n3) time bound on an n-vertex graph. By incorporating the dynamic tree data structure of Sleator and Tarjan, we obtain a version of the algorithm running in O(nm log(n2/m)) time on an n-vertex, m-edge graph. This is as fast as any known method for any graph density and faster on graphs of moderate density. The algorithm also admits efficient distributed and parallel implementations. A parallel implementation running in O(n2log n) time using n processors and O(m) space is obtained. This time bound matches that of the Shiloach-Vishkin algorithm, which also uses n processors but requires O(n2) space.

...read moreread less

1,700 citations

Book•

Solving Problems on Concurrent Processors

[...]

Geoffrey C. Fox

01 Mar 1988

1,148 citations

Journal Article•DOI•

Decentralized structures for parallel Kalman filtering

[...]

H.R. Hashemipour, Sumit Roy, Alan J. Laub

01 Jan 1988-IEEE Transactions on Automatic Control

TL;DR: H hierarchical network structures are developed that have the property that the optimal global estimate based on all the available information can be reconstructed from estimates computed by local processor nodes solely on the basis of their own local information and transmitted to a central processor.

...read moreread less

Abstract: Various multisensor network scenarios with signal processing tasks that are amenable to multiprocessor implementation are described The natural origins of such multitasking are emphasized, and novel parallel structures for state estimation using the Kalman filter are proposed that extend existing results in several directions In particular, hierarchical network structures are developed that have the property that the optimal global estimate based on all the available information can be reconstructed from estimates computed by local processor nodes solely on the basis of their own local information and transmitted to a central processor The algorithms potentially yield an approximately linear speedup rate, are reasonably failure-resistant, and are optimized with respect to communication bandwidth and memory requirements at the various processors >

...read moreread less

482 citations

Book•

Efficient Parallel Algorithms

[...]

Alan Gibbons¹, Wojciech Rytter²•Institutions (2)

University of Warwick¹, University of Warsaw²

01 Jan 1988

TL;DR: The emphasis of the book is on designing algorithms within the timeless and abstracted context of a high-level programming language rather than depending on highly specific computer architectures.

...read moreread less

Abstract: From the Publisher: This text is an introduction to the field of efficient parallel algorithms and to techniques for efficient parallelisation. It is largely self-contained and presumes no special knowledge of parallel computers or particular mathematics. The emphasis of the book is on designing algorithms within the timeless and abstracted context of a high-level programming language rather than depending on highly specific computer architectures. This approach concentrates on the essence of algorithmic theory, and on determining and taking advantage of the inherently parallel nature of certain types of problem. The authors present regularly used techniques and a range of algorithms which includes some of the more celebrated. The text is targeted at non-specialists who are considering entering the field of parallel algorithms. It will be particularly useful for courses aimed at advanced undergraduate or new postgraduate students of computer science and mathematics.

...read moreread less

466 citations

A Survey of Parallel Algorithms for Shared-Memory Machines

[...]

Richard M. Karp

01 Jan 1988

TL;DR: A survey of the growing body of theory concerned with parallel algorithms and the complexity of parallel computation, which considers the parallel random-access machine (PRAM), in which it is assumed that each processor has random access in unit time to any cell of a global memory.

...read moreread less

Abstract: This paper is a survey of the growing body of theory concerned with parallel algorithms and the complexity of parallel computation The principal computation that we consider is the parallel random-access machine (PRAM), in which it is assumed that each processor has random access in unit time to any cell of a global memory This model permits the logical structure of parallel computation to be studied in a context divorced from issues of interprocessor communication Section 2 surveys efficient parallel algorithms for bookkeeping operations such as compacting an array by squeezing out its "dead" elements, for evaluating algebraic expressions, for searching a graph and decomposing it into various kinds of components, and for sorting, merging and selection These algorithms are typically completely different from the best sequential algorithms for the same problems, and their discovery has required the creation of a new set of paradigms for the construction of parallel algorithms Section 3 studies the relationships among several variants of the PRAM model which differ in their implementation of concurrent reading and/or concurrent writing, presents lower bounds on the time to solve certain elementary problems on various kinds of P RAMs, and compares the PRAM with other models such as bounded-fan-in and unbounded-fan-in circuits, alternating Turing machines and vector machines Section 3 also introduces NC, a hierarchy of problems solvable by deterministic algorithms that operate in polylog time using a polynomial-bounded number of processors Section 4 discusses specific problems within NC Among the problems shown to lie at low levels within this hierarchy are the basic arithmetic operations, transitive closure and Boolean matrix multiplication, the computation of the determinant, the rank an d inverse of a matrix, the evaluation of certain classes of straight-line programs and the construction of a maximal independent set of vertices in a graph Section 4 also discusses the randomized version of NC, and gives fast randomized parallel algorithm s for problems such as finding a maximum matching in a graph Section 4 concludes by exhibiting several problems that are complete in the sequential complexity class P with respect to logspace reducibility, and hence unlikely to lie in NC

...read moreread less

383 citations

Journal Article•DOI•

Parallel computational geometry

[...]

Alok Aggarwal¹, Bernard Chazelle², Leonidas J. Guibas³, Colm Ó'Dúnlaing⁴, Chee Yap⁴ - Show less +1 more•Institutions (4)

IBM¹, Princeton University², Stanford University³, Courant Institute of Mathematical Sciences⁴

01 Nov 1988-Algorithmica

TL;DR: In this paper, the authors present efficient parallel algorithms for several basic problems in computational geometry: convex hulls, Voronoi diagrams, detecting line segment intersections, triangulating simple polygons, minimizing a circumscribing triangle, and recursive data-structures for three-dimensional queries.

...read moreread less

Abstract: We present efficient parallel algorithms for several basic problems in computational geometry: convex hulls, Voronoi diagrams, detecting line segment intersections, triangulating simple polygons, minimizing a circumscribing triangle, and recursive data-structures for three-dimensional queries.

...read moreread less

311 citations

Journal Article•DOI•

A parallel algorithm for polygon rasterization

[...]

Juan Pineda

01 Jun 1988

TL;DR: A parallel algorithm for the rasterization of polygons is presented that is particularly well suited for 3D Z-buffered graphics implementations and can be interpolated with hardware similar to hardware required to interpolate color and Z pixel values.

...read moreread less

Abstract: A parallel algorithm for the rasterization of polygons is presented that is particularly well suited for 3D Z-buffered graphics implementations. The algorithm represents each edge of a polygon by a linear edge function that has a value greater than zero on one side of the edge and less than zero on the opposite side. The value of the function can be interpolated with hardware similar to hardware required to interpolate color and Z pixel values. In addition, the edge function of adjacent pixels may be easily computed in parallel. The coefficients of the "Edge function" can be computed from floating point endpoints in such a way that sub-pixel precision of the endpoints can be retained in an elegant way.

...read moreread less

259 citations

Journal Article•DOI•

Removing randomness in parallel computation without a processor penalty

[...]

Michael Luby¹•Institutions (1)

University of Toronto¹

24 Oct 1988

TL;DR: A parallel algorithm for the Delta +1 vertex coloring problem with running time O(log/sup 3/ nlog log n) using a linear number of processors on a concurrent-read-concurrent-write parallel random-access machine.

...read moreread less

Abstract: Some general techniques are developed for removing randomness from randomized NC algorithms without a blowup in the number of processors. One of the requirements for the application of these techniques is that the analysis of the randomized algorithm uses only pairwise independence. The main new result is a parallel algorithm for the Delta +1 vertex coloring problem with running time O(log/sup 3/ nlog log n) using a linear number of processors on a concurrent-read-concurrent-write parallel random-access machine. The techniques also apply to several other problems, including the maximal-independent-set problem and the maximal-matching problem. The application of the general technique to these last two problems is mostly of academic interest, because NC algorithms using a linear number of processors that have better running times have been previously found. >

...read moreread less

210 citations

Proceedings Article•

Towards an Architecture-Independent Analysis of Parallel Algorithms (Extended Abstract)

[...]

Christos H. Papadimitriou, Mihalis Yannakakis

01 Jan 1988

TL;DR: It would be very interesting if the authors could combine stages (3) and (4) into a single step whereby the performance of the algorithm is measured as the makespan of the schedule (elapsed time for computing the last result).

...read moreread less

Abstract: Harnessing the massively parallel architectures soon to become available into efficient algorithmic cooperation is one of the most important intellectual challenges facing Computer Science today. To the theoretician, the task seems similar to thiat of understanding the issues involved in the performance of sequential algorithms (which motivated Knuth’s books, among other important works), only mfinitely more complex. In sequential computation, th.e design process involves (a) choosing an algorithm and (b) analyzing it (mostly, counting its steps). In the parallel context, however, we have at least four stages: (1) Choose the algorithm (say, a directed acyclic graph (dag) indicating the elementary computations and their interdependence, a model in which evaluation of sequential performance is trivial). (2) Choose a particular multiprocessor architecture. (3) Find a schedule whereby the algorithm is executed on the processors (so that all necessary data a:re available at the appropriate processor at the time of each computation). (4) Only now can we talk about the performance of the algorithm, measured as the makespan of the schedule (elapsed time for computing the last result). In our opinion, it is this multi-layered nature of the problem that lies at the heart of the difficulties encountered in the development of the necessary ideas, principles, and tools for the design of parallel algorithms. Is there a way to shortcut the process, thus improving our chances of finally gaining some insight into parallel algorithms? It would be very interesting if we could combine stages (3) and (4) into a single step whereby the performance of the algorithm cho-

...read moreread less

176 citations

Journal Article•DOI•

Parallel solution of triangular systems on distributed-memory multiprocessors

[...]

Michael T. Heath, Charles H. Romine

01 May 1988-Siam Journal on Scientific and Statistical Computing

TL;DR: Several parallel algorithms are presented for solving triangular systems of linear equations on distributed-memory multiprocessors and new wavefront algorithms are developed for both row-oriented and column-oriented matrix storage.

...read moreread less

Abstract: Several parallel algorithms are presented for solving triangular systems of linear equations on distributed-memory multiprocessors. New wavefront algorithms are developed for both row-oriented and column-oriented matrix storage. Performance of the new algorithms and several previously proposed algorithms is analyzed theoretically and illustrated empirically using implementations on commercially available hypercube multiprocessors.

...read moreread less

160 citations

Proceedings Article•

Parallel best—first search of state-space graphs: a summary of results

[...]

Vipin Kumar¹, K. P. Ramesh¹, V. Nageshwara Rao¹•Institutions (1)

University of Texas at Austin¹

21 Aug 1988

TL;DR: This paper presents many different parallel formulations of the A*/Branch-and-Bound search algorithm, and discovered problem characteristics that make certain formulations more (or less) suitable for some search problems.

...read moreread less

Abstract: This paper presents many different parallel formulations of the A*/Branch-and-Bound search algorithm. The parallel formulations primarily differ in the data structures used. Some formulations are suited only for shared-memory architectures, whereas others are suited for distributed-memory architectures as well. These parallel formulations have been implemented to solve the vertex cover problem and the TSP problem on the BBN Butterfly parallel processor. Using appropriate data structures, we are able to obtain fairly linear speedups for as many as 100 processors. We also discovered problem characteristics that make certain formulations more (or less) suitable for some search problems. Since the best-first search paradigm of A*/Branch-and-Bound is very commonly used, we expect these parallel formulations to be effective for a variety of problems. Concurrent and distributed priority queues used in these parallel formulations can be used in many parallel algorithms other than parallel A*/branch-and-bound.

...read moreread less

Book•

Parallel algorithms and matrix computation

[...]

Jagdish J. Modi

01 Jan 1988

TL;DR: Part 1 Fundamentals of parallel computation: general principles of parallel computing parallel techniques and algorithms parallel sorting algorithms and future trends in algorithm development.

...read moreread less

Abstract: Part 1 Fundamentals of parallel computation: general principles of parallel computing parallel techniques and algorithms parallel sorting algorithms. Part 2 Numerical linear algebra: solution of a system of linear algebraic equations the symmetric eigenvalue problem - Jacobi method QR factorization singular-value decomposition and related problems future trends in algorithm development.

...read moreread less

Journal Article•DOI•

A thinning algorithm by contour generation

[...]

Paul C. K. Kwok¹•Institutions (1)

University of Calgary¹

01 Nov 1988-Communications of The ACM

TL;DR: A new contour generating serial algorithm is faster and more efficient than conventional contour tracing and parallel algorithms.

...read moreread less

Abstract: A new contour generating serial algorithm is faster and more efficient than conventional contour tracing and parallel algorithms

...read moreread less

Book Chapter•DOI•

Deterministic parallel list ranking

[...]

Richard J. Anderson¹, Gary L. Miller²•Institutions (2)

University of Washington¹, University of Southern California²

28 Jun 1988

TL;DR: This paper describes a simple parallel algorithm for list ranking that matches the performance of the Cole-Vishkin [CV86a] algorithm but is simple and has reasonable constant factors.

...read moreread less

Abstract: In this paper we describe a simple parallel algorithm for list ranking. The algorithm is deterministic and runs in O(log n) time on EREW P-RAM with n/log n processor. The algorithm matches the performance of the Cole-Vishkin [CV86a] algorithm but is simple and has reasonable constant factors.

...read moreread less

Journal Article•DOI•

Parallel Gaussian elimination on an MIMD computer

[...]

Michel Cosnard, Mounir Marrakchi, Yves Robert, Denis Trystram¹•Institutions (1)

École Centrale Paris¹

01 Mar 1988

TL;DR: It is shown that the SAXPY, GAXPY and DOT algorithms of Dongarra, Gustavson and Karp, as well as parallel versions of the LDMt, LDLt, Doolittle and Cholesky algorithms, can be classified into four task graph models.

...read moreread less

Abstract: This paper introduces a graph-theoretic approach to analyse the performances of several parallel Gaussian-like triangularization algorithms on an MIMD computer. We show that the SAXPY, GAXPY and DOT algorithms of Dongarra, Gustavson and Karp, as well as parallel versions of the LDMt, LDLt, Doolittle and Cholesky algorithms, can be classified into four task graph models. We derive new complexity results and compare the asymptotic performances of these parallel versions.

...read moreread less

Journal Article•DOI•

A modified fast parallel algorithm for thinning digital patterns

[...]

Yung-Sheng Chen¹, Wen-Hsing Hsu¹•Institutions (1)

National Tsing Hua University¹

01 Feb 1988-Pattern Recognition Letters

TL;DR: A modified version of the fast parallel thinning algorithm proposed by Zhang and Suen is presented, which preserves the original merits such as the contour noise immunity and good effect in thinning crossing lines; and overcomes the original demerits.

...read moreread less

Journal Article•DOI•

The Complexity of Parallel Search

[...]

Richard M. Karp¹, Eli Upfal², Avi Wigderson³•Institutions (3)

University of California, Berkeley¹, IBM², Hebrew University of Jerusalem³

01 Apr 1988

TL;DR: Lower and upper bounds on the deterministic and randomized complexity of parallel search algorithms are derived to establish that randomized parallel algorithms are much more powerful than deterministic ones, and to show that even randomized algorithms cannot make effective use of extremely large numbers of processors.

...read moreread less

Abstract: This paper studies parallel search algorithms within the framework of independence systems. It is motivated by earlier work on parallel algorithms for concrete problems such as the determination of a maximal independent set of vertices or a maximum matching in a graph, and by the general question of determining the parallel complexity of a search problem when an oracle is available to solve the associated decision problem. Our results provide a parallel analogue of the self-reducibility process that is so useful in sequential computation. An abstract independence system is specified by a ground set E and a family of subsets of E called the independent sets; it is required that every subset of an independent set be independent. We investigate parallel algorithms for determining a maximal independent set through oracle queries of the form "Is the set a independent?", as well as parallel algorithms for determining a maximum independent set through queries to a more powerful oracle called a rank oracle. We also study these problems for three special types of independence systems: matroidoids, graphic matroids and partition matroids. We derive lower and upper bounds on the deterministic and randomized complexity of these problems. These bounds are sharp enough to give a clear picture of the processor-time trade-offs that are possible, to establish that randomized parallel algorithms are much more powerful than deterministic ones, and to show that even randomized algorithms cannot make effective use of extremely large numbers of processors.

...read moreread less

Patent•

Parallel rendering of smoothly shaded color triangles with anti-aliased edges for a three dimensional color display

[...]

Carlo John Evangelisti¹, Leon Lumelsky¹, Mark Joseph Pavicic¹•Institutions (1)

IBM¹

24 Nov 1988

TL;DR: In this paper, a parallel algorithm for rendering an important graphic primitive for accomplishing the production of a smoothly shaded color three-dimensional triangle with anti-aliased edges is presented.

...read moreread less

Abstract: SIMD computer architecture is used in conjunction with a host processor and coordinate processor to render quality, three-dimensional, anti-aliased shaded color images into the frame buffer of a video display system. The method includes a parallel algorithm for rendering an important graphic primitive for accomplishing the production of a smoothly shaded color three-dimensional triangle with anti-aliased edges. By taking advantage of the SIMD architecture and said parallel algorithm, the very time consuming pixel by pixel computations are broken down for parallel execution. A single coordinate processor computes and transmits an overall triangle record which is essentially the same for all blocks of pixels within a given bounding box which box in turn surrounds each triangle. The individual pixel data is produced by a group of M×N pixel processors and stored in the frame buffer in a series of repetitive steps wherein each step corresponds to the processing of an M×N block of pixels within the bounding box of the triangle. Thus, each pixel processor performs the same operation, modifying its computations in accordance with triangle data received from the coordinate processor and positional data unique to its own sequential connectivity to the frame buffer, thus allowing parallel access to the frame buffer.

...read moreread less

Journal Article•DOI•

A Recursive Formulation for Constrained Mechanical System Dynamics: Part III. Parallel Processor Implementation

[...]

Dae-Sung Bae¹, Jon G. Kuhl¹, Edward J. Haug¹•Institutions (1)

University of Iowa¹

01 Jan 1988-Mechanics of Structures and Machines

TL;DR: An off-road vehicle with a suspension system that has eight closed loops is used to illustrate the parallel processor algorithm and to investigate parallel processing speed-up and overhead.

...read moreread less

Abstract: A high speed dynamic simulation algorithm that exploits emerging parallel processor computer technology is presented. Medium grain parallelism is defined by the graph structure of a mechanism and the recursive algorithm derived in parts I and II of this paper, for both open and closed loop systems. An off-road vehicle with a suspension system that has eight closed loops is used to illustrate the parallel processor algorithm. A shared memory multiprocessor is used to implement the algorithm and to investigate parallel processing speed-up and overhead. Real-time simulation of a ground vehicle is demonstrated.

...read moreread less

Journal Article•DOI•

Parallel algorithmic techniques for combinatorial computation

[...]

David Eppstein¹, Zvi Galil²•Institutions (2)

Columbia University¹, Tel Aviv University²

01 Sep 1988

TL;DR: A number of algorithmic tools that have been found useful in the construction of parallel algorithms are described; among these are prefix computation, ranking, Euler tours, ear decomposition, and matrix calculations.

...read moreread less

Abstract: We have described a number of algorithmic tools that have been found useful in the construction of parallel algorithms; among these are prefix computation, ranking, Euler tours, ear decomposition, and matrix calculations. We have also described some of the applications of these tools, and listed many other applications. These algorithms seem likely to be useful not only in their own right, but also as examples of ways to break up other problems into parts suitable for parallel solution.

...read moreread less

Proceedings Article•DOI•

An efficient output-sensitive hidden surface removal algorithm and its parallelization

[...]

John H. Reif¹, Sandeep Sen¹•Institutions (1)

Duke University¹

06 Jan 1988

TL;DR: This paper presents an algorithm for hidden surface removal for a class of polyhedral surfaces which have a property that they can be ordered relatively quickly like the terrain maps and presents a parallel algorithm based on a similar approach.

...read moreread less

Abstract: In this paper we present an algorithm for hidden surface removal for a class of polyhedral surfaces which have a property that they can be ordered relatively quickly like the terrain maps A distinguishing feature of this algorithm is that its running time is sensitive to the actual size of the visible image rather than the total number of intersections in the image plane which can be much larger than the visible image The time complexity of this algorithm is O((k +n)lognloglogn) where n and k are respectively the input and the output sizes Thus, in a significant number of situations this will be faster than the worst case optimal algorithms which have running time O(n2) irrespective of the output size (where as the output size k is O(n2) only in the worst case) We also present a parallel algorithm based on a similar approach which runs in time O(log4(n+k)) using O((n + k)/log(n+k)) processors in a CREW PRAM model All our bounds are obtained using ammortized analysis

...read moreread less

Journal Article•DOI•

Efficient parallel evaluation of straight-line code and arithmetic circuits

[...]

Gary L. Miller¹, Vijaya Ramachandran², Erich Kaltofen²•Institutions (2)

University of Southern California¹, Mathematical Sciences Research Institute²

01 Aug 1988-SIAM Journal on Computing

TL;DR: A new parallel algorithm is given to evaluate a straight line program over a commutative semi-ring R of degree d and size n in time O (log n(log nd) time) using M(n) processors.

...read moreread less

Abstract: A new parallel algorithm is given to evaluate a straight line program. The algorithm evaluates a program over a commutative semi-ring R of degree d and size n in time O(log n(log nd)) using M(n) processors, where M(n) is the number of processors required for multiplying n×n matrices over the semi-ring R in O (log n) time.

...read moreread less

Journal Article•DOI•

An optimally efficient selection algorithm

[...]

Richard Cole¹•Institutions (1)

New York University¹

25 Jan 1988-Information Processing Letters

TL;DR: An optimally efficient parallel algorithm for selection on the EREW PRAM that requires a linear number of operations and O(log n log ∗ /log log n) time is given.

...read moreread less

Book Chapter•DOI•

Optimal Tree Contraction in the EREW Model

[...]

Hillel Gazit¹, Gary L. Miller¹, Shang-Hua Teng¹•Institutions (1)

University of Southern California¹

01 Jan 1988

TL;DR: A deterministic parallel algorithm for parallel tree contraction that is optimal in the sense that the product P · T is equal to the input size and gives an O(log n) time algorithm when P = n/log n.

...read moreread less

Abstract: A deterministic parallel algorithm for parallel tree contraction is presented in this paper. The algorithm takes T = O(n/P) time and uses P (P ≤ n/log n) processors, where n = the number of vertices in a tree using an Exclusive Read and Exclusive Write (EREW) Parallel Random Access Machine (PRAM). This algorithm improves the results of Miller and Reif [MR85,MR87], who use the CRCW randomized PRAM model to get the same complexity and processor count. The algorithm is optimal in the sense that the product P · T is equal to the input size and gives an O(log n) time algorithm when P = n/log n. Since the algorithm requires O(n) space, which is the input size, it is optimal in space as well. Techniques for prudent parallel tree contraction are also discussed, as well as implementation techniques for fixed-connection machines.

...read moreread less

Journal Article•DOI•

$LU$ Factorization Algorithms on Distributed-Memory Multiprocessor Architectures

[...]

George A. Geist, Charles H. Romine

01 Jul 1988-Siam Journal on Scientific and Statistical Computing

TL;DR: It is concluded that, in the absence of loop-unrolling, $LU$ factorization with partial pivoting is most efficient when pipelining is used to mask the cost of pivoting.

...read moreread less

Abstract: In this paper, we consider the effect that the data-storage scheme and pivoting scheme have on the efficiency of $LU$ factorization on a distributed-memory multiprocessor. Our presentation will focus on the hypercube architecture, but most of our results are applicable to distributed-memory architectures in general. We restrict our attention to two commonly used storage schemes (storage by rows and by columns) and investigate partial pivoting both by rows and by columns, yielding four factorization algorithms. Our goal is to determine which of these four algorithms admits the most efficient parallel implementation. We analyze factors such as load distribution, pivoting cost, and potential for pipelining. We conclude that, in the absence of loop-unrolling, $LU$ factorization with partial pivoting is most efficient when pipelining is used to mask the cost of pivoting. The two schemes that can be pipelined are pivoting by interchanging rows when the coefficient matrix is distributed to the processors by columns, and pivoting by interchanging columns when the matrix is distributed to the processors by rows.

...read moreread less

Journal Article•DOI•

A parallel triangular solver for distributed-memory multiprocessor

[...]

Guangye Li¹, Thomas F. Coleman¹•Institutions (1)

Cornell University¹

01 May 1988-Siam Journal on Scientific and Statistical Computing

TL;DR: This work considers solving triangular systems of linear equations on a distributed-memory multiprocessor which allows for a ring embedding and proposes a parallel algorithm, applicable when the triangular matrix is distributed by column in a wrap fashion.

...read moreread less

Abstract: We consider solving triangular systems of linear equations on a distributed-memory multiprocessor which allows for a ring embedding. Specifically, we propose a parallel algorithm, applicable when the triangular matrix is distributed by column in a wrap fashion. Numerical experiments indicate that the new algorithm is very efficient in some circumstances (in particular, when the size of the problem is sufficiently large relative to the number of processors).A theoretical analysis confirms that the total running time varies linearly, with respect to the matrix order, up to a threshold value of the matrix order, after which the dependence is quadratic. Moreover, we show that total message traffic is essentially the minimum possible.Finally, we describe an analogous row-oriented algorithm.

...read moreread less

Journal Article•DOI•

Geometric retrieval in parallel

[...]

Martin David Katz¹, Dennis J. Volper²•Institutions (2)

California State University, Fullerton¹, University of California, Irvine²

01 Feb 1988-Journal of Parallel and Distributed Computing

TL;DR: Experimental results for sorting integers, two-dimensional fast Fourier transforms (FFT), and constraint-satisfied searching are presented, illustrating the power of the SMP cluster programming methodology.

...read moreread less

Journal Article•DOI•

Parallel text search methods

[...]

Gerard Salton¹, Chris Buckley¹•Institutions (1)

Cornell University¹

01 Feb 1988-Communications of The ACM

TL;DR: A comparison of recently proposed parallel text search methods to alternative available search strategies that use serial processing machines suggests parallel methods do not provide large-scale gains in either retrieval effectiveness or efficiency.

...read moreread less

Abstract: A comparison of recently proposed parallel text search methods to alternative available search strategies that use serial processing machines suggests parallel methods do not provide large-scale gains in either retrieval effectiveness or efficiency.

...read moreread less

Journal Article•DOI•

Iterative algorithms for solution of large sparse systems of linear equations on hypercubes

[...]

Cevdet Aykanat¹, Füsun Özgüner², Fikret Ercal², P. Sadayappan²•Institutions (2)

Bilkent University¹, Ohio State University²

01 Dec 1988-IEEE Transactions on Computers

TL;DR: In this paper, iterative algorithms based on the conjugate gradient method are developed for hypercubes designed for coarse-grained parallelism, and the communication requirements of different schemes for mapping finite-element meshes onto the processors of a hypercube are analyzed with respect to the effect of communication parameters of the architecture.

...read moreread less

Abstract: Finite-element discretization produces linear equations in the form Ax=b, where A is large, sparse, and banded with proper ordering of the variables x. The solution of such equations on distributed-memory message-passing multiprocessors implementing the hypercube topology is addressed. Iterative algorithms based on the conjugate gradient method are developed for hypercubes designed for coarse-grained parallelism. The communication requirements of different schemes for mapping finite-element meshes onto the processors of a hypercube are analyzed with respect to the effect of communication parameters of the architecture. Experimental results for a 16-node Intel 80386-based iPSC/2 hypercube are presented and discussed. >

...read moreread less

Journal Article•DOI•

A parallel algorithm for a class of convex programs

[...]

Shih-Ping Han¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Mar 1988-Siam Journal on Control and Optimization

TL;DR: The proposed parallel algorithm has attractive convergence properties and can be implemented as parallel algorithms for tackling definite quadratic programs, linear programs, systems of linear equations and systems of generalized nonlinear inequalities.

...read moreread less

Abstract: A parallel algorithm is proposed in this paper for solving the problem $\min \{ q(x)|x \in C_1 \cap \cdots \cap C_m \} $ where q is an uniformly convex function and $C_i$ are closed convex sets in $R^n$. In each iteration of the method, we solve in parallel m independent subproblems, each minimizing a definite quadratic function over an individual set $C_i$. The method has attractive convergence properties and can be implemented as parallel algorithms for tackling definite quadratic programs, linear programs, systems of linear equations and systems of generalized nonlinear inequalities.

...read moreread less

Collapse