Showing papers on "Parallel algorithm published in 1986"

PDF

Open Access

Journal Article•DOI•

A simple parallel algorithm for the maximal independent set problem

[...]

01 Nov 1986-SIAM Journal on Computing

TL;DR: Two basic design strategies are used to develop a very simple and fast parallel algorithms for the maximal independent set (MIS) problem.

...read moreread less

Abstract: Two basic design strategies are used to develop a very simple and fast parallel algorithms for the maximal independent set (MIS) problem. The first strategy consists of assigning identical copies o...

...read moreread less

1,117 citations

Journal Article•DOI•

Data parallel algorithms

[...]

W. Daniel Hillis, Guy L. Steele

01 Dec 1986-Communications of The ACM

TL;DR: The success of data parallel algorithms—even on problems that at first glance seem inherently serial—suggests that this style of programming has much wider applicability than was previously thought.

...read moreread less

Abstract: Parallel computers with tens of thousands of processors are typically programmed in a data parallel style, as opposed to the control parallel style used in multiprocessing. The success of data parallel algorithms—even on problems that at first glance seem inherently serial—suggests that this style of programming has much wider applicability than was previously thought.

...read moreread less

1,000 citations

Journal Article•DOI•

Efficient algorithms for finding maximum matching in graphs

[...]

Zvi Galil¹•Institutions (1)

Columbia University¹

01 Mar 1986-ACM Computing Surveys

TL;DR: The techniques used for designing the most efficient algorithms for finding a maximum cardinality or weighted matching in (general or bipartite) graphs are surveyed.

...read moreread less

Abstract: This paper surveys the techniques used for designing the most efficient algorithms for finding a maximum cardinality or weighted matching in (general or bipartite) graphs. It also lists some open problems concerning possible improvements in existing algorithms and the existence of fast parallel algorithms for these problems.

...read moreread less

479 citations

Journal Article•DOI•

Image Analysis Using Multigrid Relaxation Methods

[...]

Demetri Terzopoulos¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Feb 1986-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper develops multiresolution iterative algorithms for computing lightness, shape-from-shading, and optical flow and examines the efficiency of these algorithms using synthetic image inputs, and describes the multigrid methodology that is broadly applicable in early vision.

...read moreread less

Abstract: Image analysis problems, posed mathematically as variational principles or as partial differential equations, are amenable to numerical solution by relaxation algorithms that are local, iterative, and often parallel. Although they are well suited structurally for implementation on massively parallel, locally interconnected computational architectures, such distributed algorithms are seriously handi capped by an inherent inefficiency at propagating constraints between widely separated processing elements. Hence, they converge extremely slowly when confronted by the large representations of early vision. Application of multigrid methods can overcome this drawback, as we showed in previous work on 3-D surface reconstruction. In this paper, we develop multiresolution iterative algorithms for computing lightness, shape-from-shading, and optical flow, and we examine the efficiency of these algorithms using synthetic image inputs. The multigrid methodology that we describe is broadly applicable in early vision. Notably, it is an appealing strategy to use in conjunction with regularization analysis for the efficient solution of a wide range of ill-posed image analysis problems.

...read moreread less

424 citations

Journal Article•DOI•

Constructing a perfect matching is in random NC

[...]

Richard M. Karp¹, Eli Upfal², Avi Wigderson³•Institutions (3)

University of California, Berkeley¹, Stanford University², IBM³

01 Jan 1986-Combinatorica

TL;DR: The problem of constructing a perfect matching in a graph is in the complexity class Random NC; i.e., the problem is solvable in polylog time by a randomized parallel algorithm using a polynomial-bounded number of processors.

...read moreread less

Abstract: We show that the problem of constructing a perfect matching in a graph is in the complexity class Random NC; i.e., the problem is solvable in polylog time by a randomized parallel algorithm using a polynomial-bounded number of processors. We also show that several related problems lie in Random NC. These include:

...read moreread less

287 citations

Journal Article•DOI•

A fast and simple randomized parallel algorithm for maximal matching

[...]

Amos Israel, Alon Itai

01 Feb 1986-Information Processing Letters

TL;DR: A parallel randomized algorithm to find a maximal matching is presented that improves the best known deterministic algorithm by a factor of log 2 vbEvb.

...read moreread less

248 citations

Proceedings Article•DOI•

Approximate and exact parallel scheduling with applications to list, tree and graph problems

[...]

Richard Cole, Uzi Vishkin¹•Institutions (1)

New York University¹

27 Oct 1986

TL;DR: A novel scheduling problem is defined; it is solved by repeated, rapid, approximate reschedulings, which leads to a first optimal PRAM algorithm for list ranking, which runs in logarithmic time.

...read moreread less

Abstract: We study two parallel scheduling problems and their use in designing parallel algorithms. First, we define a novel scheduling problem; it is solved by repeated, rapid, approximate reschedulings. This leads to a first optimal PRAM algorithm for list ranking, which runs in logarithmic time. Our second scheduling result is for computing prefix sums of logn bit numbers. We give an optimal parallel algorithm for the problem which runs in sublogarithmic time. These two scheduling results together lead to logarithmic time PRAM algorithms for the connectivity, biconnectivity and minimum spanning tree problems. The connectivity and biconnectivity algorithms are optimal unless m = o(nlog*n), in graphs of n vertices and m edges.

...read moreread less

196 citations

Journal Article•DOI•

A comment on “a fast parallel algorithm for thinning digital patterns”

[...]

Hsueh-I Lu¹, P. S. P. Wang²•Institutions (2)

National Taiwan University¹, Northeastern University²

01 Mar 1986-Communications of The ACM

TL;DR: The improved algorithm overcomes some of the disadvantages found in [5] by preserving necessary and essential structures for certain patterns which should not be deleted and maintains very fast speed, from about 1.5 to 2.3 times faster than the four-step and two-step methods described in [3].

...read moreread less

Abstract: A fast parallel thinning algorithm for digital patterns is presented. This algorithm is an improved version of the algorithms introduced by Zhang and Suen [5] and Stefanelli and Rosenfeld [3]. An experiment using an Apple II and an Epson printer was conducted. The results show that the improved algorithm overcomes some of the disadvantages found in [5] by preserving necessary and essential structures for certain patterns which should not be deleted and maintains very fast speed, from about 1.5 to 2.3 times faster than the four-step and two-step methods described in [3] although the resulting skeletons look basically the same.

...read moreread less

146 citations

Journal Article•DOI•

Computing the generalized singular value decomposition

[...]

Christopher C. Paige

01 Oct 1986-Siam Journal on Scientific and Statistical Computing

TL;DR: With the correct choice of ordering the algorithm can be implemented using systolic array processors (Gentleman, personal communication), and can also be used to compute any CS decomposition of a unitary matrix.

...read moreread less

Abstract: An algorithm is described for computing the generalized singular value decomposition of $A(m \times n)$ and $B(p \times n)$. Unitary matrices U, V and Q are developed so that $U^H AQ$ and $V^H BQ$ have as many nonzero parallel rows as possible, and these correspond to the common row space of the two matrices. The algorithm consists of an iterative sequence of cycles where each cycle is made up of the serial application of $2 \times 2$ generalized singular value decompositions. Convergence appears to be at least quadratic. With the correct choice of ordering the algorithm can be implemented using systolic array processors (Gentleman,personal communication). The algorithm can also be used to compute any CS decomposition of a unitary matrix.

...read moreread less

137 citations

Journal Article•DOI•

Parallel ear decomposition search (EDS) and st -numbering in graphs

[...]

Yael Maon¹, Baruch Schieber¹, Uzi Vishkin²•Institutions (2)

Tel Aviv University¹, New York University²

03 Nov 1986-Theoretical Computer Science

TL;DR: A general method for searching efficiently in parallel undirected graphs, called ear-decomposition search (EDS), based on depth-first search (DFS), is presented.

...read moreread less

134 citations

Book Chapter•DOI•

Parallel arithmetic computations: a survey

[...]

J. von zur Gathen¹•Institutions (1)

University of Toronto¹

01 Jun 1986

Journal Article•DOI•

Parallel algorithms and architectures for rule-based systems

[...]

Abhinav Gupta¹, Charles L. Forgy¹, Allen Newell¹, Robert G. Wedig¹•Institutions (1)

Carnegie Mellon University¹

01 May 1986

TL;DR: It is observed that to obtain this limited factor of 10-fold speed-up, it is necessary to exploit parallelism at a very fine granularity, and it is proposed that a suitable architecture to exploit such fine-grain parallelism is a bus-based shared-memory multiprocessor with 32-64 processors.

...read moreread less

Abstract: Rule-based systems, on the surface, appear to be capable of exploiting large amounts of parallelism—it is possible to match each rule to the data memory in parallel. In practice, however, we show that the speed-up from parallelism is quite limited, less than 10-fold. The reasons for the small speed-up are: (1) the small number of rules relevant to each change to data memory; (2) the large variation in the processing required by the relevant rules; and (3) the small number of changes made to data memory between synchronization steps. Furthermore, we observe that to obtain this limited factor of 10-fold speed-up, it is necessary to exploit parallelism at a very fine granularity. We propose that a suitable architecture to exploit such fine-grain parallelism is a bus-based shared-memory multiprocessor with 32-64 processors. Using such a multiprocessor (with individual processors working at 2 MIPS), it is possible to obtain execution speeds of about 3800 rule-firings/sec. This speed is significantly higher than that obtained by other proposed parallel implementations of rule-based systems.

...read moreread less

Journal Article•DOI•

Parallel implementations of the statistical cooling algorithm

[...]

Emile H. L. Aarts¹, F. M. J. de Bont¹, E H A Habers¹, P. J. M. van Laarhoven¹•Institutions (1)

Philips¹

01 Sep 1986-Integration

TL;DR: Two parallel formulations of the statistical cooling algorithm are proposed, i.e. a systolic algorithm and a clustered algorithm, based on the requirement that quasi-equilibrium is preserved throughout the optimization process.

...read moreread less

Journal Article•DOI•

An improved parallel algorithm for maximal matching

[...]

Amos Israeli¹, Y. Shiloach•Institutions (1)

Harvard University¹

01 Feb 1986-Information Processing Letters

TL;DR: A parallel O(log 3 vbEvb) algorithm for finding a maximal matching in a graph G(V, E) is presented, and the model of computation is the CRCW-PRAM, and vbVvb + vb Evb processors are used.

...read moreread less

Journal Article•DOI•

Efficient Parallel Algorithm for Robot Inverse Dynamics Computation

[...]

C. S. George Lee¹, Po Rong Chang¹•Institutions (1)

Purdue University¹

01 Jul 1986

TL;DR: It is shown that the time lower bound of computing the inverse dynamics of an n-link robot manipulator parallelly using p processors is O(k1 [n/p] + k2 [log<2 p]), where k1 and k2 are constants.

...read moreread less

Abstract: It is shown that the time lower bound of computing the inverse dynamics of an n-link robot manipulator parallelly using p processors is O(k1 [n/p] + k2 [log<2 p]), where k1 and k2 are constants. A novel parallel algorithm for computing the inverse dynamics using the Newton-Euler equations of motion was developed to be implemented on a single-instruction-stream multiple-data-stream computer with p processors to achieve the time lower bound. When p = n, the proposed parallel algorithm achieves the Minsky's time lower bound O([log2 n]), whidc is the conjecture of parallel evaluation. The proposed p-fold parallel algorithm can be best described as consisting of p-parallel blocks with pipelined elements within each parallel block The results from the computations in the p blocks form a new homogeneous linear recurrence of size p, which can be computed using the recursive doubling algorithm. A modified inverse perfect shuffle interconnection scheme was suggested to interconnect the p processors. Furthermore, the proposed parallel algorithm is susceptible to a systolic pipelined architecture, requiring three floating-point operations per complete set of joint torques.

...read moreread less

Proceedings Article•DOI•

A fast parallel algorithm to compute the rank of a matrix over an arbitrary field

[...]

Ketan Mulmuley¹•Institutions (1)

University of California, Berkeley¹

01 Nov 1986

TL;DR: It is shown that the rank of a matrix over an arbitrary field can be computed inO(log2 n) time using a polynomial number of processors.

...read moreread less

Abstract: It is shown that the rank of a matrix over an arbitrary field can be computed inO(log2 n) time using a polynomial number of processors.

...read moreread less

Journal Article•DOI•

A design methodology for synthesizing parallel algorithms and architectures

[...]

Marina C. Chen¹•Institutions (1)

Yale University¹

01 Dec 1986-Journal of Parallel and Distributed Computing

TL;DR: The fact that Crystal is a general purpose language for parallel programming allows new design methods and synthesis techniques, properties and theorems about problems in specific application domains, and new insights into any given problem to be integrated readily within the existing design framework.

...read moreread less

Journal Article•DOI•

New Classes for Parallel Complexity: A Study of Unification and Other Complete Problems for P

[...]

Vitter¹, Simons•Institutions (1)

Mathematical Sciences Research Institute¹

01 May 1986-IEEE Transactions on Computers

TL;DR: It is shown that foqur complete problems for P (nonsparse versions of unification, path system accessibility, monotone circuit value, and ordered depth-first search) are parallelizable.

...read moreread less

Abstract: Previous theoretical work in computational complexity has suggested that any problem which is log-space complete for P is not likely in NC, and thus not parallelizable. In practice, this is not the case. To resolve this paradox, we introduce new complexity classes PC and PC* that capture the practical notion of parallelizability we discuss in this paper. We show that foqur complete problems for P (nonsparse versions of unification, path system accessibility, monotone circuit value, and ordered depth-first search) are parallelizable. That is, their running times are O(E + V) on a sequential RAM and O(E/P + V log P) on an EXCLUSIVE-READ EXCLUSIVE-WRITE Parallel RAM with P processors where V and E are the numbers of vertices and edges in the inputed instance of the problem. These problems are in PC and PC*, since an appropriate choice of P can speed up their sequential running times by a factor of μ(P). Several interesting open questions are raised regarding these new parallel complexity classes PC and PC*. Unification is particularly important because it is a basic operation in theorem proving, in type inference algorithms, and in logic programming languages such as Prolog. A fast parallel implementation of Prolog is needed for software development in the Fifth Generation project.

...read moreread less

Journal Article•DOI•

Parallel Cholesky factorization on a shared-memory multiprocessor☆

[...]

Alan George¹, Michael T. Heath², Joseph W. H. Liu³•Institutions (3)

University of Waterloo¹, Oak Ridge National Laboratory², York University³

01 May 1986-Linear Algebra and its Applications

TL;DR: A parallel algorithm is developed for Cholesky factorization on a shared-memory multiprocessor based on self-scheduling of a pool of tasks and the most promising variant, which the authors call column-Cholesky, is identified and implemented for the Denelcor HEP multiproprocessor.

...read moreread less

SCHEDULE: Tools for developing and analyzing parallel Fortran programs

[...]

Jack Dongarra, D.C. Sorensen

01 Nov 1986

TL;DR: The main tool in this environment is a package called SCHEDULE which has been designed to aid a programmer familiar with a Fortran programming environment to implement a parallel algorithm in a manner that will lend itself to transporting the resulting program across a wide variety of parallel machines.

...read moreread less

Abstract: This paper describes an environment for the transportable implementation of parallel algorithms in a Fortran setting. By this we mean that a user's code is virtually identical for each machine. The main tool in this environment is a package called SCHEDULE which has been designed to aid a programmer familiar with a Fortran programming environment to implement a parallel algorithm in a manner that will lend itself to transporting the resulting program across a wide variety of parallel machines. The package is designed to allow existing Fortran subroutines to be called through SCHEDULE, without modification, thereby permitting users access to a wide body of existing library software in a parallel setting. Machine intrinsics are invoked within the SCHEDULE package, and considerable effort may be required on our part to move SCHEDULE from one machine to another. On the other hand, the user of SCHEDULE is relieved of the burden of modifying each code he desires to transport from one machine to another. 17 refs., 11 figs., 1 tab.

...read moreread less

Journal Article•DOI•

Computational models and task scheduling for parallel sparse Cholesky factorization

[...]

Joseph W. H. Liu¹•Institutions (1)

York University¹

01 Oct 1986

TL;DR: A new medium-grained model based on column-oriented tasks is introduced, and it is shown to correspond structurally to the filled graph of the given sparse matrix and give an overall scheme for parallel sparse Cholesky factorization, appropriate for parallel machines with shared-memory architecture like the Denelcor HEP.

...read moreread less

Abstract: In this paper, a systematic and unified treatment of computational task models for parallel sparse Cholesky factorization is presented. They are classified as fine-, medium-, and large-grained graph models. In particular, a new medium-grained model based on column-oriented tasks is introduced, and it is shown to correspond structurally to the filled graph of the given sparse matrix. The task scheduling problem for the various task graphs is also discussed. A practical algorithm to schedule the column tasks of the medium-grained model for multiple processors is described. It is based on a heuristic critical path scheduling method. This will give an overall scheme for parallel sparse Cholesky factorization, appropriate for parallel machines with shared-memory architecture like the Denelcor HEP.

...read moreread less

Parallel Cholesky factorization on a shared-memory multiprocessor. Final report, 1 October 1986-30 September 1987

[...]

A. George, M.T. Heath, J. Liu

01 Jan 1986

TL;DR: In this article, a parallel algorithm for Cholesky factorization on a shared-memory multiprocessor is presented. The algorithm is based on self-scheduling of a pool of tasks.

...read moreread less

Abstract: A parallel algorithm is developed for Cholesky factorization on a shared-memory multiprocessor. The algorithm is based on self-scheduling of a pool of tasks. The subtasks in several variants of the basic elimination algorithm are analyzed for potential concurrency in terms of precedence relations, work profiles, and processor utilization. This analysis is supported by simulation results. The most promising variant, which the authors call column-Cholesky, is identified and implemented for the Denelcor HEP multiprocessor. Experimental results are given for this machine.

...read moreread less

Journal Article•DOI•

An efficient parallel algorithm for planarity

[...]

Philip N. Klein¹, John H. Reif²•Institutions (2)

Massachusetts Institute of Technology¹, Duke University²

27 Oct 1986

TL;DR: A parallel algorithm for testing a graph for planarity, and for finding an embedding of a planar graph, which uses a sophisticated data structure for representing sets of embeddings, the PQ-tree of [Booth and Lueker, 76].

...read moreread less

Abstract: We describe a parallel algorithm for testing a graph for planarity, and for finding an embedding of a planar graph. For a graph on n vertices, the algorithm runs in O(log2 n) steps on n processors of a parallel RAM. The previous best algorithm for planarity testing in parallel polylog time ([Ja'Ja' and Simon, 82]) used a reduction to solving linear systems, and hence required Ω(n2..49...) processors by known methods, whereas our processor bounds are within a polylog factor of optimal. The most significant aspect of our parallel algorithms is the use of a sophisticated data structure for representing sets of embeddings, the PQ-tree of [Booth and Lueker, 76]. Previously no parallel algorithms for PQ-trees were known. We have efficient parallel algorithms for manipulating PQ-trees, which we use in our planarity algorithm.

...read moreread less

Journal Article•DOI•

Optimal Bounds for Finding Maximum on Array of Processors with k Global Buses

[...]

Aggarwal¹•Institutions (1)

IBM¹

01 Jan 1986-IEEE Transactions on Computers

TL;DR: The problem of finding the maximum of a set of values stored one per processor on a two-dimensional array of processors with a time-shared global bus is considered and the algorithm given by Bokhari is shown to be optimal, within a multiplier constant, for this network and for other d-dimensional arrays.

...read moreread less

Abstract: The problem of finding the maximum of a set of values stored one per processor on a two-dimensional array of processors with a time-shared global bus is considered. The algorithm given by Bokhari is shown to be optimal, within a multiplicative constant, for this network and for other d-dimensional arrays. We generalize this model and demonstrate optimal bounds for finding the maximum of a set of values stored in a d-dimensional array with k time-shared global buses.

...read moreread less

Journal Article•DOI•

Broadcast Communications and Distributed Algorithms

[...]

Dechter¹, Kleinrock•Institutions (1)

University of California, Berkeley¹

01 Mar 1986-IEEE Transactions on Computers

TL;DR: An algorithm for merging k sorted lists of n/k elements using k processors is presented and it is proved its worst case complexity to be 2n, regardless of the number of processors, while neglecting the cost arising from possible conflicts on the broadcast channel.

...read moreread less

Abstract: The paper addresses ways in which one can use "broadcast communication" in distributed algorithms and the relevant issues of design and complexity. We present an algorithm for merging k sorted lists of n/k elements using k processors and prove its worst case complexity to be 2n, regardless of the number of processors, while neglecting the cost arising from possible conflicts on the broadcast channel. We also show that this algorithm is optimal under single-channel broadcast communication. In a variation of the algorithm, we show that by using an extra local memory of O(k) the number of broadcasts is reduced to n. When the algorithm is used for sorting n elements with k processors, where each processor sorts its own list first and then merging, it has a complexity of O(n/k log(n/k) + n), and is thus asymptotically optimal for large n. We also discuss the cost incurred by the channel access scheme and prove that resolving conflicts whenever k processors are involved introduces a cost factor of at least log k.

...read moreread less

Journal Article•DOI•

Dual Sign Algorithm for Adaptive Filtering

[...]

C. Kwong¹•Institutions (1)

The Chinese University of Hong Kong¹

01 Dec 1986-IEEE Transactions on Communications

TL;DR: A new algorithm, which is a variant of the sign algorithm, is proposed for the adaptive adjustment of an FIR digital filter with an aim of improving the original convergence characteristics, yet retaining the advantage of hardware simplicity.

...read moreread less

Abstract: A new algorithm, which is a variant of the sign algorithm, is proposed for the adaptive adjustment of an FIR digital filter with an aim of improving the original convergence characteristics, yet retaining the advantage of hardware simplicity. Based on a recently proposed theory for the sign algorithm, a practical design method is derived for the new algorithm, and it is shown by computer simulation that the new algorithm in fact performs significantly better than the original algorithm.

...read moreread less

Journal Article•DOI•

Parallel algorithms for nonlinear problems

[...]

R E White

01 Jan 1986-Siam Journal on Algebraic and Discrete Methods

TL;DR: A parallel nonlinear Gauss–Seidel algorithm for approximating the solution of Au + \phi ( u ) = f where A is an M-matrix is introduced and studied and the speed-up on the Denelcor HEP parallel processing computer is recorded.

...read moreread less

Abstract: Multi-splittings of a matrix are used to generate parallel algorithms to approximate the solutions of nonlinear algebraic systems. A parallel nonlinear Gauss–Seidel algorithm for approximating the solution of $Au + \phi ( u ) = f$ where A is an M-matrix is introduced and studied. Also, a parallel Newton–SOR method is defined for the problem $F ( u ) = 0$ where $F' ( u ) = $ the Jacobian is an M-matrix. An illustration and comparison of these methods with their serial versions is given. The speed-up on the Denelcor HEP parallel processing computer is also recorded.

...read moreread less

Proceedings Article•DOI•

Efficient plane sweeping in parallel

[...]

Mikhail J. Atallah¹, Michael T. Goodrich¹•Institutions (1)

Purdue University¹

01 Aug 1986

TL;DR: This work presents techniques which result in improved parallel algorithms for a number of problems whose efficient sequential algorithms use the plane-sweeping paradigm, and never uses the AKS sorting network in any of them.

...read moreread less

Abstract: We present techniques which result in improved parallel algorithms for a number of problems whose efficient sequential algorithms use the plane-sweeping paradigm. The problems for which we give improved algorithms include intersection detection, trapezoidal decomposition, triangulation, and planar point location. Our technique can be used to improve on the previous time bound while keeping the space and processor bounds the same, or improve on the previous space bound while keeping the time and processor bounds the same. We also give efficient parallel algorithms for visibility from a point, 3-dimensional maxima, multiple range-counting, and rectilinear segment intersection counting. We never use the AKS sorting network in any of our algorithms.

...read moreread less

Journal Article•DOI•

The Architecture of SM3: A Dynamically Partitionable Multicomputer System

[...]

Baru¹, Su•Institutions (1)

University of Michigan¹

01 Sep 1986-IEEE Transactions on Computers

TL;DR: The architecture of this system is compared to those of conventional local area networks and shared-memory systems in order to establish the distinct nature and characteristics of a multicomputer system based on the SM3 concept.

...read moreread less

Abstract: The architecture of a multicomputer system with switchable main memory modules (SM3) is presented. This architecture supports the efficient execution of parallel algorithms for nonnumeric processing by 1) allowing the sharing of switchable main memory modules between computers, 2) supporting dynamic partitioning of the system, and 3) employing global control lines to efficiently support interprocessor communication. Data transfer time is reduced to memory switching time by allowing some main memory modules to be switched between processors. Dynamic partitioning gives a common bus system the capability of an MIMD machine while performing global operations. The global control lines establish a quick and efficient high-level protocol in the system. The network is supervised by a control computer which oversees network partitioning and other global functions. The hardware involved is quite simple and the network is easily extensible. A simulation study using discrete event simulation techniques has been carried out and the results of the study are presented. The architecture of this system is compared to those of conventional local area networks and shared-memory systems in order to establish the distinct nature and characteristics of a multicomputer system based on the SM3 concept.

...read moreread less

Journal Article•DOI•

Parallel QR Decomposition of a rectangular matrix

[...]

Michel Cosnard¹, Jean-Michel Muller¹, Yves Robert¹•Institutions (1)

Centre national de la recherche scientifique¹

01 Feb 1986-Numerische Mathematik

TL;DR: In this article, the authors show that the greedy algorithm introduced in [1] and [5] to perform the parallel QR decomposition of a dense rectangular matrix of sizem×n is optimal.

...read moreread less

Abstract: We show that the greedy algorithm introduced in [1] and [5] to perform the parallel QR decomposition of a dense rectangular matrix of sizem×n is optimal. Then we assume thatm/n2 tends to zero asm andn go to infinity, and prove that the complexity of such a decomposition is asymptotically2n, when an unlimited number of processors is available.

...read moreread less

Collapse