Showing papers on "Parallel algorithm published in 1994"

PDF

Open Access

Journal Article•DOI•

DSC: scheduling parallel tasks on an unbounded number of processors

[...]

Tao Yang¹, Apostolos Gerasoulis²•Institutions (2)

University of California, Santa Barbara¹, Rutgers University²

01 Sep 1994-IEEE Transactions on Parallel and Distributed Systems

TL;DR: A low-complexity heuristic for scheduling parallel tasks on an unbounded number of completely connected processors, named the dominant sequence clustering algorithm (DSC), which guarantees a performance within a factor of 2 of the optimum for general coarse-grain DAG's.

...read moreread less

Abstract: We present a low-complexity heuristic, named the dominant sequence clustering algorithm (DSC), for scheduling parallel tasks on an unbounded number of completely connected processors. The performance of DSC is on average, comparable to, or even better than, other higher-complexity algorithms. We assume no task duplication and nonzero communication overhead between processors. Finding the optimum solution for arbitrary directed acyclic task graphs (DAG's) is NP-complete. DSC finds optimal schedules for special classes of DAG's, such as fork, join, coarse-grain trees, and some fine-grain trees. It guarantees a performance within a factor of 2 of the optimum for general coarse-grain DAG's. We compare DSC with three higher-complexity general scheduling algorithms: the ETF by J.J. Hwang, Y.C. Chow, F.D. Anger, and C.Y. Lee (1989); V. Sarkar's (1989) clustering algorithm; and the MD by M.Y. Wu and D. Gajski (1990). We also give a sample of important practical applications where DSC has been found useful. >

...read moreread less

694 citations

Journal Article•DOI•

A sorting classification of parallel rendering

[...]

Steven Molnar¹, Michael Cox², David S. Ellsworth¹, Henry Fuchs¹•Institutions (2)

University of North Carolina at Chapel Hill¹, Princeton University²

01 Jul 1994-IEEE Computer Graphics and Applications

TL;DR: A classification scheme is described that is based on where the sort from object coordinates to screen coordinates occurs, which it is believed is fundamental whenever both geometry processing and rasterization are performed in parallel.

...read moreread less

Abstract: We describe a classification scheme that we believe provides a more structured framework for reasoning about parallel rendering. The scheme is based on where the sort from object coordinates to screen coordinates occurs, which we believe is fundamental whenever both geometry processing and rasterization are performed in parallel. This classification scheme supports the analysis of computational and communication costs, and encompasses the bulk of current and proposed highly parallel renderers - both hardware and software. We begin by reviewing the standard feed-forward rendering pipeline, showing how different ways of parallelizing it lead to three classes of rendering algorithms. Next, we consider each of these classes in detail, analyzing their aggregate processing and communication costs, possible variations, and constraints they may impose on rendering applications. Finally, we use these analyses to compare the classes and identify when each is likely to be preferable. >

...read moreread less

612 citations

Proceedings Article•DOI•

Efficient algorithms for globally optimal trajectories

[...]

John N. Tsitsiklis¹•Institutions (1)

Massachusetts Institute of Technology¹

14 Dec 1994

TL;DR: Two serial and parallel algorithms for solving a system of equations that arises from the discretization of the Hamilton-Jacobi equation associated to a trajectory optimization problem of the following type are presented.

...read moreread less

Abstract: Presents serial and parallel algorithms for solving a system of equations that arises from the discretization of the Hamilton-Jacobi equation associated to a trajectory optimization problem of the following type. A vehicle starts at a prespecified point x/sub 0/ and follows a unit speed trajectory x(t) inside a region in /spl Rfr//sup m/, until an unspecified time T that the region is excited. A trajectory minimising a cost function of the form /spl int//sub 0//sup T/ r(x(t))dt+q(x(T)) is sought. The discretized Hamilton-Jacobi equation corresponding to this problem is usually served using iterative methods. Nevertheless, assuming that the function r is positive, one is able to exploit the problem structure and develop one-pass algorithms for the discretized problem. The first m resembles Dijkstra's shortest path algorithm and runs in time O(n log n), where n is the number of grid points. The second algorithm uses a somewhat different discretization and borrows some ideas from Dial's shortest path algorithm; it runs in time O(n), which is the best possible, under some fairly mild assumptions. Finally, the author shows that the latter algorithm can be efficiently parallelized: for two-dimensional problems and with p processors, its running time becomes O(n/p), provided that p=O(/spl radic/n/log n). >

...read moreread less

589 citations

Book•

Parallel Computing: Theory and Practice

[...]

Michael J. Quinn

01 Mar 1994

TL;DR: Graph theoretic terminology review of complex numbers parallel algorithm design strategies and how to design parallel algorithms for linear systems and multiprocessors.

...read moreread less

Abstract: PRAM algorithms processor arrays, multiprocessors and multicomputers parallel programming languages mapping and scheduling elementary parallel algorithms matrix multiplication the fast Fourier transform solving linear systems sorting dictionary operations graph algorithms combinational search. Appendices: graph theoretic terminology review of complex numbers parallel algorithm design strategies.

...read moreread less

472 citations

Journal Article•DOI•

Parallel Taboo Search Techniques for the Job Shop Scheduling Problem

[...]

Éric D. Taillard

01 May 1994-Informs Journal on Computing

TL;DR: A fast parallel algorithm is given that provides good solutions to very large problems in a very short computation time and identifies a type of problem for which taboo search provides an optimal solution in a polynomial mean time in practice.

...read moreread less

Abstract: We apply the global optimization technique called taboo search to the job shop scheduling problem and show that our method is typically more efficient than the shifting bottleneck procedure, and also more efficient than a recently proposed simulated annealing implementation. We also identify a type of problem for which taboo search provides an optimal solution in a polynomial mean time in practice, while an implementation of the shifting bottleneck procedure seems to take an exponential amount of computation time. Included are computational results that establish new best solutions for a number of benchmark problems from the literature. Finally, we give a fast parallel algorithm that provides good solutions to very large problems in a very short computation time. INFORMS Journal on Computing, ISSN 1091-9856, was published as ORSA Journal on Computing from 1989 to 1995 under ISSN 0899-1499.

...read moreread less

347 citations

Journal Article•DOI•

Implementation of a portable nested data-parallel language

[...]

Guy E. Blelloch¹, Jonathan C. Hardwick¹, Jay Sipelstein¹, Marco Zagha¹, Siddhartha Chatterjee¹ - Show less +1 more•Institutions (1)

Carnegie Mellon University¹

01 Apr 1994-Journal of Parallel and Distributed Computing

TL;DR: Initial benchmark results of NESL show that NESL′s performance is competitive with that of machine-specific codes for regular dense data, and is often superior for irregular data.

...read moreread less

329 citations

Journal Article•DOI•

Parallel Branch-and-Branch Algorithms: Survey and Synthesis

[...]

Bernard Gendron, Teodor Gabriel Crainic

01 Dec 1994-Operations Research

TL;DR: A new characterization of branch-and-bound algorithms is given, which consists of isolating the performed operations without specifying any particular order for their execution.

...read moreread less

Abstract: We present a detailed and up-to-date survey of the literature on parallel branch-and-bound algorithms. We synthesize previous work in this area and propose a new classification of parallel branch-and-bound algorithms. This classification is used to analyze the methods proposed in the literature. To facilitate our analysis, we give a new characterization of branch-and-bound algorithms, which consists of isolating the performed operations without specifying any particular order for their execution.

...read moreread less

319 citations

Book•

Introduction to the theory of complexity

[...]

Daniel P. Bovet¹, Pierluigi Crescenzi¹•Institutions (1)

Sapienza University of Rome¹

01 Apr 1994

TL;DR: 1. Mathematical Preliminaries, Elements of Computability Theory, and Space-Complexity Classes: Algorithms and Complexity Classes.

...read moreread less

Abstract: 1. Mathematical Preliminaries. 2. Elements of Computability Theory. 4. The Class P. 5. The Glass NP. 6. The Complexity of Optiimzation Problems. 7. Beyond NP. 8. Space-Complexity Classes. 9. Probabiillistic. 10. Algorithms and Complexity Classes. 11. Interactivite Proof. 12. Systems. 13. Models of Parallel Computer. 14. Parallel Algorithms.

...read moreread less

312 citations

Journal Article•DOI•

On parallel prefix computation

[...]

Sanjeev Saxena¹, Pramod Chandra P. Bhatt², V.C. Prasad³•Institutions (3)

Indian Institute of Technology Kanpur¹, McGill University², Indian Institute of Technology Delhi³

01 Dec 1994-Parallel Processing Letters

TL;DR: It is proved that prefix sums of n integers of at most b bits can be found on a COMMON CRCW PRAM in time with a linear time-processor product, and the algorithm is optimally fast, for any polynomial number of processors.

...read moreread less

Abstract: We prove that prefix sums of n integers of at most b bits can be found on a COMMON CRCW PRAM in time with a linear time-processor product. The algorithm is optimally fast, for any polynomial number of processors. In particular, if the time taken is . This is a generalisation of previous result. The previous time algorithm was valid only for O(log n)-bit numbers. Application of this algorithm to r-way parallel merge sort algorithm is also considered. We also consider a more realistic PRAM variant, in which the word size, m, may be smaller than b (m≥log n). On this model, prefix sums can be found in optimal time.

...read moreread less

311 citations

Journal Article•DOI•

Parallel volume rendering using binary-swap compositing

[...]

Kwan-Liu Ma¹, James Painter¹, Charles Hansen², M. Krogh²•Institutions (2)

Langley Research Center¹, Los Alamos National Laboratory²

01 Jul 1994-IEEE Computer Graphics and Applications

TL;DR: A parallel volume-rendering algorithm, which consists of two parts: parallel ray tracing and parallel compositing, which is particularly effective for massively parallel processing, as it always uses all processing units by repeatedly subdividing the partial images and distributing them to the appropriate processing units.

...read moreread less

Abstract: We describe a parallel volume-rendering algorithm, which consists of two parts: parallel ray tracing and parallel compositing. In the most recent implementation on Connection Machine's CM-5 and networked workstations, the parallel volume renderer evenly distributes data to the computing resources available. Without the need to communicate with other processing units, each subvolume is ray traced locally and generates a partial image. The parallel compositing process then merges all resulting partial images in depth order to produce the complete image. The compositing algorithm is particularly effective for massively parallel processing, as it always uses all processing units by repeatedly subdividing the partial images and distributing them to the appropriate processing units. Test results on both the CM-5 and the workstations are promising. They do, however, expose different performance issues for each platform. >

...read moreread less

311 citations

Journal Article•DOI•

Direct bulk-synchronous parallel algorithms

[...]

Alexandros V. Gerbessiotis¹, Leslie G. Valiant¹•Institutions (1)

Harvard University¹

01 Aug 1994-Journal of Parallel and Distributed Computing

TL;DR: It is shown that optimality to within a multiplicative factor close to one can be achieved for the problems of Gauss-Jordan elimination and sorting, by transportable algorithms that can be applied for a wide range of values of the parameters p, g, and L.

...read moreread less

Book•

The de Bruijn multiprocessor network: a versatile parallel processing and sorting network for VLSI

[...]

Maheswara R. Samatham, Dhiraj K. Pradhan

01 Jun 1994

TL;DR: In this paper, a tight lower bound of the VLSI layout area of the binary de Bruijn multiprocessor network (BDM) is derived; a procedure for an area-optimal VLSIsI layout is also described.

...read moreread less

Abstract: It is shown that the binary de Bruijn multiprocessor network (BDM) can solve a wide variety of classes of problems. The BDM admits an N-node linear array, an N-node ring, (N-1)-node complete binary trees, ((3N/4)-2)-node tree machines, and an N-node one-step shuffle-exchange network, where N (=2/sup k/, k an integer) is the total number of nodes. The de Bruijn multiprocessor networks are proved to be fault-tolerant as well as extensible. A tight lower bound of the VLSI layout area of the BDM is derived; a procedure for an area-optimal VLSI layout is also described. It is demonstrated that the BDM is more versatile than the shuffle-exchange and the cube-connected cycles. Recent work has classified sorting architectures into (1) sequential input/sequential output, (2) parallel input/sequential output, (3) parallel input/parallel output, (4) sequential input/parallel output, and (5) hybrid input/hybrid output. It is demonstrated that the de Bruijn multiprocessor networks can sort data items in all of the abovementioned categories. No other network which can sort data items in all the categories is known. >

...read moreread less

Journal Article•DOI•

Scalable Load Balancing Techniques for Parallel Computers

[...]

Vipin Kumar¹, Ananth Grama¹, N.R. Vempaty¹•Institutions (1)

University of Minnesota¹

01 Jul 1994-Journal of Parallel and Distributed Computing

TL;DR: This paper analyses the scalability of a number of load balancing algorithms which can be applied to problems that have the following characteristics: the work done by a processor can be partitioned into independent work pieces; the work pieces are of highly variable sizes; and it is not possible to estimate the size of total work at a given processor.

...read moreread less

Proceedings Article•DOI•

Coarse-grain parallel genetic algorithms: categorization and new approach

[...]

Shyh-Chang Lin¹, William F. Punch¹, Erik D. Goodman¹•Institutions (1)

Michigan State University¹

26 Oct 1994

TL;DR: A new coarse-grained GA architecture, the Injection Island GA (iiGA), is proposed and the preliminary results of iiGA's show them to be a promising new approach to coarse-grain GA's.

...read moreread less

Abstract: This paper describes a number of different coarse-grain GA's, including various migration strategies and connectivity schemes to address the premature convergence problem. These approaches are evaluated on a graph partitioning problem. Our experiments showed, first, that the sequential GA's used are not as effective as parallel GA's for this graph partition problem. Second, for coarse-grain GA's, the results indicate that using a large number of nodes and exchanging individuals asynchronously among them is very effective. Third, GA's that exchange solutions based on population similarity instead of a fixed connection topology get better results without any degradation in speed. Finally, we propose a new coarse-grained GA architecture, the Injection Island GA (iiGA). The preliminary results of iiGA's show them to be a promising new approach to coarse-grain GA's. >

...read moreread less

Journal Article•DOI•

Analyzing scalability of parallel algorithms and architectures

[...]

Vipin Kumar¹, Anshul Gupta¹•Institutions (1)

University of Minnesota¹

01 Sep 1994-Journal of Parallel and Distributed Computing

TL;DR: The objectives of this paper are to critically assess the state of the art in the theory of scalability analysis, and to motivate further research on the development of new and more comprehensive analytical tools to study the scalability of parallel algorithms and architectures.

...read moreread less

Journal Article•DOI•

Scalability of parallel algorithm-machine combinations

[...]

Xian-He Sun¹, Diane T. Rover²•Institutions (2)

Langley Research Center¹, Michigan State University²

01 Jun 1994-IEEE Transactions on Parallel and Distributed Systems

TL;DR: Theoretical results show that a large class of algorithm-machine combinations is scalable and the scalability can be predicted through premeasured machine parameters, and a harmony between speedup and scalability has been observed.

...read moreread less

Abstract: Scalability has become an important consideration in parallel algorithm and machine designs. The word scalable, or scalability, has been widely and often used in the parallel processing community. However, there is no adequate, commonly accepted definition of scalability available. Scalabilities of computer systems and programs are difficult to quantify, evaluate, and compare. In this paper, scalability is formally defined for algorithm-machine combinations. A practical method is proposed to provide a quantitative measurement of the scalability. The relation between the newly proposed scalability and other existing parallel performance metrics is studied. A harmony between speedup and scalability has been observed. Theoretical results show that a large class of algorithm-machine combinations is scalable and the scalability can be predicted through premeasured machine parameters. Two algorithms have been studied on an nCUBE 2 multicomputer and on a MasPar MP-1 computer. These case studies have shown how scalabilities can be measured, computed, and predicted. Performance instrumentation and visualization tools also have been used and developed to understand the scalability related behavior. >

...read moreread less

Journal Article•DOI•

Computing smooth molecular surfaces

[...]

Amitabh Varshney¹, Frederick P. Brooks¹, William Wright¹•Institutions (1)

University of North Carolina at Chapel Hill¹

01 Sep 1994-IEEE Computer Graphics and Applications

TL;DR: This work considers how to formulate a parallel analytical molecular surface algorithm that has expected linear complexity with respect to the total number of atoms in a molecule, and aims to compute and display these surfaces at interactive rates, by taking advantage of advances in computational geometry.

...read moreread less

Abstract: We consider how we set out to formulate a parallel analytical molecular surface algorithm that has expected linear complexity with respect to the total number of atoms in a molecule. To achieve this goal, we avoided computing the complete 3D regular triangulation over the entire set of atoms, a process that takes time O(n log n), where n is the number of atoms in the molecule. We aim to compute and display these surfaces at interactive rates, by taking advantage of advances in computational geometry, making further algorithmic improvements and parallelizing the computations. >

...read moreread less

Journal Article•DOI•

Parallel visualization algorithms: performance and architectural implications

[...]

J. Pal Singh¹, Anshul Gupta¹, Marc Levoy¹•Institutions (1)

Stanford University¹

01 Jul 1994-IEEE Computer

TL;DR: This article demonstrates that simple and natural parallelizations work very well, the sequential implementations do not have to be fundamentally restructured, and the high degree of temporal locality obviates the need for explicit data distribution and communication management on the best known visualization algorithms.

...read moreread less

Abstract: Recently, a new class of scalable, shared-address-space multiprocessors has emerged. Like message-passing machines, these multiprocessors have a distributed interconnection network and physically distributed main memory. However, they provide hardware support for efficient implicit communication through a shared address space, and they automatically exploit temporal locality by caching both local and remote data in a processor's hardware cache. In this article, we show that these architectural characteristics make it much easier to obtain very good speedups on the best known visualization algorithms. Simple and natural parallelizations work very well, the sequential implementations do not have to be fundamentally restructured, and the high degree of temporal locality obviates the need for explicit data distribution and communication management. We demonstrate our claims through parallel versions of three state-of-the-art algorithms: a recent hierarchical radiosity algorithm by Hanrahan et al. (1991), a parallelized ray-casting volume renderer by Levoy (1992), and an optimized ray-tracer by Spach and Pulleyblank (1992). We also discuss a new shear-warp volume rendering algorithm that provides the first demonstration of interactive frame rates for a 256/spl times/256/spl times/256 voxel data set on a general-purpose multiprocessor. >

...read moreread less

Journal Article•DOI•

Maisie: a language for the design of efficient discrete-event simulations

[...]

Rajive Bagrodia¹, Wen-Toh Liao¹•Institutions (1)

University of California, Los Angeles¹

01 Apr 1994-IEEE Transactions on Software Engineering

TL;DR: The Maisie simulation language is presented, a set of optimizations are described, and the use of the language in the design of efficient parallel simulations is illustrated.

...read moreread less

Abstract: Maisie is a C-based discrete-event simulation language that was designed to cleanly separate a simulation model from the underlying algorithm (sequential or parallel) used for the execution of the model. With few modifications, a Maisie program may be executed by using a sequential simulation algorithm, a parallel conservative algorithm or a parallel optimistic algorithm. The language constructs allow the run-time system to implement optimizations that reduce recomputation and state saving overheads for optimistic simulations and synchronization overheads for conservative implementations. This paper presents the Maisie simulation language, describes a set of optimizations, and illustrates the use of the language in the design of efficient parallel simulations. >

...read moreread less

Journal Article•DOI•

Genetic Algorithms for Combinatorial Optimization: The Assemble Line Balancing Problem

[...]

Edward J. Anderson¹, Michael C. Ferris²•Institutions (2)

University of Cambridge¹, University of Wisconsin-Madison²

01 May 1994-Informs Journal on Computing

TL;DR: This work considers the application of the genetic algorithm to a particular problem, the Assembly Line Balancing Problem, and carries out extensive computational testing to find appropriate values for the various parameters associated with this genetic algorithm.

...read moreread less

Abstract: Genetic algorithms are one example of the use of a random element within an algorithm for combinatorial optimization. We consider the application of the genetic algorithm to a particular problem, the Assembly Line Balancing Problem. A general description of genetic algorithms is given, and their specialized use on our test-bed problems is discussed. We carry out extensive computational testing to find appropriate values for the various parameters associated with this genetic algorithm. These experiments underscore the importance of the correct choice of a scaling parameter and mutation rate to ensure the good performance of a genetic algorithm. We also describe a parallel implementation of the genetic algorithm and give some comparisons between the parallel and serial implementations. Both versions of the algorithm are shown to be effective in producing good solutions for problems of this type (with appropriately chosen parameters). INFORMS Journal on Computing, ISSN 1091-9856, was published as ORSA Journ...

...read moreread less

Journal Article•DOI•

A parallel implementation of the tabu search heuristic for vehicle routing problems with time window constraints

[...]

Bruno-Laurent Garcia¹, Jean-Yves Potvin¹, Jean-Marc Rousseau¹•Institutions (1)

Université de Montréal¹

01 Nov 1994-Computers & Operations Research

TL;DR: A parallel Tabu search heuristic for the Vehicle Routing Problem with Time Windows, which is synchronous and runs on a Multiple-Instruction Multiple-Data computer architecture.

...read moreread less

Journal Article•DOI•

A parallel genetic/neural network learning algorithm for MIMD shared memory machines

[...]

Shih-Lin Hung¹, Hojjat Adeli²•Institutions (2)

National Chiao Tung University¹, Ohio State University²

01 Nov 1994-IEEE Transactions on Neural Networks

TL;DR: The superior convergence property of the parallel hybrid neural network learning algorithm presented in this paper is demonstrated.

...read moreread less

Abstract: A new algorithm is presented for training of multilayer feedforward neural networks by integrating a genetic algorithm with an adaptive conjugate gradient neural network learning algorithm. The parallel hybrid learning algorithm has been implemented in C on an MIMD shared memory machine (Cray Y-MP8/864 supercomputer). It has been applied to two different domains, engineering design and image recognition. The performance of the algorithm has been evaluated by applying it to three examples. The superior convergence property of the parallel hybrid neural network learning algorithm presented in this paper is demonstrated. >

...read moreread less

Journal Article•DOI•

The hierarchical hypercube: a new interconnection topology for massively parallel systems

[...]

Qutaibah M. Malluhi¹, Magdy Bayoumi¹•Institutions (1)

University of Louisiana at Lafayette¹

01 Jan 1994-IEEE Transactions on Parallel and Distributed Systems

TL;DR: A wide class of problems, the divide & conquer class (D&Q), is shown to be easily and efficiently solvable on the HHC topology, and parallel algorithms are provided to describe how a D&Q problem can be solved efficiently on an HHC structure.

...read moreread less

Abstract: Interconnection networks play a crucial role in the performance of parallel systems. This paper introduces a new interconnection topology that is called the hierarchical hypercube (HHC). This topology is suitable for massively parallel systems with thousands of processors. An appealing property of this network is the low number of connections per processor, which enhances the VLSI design and fabrication of the system. Other alluring features include symmetry and logarithmic diameter, which imply easy and fast algorithms for communication. Moreover, the HHC is scalable; that is it can embed HHC's of lower dimensions. The paper presents two algorithms for data communication in the HHC. The first algorithm is for one-to-one transfer, and the second is for one-to-all broadcasting. Both algorithms take O(log/sub 2/ k), where k is the total number of processors in the system. A wide class of problems, the divide & conquer class (D&Q), is shown to be easily and efficiently solvable on the HHC topology. Parallel algorithms are provided to describe how a D&Q problem can be solved efficiently on an HHC structure. The solution of a D&Q problem instance having up to k inputs requires a time complexity of O(log/sub 2/ k). >

...read moreread less

Journal Article•DOI•

Passive localization of near-field sources by path following

[...]

David Starer¹, Arye Nehorai¹•Institutions (1)

Yale University¹

01 Mar 1994-IEEE Transactions on Signal Processing

TL;DR: A new algorithm for passively estimating the ranges and bearings of multiple narrow-band sources using a uniform linear sensor array is presented, which reduces the global 2D search over range and bearing to 2(m/spl minus/1) independent 1D searches.

...read moreread less

Abstract: A new algorithm for passively estimating the ranges and bearings of multiple narrow-band sources using a uniform linear sensor array is presented. The algorithm is computationally efficient and converges globally. It minimizes the MUSIC cost function subject to geometrical constraints imposed by the curvature of the received wavefronts. The estimation problem is reduced to one of solving a set of two coupled 2D polynomial equations. The proposed algorithm solves this nonlinear problem using a modification of the path-following (or homotopy) method. For an array having m sensors, the algorithm reduces the global 2D search over range and bearing to 2(m/spl minus/1) independent 1D searches. This imparts a high degree of parallelism that can be exploited to obtain source location estimates very efficiently. >

...read moreread less

Book Chapter•DOI•

Some Reformulations and Applications of the Alternating Direction Method of Multipliers

[...]

Jonathan Eckstein, Masao Fukushima¹•Institutions (1)

Nara Institute of Science and Technology¹

01 Jan 1994

TL;DR: The alternating direction method of multipliers decomposition algorithm for convex programming, as recently generalized by Eckstein and Bert- sekas, is considered, and some reformulations of the algorithm are given, and several alternative means for deriving them are discussed.

...read moreread less

Abstract: We consider the alternating direction method of multipliers decomposition algorithm for convex programming, as recently generalized by Eckstein and Bert- sekas. We give some reformulations of the algorithm, and discuss several alternative means for deriving them. We then apply these reformulations to a number of optimization problems, such as the minimum convex-cost transportation and multicommodity flow. The convex transportation version is closely related to a linear-cost transportation algorithm proposed earlier by Bertsekas and Tsitsiklis. Finally, we construct a simple data-parallel implementation of the convex-cost transportation algorithm for the CM-5 family of parallel computers, and give computational results. The method appears to converge quite quickly on sparse quadratic-cost transportation problems, even if they are very large; for example, we solve problems with over a million arcs in roughly 100 iterations, which equates to about 30 seconds of run time on a system with 256 processing nodes. Substantially better timings can probably be achieved with a more careful implementation.

...read moreread less

Proceedings Article•DOI•

A comparison of parallel algorithms for connected components

[...]

John Greiner¹•Institutions (1)

Carnegie Mellon University¹

01 Aug 1994

TL;DR: Improvements are given for the first two to improve performance significantly, although without improving their asymptotic complexity, and for the hybrid, which combines features of the others and is generally the fastest of those tested.

...read moreread less

Abstract: This paper presents a comparison of the pragmatic aspects of some parallel algorithms for finding connected components, together with optimizations on these algorithms. The algorithms being compared are two similar algorithms by Shiloach-Vishkin [22] and Awerbuch-Shiloach [2], a randomized contraction algorithm based on algorithms by Reif [21] and Phillips [20], and a hybrid algorithm [11]. Improvements are given for the first two to improve performance significantly, although without improving their asymptotic complexity. The hybrid combines features of the others and is generally the fastest of those tested. Timings were made using NESL [4] code as executed on a Connection Machine 2 and Cray Y-MP/C90.

...read moreread less

Journal Article•DOI•

Adaptive eigendecomposition of data covariance matrices based on first-order perturbations

[...]

Benoit Champagne¹•Institutions (1)

Université du Québec¹

01 Oct 1994-IEEE Transactions on Signal Processing

TL;DR: In this paper, new algorithms for adaptive eigendecomposition of time-varying data covariance matrices are presented, based on a first-order perturbation analysis of the rank-one update for covariance matrix estimates with exponential windows.

...read moreread less

Abstract: In this paper, new algorithms for adaptive eigendecomposition of time-varying data covariance matrices are presented. The algorithms are based on a first-order perturbation analysis of the rank-one update for covariance matrix estimates with exponential windows. Different assumptions on the eigenvalue structure lead to three distinct algorithms with varying degrees of complexity. A stabilization technique is presented and both issues of initialization and computational complexity are discussed. Computer simulations indicate that the new algorithms can achieve the same performance as a direct approach in which the exact eigendecomposition of the updated sample covariance matrix is obtained at each iteration. Previous algorithms with similar performance require O(LM/sup 2/) complex operations per iteration, where L and M respectively denote the data vector and signal-subspace dimensions, and involve either some form of Gram-Schmidt orthogonalization or a nonlinear eigenvalue search. The new algorithms have parallel structures, sequential operation counts of order O(LM) or less, and do not involve any of the above steps. One particular algorithm can be used to update the complete signal-subspace eigenstructure in 5LM complex operations. This represents an order of magnitude improvement in computational complexity over existing algorithms with similar performance. Finally, a simplified local convergence analysis of one of the algorithms shows that it is stable and converges in the mean to the true eigendecomposition. The convergence is geometrical and is characterized by a single time constant. >

...read moreread less

Book•

Parallel algorithms for VLSI computer-aided design

[...]

Prithviraj Banerjee¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Jan 1994

TL;DR: This text discusses the design and use of practical parallel algorithms for solving problems in a growing application area whose computational requirements are enormous - VLSI CAD applications.

...read moreread less

Abstract: Parallel computing is becoming an increasingly cost effective and affordable means for providing enormous computing power, and massively parallel (MPP) machines have been relatively easy to build. However, designing good parallel algorithms that can efficiently use the hardware resources to get the maximum performance remains a challenge. This text discusses the design and use of practical parallel algorithms for solving problems in a growing application area whose computational requirements are enormous - VLSI CAD applications. It also examines practical parallel algorithms (written in C and pseudo-C) for all forms of parallel programming - shared memory, MIMD, message passing distributed MIMD and SIMD, for a variety of interesting applications, with experimental results.

...read moreread less

Journal Article•DOI•

On topology preservation in 3D thinning

[...]

Cherng Min Ma¹•Institutions (1)

Queens College¹

01 May 1994-Cvgip: Image Understanding

TL;DR: In this paper, sufficient conditions for 3D parallel thinning algorithms to preserve topology were established for 2D and 3D binary images, and a 2D-parallel thinning algorithm can be proved to be topology preserving by checking a small number of configurations.

...read moreread less

Abstract: Topology preservation is a major concern of parallel thinning algorithms for 2D and 3D binary images. To prove that a parallel thinning algorithm preserves topology, one must show that it preserves topology for all possible images. But it would be difficult to check all images, since there are too many possible images. Efficient sufficient conditions which can simplify such proofs for the 2D case were proposed by Ronse [Discrete Appl. Math. 21, 1988, 69-79]. By Ronse′s results, a 2D parallel thinning algorithm can be proved to be topology preserving by checking a rather small number of configurations. This paper establishes sufficient conditions for 3D parallel thinning algorithms to preserve topology.

...read moreread less

Journal Article•DOI•

Fast Parallel Tree Codes for Gravitational and Fluid Dynamical N-Body Problems

[...]

John K. Salmon¹, Michael S. Warren²•Institutions (2)

California Institute of Technology¹, Los Alamos National Laboratory²

01 Jun 1994

TL;DR: In this paper, the authors discuss two physical systems from separate disciplines that make use of the same algorithmic and mathematical structures to reduce the number of operations necessary to complete a realistic simulation.

...read moreread less

Abstract: : We discuss two physical systems from separate disciplines that make use of the same algorithmic and mathematical structures to reduce the number of operations necessary to complete a realistic simulation. In the gravitational N- body problem, the acceleration of an object is given by the familiar Newtonian laws of motion and gravitation. The computational load is reduced by treating groups of bodies as single multipole sources rather than individual bodies. In the simulation of incompressible flows, the flow may be modeled by the dynamics of a set of N interacting vortices. Vortices are vector objects in three dimensions, but their interactions are mathematically similar to that of gravitating masses. The multipole approximation can be used to greatly reduce the time needed to compute the interactions between vortices. Both types of simulations were carried out on the Intel Touchstone Delta, a parallel MIMD computer with 512 processors. Timings are reported for systems of up to 10 million bodies, and demonstrate that the implementation scales well on massively parallel systems. The majority of the code is common between the two applications, which differ only in certain physics modules. In particular, the code for parallel tree construction and traversal is shared.

...read moreread less

Collapse