Showing papers on "Parallel algorithm published in 2001"

PDF

Open Access

Journal Article•DOI•

Component averaging: An efficient iterative parallel algorithm for large and sparse unstructured problems

[...]

Yair Censor¹, Dan Gordon¹, Rachel Gordon²•Institutions (2)

University of Haifa¹, Technion – Israel Institute of Technology²

01 May 2001

TL;DR: In this article, Component averaging (CAV) is introduced as a new iterative parallel technique suitable for large and sparse unstructured systems of linear equations, which simultaneously projects the current iterate onto all the system's hyperplanes, and is thus inherently parallel.

...read moreread less

Abstract: Component averaging (CAV) is introduced as a new iterative parallel technique suitable for large and sparse unstructured systems of linear equations. It simultaneously projects the current iterate onto all the system's hyperplanes, and is thus inherently parallel. However, instead of orthogonal projections and scalar weights (as used, for example, in Cimmino's method), it uses oblique projections and diagonal weighting matrices, with weights related to the sparsity of the system matrix. These features provide for a practical convergence rate which approaches that of algebraic reconstruction technique (ART) (Kaczmarz's row-action algorithm) – even on a single processor. Furthermore, the new algorithm also converges in the inconsistent case. A proof of convergence is provided for unit relaxation, and the fast convergence is demonstrated on image reconstruction problems of the Herman head phantom obtained within the SNARK93 image reconstruction software package. Both reconstructed images and convergence plots are presented. The practical consequences of the new technique are far reaching for real-world problems in which iterative algorithms are used for solving large, sparse, unstructured and often inconsistent systems of linear equations.

...read moreread less

233 citations

Journal Article•DOI•

A parallel FDTD algorithm using the MPI library

[...]

Christophe Guiffaut, K. Mahdjoubi¹•Institutions (1)

University of Rennes¹

01 Apr 2001-IEEE Antennas and Propagation Magazine

TL;DR: This paper describes the essential elements of a parallel algorithm for the FDTD method using the MPI (message passing interface) library, and uses a new method that makes it unnecessary to split the field components.

...read moreread less

Abstract: In this paper, we describe the essential elements of a parallel algorithm for the FDTD method using the MPI (message passing interface) library. To simplify and accelerate the algorithm, an MPI Cartesian 2D topology is used. The inter-process communications are optimized by the use of derived data types. A general approach is also explained for parallelizing the auxiliary tools, such as far-field computation, thin-wire treatment, etc. For PMLs, we have used a new method that makes it unnecessary to split the field components. This considerably simplifies the computer programming, and is compatible with the parallel algorithm.

...read moreread less

224 citations

Proceedings Article•DOI•

Sequential and parallel algorithms for mixed packing and covering

[...]

Neal E. Young

14 Oct 2001

TL;DR: The main contribution is that the algorithms solve mixed packing and covering problems (in contrast to pure packing or pure covering problems, which have only "/spl les/" or only "/ spl ges/" inequalities) and run in time independent of the so-called width of the problem.

...read moreread less

Abstract: We describe sequential and parallel algorithms that approximately solve linear programs with no negative coefficients (aka mixed packing and covering problems). For explicitly given problems, our fastest sequential algorithm returns a solution satisfying all constraints within a 1/spl plusmn//spl epsi/ factor in O(mdlog(m)//spl epsi//sup 2/) time, where m is the number of constraints and d is the maximum number of constraints any variable appears in. Our parallel algorithm runs in time polylogarithmic in the input size times /spl epsi//sup -4/ and uses a total number of operations comparable to the sequential algorithm. The main contribution is that the algorithms solve mixed packing and covering problems (in contrast to pure packing or pure covering problems, which have only "/spl les/" or only "/spl ges/" inequalities, but not both) and run in time independent of the so-called width of the problem.

...read moreread less

202 citations

Proceedings Article•DOI•

Parallel quantum-inspired genetic algorithm for combinatorial optimization problem

[...]

Kuk-Hyun Han¹, Kui-Hong Park¹, Ci-Ho Lee¹, Jong-Hwan Kim¹•Institutions (1)

KAIST¹

27 May 2001

TL;DR: Results show that PQGA is superior to QGA as well as other conventional genetic algorithms, and is able to possess the two characteristics of exploration and exploitation simultaneously.

...read moreread less

Abstract: This paper proposes a new parallel evolutionary algorithm called parallel quantum-inspired genetic algorithm (PQGA). Quantum-inspired genetic algorithm (QGA) is based on the concept and principles of quantum computing such as qubits and superposition of states. Instead of binary, numeric, or symbolic representation, by adopting the qubit chromosome as a representation, QGA can represent a linear superposition of solutions due to its probabilistic representation. QGA is suitable for parallel structures because of rapid convergence and good global search capability. That is, QGA is able to possess the two characteristics of exploration and exploitation simultaneously. The effectiveness and the applicability of PQGA are demonstrated by experimental results on the knapsack problem, which is a well-known combinatorial optimization problem. The results show that PQGA is superior to QGA as well as other conventional genetic algorithms.

...read moreread less

190 citations

Journal Article•DOI•

Parallel data mining for association rules on shared memory systems

[...]

Srinivasan Parthasarathy¹, Mohammed J. Zaki², Mitsunori Ogihara³, Wei Li⁴•Institutions (4)

Ohio State University¹, Rensselaer Polytechnic Institute², University of Rochester³, Intel⁴

01 Feb 2001-Knowledge and Information Systems

TL;DR: A new parallel algorithm for data mining of association rules on shared-memory multiprocessors is presented, the degree of parallelism, synchronization, and data locality issues are studied, and proposed optimizations for fast frequency computation are presented.

...read moreread less

Abstract: In this paper we present a new parallel algorithm for data mining of association rules on shared-memory multiprocessors. We study the degree of parallelism, synchronization, and data locality issues, and present optimizations for fast frequency computation. Experiments show that a significant improvement of performance is achieved using our proposed optimizations. We also achieved good speed-up for the parallel algorithm.

...read moreread less

168 citations

Proceedings Article•DOI•

Fast parallel association rule mining without candidacy generation

[...]

Osmar R. Zaïane, Mohammad El-Hajj, Paul Lu¹•Institutions (1)

University of Alberta¹

29 Nov 2001

TL;DR: A new parallel algorithm MLFPT (multiple local frequent pattern tree) for parallel mining of frequent patterns, based on FP-growth mining, that uses only two full I/O scans of the database, eliminating the need for generating candidate items, and distributing the work fairly among processors.

...read moreread less

Abstract: In this paper we introduce a new parallel algorithm MLFPT (multiple local frequent pattern tree) for parallel mining of frequent patterns, based on FP-growth mining, that uses only two full I/O scans of the database, eliminating the need for generating candidate items, and distributing the work fairly among processors. We have devised partitioning strategies at different stages of the mining process to achieve near optimal balancing between processors. We have successfully tested our algorithm on datasets larger than 50 million transactions.

...read moreread less

166 citations

Journal Article•DOI•

Bilevel Parallel Genetic Algorithms for Optimization of Large Steel Structures

[...]

Kamal C. Sarma, Hojjat Adeli¹•Institutions (1)

Ohio State University¹

01 Sep 2001-Computer-aided Civil and Infrastructure Engineering

TL;DR: This article is concerned with optimization of very large steel structures subjected to the actual constraints of the American Institute of Steel Construction ASD and LRFD specifications on high‐performance multiprocessor machines using biologically inspired genetic algorithms.

...read moreread less

Abstract: This article is concerned with optimization of very large steel structures subjected to the actual constraints of the American Institute of Steel Construction ASD and LRFD specifications on high-performance multiprocessor machines using biologically inspired genetic algorithms. First, parallel fuzzy genetic algorithms (GAs) are presented for optimization of steel structures using a distributed memory Message Passing Interface (MPI) with two different schemes: the processor farming scheme and the migration scheme. Next, two bilevel parallel GAs are presented for large-scale structural optimization through judicious combination of shared memory data parallel processing using the OpenMP Application Programming Interface (API) and distributed memory message passing parallel processing using MPI. Speedup results are presented for parallel algorithms.

...read moreread less

163 citations

Journal Article•DOI•

m-best S-D assignment algorithm with application to multitarget tracking

[...]

R.L. Popp, Krishna R. Pattipati¹, Yaakov Bar-Shalom¹•Institutions (1)

University of Connecticut¹

01 Jan 2001-IEEE Transactions on Aerospace and Electronic Systems

TL;DR: In this article, the authors proposed a data association algorithm, termed m-best S-D, that can determine in O(mSkn/sup 3/) time (m assignments, S/spl ges/3 lists of size n, k relaxations) the (approximately) m- best solutions to an S -D assignment problem.

...read moreread less

Abstract: In this paper we describe a novel data association algorithm, termed m-best S-D, that determines in O(mSkn/sup 3/) time (m assignments, S/spl ges/3 lists of size n, k relaxations) the (approximately) m-best solutions to an S-D assignment problem. The m-best S-D algorithm is applicable to tracking problems where either the sensors are synchronized or the sensors and/or the targets are very slow moving. The significance of this work is that the m-best S-D assignment algorithm (in a sliding window mode) can provide for an efficient implementation of a suboptimal multiple hypothesis tracking (MHT) algorithm by obviating the need for a brute force enumeration of an exponential number of joint hypotheses. We first describe the general problem for which the m-best S-D applies. Specifically, given line of sight (LOS) (i.e., incomplete position) measurements from S sensors, sets of complete position measurements are extracted, namely, the 1st, 2nd, ..., mth best (in terms of likelihood) sets of composite measurements are determined by solving a static S-D assignment problem. Utilizing the joint likelihood functions used to determine the m best S-D assignment solutions, the composite measurements are then quantified with a probability of being correct using a JPDA-like (joint probabilistic data association) technique. Lists of composite measurements from successive scans, along with their corresponding probabilities, are used in turn with a state estimator in a dynamic 2-D assignment algorithm to estimate the states of moving targets over time. The dynamic assignment cost coefficients are based on a likelihood function that incorporates the "true" composite measurement probabilities obtained from the (static) m-best S-D assignment solutions. We demonstrate the merits of the m-best S-D algorithm by applying it to a simulated multitarget passive sensor track formation and maintenance problem, consisting of multiple time samples of LOS measurements originating from multiple (S=7) synchronized high frequency direction finding sensors.

...read moreread less

161 citations

Proceedings Article•DOI•

Parallel state space construction for model-checking

[...]

Hubert Garavel¹, Radu Mateescu¹, Irina Smarandache¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

02 May 2001

TL;DR: This paper presents parallel algorithms for constructing state spaces (or Labeled Transition Systems) on a network or a cluster of workstations by using parallelization techniques and shows close to ideal speedups and a good load balancing between network nodes.

...read moreread less

Abstract: The verification of concurrent finite-state systems by model-checking often requires to generate (a large part of) the state space of the system under analysis. Because of the state explosion problem, this may be a resource-consuming operation, both in terms of memory and CPU time. In this paper, we aim at improving the performances of state space construction by using parallelization techniques. We present parallel algorithms for constructing state spaces (or Labeled Transition Systems) on a network or a cluster of workstations. Each node in the network builds a part of the state space, all parts being merged to form the whole state space upon termination of the parallel computation. These algorithms have been implemented with the CADP verification tool set and experimented on various concurrent applications specified in LOTOS. The results obtained show close to ideal speedups and a good load balancing between network nodes.

...read moreread less

150 citations

Book Chapter•DOI•

Implementation of RSA Algorithm Based on RNS Montgomery Multiplication

[...]

Hanae Nozaki¹, Masahiko Motoyama¹, Atsushi Shimbo¹, Shinichi Kawamura¹•Institutions (1)

Toshiba¹

14 May 2001

TL;DR: An implementation of RSA cryptosystem using the RNS Montgomery multiplication is described, and an implementation method using the Chinese Remainder Theorem (CRT) is presented.

...read moreread less

Abstract: We proposed a fast parallel algorithm of Montgomery multiplication based on Residue Number Systems (RNS). An implementation of RSA cryptosystem using the RNS Montgomery multiplication is described in this paper. We discuss how to choose the base size of RNS and the number of parallel processing units. An implementation method using the Chinese Remainder Theorem (CRT) is also presented. An LSI prototype adopting the proposed Cox-Rower Architecture achieves 1024- bit RSA transactions in 4.2 msec without CRT and 2.4 msec with CRT, when the operating frequency is 80 MHz and the total number of logic gates is 333 KG for 11 parallel processing units.

...read moreread less

128 citations

Journal Article•DOI•

Local and Parallel Finite Element Algorithms Based on Two-Grid Discretizations for Nonlinear Problems

[...]

Jinchao Xu¹, Aihui Zhou²•Institutions (2)

Pennsylvania State University¹, Chinese Academy of Sciences²

01 May 2001-Advances in Computational Mathematics

TL;DR: Some local and parallel discretizations and adaptive finite element algorithms are proposed and analyzed for nonlinear elliptic boundary value problems in both two and three dimensions for finite element solutions on general shape-regular grids.

...read moreread less

Abstract: In this paper, some local and parallel discretizations and adaptive finite element algorithms are proposed and analyzed for nonlinear elliptic boundary value problems in both two and three dimensions. The main technique is to use a standard finite element discretization on a coarse grid to approximate low frequencies and then to apply some linearized discretization on a fine grid to correct the resulted residual (which contains mostly high frequencies) by some local/parallel procedures. The theoretical tools for analyzing these methods are some local a priori and a posteriori error estimates for finite element solutions on general shape-regular grids that are also obtained in this paper.

...read moreread less

Journal Article•DOI•

Linear-scaling density-functional-theory calculations of electronic structure based on real-space grids: design, analysis, and scalability test of parallel algorithms

[...]

Fuyuki Shimojo¹, Rajiv K. Kalia¹, Aiichiro Nakano¹, Priya Vashishta¹•Institutions (1)

Louisiana State University¹

01 Nov 2001-Computer Physics Communications

TL;DR: Scalability tests of these algorithms for density-functional-theory based electronic-structure calculations show that the linear-scaling DFT algorithm is highly scalable.

...read moreread less

Journal Article•DOI•

Processor Allocation and Checkpoint Interval Selection in Cluster Computing Systems

[...]

James S. Plank¹, Michael G. Thomason¹•Institutions (1)

University of Tennessee¹

01 Nov 2001-Journal of Parallel and Distributed Computing

TL;DR: This paper presents a performance model for long-running parallel computations that execute with checkpointing enabled, discusses how it is relevant to today's parallel computing environments and software, and presents case studies of using the model to select runtime parameters.

...read moreread less

Journal Article•DOI•

An asynchronous parallel metaheuristic for the period vehicle routing problem

[...]

Lúcia Maria de A. Drummond¹, Luiz Satoru Ochi¹, Dalessandro Soares Vianna¹•Institutions (1)

Federal Fluminense University¹

01 Jan 2001-Future Generation Computer Systems

TL;DR: The algorithm proposed is based on concepts used in parallel genetic algorithms and local search heuristics and employs the Island model in which the migration frequency must not be very high.

...read moreread less

Journal Article•DOI•

Parallel Sequence Mining on Shared-Memory Machines

[...]

Mohammed J. Zaki¹•Institutions (1)

Rensselaer Polytechnic Institute¹

01 Mar 2001-Journal of Parallel and Distributed Computing

TL;DR: pSPADE as mentioned in this paper decomposes the original search space into smaller suffix-based classes, which can be solved in main memory using efficient search techniques and simple join operations with no synchronization.

...read moreread less

Journal Article•DOI•

The massively parallel genetic algorithm for RNA folding: MIMD implementation and population variation

[...]

Bruce A. Shapiro¹, Jin Chu Wu¹, David Bengali¹, Mark J. Potts•Institutions (1)

National Institutes of Health¹

01 Feb 2001-Bioinformatics

TL;DR: The GA, an evolution-like algorithm that is applied to a large population of RNA structures based on a pool of helical stems derived from an RNA sequence, evolves this population in parallel.

...read moreread less

Abstract: A massively parallel Genetic Algorithm (GA) has been applied to RNA sequence folding on three different computer architectures. The GA, an evolution-like algorithm that is applied to a large population of RNA structures based on a pool of helical stems derived from an RNA sequence, evolves this population in parallel. The algorithm was originally designed and developed for a 16384 processor SIMD (Single Instruction Multiple Data) MasPar MP-2. More recently it has been adapted to a 64 processor MIMD (Multiple Instruction Multiple Data) SGI ORIGIN 2000, and a 512 processor MIMD CRAY T3E. The MIMD version of the algorithm raises issues concerning RNA structure data-layout and processor communication. In addition, the effects of population variation on the predicted results are discussed. Also presented are the scaling properties of the algorithm from the perspective of the number of physical processors utilized and the number of virtual processors (RNA structures) operated upon.

...read moreread less

Journal Article•DOI•

Cloning parallel simulations

[...]

Maria Hybinette¹, Richard M. Fujimoto²•Institutions (2)

University of Georgia¹, Georgia Institute of Technology²

01 Oct 2001-ACM Transactions on Modeling and Computer Simulation

TL;DR: The performance results with a commercial air traffic control simulation demonstrate that cloning can significantly reduce the time required to compute multiple alternate futures.

...read moreread less

Abstract: We present a cloning mechanism that enables the evaluation of multiple simulated futures. Performance of the mechanism is analyzed and evaluated experimentally on a shared memory multiprocessor. A running parallel discrete event simulation is dynamically cloned at decision points to explore different execution paths concurrently. In this way, what-if and alternative scenario analysis can be performed in applications such as gaming or tactical and strategic battle management. A construct called virtual logical processes avoids repeating common computations among clones and improves efficiency. The advantages of cloning are preserved regardless of the number of clones (or execution paths). Our performance results with a commercial air traffic control simulation demonstrate that cloning can significantly reduce the time required to compute multiple alternate futures.

...read moreread less

Journal Article•DOI•

Node-ejection chains for the vehicle routing problem: sequential and parallel algorithms

[...]

César Rego¹•Institutions (1)

University of Mississippi¹

01 Feb 2001

TL;DR: Numerical tests indicate that the sequential version of the Tabu search algorithm is highly competitive with the best existing heuristics and that the parallel algorithm outperforms all of these algorithms.

...read moreread less

Abstract: We present a Tabu search algorithm for the vehicle routing problem under capacity and distance restrictions. The neighborhood search is based on compound moves generated by a node-ejection chain process. During the course of the algorithm, two types of neighborhood structures are used and crossing infeasible solutions is allowed. Then, a parallel version of the algorithm which exploits the moves’ characteristics is described. Parallel processing is used to explore the solution space more extensively and to accelerate the search process. Tests are carried out on a SUNSparc workstation and the parallel algorithm uses a network of four of these machines. Numerical tests indicate that the sequential version of the algorithm is highly competitive with the best existing heuristics and that the parallel algorithm outperforms all of these algorithms.

...read moreread less

Journal Article•DOI•

Unsteady Flow Calculations with a Parallel Multiblock Moving Mesh Algorithm

[...]

Her Mann Tsai, A. S. F. Wong, Jinsheng Cai, Y. Zhu, Feng Liu - Show less +1 more

01 Jan 2001-AIAA Journal

TL;DR: A multiblock parallel Euler/Navier-Stokes solver using multigrid and dual-time stepping is developed along with the moving mesh method for unsteady flow calculations of airfoils and wings with deforming shapes as found in flutter simulations.

...read moreread less

Abstract: A novel parallel dynamic moving mesh algorithm designed for multiblock parallel unsteady flow calculations using body-fitted grids is presented. The moving grid algorithm within each block uses a method of arc-length-based transfinite interpolation, which is performed independently on local processors where the blocks reside. A spring network approach is used to determine the motion of the corner points of the blocks, which may be connected in an unstructured fashion in a general multiblock method. A smoothing operator is applied to the points of the block face boundaries and edges to maintain grid smoothness and grid angles. A multiblock parallel Euler/Navier-Stokes solver using multigrid and dual-time stepping is developed along with the moving mesh method. Computational results are presented for the unsteady flow calculations of airfoils and wings with deforming shapes as found in flutter simulations

...read moreread less

Journal Article•DOI•

BICAV: a block-iterative parallel algorithm for sparse systems with pixel-related weighting

[...]

Yair Censor¹, Dan Gordon¹, Rachel Gordon²•Institutions (2)

University of Haifa¹, Technion – Israel Institute of Technology²

01 Oct 2001-IEEE Transactions on Medical Imaging

TL;DR: When BICAV is optimized for block size and relaxation parameters, its very first iterates are far superior to those of CAV, and more or less on a par with ART.

...read moreread less

Abstract: Component averaging (CAV) was recently introduced by Censor, Gordon, and Gordon as a new iterative parallel technique suitable for large and sparse unstructured systems of linear equations. Based on earlier work of Byrne and Censor, it uses diagonal weighting matrices, with pixel-related weights determined by the sparsity of the system matrix. CAV is inherently parallel (similar to the very slowly converging Cimmino method) but its practical convergence on problems of image reconstruction from projections is similar to that of the algebraic reconstruction technique (ART). Parallel techniques are becoming more important for practical image reconstruction since they are relevant not only for supercomputers but also for the increasingly prevalent multiprocessor workstations. This paper reports on experimental results with a block-iterative version of component averaging (BICAV). When BICAV is optimized for block size and relaxation parameters, its very first iterates are far superior to those of CAV, and more or less on a par with ART. Similar to CAV, BICAV is also inherently parallel. The fast convergence is demonstrated on problems of image reconstruction from projections, using the SNARK93 image reconstruction software package. Detailed plots of various measures of convergence, and reconstructed images are presented.

...read moreread less

Journal Article•DOI•

Using three-dimensional microfluidic networks for solving computationally hard problems

[...]

Daniel T. Chiu¹, Elena Pezzoli, Hongkai Wu, Abraham D. Stroock, George M. Whitesides - Show less +1 more•Institutions (1)

Harvard University¹

13 Mar 2001-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: The design of a parallel algorithm that uses moving fluids in a three-dimensional microfluidic system to solve a nondeterministically polynomial complete problem (the maximal clique problem) inPolynomial time is described.

...read moreread less

Abstract: This paper describes the design of a parallel algorithm that uses moving fluids in a three-dimensional microfluidic system to solve a nondeterministically polynomial complete problem (the maximal clique problem) in polynomial time. This algorithm relies on (i) parallel fabrication of the microfluidic system, (ii) parallel searching of all potential solutions by using fluid flow, and (iii) parallel optical readout of all solutions. This algorithm was implemented to solve the maximal clique problem for a simple graph with six vertices. The successful implementation of this algorithm to compute solutions for small-size graphs with fluids in microchannels is not useful, per se, but does suggest broader application for microfluidics in computation and control.

...read moreread less

Journal Article•DOI•

Lifting factorization-based discrete wavelet transform architecture design

[...]

W. Jiang¹, Antonio Ortega¹•Institutions (1)

University of Southern California¹

01 May 2001-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: Two new system architectures, overlap-state sequential and split-and-merge parallel, are proposed based on a novel boundary postprocessing technique for the computation of the discrete wavelet transform (DWT) to introduce multilevel partial computations for samples near data boundaries.

...read moreread less

Abstract: In this paper, two new system architectures, overlap-state sequential and split-and-merge parallel, are proposed based on a novel boundary postprocessing technique for the computation of the discrete wavelet transform (DWT). The basic idea is to introduce multilevel partial computations for samples near data boundaries based on a finite state machine model of the DWT derived from the lifting scheme. The key observation is that these partially computed (lifted) results can also be stored back to their original locations and the transform can be continued anytime later as long as these partial computed results are preserved. It is shown that such an extension of the in-place calculation feature of the original lifting algorithm greatly helps to reduce the extra buffer and communication overheads, in sequential and parallel system implementations, respectively. Performance analysis and experimental results show that, for the Daubechies (see J.Fourier Anal. Appl., vol.4, no.3, p.247-69, 1998) (9,7) wavelet filters, using the proposed boundary postprocessing technique, the minimal required buffer size in the line-based sequential DWT algorithm is 40% less than the best available approach. In the parallel DWT algorithm we show 30% faster performance than existing approaches.

...read moreread less

Proceedings Article•DOI•

Iceberg-cube computation with PC clusters

[...]

Raymond T. Ng¹, Alan Wagner¹, Yu Yin¹•Institutions (1)

University of British Columbia¹

01 May 2001

TL;DR: This paper investigates the approach of using low cost PC cluster to parallelize the computation of iceberg-cube queries and recommends a “recipe” which uses PT as the default algorithm, but may also deploy ASL under specific circumstances.

...read moreread less

Abstract: In this paper, we investigate the approach of using low cost PC cluster to parallelize the computation of iceberg-cube queries. We concentrate on techniques directed towards online querying of large, high-dimensional datasets where it is assumed that the total cube has net been precomputed. The algorithmic space we explore considers trade-offs between parallelism, computation and I/0. Our main contribution is the development and a comprehensive evaluation of various novel, parallel algorithms. Specifically: (1) Algorithm RP is a straightforward parallel version of BUC [BR99]; (2) Algorithm BPP attempts to reduce I/0 by outputting results in a more efficient way; (3) Algorithm ASL, which maintains cells in a cuboid in a skiplist, is designed to put the utmost priority on load balancing; and (4) alternatively, Algorithm PT load-balances by using binary partitioning to divide the cube lattice as evenly as possible.We present a thorough performance evaluation on all these algorithms on a variety of parameters, including the dimensionality of the cube, the sparseness of the cube, the selectivity of the constraints, the number of processors, and the size of the dataset. A key finding is that it is not a one-algorithm-fit-all situation. We recommend a “recipe” which uses PT as the default algorithm, but may also deploy ASL under specific circumstances.

...read moreread less

Journal Article•DOI•

Concurrent threads and optimal parallel minimum spanning trees algorithm

[...]

Ka Wong Chong¹, Yijie Han², Tak-Wah Lam¹•Institutions (2)

University of Hong Kong¹, University of Missouri–Kansas City²

01 Mar 2001-Journal of the ACM

TL;DR: This paper resolves a long-standing open problem on whether the concurrent write capability of parallel random access machine (PRAM) is essential for solving fundamental graph problems like connected components and minimum spanning trees in logarithmic time.

...read moreread less

Abstract: This paper resolves a long-standing open problem on whether the concurrent write capability of parallel random access machine (PRAM) is essential for solving fundamental graph problems like connected components and minimum spanning trees in O(logn) time. Specifically, we present a new algorithm to solve these problems in O(logn) time using a linear number of processors on the exclusive-read exclusive-write PRAM. The logarithmic time bound is actually optimal since it is well known that even computing the “OR” of nbit requires O(log n time on the exclusive-write PRAM. The efficiency achieved by the new algorithm is based on a new schedule which can exploit a high degree of parallelism.

...read moreread less

Book Chapter•DOI•

A family of high-performance matrix multiplication algorithms

[...]

John A. Gunnels¹, Greg Henry², Robert A. van de Geijn¹•Institutions (2)

University of Texas at Austin¹, Intel²

28 May 2001

TL;DR: Using a simple model of hierarchical memories, mathematics is employed to determine a locally-optimal strategy for blocking matrices and the resulting family of algorithms yields performance that is superior to that of methods that automatically tune such kernels.

...read moreread less

Abstract: During the last half-decade, a number of research efforts have centered around developing software for generating automatically tuned matrix multiplication kernels. These include the PHiPAC project and the ATLAS project. The software end-products of both projects employ brute force to search a parameter space for blockings that accommodate multiple levels of memory hierarchy. We take a different approach: using a simple model of hierarchical memories we employ mathematics to determine a locally-optimal strategy for blocking matrices. The theoretical results show that, depending on the shape of the matrices involved, different strategies are locally-optimal. Rather than determining a blocking strategy at library generation time, the theoretical results show that, ideally, one should pursue a heuristic that allows the blocking strategy to be determined dynamically at run-time as a function of the shapes of the operands. When the resulting family of algorithms is combined with a highly optimized inner-kernel for a small matrix multiplication, the approach yields performance that is superior to that of methods that automatically tune such kernels. Preliminary results, for the Intel Pentium (R) III processor, support the theoretical insights.

...read moreread less

Journal Article•DOI•

A 3D MHD model of astrophysical flows: Algorithms, tests and parallelisation

[...]

S. E. Caunt¹, Maarit J. Korpi¹•Institutions (1)

University of Oulu¹

01 Apr 2001-Astronomy and Astrophysics

TL;DR: A numerical method designed for modelling dierent kinds of astrophysical flows in three dimensions employing the local shearing- box technique and uses parallel algorithms to increase the performance of standard serial methods.

...read moreread less

Abstract: In this paper we describe a numerical method designed for modelling dierent kinds of astrophysical flows in three dimensions. Our method is a standard explicit nite dierence method employing the local shearing- box technique. To model the features of astrophysical systems, which are usually compressible, magnetised and turbulent, it is desirable to have high spatial resolution and large domain size to model as many features as possible, on various scales, within a particular system. In addition, the time-scales involved are usually wide- ranging also requiring signicant amounts of CPU time. These two limits (resolution and time-scales) enforce huge limits on computational capabilities. The model we have developed therefore uses parallel algorithms to increase the performance of standard serial methods. The aim of this paper is to report the numerical methods we use and the techniques invoked for parallelising the code. The justication of these methods is given by the extensive tests presented herein.

...read moreread less

Journal Article•DOI•

An algorithm for the asynchronous Write-All problem based on process collision

[...]

Jan Friso Groote¹, Wim H. Hesselink², Sjouke Mauw¹, Rogier Vermeulen¹•Institutions (2)

Eindhoven University of Technology¹, University of Groningen²

01 Apr 2001-Distributed Computing

TL;DR: This work presents and analyzes an asynchronous algorithm that is a generalization of the naive two-processor algorithm where the two processes each start at one side of the array and walk towards each other until they collide.

...read moreread less

Abstract: The problem of using P processes to write a given value to all positions of a shared array of size N is called the Write-All problem. We present and analyze an asynchronous algorithm with work complexity O(NċPlog(x+1)/x)), where x N1/log(P) (assuming N = xk and P = 2k). Our algorithm is a generalization of the naive two-processor algorithm where the two processes each start at one side of the array and walk towards each other until they collide.

...read moreread less

Journal Article•DOI•

A meshfree contact-detection algorithm

[...]

Shaofan Li¹, Dong Qian¹, Wing Kam Liu¹, Ted Belytschko¹•Institutions (1)

Northwestern University¹

02 Mar 2001-Computer Methods in Applied Mechanics and Engineering

TL;DR: Numerical results obtained with the meshfree contact algorithm show that this new contact algorithm can accurately predict the contact as well as separation of projectile and target.

...read moreread less

Journal Article•DOI•

Structured neural networks for constrained model predictive control

[...]

Li-Xin Wang¹, Feng Wan¹•Institutions (1)

Hong Kong University of Science and Technology¹

01 Aug 2001-Automatica

TL;DR: A structured neural network implementing the gradient projection algorithm is developed to solve the quadratic programming problem in constrained model predictive control in a massively parallel fashion with guaranteed convergence to optimal solution.

...read moreread less

Journal Article•DOI•

Efficient Parallel Algorithms for Parabolic Problems

[...]

Qiang Du, Mo Mu, Z. N. Wu

01 May 2001-SIAM Journal on Numerical Analysis

TL;DR: Domain decomposition algorithms for parallel numerical solution of parabolic equations are studied for steady state or slow unsteady computation, showing that the resulting schemes are of second order global accuracy in space, and stable in the sense of Osher or in $L_{\infty }$.

...read moreread less

Abstract: Domain decomposition algorithms for parallel numerical solution of parabolic equations are studied for steady state or slow unsteady computation. Implicit schemes are used in order to march with large time steps. Parallelization is realized by approximating interface values using explicit computation. Various techniques are examined, including a multistep second order explicit scheme and a one-step high-order scheme. We show that the resulting schemes are of second order global accuracy in space, and stable in the sense of Osher or in $L_{\infty }$. They are optimized with respect to the parallel efficiency.

...read moreread less

Collapse