Showing papers on "Parallel algorithm published in 2000"

PDF

Open Access

Journal Article•DOI•

The GRID: Blueprint for a New Computing Infrastructure

[...]

01 Jan 2000-Scalable Computing: Practice and Experience

TL;DR: The main purpose is to update the designers and users of parallel numerical algorithms with the latest research in the field and present the novel ideas, results and work in progress and advancing state-of-the-art techniques in the area of parallel and distributed computing for numerical and computational optimization problems in scientific and engineering application.

...read moreread less

Abstract: Edited by Tianruo Yang Kluwer Academic Publisher, Dordrech, Netherlands, 1999, 248 pp. ISBN 0-7923-8588-8, $135.00 This book contains a selection of contributed and invited papers presented and the workshop Frontiers of Parallel Numerical Computations and Applications, in the IEEE 7th Symposium on the Frontiers on Massively Parallel Computers (Frontiers '99) at Annapolis, Maryland, February 20-25, 1999. Its main purpose is to update the designers and users of parallel numerical algorithms with the latest research in the field. A broad spectrum of topics on parallel numerical computations, with applications to some of the more challenging engineering problems, is covered. Parallel algorithm designers and engineers who use extensively parallel numerical computations, as well as graduate students in Computer Science, Scientific Computing, various engineering fields and applied mathematics should benefit from reading it. The first part is addressed to a larger audience and presents papers on parallel numerical algorithms. Two new libraries are presented: PSPASSES and PoLAPACK. PSPASSES is a collection of parallel direct solvers, for sparse symmetric positive definite linear systems, which are characterized by high performance and good scalability. PoLAPACK library contains LU and QR codes based on a new blocking strategy that guarantees good performance regardless of the physical block size. Next, an efficient approach to solving stiff ordinary differential equations by diagonal implicitly iterated Runge-Kutta (DIIRK) method is described. DIIRK renders a fast parallel implementation due to a reduced number of function evaluation and an automatic stepsize control mechanism. Finally, minimization of sufficiently smooth non-linear functionals is sought via parallel space decomposition. Here, a theoretical background of the problem and two equivalent algorithms are presented. New research directions for classical solvers are treated in the next three papers: first, reduction of the global synchronization in the biconjugate gradient method, second, a new more efficient Jacobi ordering for the multiple-port hypercubes, and finally, an analysis of the theoretical performance of an improved version of the Quasi-minimal residual method. Parallel numerical applications constitute the second part of the book, with results from fluid mechanics, material sciences, applications to signal and image processing, dynamic systems, semiconductor technology and electronic circuits and systems design. With one exception, the authors expose in detail parallel implementations of the algorithms and numerical results. First, a 3D-elasticity problem is solved using an additive overlapping domain decomposition algorithm. Second, an overlapping mesh technique is used in a parallel solver for the compressible flow problem. Then, a parallel version of a complex numerical algorithm to solve a lubrication problem studied in tribology is introduced. Next, a timid approach to parallel computing of the cavity flow by the finite element method is presented. The problem solved is rather small for today's needs and only up to 6 processors are used. This is also the only paper that does not present results from numerical experiments. The remaining applications discussed in the subsequent chapters are: large scale multidisciplinary design optimization problem with application to the design of a supersonic commercial aircraft, a report on progress in parallel solution of an electromagnetic scattering problem using boundary integral methods and an optimal solution to the convection-diffusion equation modeling the concentration of a pollutant in the air. The book is of definite interest to readers who keep up-to-date with the parallel numerical computation research. The main purpose, to present the novel ideas, results and work in progress and advancing state-of-the-art techniques in the area of parallel and distributed computing for numerical and computational optimization problems in scientific and engineering application is clearly achieved. However, due to its content it cannot serve as a textbook for a computer science or engineering class. Overall, is a reference type book to be kept by specialists and in a library rather than a book to be purchased for self-introduction to the field. Most of the papers presented are results of ongoing research and so they rely heavily on previous results. On the other hand, with only one exception, the results presented in the papers are a great source of information for the researchers currently involved in the field. Michelle Pal, Los Alamos National Laboratory

...read moreread less

4,696 citations

Journal Article•DOI•

Multifrontal parallel distributed symmetric and unsymmetric solvers

[...]

Patrick R. Amestoy, Iain S. Duff, Jean-Yves L'Excellent

14 Apr 2000-Computer Methods in Applied Mechanics and Engineering

TL;DR: In this paper, a new parallel distributed memory multifrontal approach is described to handle numerical pivoting efficiently, a parallel asynchronous algorithm with dynamic scheduling of the computing tasks has been developed.

...read moreread less

940 citations

Journal Article•DOI•

M.DynaMix – a scalable portable parallel MD simulation package for arbitrary molecular mixtures

[...]

Alexander P. Lyubartsev¹, Aatto Laaksonen¹•Institutions (1)

Stockholm University¹

01 Jun 2000-Computer Physics Communications

TL;DR: A general purpose, scalable parallel molecular dynamics package for simulations of arbitrary mixtures of flexible or rigid molecules is presented, which allows use of most types of conventional molecular-mechanical force fields and contains a variety of auxiliary terms for inter- and intramolecular interactions, including an harmonic bond-stretchings.

...read moreread less

378 citations

Journal Article•DOI•

Local and parallel finite element algorithms based on two-grid discretizations

[...]

Jinchao Xu¹, Aihui Zhou²•Institutions (2)

Pennsylvania State University¹, Academia Sinica²

01 Jul 2000-Mathematics of Computation

TL;DR: A number of new local and parallel discretization and adaptive nite element algorithms are proposed and analyzed in this paper for elliptic boundary value problems and the main idea is to use a coarse grid to approximate the low frequencies and then to correct the resulted residue (which contains mostly high frequencies) by some local/parallel procedures.

...read moreread less

Abstract: A number of new local and parallel discretization and adaptive nite element algorithms are proposed and analyzed in this paper for elliptic boundary value problems. These algorithms are motivated by the observation that, for a solution to some elliptic problems, low frequency components can be approximated well by a relatively coarse grid and high frequency components can be computed on a ne grid by some local and parallel procedure. The theoretical tools for analyzing these methods are some local a priori and a posteriori estimates that are also obtained in this paper for nite element solutions on general shape-regular grids. Some numerical experiments are also presented to support the theory. In this paper, we will propose some new parallel techniques for nite element computation. These techniques are based on our understanding of the local and global properties of a nite element solution to some elliptic problems. Simply speaking, the global behavior of a solution is mostly governed by low frequency components while the local behavior is mostly governed by high frequency compo- nents. The main idea of our new algorithms is to use a coarse grid to approximate the low frequencies and then to use a ne grid to correct the resulted residue (which contains mostly high frequencies) by some local/parallel procedures. Let us now give a somewhat more detailed but informal (and hopefully infor- mative) description of the main ideas and results in this paper. We consider the following very simple model problem posed on a convex polygonal domain R 2 : ( u + bru = f; in ;

...read moreread less

209 citations

Journal Article•DOI•

Exploiting multiple levels of parallelism in Molecular Dynamics based calculations via modern techniques and software paradigms on distributed memory computers

[...]

Mark E. Tuckerman¹, D.A. Yarne², Shane O. Samuelson³, Adam Hughes³, Glenn J. Martyna³ - Show less +1 more•Institutions (3)

Courant Institute of Mathematical Sciences¹, University of Pennsylvania², Indiana University³

09 Jun 2000-Computer Physics Communications

TL;DR: Modern molecular dynamics methods are reviewed and their application to quantum manybody systems and electronic structure calculations described, and it is shown how modern object oriented programming paradigms can be employed to implement multilevel parallel algorithms in a large computational package rapidly and efficiently.

...read moreread less

205 citations

Journal Article•DOI•

Efficient parallel genetic algorithms: theory and practice

[...]

Erick Cantú-Paz¹, David E. Goldberg¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

09 Jun 2000-Computer Methods in Applied Mechanics and Engineering

TL;DR: In this article, a simple GA is optimized to reach a desired solution of a desired quality and its execution time is optimized, and two bounding cases of migration rate and topology are analyzed, and the case that yields good speedups is optimized.

...read moreread less

166 citations

Journal Article•DOI•

Scalable parallel data mining for association rules

[...]

Eui-Hong Han¹, George Karypis, Vipin Kumar•Institutions (1)

University of Minnesota¹

01 May 2000-IEEE Transactions on Knowledge and Data Engineering

TL;DR: Experimental results on a 128-processor Cray T3E show that HD scales just as well as the CD algorithm with respect to the number of transactions, and scales as much as IDD withrespect to increasing candidate set size.

...read moreread less

Abstract: The authors propose two new parallel formulations of the Apriori algorithm (R Agrawal and R Srikant, 1994) that is used for computing association rules These new formulations, IDD and HD, address the shortcomings of two previously proposed parallel formulations CD and DD Unlike the CD algorithm, the IDD algorithm partitions the candidate set intelligently among processors to efficiently parallelize the step of building the hash tree The IDD algorithm also eliminates the redundant work inherent in DD, and requires substantially smaller communication overhead than DD But IDD suffers from the added cost due to communication of transactions among processors HD is a hybrid algorithm that combines the advantages of CD and DD Experimental results on a 128-processor Cray T3E show that HD scales just as well as the CD algorithm with respect to the number of transactions, and scales as well as IDD with respect to increasing candidate set size

...read moreread less

165 citations

Journal Article•DOI•

State estimation distributed processing [for power systems]

[...]

R. Ebrahimian, R. Baldick

01 Dec 2000-IEEE Transactions on Power Systems

TL;DR: The authors apply the auxiliary problem principle to develop a distributed state estimator, demonstrating its performance on the Electric Reliability Council of Texas (ERCOT) and the Southwest Power Pool (SPP) systems.

...read moreread less

Abstract: This paper presents an application of a parallel algorithm to power systems state estimation The authors apply the auxiliary problem principle to develop a distributed state estimator, demonstrating its performance on the Electric Reliability Council of Texas (ERCOT) and the Southwest Power Pool (SPP) systems

...read moreread less

158 citations

Book•DOI•

Large-scale parallel data mining

[...]

Mohammed J. Zaki, Ching-Tien Ho

01 Jan 2000

TL;DR: A High Performance Implementation of the Data Space Transfer Protocol (DSTP) and an efficient Parallel Classification Using Dimensional Aggregates for Mining Associations are implemented.

...read moreread less

Abstract: Large-Scale Parallel Data Mining.- Parallel and Distributed Data Mining: An Introduction.- Mining Frameworks.- The Integrated Delivery of Large-Scale Data Mining: The ACSys Data Mining Project.- A High Performance Implementation of the Data Space Transfer Protocol (DSTP).- Active Mining in a Distributed Setting.- Associations and Sequences.- Efficient Parallel Algorithms for Mining Associations.- Parallel Branch-and-Bound Graph Search for Correlated Association Rules.- Parallel Generalized Association Rule Mining on Large Scale PC Cluster.- Parallel Sequence Mining on Shared-Memory Machines.- Classification.- Parallel Predictor Generation.- Efficient Parallel Classification Using Dimensional Aggregates.- Learning Rules from Distributed Data.- Clustering.- Collective, Hierarchical Clustering from Distributed, Heterogeneous Data.- A Data-Clustering Algorithm on Distributed Memory Multiprocessors.

...read moreread less

155 citations

Proceedings Article•DOI•

Segmented min-min: a static mapping algorithm for meta-tasks on heterogeneous computing systems

[...]

Min-You Wu, Wei Shu, H. Zhang

01 May 2000

TL;DR: An algorithm is presented which improves the min-min algorithm by scheduling large tasks first, balances the load well and demonstrates even better performance in both makespan and running time.

...read moreread less

Abstract: The min-min algorithm is a simple algorithm. It runs fast and delivers good performance. However, the min-min algorithm schedules small tasks first, resulting in some load imbalance. We present an algorithm which improves the min-min algorithm by scheduling large tasks first. The new algorithm, segmented min-min, balances the load well and demonstrates even better performance in both makespan and running time.

...read moreread less

150 citations

Proceedings Article•DOI•

A multiobjective genetic algorithm for radio network optimization

[...]

H. Meunier, El-Ghazali Talbi¹, P. Reininger²•Institutions (2)

university of lille¹, CNET²

16 Jul 2000

TL;DR: A genetic algorithm (GA) that aims to approximate the Pareto frontier of the problem, and has been implemented in parallel on a network of workstations to speed up the search.

...read moreread less

Abstract: Engineering of mobile telecommunication networks endures two major problems: the design of the network and the frequency assignment. We address the first problem in this paper, which has been formulated as a multiobjective constrained combinatorial optimisation problem. We propose a genetic algorithm (GA) that aims to approximate the Pareto frontier of the problem. Advanced techniques have been used, such as Pareto ranking, sharing and elitism. The GA has been implemented in parallel on a network of workstations to speed up the search. To evaluate the performance of the GA, we have introduced two new quantitative indicators: the entropy and the contribution. Encouraging results are obtained on real-life problems.

...read moreread less

Journal Article•DOI•

A Scalable Parallel Algorithm for Incomplete Factor Preconditioning

[...]

David Hysom, Alex Pothen

01 Jun 2000-SIAM Journal on Scientific Computing

TL;DR: This work describes a parallel algorithm for computing incomplete factor (ILU) preconditioners that attains a high degree of parallelism through graph partitioning and a two-level ordering strategy and shows that this algorithm is scalable.

...read moreread less

Abstract: We describe a parallel algorithm for computing incomplete factor (ILU) preconditioners. The algorithm attains a high degree of parallelism through graph partitioning and a two-level ordering strategy. Both the subdomains and the nodes within each subdomain are ordered to preserve concurrency. We show through an algorithmic analysis and through computational results that this algorithm is scalable. Experimental results include timings on three parallel platforms for problems with up to 20 million unknowns running on up to 216 processors. The resulting preconditioned Krylov solvers have the desirable property that the number of iterations required for convergence is insensitive to the number of processors.

...read moreread less

Journal Article•DOI•

Applying recursion to serial and parallel QR factorization leads to better performance

[...]

Erik Elmroth¹, Fred G. Gustavson²•Institutions (2)

Umeå University¹, IBM²

01 Jul 2000-Ibm Journal of Research and Development

TL;DR: A hybrid recursive algorithm that outperforms the LAPACK algorithm DGEQRF by about 20% for large square matrices and up to almost a factor of 3 for tall thin matrices is introduced.

...read moreread less

Abstract: We present new recursive serial and parallel algorithms for QR factorization of an m by n matrix. They improve performance. The recursion leads to an automatic variable blocking, and it also replaces a Level 2 part in a standard block algorithm with Level 3 operations. However, there are significant additional costs for creating and performing the updates, which prohibit the efficient use of the recursion for large n. We present a quantitative analysis of these extra costs. This analysis leads us to introduce a hybrid recursive algorithm that outperforms the LAPACK algorithm DGEQRF by about 20% for large square matrices and up to almost a factor of 3 for tall thin matrices. Uniprocessor performance results are presented for two IBM RS/6000® SP nodes-a 120-MHz IBM POWER2 node and one processor of a four-way 332-MHz IBM PowerPC® 604e SMP node. The hybrid recursive algorithm reaches more than 90% of the theoretical peak performance of the POWER2 node. Compared to standard block algorithms, the recursive approach also shows a significant advantage in the automatic tuning obtained from its automatic variable blocking. A successful parallel implementation on a four-way 332-MHz IBM PPC604e SMP node based on dynamic load balancing is presented. For two, three, and four processors it shows speedups of up to 1.97, 2.99, and 3.97.

...read moreread less

Book Chapter•DOI•

Parallel Multilevel Algorithms for Multi-Constraint Graph Partitioning

[...]

Kirk Schloegel¹, George Karypis¹, Vipin Kumar¹•Institutions (1)

University of Minnesota¹

29 Aug 2000

TL;DR: In this paper, a parallel formulation of a recently developed multi-constraint graph partitioning algorithm is presented, which is able to efficiently compute partitionings of similar edge-cuts as serial multiconstraints algorithms, and can scale to very large graphs.

...read moreread less

Abstract: Sequential multi-constraint graph partitioners have been developed to address the load balancing requirements of multi-phase simulations. The efficient execution of large multi-phase simulations on high performance parallel computers requires that the multi-constraint partitionings are computed in parallel. This paper presents a parallel formulation of a recently developed multi-constraint graph partitioning algorithm. We describe this algorithm and give experimental results conducted on a 128-processor Cray T3E. We show that our parallel algorithm is able to efficiently compute partitionings of similar edge-cuts as serial multi-constraint algorithms, and can scale to very large graphs. Our parallel multi-constraint graph partitioner is able to compute a three-constraint 128-way partitioning of a 7.5 million node graph in about 7 seconds on 128 processors of a Cray T3E.

...read moreread less

Journal Article•DOI•

Strategies for massively parallel local-orbital-based electronic structure methods

[...]

Mark R. Pederson¹, Dirk Porezag¹, Jens Kortus¹, David C. Patton¹•Institutions (1)

United States Naval Research Laboratory¹

01 Jan 2000-Physica Status Solidi B-basic Solid State Physics

TL;DR: This work discusses several aspects related to massively parallel electronic structure calculations using the gaussian-orbital based Naval Research Laboratory Molecular Orbital Library (NRLMOL), and refers to the algorithms for parallelizing such problems as "honey-bee algorithms" because they are analogous to nature's way of generating honey.

...read moreread less

Abstract: We discuss several aspects related to massively parallel electronic structure calculations using the gaussian-orbital based Naval Research Laboratory Molecular Orbital Library (NRLMOL). While much of the discussion is specific to gaussian-orbital methods, we show that all of the computationally intensive problems encountered in this code are special cases of a general class of problems which allow for the generation of parallel code that is automatically dynamically load balanced. We refer to the algorithms for parallelizing such problems as “honey-bee algorithms” because they are analogous to nature's way of generating honey. With the use of such algorithms, BEOWULF clusters of personal computers are roughly equivalent to higher performance systems on a per processor basis. Further, we show that these algorithms are compatible with more complicated parallel programming architectures that are reasonable to anticipate. After specifically discussing several parallel algorithms, we discuss applications of this program to magnetic molecules.

...read moreread less

Journal Article•DOI•

Efficient hierarchical chaotic image encryption algorithm and its VLSI realisation

[...]

J. C. Yen¹, Jiun-In Guo•Institutions (1)

National United University¹

01 Apr 2000

TL;DR: An efficient hierarchical chaotic image encryption algorithm and its VLSI architecture and its FPGA realisation of its key modules are proposed and its fractal dimension is computed to demonstrate the effectiveness of the proposed scheme.

...read moreread less

Abstract: An efficient hierarchical chaotic image encryption algorithm and its VLSI architecture are proposed. Based on a chaotic system and a permutation scheme, all the partitions of the original image are rearranged and the pixels in each partition are scrambled. Its properties of high security, parallel and pipeline processing, and no distortion are analysed. To implement the algorithm, its VLSI architecture with pipeline processing, real-time processing capability, and low hardware cost is designed and the FPGA realisation of its key modules is given. Finally, the encrypted image is simulated and its fractal dimension is computed to demonstrate the effectiveness of the proposed scheme.

...read moreread less

Book Chapter•DOI•

On Identifying Strongly Connected Components in Parallel

[...]

Lisa Fleischer¹, Bruce Hendrickson², Ali Pinar³•Institutions (3)

Columbia University¹, Sandia National Laboratories², University of Illinois at Urbana–Champaign³

01 May 2000

TL;DR: For a graph with n vertices in which degrees are bounded by a constant, the expected serial running time of their algorithm was O(n log n) as discussed by the authors, where n is the number of vertices.

...read moreread less

Abstract: The standard serial algorithm for strongly connected components is based on depth first search, which is difficult to parallelize. We describe a divide-and-conquer algorithm for this problem which has significantly greater potential for parallelization. For a graph with n vertices in which degrees are bounded by a constant, we sho w the expected serial running time of our algorithm to be O(n log n).

...read moreread less

Book Chapter•DOI•

A Hierarchical Genetic Algorithm Using Multiple Models for Optimization

[...]

Mourad Sefrioui, Jacques Periaux¹•Institutions (1)

Dassault Aviation¹

18 Sep 2000

TL;DR: The overall results are that a Hierarchical Genetic Algorithm using multiple models can achieve the same quality of results as that of a classic GA using only a complex model, but up to three times faster.

...read moreread less

Abstract: This article presents both the theoretical basis and some experimental results on Hierarchical Genetic Algorithms (HGAs). HGAs are explained in details, along with the advantages conferred by their multi-layered hierarchical topology. This topology is an excellent compromise to the classical exploration/exploitation dilemma. Another feature is the introduction of multiple models for optimization problems, within the frame of an HGA. We show that with such an architecture, it is possible to use a mix of simple models that are very fast and more complex models (with slower solvers), and still achieve the same quality as that obtained with only complex models. The different concepts presented in this paper are then illustrated via experiments on a Computational Fluid Dynamics problem, namely a nozzle reconstruction. The overall results are that a Hierarchical Genetic Algorithm using multiple models can achieve the same quality of results as that of a classic GA using only a complex model, but up to three times faster.

...read moreread less

Journal Article•DOI•

A New Paradigm for Parallel Adaptive Meshing Algorithms

[...]

Randolph E. Bank, Michael Holst

01 Apr 2000-SIAM Journal on Scientific Computing

TL;DR: This approach addresses the load balancing problem in a new way, requiring far less communication than current approaches, and allows existing sequential adaptive PDE codes such as PLTMG and MC to run in a parallel environment without a large investment in recoding.

...read moreread less

Abstract: We present a new approach to the use of parallel computers with adaptive finite element methods. This approach addresses the load balancing problem in a new way, requiring far less communication than current approaches. It also allows existing sequential adaptive PDE codes such as PLTMG and MC to run in a parallel environment without a large investment in recoding. In this new approach, the load balancing problem is reduced to the numerical solution of a small elliptic problem on a single processor, using a sequential adaptive solver, without requiring any modifications to the sequential solver. The small elliptic problem is used to produce a posteriori error estimates to predict future element densities in the mesh, which are then used in a weighted recursive spectral bisection of the initial mesh. The bulk of the calculation then takes place independently on each processor, with no communication, using possibly the same sequential adaptive solver. Each processor adapts its region of the mesh independently, and a nearly load-balanced mesh distribution is usually obtained as a result of the initial weighted spectral bisection. Only the initial fan-out of the mesh decomposition to the processors requires communication. Two additional steps requiring boundary exchange communication may be employed after the individual processors reach an adapted solution, namely, the construction of a global conforming mesh from the independent subproblems, followed by a final smoothing phase using the subdomain solutions as an initial guess. We present a series of convincing numerical experiments which illustrate the effectiveness of this approach. The justification of the initial refinement prediction step, as well as the justification of skipping the two communication-intensive steps, is supported by some recent [J. Xu and A. Zhou, Math. Comp., to appear] and not so recent [J. A. Nitsche and A. H. Schatz, Math. Comp., 28 (1974), pp. 937--958; A. H. Schatz and L. B. Wahlbin, Math. Comp., 31 (1977), pp. 414--442; A. H. Schatz and L. B. Wahlbin, Math. Comp., 64 (1995), pp. 907--928] results on local a priori and a posteriori error estimation.

...read moreread less

Journal Article•DOI•

Computation of Gauss-Kronrod of quadrature rules

[...]

Daniela Calvetti¹, Gene H. Golub², William B. Gragg³, Lothar Reichel⁴•Institutions (4)

Case Western Reserve University¹, Stanford University², Naval Postgraduate School³, Kent State University⁴

01 Jul 2000-Mathematics of Computation

TL;DR: A new algorithm is described that does not require the entries of the tridiagonal matrix to be determined, and thereby avoids computations that can be sensitive to perturbations.

...read moreread less

Abstract: Recently Laurie presented a new algorithm for the computation of (2n+1)-point Gauss-Kronrod quadrature rules with real nodes and positive weights. This algorithm first determines a symmetric tridiagonal matrix of order 2n + 1 from certain mixed moments, and then computes a partial spectral factorization. We describe a new algorithm that does not require the entries of the tridiagonal matrix to be determined, and thereby avoids computations that can be sensitive to perturbations. Our algorithm uses the consolidation phase of a divide-and-conquer algorithm for the symmetric tridiagonal eigenproblem. We also discuss how the algorithm can be applied to compute Kronrod extensions of Gauss-Radau and Gauss-Lobatto quadrature rules. Throughout the paper we emphasize how the structure of the algorithm makes efficient implementation on parallel computers possible. Numerical examples illustrate the performance of the algorithm.

...read moreread less

Journal Article•DOI•

A parallel finite-element tearing and interconnecting algorithm for solution of the vector wave equation with PML absorbing medium

[...]

C.T. Wolfe, U. Navsariwala, Stephen D. Gedney

01 Feb 2000-IEEE Transactions on Antennas and Propagation

TL;DR: It is shown that this FETI algorithm is highly scalable and is more efficient on parallel platforms when solving large matrices than traditional iterative methods such as a preconditioned conjugate gradient algorithm.

...read moreread less

Abstract: A domain decomposition method based on the finite-element tearing and interconnecting (FETI) algorithm is presented for the solution of the large sparse matrices associated with the finite-element method (FEM) solution of the vector wave equation. The FETI algorithm is based on the method of Lagrange multipliers and leads to a reduced-order system, which is solved using the biconjugate gradient method (BiCGM). It is shown that this method is highly scalable and is more efficient on parallel platforms when solving large matrices than traditional iterative methods such as a preconditioned conjugate gradient algorithm. This is especially true when a perfectly matched layer (PML) absorbing medium is used to terminate the problem domain.

...read moreread less

Journal Article•DOI•

A Parallel Grasp for the Steiner Tree Problem in Graphs Using a Hybrid Local Search Strategy

[...]

Simone Martins¹, Mauricio G. C. Resende², Celso C. Ribeiro¹, Panos M. Pardalos³•Institutions (3)

The Catholic University of America¹, AT&T², University of Florida³

01 Sep 2000-Journal of Global Optimization

TL;DR: A parallel greedy randomized adaptive search procedure (GRASP) for the Steiner problem in graphs and the main contribution of the parallel algorithm concerns the fact that larger speedups of the same order of the number of processors are obtained exactly for the most difficult problems.

...read moreread less

Abstract: In this paper, we present a parallel greedy randomized adaptive search procedure (GRASP) for the Steiner problem in graphs. GRASP is a two-phase metaheuristic. In the first phase, solutions are constructed using a greedy randomized procedure. Local search is applied in the second phase, leading to a local minimum with respect to a specified neighborhood. In the Steiner problem in graphs, feasible solutions can be characterized by their non-terminal nodes (Steiner nodes) or by their key-paths. According to this characterization, two GRASP procedures are described using different local search strategies. Both use an identical construction procedure. The first uses a node-based neighborhood for local search, while the second uses a path-based neighborhood. Computational results comparing the two procedures show that while the node-based variant produces better quality solutions, the path-based variant is about twice as fast. A hybrid GRASP procedure combining the two neighborhood search strategies is then proposed. Computational experiments with a parallel implementation of the hybrid procedure are reported, showing that the algorithm found optimal solutions for 45 out of 60 benchmark instances and was never off by more than 4% of the optimal solution value. The average speedup results observed for the test problems show that increasing the number of processors reduces elapsed times with increasing speedups. Moreover, the main contribution of the parallel algorithm concerns the fact that larger speedups of the same order of the number of processors are obtained exactly for the most difficult problems.

...read moreread less

Journal Article•DOI•

A Note On Parallel Matrix Inversion

[...]

E. S. Quintana¹, Gregorio Quintana¹, Xiaobai Sun¹, Robert A. van de Geijn•Institutions (1)

James I University¹

01 May 2000-SIAM Journal on Scientific Computing

TL;DR: This work presents one-sweep parallel algorithms for the inversion of general and symmetric positive definite matrices that feature simple programming and performance optimization while maintaining the same arithmetic cost and numerical properties of conventional inversion algorithms.

...read moreread less

Abstract: We present one-sweep parallel algorithms for the inversion of general and symmetric positive definite matrices. The algorithms feature simple programming and performance optimization while maintaining the same arithmetic cost and numerical properties of conventional inversion algorithms. Our experiments on a Cray T3E-600 and a Beowulf cluster demonstrate high performance of implementations for distributed memory parallel computers.

...read moreread less

Journal Article•DOI•

Scalable parallel graph coloring algorithms

[...]

Assefaw H. Gebremedhin¹, Fredrik Manne¹•Institutions (1)

University of Bergen¹

01 Oct 2000-Concurrency and Computation: Practice and Experience

TL;DR: A simple and fast parallel graph coloring heuristic that is well suited for shared memory programming and yields an almost linear speedup on the PRAM model is presented.

...read moreread less

Abstract: SUMMARY Finding a good graph coloring quickly is often a crucial phase in the development of efficient, parallel algorithms for many scientific and engineering applications. In this paper we consider the problem of solving the graph coloring problem itself in parallel. We present a simple and fast parallel graph coloring heuristic that is well suited for shared memory programming and yields an almost linear speedup on the PRAM model. We also present a second heuristic that improves on the number of colors used. The heuristics have been implemented using OpenMP. Experiments conducted on an SGI Cray Origin 2000 supercomputer using very large graphs from finite element methods and eigenvalue computations validate the theoretical run-time analysis. Copyright  2000 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

Aerodynamic and Aeroacoustic Optimization of Rotorcraft Airfoils via a Parallel Genetic Algorithm

[...]

Brian R. Jones¹, William A. Crossley¹, Anastasios S. Lyrintzis¹•Institutions (1)

Purdue University¹

01 Nov 2000-Journal of Aircraft

TL;DR: A parallel genetic algorithm (GA) methodology was developed to generate a family of two-dimensional airfoil designs that address rotorcraft aerodynamic and aeroacoustic concerns and exhibited favorable performance when compared with typical rotorcraft airfoils under identical design conditions using the same analysis routines.

...read moreread less

Abstract: A parallel genetic algorithm (GA) methodology was developed to generate a family of two-dimensional airfoil designs that address rotorcraft aerodynamic and aeroacoustic concerns The GA operated on 20 design variables, whichconstitutedthecontrolpointsforasplinerepresentingtheairfoilsurfaceTheGAtookadvantageofavailable computer resources by operating in either serial mode, where the GA and function evaluations were run on the same processor or “ manager/worker” parallel mode, where the GA runs on the manager processor and function evaluations areconducted independently on separate workerprocessors The multiple objectives of this work were to minimizethedrag and overall noiseof the airfoil Constraintswereplaced on liftcoefe cient, moment coefe cient, andboundary-layerconvergenceTheaerodynamicanalysiscodeXFOILprovidedpressureandsheardistributions in addition to liftand drag predictions Theaeroacousticanalysis code, WOPWOP, provided thicknessand loading noise predictions The airfoils comprising the resulting Pareto-optimal set exhibited favorable performance when compared with typical rotorcraft airfoils under identical design conditions using the same analysis routines The relationship between the quality of results and the analyses used in the optimization is also discussed The new airfoil shapes could provide starting points for further investigation

...read moreread less

Book Chapter•DOI•

Achieving Scalability in Parallel Reachability Analysis of Very Large Circuits

[...]

Tamir Heyman¹, Daniel Geist², Orna Grumberg¹, Assaf Schuster¹•Institutions (2)

Technion – Israel Institute of Technology¹, IBM²

15 Jul 2000

TL;DR: In this article, a scalable method for parallel symbolic reachability analysis on a distributed-memory environment of workstations is presented, which makes use of an adaptive partitioning algorithm which achieves high reduction of space requirements.

...read moreread less

Abstract: This paper presents a scalable method for parallel symbolic reachability analysis on a distributed-memory environment of workstations. Our method makes use of an adaptive partitioning algorithm which achieves high reduction of space requirements. The memory balance is maintained by dynamically repartitioning the state space throughout the computation. A compact BDD representation allows coordination by shipping BDDs from one machine to another, where different variable orders are allowed. The algorithm uses a distributed termination protocol with none of the memory modules preserving a complete image of the set of reachable states. No external storage is used on the disk; rather, we make use of the network which is much faster.

...read moreread less

Proceedings Article•DOI•

The new model of parallel genetic algorithm in multi-objective optimization problems - divided range multi-objective genetic algorithm

[...]

Tomoyuki Hiroyasu¹, Mitsunori Miki, Shinya Watanabe•Institutions (1)

Doshisha University¹

16 Jul 2000

TL;DR: The DRMOGA is a very suitable GA model for parallel processing, and in some cases it can derive better solutions compared to both the single-population model and the distributed model.

...read moreread less

Abstract: Proposes a divided-range multi-objective genetic algorithm (DRMOGA), which is a model for the parallel processing of genetic algorithms (GAs) for multi-objective problems. In the DRMOGA, the population of GAs is sorted with respect to the values of the objective function and divided into sub-populations. In each sub-population, a simple GA for multi-objective problems is performed. After some generations, all the individuals are gathered and they are sorted again. In this model, the Pareto-optimal solutions which are close to each other are collected into one sub-population. Therefore, this algorithm increases the calculation efficiency and a neighborhood search can be performed. Through numerical examples, the following facts become clear: (i) the DRMOGA is a very suitable GA model for parallel processing, and (ii) in some cases it can derive better solutions compared to both the single-population model and the distributed model.

...read moreread less

Proceedings Article•DOI•

A parallel tabu search based method for reconfigurations of distribution systems

[...]

Hiroyuki Mori¹, Y. Ogita¹•Institutions (1)

Meiji University¹

16 Jul 2000

TL;DR: This paper proposes parallel tabu search to reduce computational effort and enhance solution accuracy and introduces the adaptive memory called tabu list into the hill-climbing method of local search to escape from local minima.

...read moreread less

Abstract: This paper presents an efficient meta-heuristic method for reconfigurations of distribution systems. In this paper, parallel tabu search is used to reconfigure distribution systems so that active power losses are globally minimized with turning on/off sectionalizing switches. The loss minimization problem is one of the most important problems to save the operational cost in distribution systems. As a result, more efficient approaches are required to handle a combinatorial problem. This paper focuses on tabu search (TS) that introduces the adaptive memory called tabu list into the hill-climbing method of local search to escape from local minima. This paper proposes parallel tabu search to reduce computational effort and enhance solution accuracy. A couple of strategies are introduced into TS. One is to decompose the solution neighborhood of TS into the subneighborhood. The other is to consider different tabu lengths that make the solution more diverse. The proposed method is successfully applied to sample systems.

...read moreread less

Book Chapter•DOI•

Island Model Cooperating with Speciation for Multimodal Optimization

[...]

Mourad Bessaou¹, Alain Petrowski, Patrick Siarry²•Institutions (2)

Cergy-Pontoise University¹, University of Paris²

18 Sep 2000

TL;DR: A new method is considered that enables a genetic algorithm (GA) to identify and maintain multiple optima of a multimodal function, by creating subpopulations within the niches defined by themultiple optima, thus warranting a good "diversity".

...read moreread less

Abstract: This paper considers a new method that enables a genetic algorithm (GA) to identify and maintain multiple optima of a multimodal function, by creating subpopulations within the niches defined by the multiple optima, thus warranting a good "diversity". The algorithm is based on a splitting of the traditional GA into a sequence of two processes. Since the GA behavior is determined by the exploration / exploitation balance, during the first step (Exploration), the multipopulation genetic algorithm coupled with a speciation method detects the potential niches by classifying "similar" individuals in the same population. Once the niches are detected, the algorithm achieves an intensification (Exploitation), by allocating a separate portion of the search space to each population. These two steps are alternately performed at a given frequency. Empirical results obtained with F6 Schaffer's function are then presented to show the reliability of the algorithm.

...read moreread less

Journal Article•DOI•

Markov chain models of parallel genetic algorithms

[...]

Erick Cantú-Paz¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Sep 2000-IEEE Transactions on Evolutionary Computation

TL;DR: This paper presents models that predict the effects of the parallel GA parameters on its search quality by finding the probability that each population converges to the correct solution after each restart, and also calculate the long-run chance of success.

...read moreread less

Abstract: Implementations of parallel genetic algorithms (GA) with multiple populations are common, but they introduce several parameters whose effect on the quality of the search is not well understood. Parameters such as the number of populations, their size, the topology of communications, and the migration rate have to be set carefully to reach adequate solutions. This paper presents models that predict the effects of the parallel GA parameters on its search quality. The paper reviews some recent results on the case where each population is connected to all the others and the migration rate is set to the maximum value possible. This bounding case is the simplest to analyze, and it introduces the methodology that is used in the remainder of the paper to analyze parallel GA with arbitrary migration rates and communication topologies. This investigation considers that migration occurs only after each population converges; then, incoming individuals are incorporated into the populations and the algorithm restarts. The models find the probability that each population converges to the correct solution after each restart, and also calculate the long-run chance of success. The accuracy of the models is verified with experiments using one additively decomposable function.

...read moreread less

Collapse