Showing papers on "Massively parallel published in 1998"

PDF

Open Access

Journal Article•DOI•

A Defect-Tolerant Computer Architecture: Opportunities for Nanotechnology

[...]

James R. Heath¹, Philip J. Kuekes¹, Gregory S. Snider¹, R. Stanley Williams¹•Institutions (1)

12 Jun 1998-Science

TL;DR: The defect-tolerant architecture of Teramac, which incorporates a high communication bandwith that enables it to easily route around defects, has significant implications for any future nanometer-scale computational paradigm.

...read moreread less

Abstract: Teramac is a massively parallel experimental computer built at Hewlett-Packard Laboratories to investigate a wide range of different computational architectures. This machine contains about 220,000 hardware defects, any one of which could prove fatal to a conventional computer, and yet it operated 100 times faster than a high-end single-processor workstation for some of its configurations. The defect-tolerant architecture of Teramac, which incorporates a high communication bandwith that enables it to easily route around defects, has significant implications for any future nanometer-scale computational paradigm. It may be feasible to chemically synthesize individual electronic components with less than a 100 percent yield, assemble them into systems with appreciable uncertainty in their connectivity, and still create a powerful and reliable data communications network. Future nanoscale computers may consist of extremely large-configuration memories that are programmed for specific tasks by a tutor that locates and tags the defects in the system.

...read moreread less

895 citations

Journal Article•DOI•

BSPlib: The BSP programming library

[...]

Jonathan M. D. Hill¹, Bill McColl¹, Dan C. Stefanescu², Dan C. Stefanescu³, Mark W. Goudreau⁴, Kevin J. Lang⁵, Satish Rao⁵, Torsten Suel⁶, Thanasis Tsantilas⁷, Rob H. Bisseling⁸ - Show less +6 more•Institutions (8)

University of Oxford¹, Harvard University², Suffolk University³, University of Central Florida⁴, Princeton University⁵, Alcatel-Lucent⁶, Columbia University⁷, Utrecht University⁸

01 Dec 1998

TL;DR: This paper presents the full definition of BSPlib in C, motivates the design of its basic operations, and gives examples of their use, and briefly describes applications in benchmarking, Fast Fourier Transforms, sorting, and molecular dynamics.

...read moreread less

Abstract: BSPlib is a small communications library for bulk synchronous parallel (BSP) programming which consists of only 20 basic operations. This paper presents the full definition of BSPlib in C, motivates the design of its basic operations, and gives examples of their use. The library enables programming in two distinct styles: direct remote memory access (DRMA) using put or get operations, and bulk synchronous message passing (BSMP). Currently, implementations of BSPlib exist for a variety of modern architectures, including massively parallel computers with distributed memory, shared memory multiprocessors, and networks of workstations. BSPlib has been used in several scientific and industrial applications; this paper briefly describes applications in benchmarking, Fast Fourier Transforms (FFTs), sorting, and molecular dynamics.

...read moreread less

343 citations

Journal Article•DOI•

A practical approach to dynamic load balancing

[...]

J. Watts¹, Stephen Taylor¹•Institutions (1)

Syracuse University¹

01 Mar 1998-IEEE Transactions on Parallel and Distributed Systems

TL;DR: This paper presents a cohesive, practical load balancing framework that improves upon existing strategies, and exposes a serious deficiency in current load balancing strategies, motivating further work in this area.

...read moreread less

Abstract: This paper presents a cohesive, practical load balancing framework that improves upon existing strategies. These techniques are portable to a broad range of prevalent architectures, including massively parallel machines, such as the Cray T3D/E and Intel Paragon, shared memory systems, such as the Silicon Graphics PowerChallenge, and networks of workstations. As part of the work, an adaptive heat diffusion scheme is presented, as well as a task selection mechanism that can preserve or improve communication locality. Unlike many previous efforts in this arena, the techniques have been applied to two large-scale industrial applications on a variety of multicomputers. In the process, this work exposes a serious deficiency in current load balancing strategies, motivating further work in this area.

...read moreread less

205 citations

Journal Article•DOI•

Large-Scale Molecular Dynamics Simulations of Dislocation Intersection in Copper

[...]

S. J. Zhou¹, D. L. Preston¹, Peter S. Lomdahl¹, D. M. Beazley¹•Institutions (1)

Los Alamos National Laboratory¹

06 Mar 1998-Science

TL;DR: The results of massively parallel three-dimensional molecular dynamics simulations of the perpendicular intersection of extended dislocations in copper are reported, providing insights into this complex atomistic process.

...read moreread less

Abstract: The results of massively parallel three-dimensional molecular dynamics simulations of the perpendicular intersection of extended dislocations in copper are reported. The intersection process, which involves three of the four possible {111} glide planes in the face-centered cubic lattice, begins with junction formation, followed by unzipping, partial dislocation bowing, cutting, and, finally, unit jog formation. The investigation provides insights into this complex atomistic process, which is currently not accessible to experimental investigation.

...read moreread less

170 citations

Journal Article•DOI•

Parallel genetic simulated annealing: a massively parallel SIMD algorithm

[...]

H. Chen, Nicholas S. Flann¹, Daniel W. Watson¹•Institutions (1)

Utah State University¹

01 Feb 1998-IEEE Transactions on Parallel and Distributed Systems

TL;DR: A new hybrid algorithm is introduced that inherits those aspects of GA that lend themselves to parallelization, and avoids serial bottle-necks of GA approaches by incorporating elements of SA to provide a completely parallel, easily scalable hybrid GA/SA method.

...read moreread less

Abstract: Many significant engineering and scientific problems involve optimization of some criteria over a combinatorial configuration space. The two methods most often used to solve these problems effectively-simulated annealing (SA) and genetic algorithms (GA)-do not easily lend themselves to massive parallel implementations. Simulated annealing is a naturally serial algorithm, while GA involves a selection process that requires global coordination. This paper introduces a new hybrid algorithm that inherits those aspects of GA that lend themselves to parallelization, and avoids serial bottle-necks of GA approaches by incorporating elements of SA to provide a completely parallel, easily scalable hybrid GA/SA method. This new method, called Genetic Simulated Annealing, does not require parallelization of any problem specific portions of a serial implementation-existing serial implementations can be incorporated as is. Results of a study on two difficult combinatorial optimization problems, a 100 city traveling salesperson problem and a 24 word, 12 bit error correcting code design problem, performed on a 16 K PE MasPar MP-1, indicate advantages over previous parallel GA and SA approaches. One of the key results is that the performance of the algorithm scales up linearly with the increase of processing elements, a feature not demonstrated by any previous parallel GA or SA approaches, which enables the new algorithm to utilize massive parallel architecture with maximum effectiveness. Additionally, the algorithm does not require careful choice of control parameters, a significant advantage over SA and GA.

...read moreread less

140 citations

Journal Article•DOI•

3-D large-scale wave propagation modeling by spectral element method on Cray T3E multiprocessor

[...]

Géza Seriani

02 Oct 1998-Computer Methods in Applied Mechanics and Engineering

TL;DR: A spectral element method in conjunction with a new iterative solution technique is presented, which allows for increased computational efficiency when compared to standard finite element method and a significant reduction both in storage requirements and in the computational complexity.

...read moreread less

134 citations

Journal Article•DOI•

The two-level FETI method. Part II: Extension to shell problems, parallel implementation and performance results

[...]

Charbel Farhat¹, Po-Shu Chen¹, Jan Mandel¹, Jan Mandel², Francois Xavier Roux - Show less +1 more•Institutions (2)

University of Colorado Boulder¹, University of Colorado Denver²

16 Mar 1998-Computer Methods in Applied Mechanics and Engineering

TL;DR: The two-level FETI method is extended to shell problems, a variant approach is described, the preconditioning problem is revisited, the efficient implementation of the corresponding iterative solvers on massively parallel processors is addressed, and the computational price of mathematical optimality is highlighted.

...read moreread less

114 citations

Journal Article•DOI•

An Sn algorithm for the massively parallel CM-200 computer

[...]

Randal S. Baker¹, Kenneth R. Koch¹•Institutions (1)

Los Alamos National Laboratory¹

01 Mar 1998-Nuclear Science and Engineering

TL;DR: A parallel three-dimensional Sn algorithm is described for performing eigenvalue calculations on rectangular meshes using the Thinking Machines Connection Machine model CM-200 computer.

...read moreread less

Abstract: A parallel three-dimensional Sn algorithm is described for performing eigenvalue calculations on rectangular meshes using the Thinking Machines Connection Machine model CM-200 computer. By using a ...

...read moreread less

112 citations

Patent•

Multiple parallel-job scheduling method and apparatus

[...]

Akihiro Nakaya¹, Takashi Nishikado¹, Hiroyuki Kumazaki¹, Naonobu Sukegawa¹, Kei Nakajima¹, Masakazu Fukagawa¹ - Show less +2 more•Institutions (1)

Hitachi¹

24 Feb 1998

TL;DR: In this paper, a synchronization range indicator is provided which can control by program whether the parallel processors are available in correspondence to the respective serial processors, in response to a request for using the parallel processor from a serial processor.

...read moreread less

Abstract: Multiple parallel-job scheduling method and apparatus are provided which can improve the utilization of all processors in a system when a plurality of parallel jobs are executed concurrently. A plurality of processors constituting a computer system and each having the equal function are logically categorized into serial processors for executing a serial computing part or a parallel computing part of a parallel job and a parallel processor group consisting of multiple processors for executing the parallel computing part of the parallel job in parallel. In order that the parallel processors are shared by a plurality of parallel jobs, a synchronization range indicator is provided which can control by program whether the parallel processors are available in correspondence to the respective serial processors. In response to a request for using the parallel processors from a serial processor for which the parallel processors are so set as to be available by means of the synchronization range indicator, operation can be carried out without invoking an interrupt.

...read moreread less

109 citations

Patent•

Low cost, highly parallel memory tester

[...]

George W. Conner¹•Institutions (1)

Teradyne¹

04 Sep 1998

TL;DR: Automatic test equipment for semiconductor memories that provides testing of large arrays of semiconductor memory chips in parallel is described in this article, which greatly enhances the economics of testing memory device made according to a RAMBUS standard, which includes a low speed port and a medium speed port.

...read moreread less

Abstract: Automatic test equipment for semiconductor memories that provides testing of large arrays of semiconductor memory chips in parallel. Such massively parallel memory testing greatly enhances test throughput, thereby reducing cost. It greatly enhances the economics of testing memory device made according to a RAMBUS standard, which includes a low speed port and a medium speed port because it allows the same automatic test equipment to economically be used to test devices with the low speed port and the medium speed port.

...read moreread less

103 citations

Proceedings Article•DOI•

Genetic network analysis in light of massively parallel biological data acquisition.

[...]

Zoltan Szallasi¹•Institutions (1)

Uniformed Services University of the Health Sciences¹

01 Dec 1998

TL;DR: This paper reviews how the stochastic nature, effective size, and the compartmentalization of genetic networks as well as the information content of gene expression matrices will influence the ability to perform successful reverse engineering.

...read moreread less

Abstract: Complementary DNA microarray and high density oligonucleotide arrays opened the opportunity for massively parallel biological data acquisition. Application of these technologies will shift the emphasis in biological research from primary data generation to complex quantitative data analysis. Reverse engineering of time-dependent gene-expression matrices is amongst the first complex tools to be developed. The success of reverse engineering will depend on the quantitative features of the genetic networks and the quality of information we can obtain from biological systems. This paper reviews how the (1) stochastic nature, (2) the effective size, and (3) the compartmentalization of genetic networks as well as (4) the information content of gene expression matrices will influence our ability to perform successful reverse engineering.

...read moreread less

Book•

Distributed and Parallel Computing

[...]

Hesham El-Rewini¹, Ted G. Lewis²•Institutions (2)

University of Nebraska–Lincoln¹, Naval Postgraduate School²

01 Jan 1998

TL;DR: This book introduces state-of-the-art methods for programming parallel systems, including approaches to reverse engineering traditional sequential software, and includes detailed coverage of the critical scheduling problem, compares multiple programming languages and environments, and shows how to measure the performance of parallel systems.

...read moreread less

Abstract: The state-of-the-art in high-performance concurrent computing -- theory and practice.-- Detailed coverage of the growing integration between parallel and distributed computing.-- Advanced approaches for programming distributed, parallel systems -- and adapting traditional sequential software.-- Creating a Parallel Virtual Machine (PVM) from networked, heterogeneous systems.This is the most up-to-date, comprehensive guide to the rapidly changing field of distributed and parallel systems.The book begins with an introductory survey of distributed and parallel computing: its rationale and evolution. It compares and contrasts a wide variety of approaches to parallelism, from distributed computer networks, to parallelism within processors (such as Intel's MMX), to massively parallel systems. The book introduces state-of-the-art methods for programming parallel systems, including approaches to reverse engineering traditional sequential software. It includes detailed coverage of the critical scheduling problem, compares multiple programming languages and environments, and shows how to measure the performance of parallel systems. The book introduces the Parallel Virtual Machine (PVM) system for writing programs that run on a network of heterogenous systems; the new Message Passing Interface (MPI-2)standard; and finally, the growing role of Java in writing distributed and parallel applications.

...read moreread less

Book Chapter•DOI•

Bayanihan: Web-Based Volunteer Computing Using Java

[...]

Luis F. G. Sarmenta¹•Institutions (1)

Massachusetts Institute of Technology¹

04 Mar 1998-Lecture Notes in Computer Science

TL;DR: This paper presents and discusses the idea of Web-based volunteer computing, which allows people to cooperate in solving a large parallel problem by using standard Web browsers to volunteer their computers' processing power.

...read moreread less

Abstract: This paper presents and discusses the idea of Web-based volunteer computing, which allows people to cooperate in solving a large parallel problem by using standard Web browsers to volunteer their computers' processing power. Because volunteering requires no prior human contact and very little technical knowledge, it becomes very easy to build very large volunteer computing networks. At its full potential, volunteer computing can make it possible to build world-wide massively parallel computing networks more powerful than any supercomputer. Even on a smaller, more practical scale, volunteer computing can be used within companies or institutions to provide supercomputer-like facilities by harnessing the computing power of existing workstations. Many interesting variations are possible, including networks of information appliances (NOIAs), paid volunteer systems, and barter trade of compute cycles. In this paper, we discuss these possibilities, and identify several issues that will need to be addressed in order to successfully implement them. We also present an overview of the current work being done in the Bayanihan volunteer computing project.

...read moreread less

Journal Article•DOI•

Parallel implementation of the PHOENIX generalized stellar atmosphere program. II. Wavelength parallelization

[...]

E. Baron¹, Peter H. Hauschildt²•Institutions (2)

University of Oklahoma¹, University of Georgia²

01 Mar 1998-The Astrophysical Journal

TL;DR: In this paper, a pipelined parallelization of PHOENIX is described, where the necessary data from a previous wavelength point is sent to the processor working on the succeeding wavelength point as soon as it is known.

...read moreread less

Abstract: We describe an important addition to the parallel implementation of our generalized nonlocal thermodynamic equilibrium (NLTE) stellar atmosphere and radiative transfer computer program PHOENIX. In a previous paper in this series we described data and task parallel algorithms we have developed for radiative transfer, spectral line opacity, and NLTE opacity and rate calculations. These algorithms divided the work spatially or by spectral lines, that is, distributing the radial zones, individual spectral lines, or characteristic rays among different processors and employ, in addition, task parallelism for logically independent functions (such as atomic and molecular line opacities). For finite, monotonic velocity fields, the radiative transfer equation is an initial value problem in wavelength, and hence each wavelength point depends upon the previous one. However, for sophisticated NLTE models of both static and moving atmospheres needed to accurately describe, e.g., novae and supernovae, the number of wavelength points is very large (200,000-300,000) and hence parallelization over wavelength can lead both to considerable speedup in calculation time and the ability to make use of the aggregate memory available on massively parallel supercomputers. Here, we describe an implementation of a pipelined design for the wavelength parallelization of PHOENIX, where the necessary data from the processor working on a previous wavelength point is sent to the processor working on the succeeding wavelength point as soon as it is known. Our implementation uses a MIMD design based on a relatively small number of standard message passing interface (MPI) library calls and is fully portable between serial and parallel computers.

...read moreread less

Book•DOI•

Computing in Object-Oriented Parallel Environments

[...]

Denis Caromel, Rodney R. Oldehoeft, Marydell Tholburn

01 Jan 1998

TL;DR: Introducing a mechanism of object archiving and transmission has enabled a natural extension to a parallel algorithm and showed good performance on a networked PC cluster, for sufficiently coarse granularity.

...read moreread less

Abstract: With trends toward more complex nuclear reactor designs, advanced methods are required for appropriate reduction of design margins from an economical point of view. As a solution, an algorithm based on an object-oriented approach has been developed. In this algorithm, calculation meshes are represented as calculation objects wherein specific calculation algorithms are encapsulated. Abstracted data, which are neutron current objects, are exchanged between these objects. Calculation objects can retrieve required data having specified data types from the neutron current objects, which leads to a combined use of different calculation methods and algorithms in the same computation. Introducing a mechanism of object archiving and transmission has enabled a natural extension to a parallel algorithm. The parallel solution is identical with the sequential one. The SCOPE code, an actual implementation of our algorithm, showed good performance on a networked PC cluster, for sufficiently coarse granularity.

...read moreread less

Book Chapter•DOI•

Improving First-Come-First-Serve Job Scheduling by Gang Scheduling

[...]

Uwe Schwiegelshohn, Ramin Yahyapour

30 Mar 1998

TL;DR: By simulation with real workload data, a new scheduling method for batch jobs on massively parallel processor architectures based on the First-come-first-serve strategy is shown to be suitable to be applied in real parallel computers.

...read moreread less

Abstract: We present a new scheduling method for batch jobs on massively parallel processor architectures. This method is based on the First-come-first-serve strategy and emphasizes the notion of fairness. Severe fragmentation is prevented by using gang scheduling which is only initiated by highly parallel jobs. Good worst-case behavior of the scheduling approach has already been proven by theoretical analysis. In this paper we show by simulation with real workload data that the algorithm is also suitable to be applied in real parallel computers. This holds for several different scheduling criteria like makespan or sum of the flow times. Simulation is also used for determination of the best parameter set for the new method.

...read moreread less

Proceedings Article•DOI•

Elastic modeling initiative, part III: 3-D computational modeling

[...]

John C. Grieger¹, Shawn Larsen²•Institutions (2)

Phillips Petroleum Company¹, Lawrence Livermore National Laboratory²

01 Mar 1998-Seg Technical Program Expanded Abstracts

TL;DR: A 3-D finite-difference elastic wave propagation code that incorporates a number of advanced computational and physics-based enhancements has been developed and will be used to generate an elastic subset of the SEG/EAEG acoustic data set.

...read moreread less

Abstract: A 3-D finite-difference elastic wave propagation code that incorporates a number of advanced computational and physics-based enhancements has been developed. These enhancements include full 3-D elastic, viscoelastic, and topographic modeling (anisotropic capabilities arc being added), low-level optimization, propagating and variable density grids, hybridization, and parallelization. This code takes advantage of high performance computing and massively parallel processing to make 3-D full-physics simulations of seismic problems feasible. This computational tool will be used to generate an elastic subset of the SEG/EAEG acoustic data set. The acoustic and elastic data will be compared to examine pitfalls with traditional processing, and to test the effectiveness of using elastic data as an aid to seismic imaging.

...read moreread less

Journal Article•DOI•

Parallel Transient Dynamics Simulations

[...]

Steve Plimpton¹, Steve Attaway¹, Bruce Hendrickson¹, J.W. Swegle¹, Courtenay Vanghan¹ - Show less +1 more•Institutions (1)

Sandia National Laboratories¹

01 Apr 1998-Journal of Parallel and Distributed Computing

TL;DR: New parallel algorithms for smoothed particle hydrodynamics and contact detection are described which turn out to have several key features in common and how to join them with traditional parallel finite element techniques to create an integrated particle/mesh transient dynamics simulation is described.

...read moreread less

Journal Article•DOI•

A parallel adaptive tabu search approach

[...]

El-Ghazali Talbi¹, Zouhir Hafidi¹, J.-M. Geib¹•Institutions (1)

Lille University of Science and Technology¹

01 Dec 1998

TL;DR: The adaptive parallelism used to dynamically adjust the parallelism degree of the application with respect to the system load demonstrates that high-performance computing using a hundred of heterogeneous workstations combined with massively parallel machines is feasible to solve large optimization problems.

...read moreread less

Abstract: This paper presents a new approach for parallel tabu search based on adaptive parallelism. Adaptive parallelism was used to dynamically adjust the parallelism degree of the application with respect to the system load. Adaptive parallelism demonstrates that high-performance computing using a hundred of heterogeneous workstations combined with massively parallel machines is feasible to solve large optimization problems. The parallel tabu search algorithm includes different tabu list sizes and new intensification/diversification mechanisms. Encouraging results have been obtained in solving the quadratic assignment problem. We have improved the best known solutions for some large real-world problems.

...read moreread less

Journal Article•DOI•

A spanning multichannel linked hypercube: a gradually scalable optical interconnection network for massively parallel computing

[...]

Ahmed Louri¹, B. Weech¹, Costas Neocleous¹•Institutions (1)

University of Arizona¹

01 May 1998-IEEE Transactions on Parallel and Distributed Systems

TL;DR: A new, scalable interconnection topology called the Spanning Multichannel Linked Hypercube (SMLH) is proposed, which is very suitable to massively parallel systems and is highly amenable to optical implementation.

...read moreread less

Abstract: A new, scalable interconnection topology called the Spanning Multichannel Linked Hypercube (SMLH) is proposed. This proposed network is very suitable to massively parallel systems and is highly amenable to optical implementation. The SMLH uses the hypercube topology as a basic building block and connects such building blocks using two-dimensional multichannel links (similar to spanning buses). In doing so, the SMLH combines positive features of both the hypercube (small diameter, high connectivity, symmetry, simple routing, and fault tolerance) and the spanning bus hypercube (SBH) (constant node degree, scalability, and ease of physical implementation), while at the same time circumventing their disadvantages. The SMLH topology supports many communication patterns found in different classes of computation, such as bus-based, mesh-based, and tree-based problems, as well as hypercube-based problems. A very attractive feature of the SMLH network is its ability to support a large number of processors with the possibility of maintaining a constant degree and a constant diameter. Other positive features include symmetry, incremental scalability, and fault tolerance. It is shown that the SMLH network provides better average message distance, average traffic density, and queuing delay than many similar networks, including the binary hypercube, the SBH, etc. Additionally, the SMLH has comparable performance to other high-performance hypercubic networks, including the Generalized Hypercube and the Hypermesh. An optical implementation methodology is proposed for SMLH. The implementation methodology combines both the advantages of free space optics with those of wavelength division multiplexing techniques. A detailed analysis of the feasibility of the proposed network is also presented.

...read moreread less

Journal Article•DOI•

Self-replicating and self-repairing multicellular automata

[...]

Gianluca Tempesti¹, Daniel Mange¹, André Stauffer¹•Institutions (1)

École Polytechnique Fédérale de Lausanne¹

01 Jun 1998-Artificial Life

TL;DR: This work presents an attempt to draw inspiration from biology in the design of a novel digital circuit: a field-programmable gate array (FPGA), endowed with two features motivated and guided by the behavior of biological systems: self-replication and self-repair.

...read moreread less

Abstract: Biological organisms are among the most intricate structures known to man, exhibiting highly complex behavior through the massively parallel cooperation of numerous relatively simple elements, the cells. As the development of computing systems approaches levels of complexity such that their synthesis begins to push the limits of human intelligence, engineers are starting to seek inspiration in nature for the design of computing systems, both at the software and at hardware levels. We present one such endeavor, notably an attempt to draw inspiration from biology in the design of a novel digital circuit: a field-programmable gate array (FPGA). This reconfigurable logic circuit will be endowed with two features motivated and guided by the behavior of biological systems: self-replication and self-repair.

...read moreread less

Journal Article•DOI•

Parallelization of a Dynamic Monte Carlo Algorithm: a Partially Rejection-Free Conservative Approach

[...]

Gyorgy Korniss, Mark A. Novotny, Per Arne Rikvold¹•Institutions (1)

Florida State University¹

21 Dec 1998-arXiv: Statistical Mechanics

TL;DR: In this article, a massively parallel implementation of an algorithm for simulating the dynamics of metastable decay in kinetic Ising models is presented, which is directly applicable to a wide range of stochastic cellular automata.

...read moreread less

Abstract: We experiment with a massively parallel implementation of an algorithm for simulating the dynamics of metastable decay in kinetic Ising models. The parallel scheme is directly applicable to a wide range of stochastic cellular automata where the discrete events (updates) are Poisson arrivals. For high performance, we utilize a continuous-time, asynchronous parallel version of the n-fold way rejection-free algorithm. Each processing element carries an lxl block of spins, and we employ the fast SHMEM-library routines on the Cray T3E distributed-memory parallel architecture. Different processing elements have different local simulated times. To ensure causality, the algorithm handles the asynchrony in a conservative fashion. Despite relatively low utilization and an intricate relationship between the average time increment and the size of the spin blocks, we find that for sufficiently large l the algorithm outperforms its corresponding parallel Metropolis (non-rejection-free) counterpart. As an example application, we present results for metastable decay in a model ferromagnetic or ferroelectric film, observed with a probe of area smaller than the total system.

...read moreread less

MUMPS MUltifrontal Massively Parallel Solver Version 2.0

[...]

Patrick R. Amestoy, Iain S. Duff, Jean-Yves L'Excellent

01 Jan 1998

TL;DR: The interface to the MUMPS code and the message passing mechanisms that are used in the package are described, which are to develop a public domain library of sparse codes for distributed memory parallel computers.

...read moreread less

Abstract: We describe aspects of the interface and design of Version 2.0 of the MUltifrontal Massively Parallel Solver MUMPS. This code solves sets of sparse linear equations Ax = b, where the matrix A is unsymmetric. It is written in Fortran 90 and uses MPI for message passing. It also calls the ScaLAPACK code which in turn uses the BLACS. Level 3 BLAS are also used by the code. MUMPS is the direct solver in the PARASOL project, an EU LTR Project with twelve partners from ve countries. The main aim of PARASOL is to develop a public domain library of sparse codes for distributed memory parallel computers. This report describes the interface to the MUMPS code and the message passing mechanisms that are used in the package.

...read moreread less

Journal Article•DOI•

Scheduling with neural networks : A review of the literature and new research directtions

[...]

Ihsan Sabuncuoglu

01 Jan 1998-Production Planning & Control

TL;DR: Artificial neural networks attempt to emulate the massively parallel and distributed processing of the human brain and are being examined for a variety of problems that have been difficult to solve.

...read moreread less

Abstract: Artificial neural networks (ANNs) attempt to emulate the massively parallel and distributed processing of the human brain. They are being examined for a variety of problems that have been difficult...

...read moreread less

Journal Article•DOI•

Parallel decomposition of unstructured FEM‐meshes

[...]

Ralf Diekmann¹, Derk Meyer¹, Burkhard Monien¹•Institutions (1)

University of Paderborn¹

01 Jan 1998-Concurrency and Computation: Practice and Experience

TL;DR: In this paper, a massively parallel algorithm for static and dynamic partitioning of unstructured FEM-meshes is presented, where a fast but inaccurate sequential clustering is determined which is used, together with a simple mapping heuristic, to map the mesh initially onto the processors of a massive parallel system.

...read moreread less

Abstract: We present a massively parallel algorithm for static and dynamic partitioning of unstructured FEM-meshes. The method consists of two parts. First a fast but inaccurate sequential clustering is determined which is used, together with a simple mapping heuristic, to map the mesh initially onto the processors of a massively parallel system. The second part of the method uses a massively parallel algorithm to remap and optimize the mesh decomposition taking several cost functions into account. It first calculates the amount of nodes that have to be migrated between pairs of clusters in order to obtain an optimal load balancing. In a second step, nodes to be migrated are chosen according to cost functions optimizing the amount and necessary communication and other measures which are important for the numerical solution method (like for example the aspect ratio of the resulting domains).

...read moreread less

Journal Article•DOI•

Massively parallel dual control volume grand canonical molecular dynamics with LADERA I. Gradient driven diffusion in Lennard-Jones fluids

[...]

Grant S. Heffelfinger¹, David M. Ford¹•Institutions (1)

Sandia National Laboratories¹

01 Jul 1998-Molecular Physics

TL;DR: In this paper, a new algorithm to enable the implementation of dual control volume grand canonical molecular dynamics (DCV-GCMD) on massively parallel (MP) architectures is presented.

...read moreread less

Abstract: A new algorithm to enable the implementation of dual control volume grand canonical molecular dynamics (DCV-GCMD) on massively parallel (MP) architectures is presented. DCVGCMD can be thought of as hybridization of molecular dynamics (MD) and grand canonical Monte Carlo (GCMC) and was developed recently to make possible the simulation of gradient-driven diffusion. The method has broad application to such problems as membrane separations, drug delivery systems, diffusion in polymers and zeolites, etc. The massively parallel algorithm for the DCV-GCMD method has been implemented in a code named LADERA which employs the short range Lennard-Jones potential for pure fluids and multicomponent mixtures including bulk and confined (single pore as well as amorphous solid materials) systems. Like DCV-GCMD, LADERA's MP algorithm can be thought of as a hybridization of two different algorithms, spatial MD and spatial GCMC. The DCV-GCMD method is described fully followed by the DCV-GCMD parallel algorithm employed in ...

...read moreread less

Journal Article•DOI•

Massively parallel dual control volume grand canonical molecular dynamics with ladera II. Gradient driven diffusion through polymers

[...]

David M. Ford¹, Grant S. Heffelfinger¹•Institutions (1)

Sandia National Laboratories¹

01 Jul 1998-Molecular Physics

TL;DR: The present version of the LADERA FORTRAN code has the capability of modelling systems with explicit intramolecular interactions such as bonds, angles, and dihedral rotations and includes another new feature, which is the use of neighbour lists in force calculations.

...read moreread less

Abstract: This paper, the second part of a series, extends the capabilities of the LADERA FORTRAN code for massively parallel dual control volume grand canonical molecular dynamics (DCVGCMD). DCV-GCMD is a hybrid of two more common molecular simulation techniques (grand canonical Monte Carlo and molecular dynamics) which allows the direct molecularlevel modelling of diffusion under a chemical potential gradient. The present version of the code, LADERA-B has the capability of modelling systems with explicit intramolecular interactions such as bonds, angles, and dihedral rotations. The utility of the new code for studying gradient-driven diffusion of small molecules through polymers is demonstrated by applying it to two model systems. LADERA-B includes another new feature, which is the use of neighbour lists in force calculations. This feature increases the speed of the code but presents several challenges in the parallel hybrid algorithm. There is discussion on how these problems were addressed and how our implement...

...read moreread less

Proceedings Article•DOI•

Design and implementation of multilevel parallel optimization on the intel teraflops

[...]

M. S. Eldred, W. E. Hartt

02 Sep 1998

TL;DR: Various parallel programming models are discussed, although emphasis is given to a masterslave implementation using the Message Passing Interface (MPI), and a mathematical analysis is given on achieving peak efficiency in multilevel parallelism by selecting the most effective processor partitioning schemes.

...read moreread less

Abstract: Single-level parallel optimization approaches, those in which either the simulation code executes in parallel or the optimization algorithm invokes multiple simultaneous single-processor analyses, have been investigated previously and been shown to be effective in reducing the time required to compute optimal solutions. However, these approaches have clear performance limitations which point to the need for multiple levels of parallelism in order to achieve peak parallel performance. Managing multiple simultaneous instances of massively parallel simulations is a challenging software undertaking, especially if the implementation is to be flexible, extensible, and generalpurpose. This paper focuses on the design for multilevel parallelism as implemented within the DAKOTA iterator toolkit. Various parallel programming models are discussed, although emphasis is given to a masterslave implementation using the Message Passing Interface (MPI). A mathematical analysis is given on achieving peak efficiency in multilevel parallelism by selecting the most effective processor partitioning schemes. This analysis is verified in some computational experiments.

...read moreread less

Proceedings Article•DOI•

Genetic optimization of cellular neural networks

[...]

Martin Hänggi¹, George S. Moschytz•Institutions (1)

ETH Zurich¹

04 May 1998

TL;DR: A genetic algorithm is proposed as a generally applicable global learning method for finding and optimizing parameters of a cellular neural network that have to be insensitive to small perturbations.

...read moreread less

Abstract: The operation of a cellular neural network (CNN) is defined by a set of 19 parameters. There is no known general method for finding these parameters; analytic design methods are available for a small class of problems only. Standard learning algorithms cannot be applied due to the lack of gradient information. The authors propose a genetic algorithm as a generally applicable global learning method. In order to be useful for real CNN VLSI chips, the parameters have to be insensitive to small perturbations Therefore, after the parameters are learnt they are optimized with respect to robustness in a second genetic processing step. As the simulation of CNNs necessitates the numerical integration of large systems of nonlinear differential equations, the evaluation of the fitness functions is computationally very expensive; a massively parallel supercomputer is used to achieve acceptable run times.

...read moreread less

Journal Article•DOI•

SOM Accelerator System

[...]

Stefan Rüping¹, Mario Porrmann¹, Ulrich Rückert¹•Institutions (1)

University of Paderborn¹

06 Nov 1998-Neurocomputing

TL;DR: A high-performance system with the latest NBISOM_25 chips is presented, integrated in a simulation framework for neural networks, that contains software tools for self-organizing maps as well as for neural associative memories, tools for pre- and postprocessing and tools for graphical analysis of the simulation results.

...read moreread less

Collapse