scispace - formally typeset
Search or ask a question

Showing papers on "Massively parallel published in 2005"


Journal ArticleDOI
TL;DR: From this project came the people and ideas that underpinned VMware Inc., the original supplier of VMMs for commodity computing hardware, and the implications of having a VMM for commodity platforms intrigued both researchers and entrepreneurs.
Abstract: Developed more than 30 years ago to address mainframe computing problems, virtual machine monitors have resurfaced on commodity platforms, offering novel solutions to challenges in security, reliability, and administration Stanford University researchers began to look at the potential of virtual machines to overcome difficulties that hardware and operating system limitations imposed: This time the problems stemmed from massively parallel processing (MPP) machines that were difficult to program and could not run existing operating systems With virtual machines, researchers found they could make these unwieldy architectures look sufficiently similar to existing platforms to leverage the current operating systems From this project came the people and ideas that underpinned VMware Inc, the original supplier of VMMs for commodity computing hardware The implications of having a VMM for commodity platforms intrigued both researchers and entrepreneurs

720 citations


Journal ArticleDOI
TL;DR: The key architectural features of BlueGene/L are introduced: the link chip component and five Blue Gene/L networks, the PowerPC® 440 core and floating-point enhancements, the on-chip and off-chip distributed memory system, the node- and system-level design for high reliability, and the comprehensive approach to fault isolation.
Abstract: The Blue Gene®/L computer is a massively parallel supercomputer based on IBM system-on-a-chip technology. It is designed to scale to 65,536 dual-processor nodes, with a peak performance of 360 teraflops. This paper describes the project objectives and provides an overview of the system architecture that resulted. We discuss our application-based approach and rationale for a low-power, highly integrated design. The key architectural features of Blue Gene/L are introduced in this paper: the link chip component and five Blue Gene/L networks, the PowerPC® 440 core and floating-point enhancements, the on-chip and off-chip distributed memory system, the node- and system-level design for high reliability, and the comprehensive approach to fault isolation.

422 citations


Journal ArticleDOI
TL;DR: Both the architecture and the microarchitecture of the torus and a network performance simulator are described and simulation results and hardware measurements are presented.
Abstract: The main interconnect of the massively parallel Blue Gene®/L is a three-dimensional torus network with dynamic virtual cut-through routing. This paper describes both the architecture and the microarchitecture of the torus and a network performance simulator. Both simulation results and hardware measurements are presented.

361 citations


Journal ArticleDOI
TL;DR: ZEUS-MP as discussed by the authors is a massively parallel implementation of the ZEUS code for simulations on parallel computing platforms, which allows the advection of multiple chemical (or nuclear) species.
Abstract: This paper describes ZEUS-MP, a multi-physics, massively parallel, message- passing implementation of the ZEUS code. ZEUS-MP differs significantly from the ZEUS-2D code, the ZEUS-3D code, and an early "version 1" of ZEUS-MP distributed publicly in 1999. ZEUS-MP offers an MHD algorithm better suited for multidimensional flows than the ZEUS-2D module by virtue of modifications to the Method of Characteristics scheme first suggested by Hawley and Stone (1995), and is shown to compare quite favorably to the TVD scheme described by Ryu et. al (1998). ZEUS-MP is the first publicly-available ZEUS code to allow the advection of multiple chemical (or nuclear) species. Radiation hydrodynamic simulations are enabled via an implicit flux-limited radiation diffusion (FLD) module. The hydrodynamic, MHD, and FLD modules may be used in one, two, or three space dimensions. Self gravity may be included either through the assumption of a GM/r potential or a solution of Poisson's equation using one of three linear solver packages (conjugate-gradient, multigrid, and FFT) provided for that purpose. Point-mass potentials are also supported. Because ZEUS-MP is designed for simulations on parallel computing platforms, considerable attention is paid to the parallel performance characteristics of each module. Strong-scaling tests involving pure hydrodynamics (with and without self-gravity), MHD, and RHD are performed in which large problems (256^3 zones) are distributed among as many as 1024 processors of an IBM SP3. Parallel efficiency is a strong function of the amount of communication required between processors in a given algorithm, but all modules are shown to scale well on up to 1024 processors for the chosen fixed problem size.

333 citations


Book
01 Jan 2005
TL;DR: The interdisciplinary research monograph brings together results of a decade-long study into designing experimental and simulated prototypes of reaction-diffusion computing devices for image processing, path planning, robot navigation, computational geometry, logics and artificial intelligence.
Abstract: The interdisciplinary research monograph, which has been peer-reviewed by several international experts assigned by Elsevier, introduces ground breaking original results in formal paradigms, architectures and laboratory implementations of computers based on travelling waves in reaction-diffusion media The monograph brings together results of a decade-long study into designing experimental and simulated prototypes of reaction-diffusion computing devices for image processing, path planning, robot navigation, computational geometry, logics and artificial intelligence The book has had impact in the field of massively parallel computing because of its comprehensive presentation of the theoretical and experimental foundations, and cutting-edge computation techniques, chemical laboratory experimental setups and hardware implementation technology employed in the development of novel nature-inspired computing devices The monograph resulted from EPSRC grants GR/S63854/01 and EP/C004272/1

302 citations


Journal ArticleDOI
TL;DR: A new method for the parallel evaluation of distance‐limited pairwise particle interactions that significantly reduces the amount of data transferred between processors by comparison with traditional methods is introduced.
Abstract: Classical molecular dynamics simulations of biological macromolecules in explicitly modeled solvent typically require the evaluation of interactions between all pairs of atoms separated by no more than some distance R, with more distant interactions handled using some less expensive method. Performing such simulations for periods on the order of a millisecond is likely to require the use of massive parallelism. The extent to which such simulations can be efficiently parallelized, however, has historically been limited by the time required for interprocessor communication. This article introduces a new method for the parallel evaluation of distance-limited pairwise particle interactions that significantly reduces the amount of data transferred between processors by comparison with traditional methods. Specifically, the amount of data transferred into and out of a given processor scales as O(R(3/2)p(-1/2)), where p is the number of processors, and with constant factors that should yield a substantial performance advantage in practice.

200 citations


Proceedings ArticleDOI
28 Jun 2005
TL;DR: The NonStop advanced architecture (NSAA) uses dual or triple modular redundant fault-tolerant servers built from standard HP 4-way SMP Itanium/spl reg/2 server processor modules, memory boards, and power infrastructure to improve system availability and reduce cost.
Abstract: For nearly 30 years the Hewlett Packard NonStop Enterprise Division (formerly Tandem Computers Inc.) has produced highly available, fault-tolerant, massively parallel NonStop computer systems. These vertically integrated systems use a proprietary operating system and specialized hardware for detecting, isolating, and recovering from faults. The NonStop advanced architecture (NSAA) uses dual or triple modular redundant fault-tolerant servers built from standard HP 4-way SMP Itanium/spl reg/2 server processor modules, memory boards, and power infrastructure. A unique synchronization mechanism allows fully compared operations from loosely synchronized processor modules. In addition, the NSAA improves system availability by additional hardware fault masking, and significantly lowers cost by leveraging existing high-volume Itanium server components.

185 citations


01 Jan 2005
TL;DR: Catamount is designed to be a low overhead operating system for a parallel computing environment that is limited to the minimum set needed to run a scientific computation.
Abstract: Catamount is designed to be a low overhead operating system for a parallel computing environment. Functionality is limited to the minimum set needed to run a scientific computation. The design choices and implementations will be presented.

126 citations



Journal ArticleDOI
TL;DR: The data structures, parallel implementation, and resulting performance of the IJ, Struct and semiStruct interfaces are described, which investigates their scalability, presents successes as well as pitfalls of some of the approaches and suggests ways of dealing with them.
Abstract: The software library hypre provides high-performance preconditioners and solvers for the solution of large, sparse linear systems on massively parallel computers as well as conceptual interfaces that allow users to access the library in the way they naturally think about their problems. These interfaces include a stencil-based structured interface (Struct); a semistructured interface (semiStruct), which is appropriate for applications that are mostly structured, for example, block structured grids, composite grids in structured adaptive mesh refinement applications, and overset grids; and a finite element interface (FEI) for unstructured problems, as well as a conventional linear-algebraic interface (IJ). It is extremely important to provide an efficient, scalable implementation of these interfaces in order to support the scalable solvers of the library, especially when using tens of thousands of processors. This article describes the data structures, parallel implementation, and resulting performance of the IJ, Struct and semiStruct interfaces. It investigates their scalability, presents successes as well as pitfalls of some of the approaches and suggests ways of dealing with them.

89 citations


Journal ArticleDOI
TL;DR: A perspective is presented that retains the descriptive richness while providing a unifying framework that will lead to effective and affordable Petaflops-scale computing including the future role of computer centers as facilities for supporting high performance computing environments.
Abstract: In a recent paper, Gordon Bell and Jim Gray (2002) put forth a view of the past, present, and future of high-performance computing (HPC) that is both insightful and thought provoking. Identifying key trends with a grace and candor rarely encountered in a single work, the authors describe an evolutionary past drawn from their vast experience and project an enticing and compelling vision of HPC's future. Yet, the underlying assumptions implicit in their treatment, particularly those related to terminology and dominant trends, conflict with our own experience, common practices, and shared view of HPCs future directions. Taken from our vantage points of the Top500 list," the Lawrence Berkeley National Laboratory NERSC computer center, Beowulf-class computing, and research in petaflops-scale computing architectures, we offer an alternate perspective on several key issues in the form of a constructive counterpoint. One objective of this article is to restore the strength and value of the term "cluster" by degeneralizing its applicability to a restricted subset of parallel computers. We'll further consider this class in conjunction with its complementing terms constellation, Beowulf class, and massively parallel processing systems (MPPs), based on the classification used by the Top500 list, which has tracked the HPC field for more than a decade.

Book ChapterDOI
Jarkko Kari1
04 Jul 2005
TL;DR: This paper is a short survey of research on reversible cellular automata over the past fourty plus years and discusses the classic results by Hedlund, Moore and Myhill that relate injectivity, surjectivity and reversibility with each other.
Abstract: Reversible cellular automata (RCA) are models of massively parallel computation that preserve information. This paper is a short survey of research on reversible cellular automata over the past fourty plus years. We discuss the classic results by Hedlund, Moore and Myhill that relate injectivity, surjectivity and reversibility with each other. Then we review algorithmic questions and some results on computational universality. Finally we talk about local reversibility vs. global reversibility.

Journal ArticleDOI
TL;DR: Performance measurements show that message-passing services deliver performance close to the hardware limits of the machine, and dedicating one of the processors of a node to communication functions greatly improves the message-Passing bandwidth, whereas running two processes per compute node (virtual node mode) can have a positive impact on application performance.
Abstract: The Blue Gene®/L (BG/L) supercomputer, with 65,536 dual-processor compute nodes, was designed from the ground up to support efficient execution of massively parallel message-passing programs. Part of this support is an optimized implementation of the Message Passing Interface (MPI), which leverages the hardware features of BG/L. MPI for BG/L is implemented on top of a more basic message-passing infrastructure called the message layer. This message layer can be used both to implement other higher-level libraries and directly by applications. MPI and the message layer are used in the two BG/L modes of operation: the coprocessor mode and the virtual node mode. Performance measurements show that our message-passing services deliver performance close to the hardware limits of the machine. They also show that dedicating one of the processors of a node to communication functions (coprocessor mode) greatly improves the message-passing bandwidth, whereas running two processes per compute node (virtual node mode) can have a positive impact on application performance.

Journal ArticleDOI
TL;DR: A linear-scaling algorithm has been developed to perform large-scale molecular-dynamics (MD) simulations, in which interatomic forces are computed quantum mechanically in the framework of the density functional theory.

Journal ArticleDOI
TL;DR: A bifurcation analysis of an 1.6 Million unknown model of 3D Rayleigh–Benard convection in a 5 × 5 × 1 box is successfully undertaken, showing that the algorithms can indeed scale to problems of this size while producing solutions of reasonable accuracy.
Abstract: We present the set of bifurcation tracking algorithms which have been developed in the LOCA software library to work with large scale application codes that use fully coupled Newton's method with iterative linear solvers. Turning point (fold), pitchfork, and Hopf bifurcation tracking algorithms based on Newton's method have been implemented, with particular attention to the scalability to large problem sizes on parallel computers and to the ease of implementation with new application codes. The ease of implementation is accomplished by using block elimination algorithms to solve the Newton iterations of the augmented bifurcation tracking systems. The applicability of such algorithms for large applications is in doubt since the main computational kernel of these routines is the iterative linear solve of the same matrix that is being driven singular by the algorithm. To test the robustness and scalability of these algorithms, the LOCA library has been interfaced with the MPSalsa massively parallel finite element reacting flows code. A bifurcation analysis of an 1.6 Million unknown model of 3D Rayleigh–Benard convection in a 5 × 5 × 1 box is successfully undertaken, showing that the algorithms can indeed scale to problems of this size while producing solutions of reasonable accuracy.

Proceedings ArticleDOI
04 Apr 2005
TL;DR: So-called "M/spl times/N" research, as part of the Common Component Architecture (CCA) effort, addresses these special and challenging needs, to provide generalized interfaces and tools that support flexible parallel data redistribution and parallel remote method invocation.
Abstract: With the increasing availability of high-performance massively parallel computer systems, the prevalence of sophisticated scientific simulation has grown rapidly. The complexity of the scientific models being simulated has also evolved, leading to a variety of coupled multi-physics simulation codes. Such cooperating parallel programs require fundamentally new interaction capabilities, to efficiently exchange parallel data structures and collectively invoke methods across programs. So-called "M/spl times/N" research, as part of the Common Component Architecture (CCA) effort, addresses these special and challenging needs, to provide generalized interfaces and tools that support flexible parallel data redistribution and parallel remote method invocation. Using this technology, distinct simulation codes with disparate distributed data decompositions can work together to achieve greater scientific discoveries.

Book
01 Jan 2005
TL;DR: Computer architecture deals with the physical configuration logical structure formats protocols and operational sequences for processing data controlling the configuration and controlling the operations over a computer Advanced Computer Architecture and Parallel Processing.
Abstract: Advanced Computer Architecture And Parallel Processing Wiley Series On Parallel And Distributed Computing V 2 *FREE* advanced computer architecture and parallel processing wiley series on parallel and distributed computing v 2 Multiprocessing is the use of two or more central processing units (CPUs) within a single computer system. The term also refers to the ability of a system to support more than one processor or the ability to allocate tasks between them.ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING Eqbal ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING TEAM LinG Live Informative Non cost and Genuine WILEY SERIES ON PARALLEL AND DISTRIBUTED COMPUTING SERIES EDITOR Albert Y Zomaya Parallel amp Distributed Simulation Systems Richard Fujimoto Network Computing 157 7 1 Computer Networks Basics 158 7 2 Client Server Systems 161 Advanced Computer Architecture and Parallel Processing Advanced Computer Architecture and Parallel Processing Wiley Series on Parallel and Distributed Computing v 2 Hesham El Rewini Mostafa Abd El Barr on Amazon com FREE shipping on qualifying offers Computer architecture deals with the physical configuration logical structure formats protocols and operational sequences for processing data Advanced Computer Architecture and Parallel Processing Computer architecture deals with the physical configuration logical structure formats protocols and operational sequences for processing data controlling the configuration and controlling the operations over a computer Advanced Computer Architecture and Parallel Processing

Proceedings ArticleDOI
12 Feb 2005
TL;DR: This work introduces scatter-add, which is the data-parallel form of the well-known scalar fetch-and-op, specifically tuned for SIMD/vector/stream style memory systems, and details the microarchitecture of a scattered-add implementation on a stream architecture, which requires less than 2% increase in die area yet shows performance speedups ranging from 1.45 to over 11 on a set of applications that require a scatter- add computation.
Abstract: Many important applications exhibit large amounts of data parallelism, and modern computer systems are designed to take advantage of it. While much of the computation in the multimedia and scientific application domains is data parallel, certain operations require costly serialization that increase the run time. Examples include superposition type updates in scientific computing and histogram computations in media processing. We introduce scatter-add, which is the data-parallel form of the well-known scalar fetch-and-op, specifically tuned for SIMD/vector/stream style memory systems. The scatter-add mechanism scatters a set of data values to a set of memory addresses and adds each data value to each referenced memory location instead of overwriting it. This novel architecture extension allows us to efficiently support data-parallel atomic update computations found in parallel programming languages such as HPF, and applies both to single-processor and multiprocessor SIMD data-parallel systems. We detail the microarchitecture of a scatter-add implementation on a stream architecture, which requires less than 2% increase in die area yet shows performance speedups ranging from 1.45 to over 11 on a set of applications that require a scatter-add computation.

Proceedings ArticleDOI
14 Jun 2005
TL;DR: MADbench is presented, a lightweight version of the MADCAP CMB power spectrum estimation code that retains the operational complexity and integrated system requirements, and the integrated performance monitoring (IPM) package is introduced: a portable, lightweight, and scalable tool for effectively extracting MPI message-passing overheads.
Abstract: The cosmic microwave background (CMB) is an exquisitely sensitive probe of the fundamental parameters of cosmology. Extracting this information is computationally intensive, requiring massively parallel computing and sophisticated numerical algorithms. In this work we present MADbench, a lightweight version of the MADCAP CMB power spectrum estimation code that retains the operational complexity and integrated system requirements. In addition, to quantify communication behavior across a variety of architectural platforms, we introduce the integrated performance monitoring (IPM) package: a portable, lightweight, and scalable tool for effectively extracting MPI message-passing overheads. A performance characterization study is conducted on some of the world's most powerful supercomputers, including the superscalar Seaborg (IBM Power3+) and CC-NUMA Columbia (SGIAltix), as well as the vector-based Earth Simulator (NEC SX-6 enhanced) and Phoenix (Cray XI) systems. In-depth analysis shows that in order to bridge the gap between theoretical and sustained system performance, it is critical to gain a clear understanding of how the distinct parts of large-scale parallel applications interact with the individual subcomponents of HEC platforms.

Journal Article
TL;DR: In this article, the Integrated Performance Monitoring (IPMADCAP) package is introduced to quantify communication behavior across a variety of architectural platforms, and a portable, lightweight, and scalable tool for effectively extracting MPI message-passing over heads is presented.
Abstract: The Cosmic Microwave Background (CMB) is an exquisitely sensitive probe of the fundamental parameters of cosmology. Extracting this information is computationally intensive, requiring massively parallel computing and sophisticated numerical algorithms. In this work we present MAD bench, a lightweight version of the MADCAP CMB power spectrum estimation code that retains the operational complexity and integrated system requirements. In addition, to quantify communication behavior across a variety of architectural platforms, we introduce the Integrated Performance Monitoring (IPM) package: a portable, lightweight, and scalable tool for effectively extracting MPI message-passing over heads. A performance characterization study is conducted on some of the world's most powerful supercomputers, including the superscalar Seaborg (IBMPower3+) and CC-NUMA Columbia (SGI Altix), as well as the vector-based Earth Simulator (NEC SX-6 enhanced) and Phoenix (Cray X1) systems. In-depth analysis shows that in order to bridge the gap between theoretical and sustained system performance, it is critical to gain a clear understanding of how the distinct parts of large-scale parallel applications interact with the individual subcomponents of HEC platforms.

Journal ArticleDOI
TL;DR: The design of a dual-issue single-instruction, multiple-data-like (SIMD-like) extension of the IBM PowerPC® 440 floating-point unit (FPU) core and the compiler and algorithmic techniques to exploit it are described and measurements show that the combination of algorithm, compiler, and hardware delivers a significant fraction of peak floating- point performance for compute-bound-kernels, such as matrix multiplication.
Abstract: We describe the design of a dual-issue single-instruction, multiple-data-like (SIMD-like) extension of the IBM PowerPC® 440 floating-point unit (FPU) core and the compiler and algorithmic techniques to exploit it. This extended FPU is targeted at both the IBM massively parallel Blue Gene®/L machine and the more pervasive embedded platforms. We discuss the hardware and software codesign that was essential in order to fully realize the performance benefits of the FPU when constrained by the memory bandwidth limitations and high penalties for misaligned data access imposed by the memory hierarchy on a Blue Gene/L node. Using both hand-optimized and compiled code for key linear algebraic kernels, we validate the architectural design choices, evaluate the success of the compiler, and quantify the effectiveness of the novel algorithm design techniques. Our measurements show that the combination of algorithm, compiler, and hardware delivers a significant fraction of peak floating-point performance for compute-bound-kernels, such as matrix multiplication, and delivers a significant fraction of peak memory bandwidth for memorybound kernels, such as DAXPY, while remaining largely insensitive to data alignment.

Book ChapterDOI
Dong-Sun Kim, Hyun-Sik Kim, Hongsik Kim, Gunhee Han, Duck-Jin Chung1 
30 May 2005
TL;DR: This paper proposes a high performance neural network processor whose function can be changed by programming that is based on the SIMD architecture that is optimized for neural network and image processing.
Abstract: Artificial Neural Networks (ANNs) and image processing requires massively parallel computation of simple operator accompanied by heavy memory access. Thus, this type of operators naturally maps onto Single Instruction Multiple Data (SIMD) stream parallel processing with distributed memory. This paper proposes a high performance neural network processor whose function can be changed by programming. The proposed processor is based on the SIMD architecture that is optimized for neural network and image processing. The proposed processor supports 24 instructions, and consists of 16 Processing Units (PUs) per chip. Each PU includes 24-bit 2K-word Local Memory (LM) and a Processing Element (PE). The proposed architecture allows multichip expansion that minimizes chip-to-chip communication bottleneck. The proposed processor is verified with FPGA implementation and the functionality is verified with character recognition application.

Book ChapterDOI
11 Sep 2005
TL;DR: A parallel implementation of an interior point method uses object-oriented programming techniques and allows for exploiting different block-structures of matrices and outperforms the industry-standard optimizer, shows very good parallel efficiency on massively parallel architecture and solves problems of unprecedented sizes reaching 109 variables.
Abstract: Solution methods for very large scale optimization problems are addressed in this paper. Interior point methods are demonstrated to provide unequalled efficiency in this context. They need a small (and predictable) number of iterations to solve a problem. A single iteration of interior point method requires the solution of indefinite system of equations. This system is regularized to guarantee the existence of triangular decomposition. Hence the well-understood parallel computing techniques developed for positive definite matrices can be extended to this class of indefinite matrices. A parallel implementation of an interior point method is described in this paper. It uses object-oriented programming techniques and allows for exploiting different block-structures of matrices. Our implementation outperforms the industry-standard optimizer, shows very good parallel efficiency on massively parallel architecture and solves problems of unprecedented sizes reaching 109 variables.

Journal ArticleDOI
TL;DR: A new load balanced parallel implementation of a non-adaptive version of Greengard and Rokhlin's fast multipole method for distributed memory architectures with focus on applications in molecular dynamics is presented.

Proceedings ArticleDOI
04 May 2005
TL;DR: In this paper, a two-state reversible cellular automata (RCA) is described, which is shown to be capable of universal computation and evidence is offered that this RCA is able to be constructed by universal construction.
Abstract: A novel two-state, Reversible Cellular Automata (RCA) is described. This three-dimensional RCA is shown to be capable of universal computation. Additionally, evidence is offered that this RCA is capable of universal construction.

Proceedings ArticleDOI
12 Nov 2005
TL;DR: This effort represents the first time that a high-order variable-density incompressible flow solver with species diffusion has demonstrated sustained performance in the TeraFLOPS range.
Abstract: We describe Miranda, a massively parallel spectral/compact solver for variabledensity incompressible flow, including viscosity and species diffusivity effects. Miranda utilizes FFTs and band-diagonal matrix solvers to compute spatial derivatives to at least 10th-order accuracy. We have successfully ported this communicationintensive application to BlueGene/L and have explored both direct block parallel and transpose-based parallelization strategies for its implicit solvers. We have discovered a mapping strategy which results in virtually perfect scaling of the transpose method up to 65,536 processors of the BlueGene/L machine. Sustained global communication rates in Miranda typically run at 85% of the theoretical peak speed of the BlueGene/L torus network, while sustained communication plus computation speeds reach 2.76 TeraFLOPS. This effort represents the first time that a high-order variable-density incompressible flow solver with species diffusion has demonstrated sustained performance in the TeraFLOPS range.

Book ChapterDOI
TL;DR: A model of visual exploration of a scene by the means of localized computations in neural populations whose architecture allows the emergence of a coherent behaviour of sequential scanning of salient stimuli is proposed.
Abstract: Although biomimetic autonomous robotics relies on the massively parallel architecture of the brain, the key issue is to temporally organize behaviour. The distributed representation of the sensory information has to be coherently processed to generate relevant actions. In the visual domain, we propose here a model of visual exploration of a scene by the means of localized computations in neural populations whose architecture allows the emergence of a coherent behaviour of sequential scanning of salient stimuli. It has been implemented on a real robotic platform exploring a moving and noisy scene including several identical targets.

Journal ArticleDOI
TL;DR: A review of recent advances in simulations of magnetically confined plasmas is presented in this article, with illustrative examples, chosen from associated research areas such as microturbulence, magnetohydrodynamics and other topics.
Abstract: Scientific simulation, which provides a natural bridge between theory and experiment, is an essential tool for understanding complex plasma behaviour. Recent advances in simulations of magnetically confined plasmas are reviewed in this paper, with illustrative examples, chosen from associated research areas such as microturbulence, magnetohydrodynamics and other topics. Progress has been stimulated, in particular, by the exponential growth of computer speed along with significant improvements in computer technology. The advances in both particle and fluid simulations of fine-scale turbulence and large-scale dynamics have produced increasingly good agreement between experimental observations and computational modelling. This was enabled by two key factors: (a) innovative advances in analytic and computational methods for developing reduced descriptions of physics phenomena spanning widely disparate temporal and spatial scales and (b) access to powerful new computational resources. Excellent progress has been made in developing codes for which computer run-time and problem-size scale well with the number of processors on massively parallel processors (MPPs). Examples include the effective usage of the full power of multi-teraflop (multi-trillion floating point computations per second) MPPs to produce three-dimensional, general geometry, nonlinear particle simulations that have accelerated advances in understanding the nature of turbulence self-regulation by zonal flows. These calculations, which typically utilized billions of particles for thousands of time-steps, would not have been possible without access to powerful present generation MPP computers and the associated diagnostic and visualization capabilities. In looking towards the future, the current results from advanced simulations provide great encouragement for being able to include increasingly realistic dynamics to enable deeper physics insights into plasmas in both natural and laboratory environments. This should produce the scientific excitement which will help to (a) stimulate enhanced cross-cutting collaborations with other fields and (b) attract the bright young talent needed for the future health of the field of plasma science.

Journal ArticleDOI
TL;DR: The DL_POLY package provides a set of classical molecular dynamics programs that have application over a wide range of atomic and molecular systems, stretching from small systems consisting of a few hundred atoms running on a single processor to systems running on massively parallel computers with thousands of processors.
Abstract: The DL_POLY package provides a set of classical molecular dynamics programs that have application over a wide range of atomic and molecular systems. Written for parallel computers they offer capabilities stretching from small systems consisting of a few hundred atoms running on a single processor, up to systems of several million atoms running on massively parallel computers with thousands of processors. In this article we describe the structure of the programs and some applications.

Proceedings Article
01 Jan 2005
TL;DR: This paper presents a meta-analyses of the immune system’s response to TSPs and its applications in medicine and medicine-like settings.
Abstract: c © 2006 by John von Neumann Institute for Computing Permission to make digital or hard copies of portions of this work for personal or classroom use is granted provided that the copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise requires prior specific permission by the publisher mentioned above.