scispace - formally typeset
Search or ask a question

Showing papers on "Massively parallel published in 1998"


Journal ArticleDOI
12 Jun 1998-Science
TL;DR: The defect-tolerant architecture of Teramac, which incorporates a high communication bandwith that enables it to easily route around defects, has significant implications for any future nanometer-scale computational paradigm.
Abstract: Teramac is a massively parallel experimental computer built at Hewlett-Packard Laboratories to investigate a wide range of different computational architectures. This machine contains about 220,000 hardware defects, any one of which could prove fatal to a conventional computer, and yet it operated 100 times faster than a high-end single-processor workstation for some of its configurations. The defect-tolerant architecture of Teramac, which incorporates a high communication bandwith that enables it to easily route around defects, has significant implications for any future nanometer-scale computational paradigm. It may be feasible to chemically synthesize individual electronic components with less than a 100 percent yield, assemble them into systems with appreciable uncertainty in their connectivity, and still create a powerful and reliable data communications network. Future nanoscale computers may consist of extremely large-configuration memories that are programmed for specific tasks by a tutor that locates and tags the defects in the system.

895 citations


Journal ArticleDOI
01 Dec 1998
TL;DR: This paper presents the full definition of BSPlib in C, motivates the design of its basic operations, and gives examples of their use, and briefly describes applications in benchmarking, Fast Fourier Transforms, sorting, and molecular dynamics.
Abstract: BSPlib is a small communications library for bulk synchronous parallel (BSP) programming which consists of only 20 basic operations. This paper presents the full definition of BSPlib in C, motivates the design of its basic operations, and gives examples of their use. The library enables programming in two distinct styles: direct remote memory access (DRMA) using put or get operations, and bulk synchronous message passing (BSMP). Currently, implementations of BSPlib exist for a variety of modern architectures, including massively parallel computers with distributed memory, shared memory multiprocessors, and networks of workstations. BSPlib has been used in several scientific and industrial applications; this paper briefly describes applications in benchmarking, Fast Fourier Transforms (FFTs), sorting, and molecular dynamics.

343 citations


Journal ArticleDOI
TL;DR: This paper presents a cohesive, practical load balancing framework that improves upon existing strategies, and exposes a serious deficiency in current load balancing strategies, motivating further work in this area.
Abstract: This paper presents a cohesive, practical load balancing framework that improves upon existing strategies. These techniques are portable to a broad range of prevalent architectures, including massively parallel machines, such as the Cray T3D/E and Intel Paragon, shared memory systems, such as the Silicon Graphics PowerChallenge, and networks of workstations. As part of the work, an adaptive heat diffusion scheme is presented, as well as a task selection mechanism that can preserve or improve communication locality. Unlike many previous efforts in this arena, the techniques have been applied to two large-scale industrial applications on a variety of multicomputers. In the process, this work exposes a serious deficiency in current load balancing strategies, motivating further work in this area.

205 citations


Journal ArticleDOI
06 Mar 1998-Science
TL;DR: The results of massively parallel three-dimensional molecular dynamics simulations of the perpendicular intersection of extended dislocations in copper are reported, providing insights into this complex atomistic process.
Abstract: The results of massively parallel three-dimensional molecular dynamics simulations of the perpendicular intersection of extended dislocations in copper are reported. The intersection process, which involves three of the four possible {111} glide planes in the face-centered cubic lattice, begins with junction formation, followed by unzipping, partial dislocation bowing, cutting, and, finally, unit jog formation. The investigation provides insights into this complex atomistic process, which is currently not accessible to experimental investigation.

170 citations


Journal ArticleDOI
TL;DR: A new hybrid algorithm is introduced that inherits those aspects of GA that lend themselves to parallelization, and avoids serial bottle-necks of GA approaches by incorporating elements of SA to provide a completely parallel, easily scalable hybrid GA/SA method.
Abstract: Many significant engineering and scientific problems involve optimization of some criteria over a combinatorial configuration space. The two methods most often used to solve these problems effectively-simulated annealing (SA) and genetic algorithms (GA)-do not easily lend themselves to massive parallel implementations. Simulated annealing is a naturally serial algorithm, while GA involves a selection process that requires global coordination. This paper introduces a new hybrid algorithm that inherits those aspects of GA that lend themselves to parallelization, and avoids serial bottle-necks of GA approaches by incorporating elements of SA to provide a completely parallel, easily scalable hybrid GA/SA method. This new method, called Genetic Simulated Annealing, does not require parallelization of any problem specific portions of a serial implementation-existing serial implementations can be incorporated as is. Results of a study on two difficult combinatorial optimization problems, a 100 city traveling salesperson problem and a 24 word, 12 bit error correcting code design problem, performed on a 16 K PE MasPar MP-1, indicate advantages over previous parallel GA and SA approaches. One of the key results is that the performance of the algorithm scales up linearly with the increase of processing elements, a feature not demonstrated by any previous parallel GA or SA approaches, which enables the new algorithm to utilize massive parallel architecture with maximum effectiveness. Additionally, the algorithm does not require careful choice of control parameters, a significant advantage over SA and GA.

140 citations


Journal ArticleDOI
TL;DR: A spectral element method in conjunction with a new iterative solution technique is presented, which allows for increased computational efficiency when compared to standard finite element method and a significant reduction both in storage requirements and in the computational complexity.

134 citations


Journal ArticleDOI
TL;DR: The two-level FETI method is extended to shell problems, a variant approach is described, the preconditioning problem is revisited, the efficient implementation of the corresponding iterative solvers on massively parallel processors is addressed, and the computational price of mathematical optimality is highlighted.

114 citations


Journal ArticleDOI
TL;DR: A parallel three-dimensional Sn algorithm is described for performing eigenvalue calculations on rectangular meshes using the Thinking Machines Connection Machine model CM-200 computer.
Abstract: A parallel three-dimensional Sn algorithm is described for performing eigenvalue calculations on rectangular meshes using the Thinking Machines Connection Machine model CM-200 computer. By using a ...

112 citations


Patent
24 Feb 1998
TL;DR: In this paper, a synchronization range indicator is provided which can control by program whether the parallel processors are available in correspondence to the respective serial processors, in response to a request for using the parallel processor from a serial processor.
Abstract: Multiple parallel-job scheduling method and apparatus are provided which can improve the utilization of all processors in a system when a plurality of parallel jobs are executed concurrently. A plurality of processors constituting a computer system and each having the equal function are logically categorized into serial processors for executing a serial computing part or a parallel computing part of a parallel job and a parallel processor group consisting of multiple processors for executing the parallel computing part of the parallel job in parallel. In order that the parallel processors are shared by a plurality of parallel jobs, a synchronization range indicator is provided which can control by program whether the parallel processors are available in correspondence to the respective serial processors. In response to a request for using the parallel processors from a serial processor for which the parallel processors are so set as to be available by means of the synchronization range indicator, operation can be carried out without invoking an interrupt.

109 citations


Patent
George W. Conner1
04 Sep 1998
TL;DR: Automatic test equipment for semiconductor memories that provides testing of large arrays of semiconductor memory chips in parallel is described in this article, which greatly enhances the economics of testing memory device made according to a RAMBUS standard, which includes a low speed port and a medium speed port.
Abstract: Automatic test equipment for semiconductor memories that provides testing of large arrays of semiconductor memory chips in parallel. Such massively parallel memory testing greatly enhances test throughput, thereby reducing cost. It greatly enhances the economics of testing memory device made according to a RAMBUS standard, which includes a low speed port and a medium speed port because it allows the same automatic test equipment to economically be used to test devices with the low speed port and the medium speed port.

103 citations


Proceedings ArticleDOI
01 Dec 1998
TL;DR: This paper reviews how the stochastic nature, effective size, and the compartmentalization of genetic networks as well as the information content of gene expression matrices will influence the ability to perform successful reverse engineering.
Abstract: Complementary DNA microarray and high density oligonucleotide arrays opened the opportunity for massively parallel biological data acquisition. Application of these technologies will shift the emphasis in biological research from primary data generation to complex quantitative data analysis. Reverse engineering of time-dependent gene-expression matrices is amongst the first complex tools to be developed. The success of reverse engineering will depend on the quantitative features of the genetic networks and the quality of information we can obtain from biological systems. This paper reviews how the (1) stochastic nature, (2) the effective size, and (3) the compartmentalization of genetic networks as well as (4) the information content of gene expression matrices will influence our ability to perform successful reverse engineering.

Book
01 Jan 1998
TL;DR: This book introduces state-of-the-art methods for programming parallel systems, including approaches to reverse engineering traditional sequential software, and includes detailed coverage of the critical scheduling problem, compares multiple programming languages and environments, and shows how to measure the performance of parallel systems.
Abstract: The state-of-the-art in high-performance concurrent computing -- theory and practice.-- Detailed coverage of the growing integration between parallel and distributed computing.-- Advanced approaches for programming distributed, parallel systems -- and adapting traditional sequential software.-- Creating a Parallel Virtual Machine (PVM) from networked, heterogeneous systems.This is the most up-to-date, comprehensive guide to the rapidly changing field of distributed and parallel systems.The book begins with an introductory survey of distributed and parallel computing: its rationale and evolution. It compares and contrasts a wide variety of approaches to parallelism, from distributed computer networks, to parallelism within processors (such as Intel's MMX), to massively parallel systems. The book introduces state-of-the-art methods for programming parallel systems, including approaches to reverse engineering traditional sequential software. It includes detailed coverage of the critical scheduling problem, compares multiple programming languages and environments, and shows how to measure the performance of parallel systems. The book introduces the Parallel Virtual Machine (PVM) system for writing programs that run on a network of heterogenous systems; the new Message Passing Interface (MPI-2)standard; and finally, the growing role of Java in writing distributed and parallel applications.

Book ChapterDOI
TL;DR: This paper presents and discusses the idea of Web-based volunteer computing, which allows people to cooperate in solving a large parallel problem by using standard Web browsers to volunteer their computers' processing power.
Abstract: This paper presents and discusses the idea of Web-based volunteer computing, which allows people to cooperate in solving a large parallel problem by using standard Web browsers to volunteer their computers' processing power. Because volunteering requires no prior human contact and very little technical knowledge, it becomes very easy to build very large volunteer computing networks. At its full potential, volunteer computing can make it possible to build world-wide massively parallel computing networks more powerful than any supercomputer. Even on a smaller, more practical scale, volunteer computing can be used within companies or institutions to provide supercomputer-like facilities by harnessing the computing power of existing workstations. Many interesting variations are possible, including networks of information appliances (NOIAs), paid volunteer systems, and barter trade of compute cycles. In this paper, we discuss these possibilities, and identify several issues that will need to be addressed in order to successfully implement them. We also present an overview of the current work being done in the Bayanihan volunteer computing project.

Journal ArticleDOI
TL;DR: In this paper, a pipelined parallelization of PHOENIX is described, where the necessary data from a previous wavelength point is sent to the processor working on the succeeding wavelength point as soon as it is known.
Abstract: We describe an important addition to the parallel implementation of our generalized nonlocal thermodynamic equilibrium (NLTE) stellar atmosphere and radiative transfer computer program PHOENIX. In a previous paper in this series we described data and task parallel algorithms we have developed for radiative transfer, spectral line opacity, and NLTE opacity and rate calculations. These algorithms divided the work spatially or by spectral lines, that is, distributing the radial zones, individual spectral lines, or characteristic rays among different processors and employ, in addition, task parallelism for logically independent functions (such as atomic and molecular line opacities). For finite, monotonic velocity fields, the radiative transfer equation is an initial value problem in wavelength, and hence each wavelength point depends upon the previous one. However, for sophisticated NLTE models of both static and moving atmospheres needed to accurately describe, e.g., novae and supernovae, the number of wavelength points is very large (200,000-300,000) and hence parallelization over wavelength can lead both to considerable speedup in calculation time and the ability to make use of the aggregate memory available on massively parallel supercomputers. Here, we describe an implementation of a pipelined design for the wavelength parallelization of PHOENIX, where the necessary data from the processor working on a previous wavelength point is sent to the processor working on the succeeding wavelength point as soon as it is known. Our implementation uses a MIMD design based on a relatively small number of standard message passing interface (MPI) library calls and is fully portable between serial and parallel computers.

BookDOI
01 Jan 1998
TL;DR: Introducing a mechanism of object archiving and transmission has enabled a natural extension to a parallel algorithm and showed good performance on a networked PC cluster, for sufficiently coarse granularity.
Abstract: With trends toward more complex nuclear reactor designs, advanced methods are required for appropriate reduction of design margins from an economical point of view. As a solution, an algorithm based on an object-oriented approach has been developed. In this algorithm, calculation meshes are represented as calculation objects wherein specific calculation algorithms are encapsulated. Abstracted data, which are neutron current objects, are exchanged between these objects. Calculation objects can retrieve required data having specified data types from the neutron current objects, which leads to a combined use of different calculation methods and algorithms in the same computation. Introducing a mechanism of object archiving and transmission has enabled a natural extension to a parallel algorithm. The parallel solution is identical with the sequential one. The SCOPE code, an actual implementation of our algorithm, showed good performance on a networked PC cluster, for sufficiently coarse granularity.

Book ChapterDOI
30 Mar 1998
TL;DR: By simulation with real workload data, a new scheduling method for batch jobs on massively parallel processor architectures based on the First-come-first-serve strategy is shown to be suitable to be applied in real parallel computers.
Abstract: We present a new scheduling method for batch jobs on massively parallel processor architectures. This method is based on the First-come-first-serve strategy and emphasizes the notion of fairness. Severe fragmentation is prevented by using gang scheduling which is only initiated by highly parallel jobs. Good worst-case behavior of the scheduling approach has already been proven by theoretical analysis. In this paper we show by simulation with real workload data that the algorithm is also suitable to be applied in real parallel computers. This holds for several different scheduling criteria like makespan or sum of the flow times. Simulation is also used for determination of the best parameter set for the new method.

Proceedings ArticleDOI
TL;DR: A 3-D finite-difference elastic wave propagation code that incorporates a number of advanced computational and physics-based enhancements has been developed and will be used to generate an elastic subset of the SEG/EAEG acoustic data set.
Abstract: A 3-D finite-difference elastic wave propagation code that incorporates a number of advanced computational and physics-based enhancements has been developed. These enhancements include full 3-D elastic, viscoelastic, and topographic modeling (anisotropic capabilities arc being added), low-level optimization, propagating and variable density grids, hybridization, and parallelization. This code takes advantage of high performance computing and massively parallel processing to make 3-D full-physics simulations of seismic problems feasible. This computational tool will be used to generate an elastic subset of the SEG/EAEG acoustic data set. The acoustic and elastic data will be compared to examine pitfalls with traditional processing, and to test the effectiveness of using elastic data as an aid to seismic imaging.

Journal ArticleDOI
TL;DR: New parallel algorithms for smoothed particle hydrodynamics and contact detection are described which turn out to have several key features in common and how to join them with traditional parallel finite element techniques to create an integrated particle/mesh transient dynamics simulation is described.

Journal ArticleDOI
01 Dec 1998
TL;DR: The adaptive parallelism used to dynamically adjust the parallelism degree of the application with respect to the system load demonstrates that high-performance computing using a hundred of heterogeneous workstations combined with massively parallel machines is feasible to solve large optimization problems.
Abstract: This paper presents a new approach for parallel tabu search based on adaptive parallelism. Adaptive parallelism was used to dynamically adjust the parallelism degree of the application with respect to the system load. Adaptive parallelism demonstrates that high-performance computing using a hundred of heterogeneous workstations combined with massively parallel machines is feasible to solve large optimization problems. The parallel tabu search algorithm includes different tabu list sizes and new intensification/diversification mechanisms. Encouraging results have been obtained in solving the quadratic assignment problem. We have improved the best known solutions for some large real-world problems.

Journal ArticleDOI
TL;DR: A new, scalable interconnection topology called the Spanning Multichannel Linked Hypercube (SMLH) is proposed, which is very suitable to massively parallel systems and is highly amenable to optical implementation.
Abstract: A new, scalable interconnection topology called the Spanning Multichannel Linked Hypercube (SMLH) is proposed. This proposed network is very suitable to massively parallel systems and is highly amenable to optical implementation. The SMLH uses the hypercube topology as a basic building block and connects such building blocks using two-dimensional multichannel links (similar to spanning buses). In doing so, the SMLH combines positive features of both the hypercube (small diameter, high connectivity, symmetry, simple routing, and fault tolerance) and the spanning bus hypercube (SBH) (constant node degree, scalability, and ease of physical implementation), while at the same time circumventing their disadvantages. The SMLH topology supports many communication patterns found in different classes of computation, such as bus-based, mesh-based, and tree-based problems, as well as hypercube-based problems. A very attractive feature of the SMLH network is its ability to support a large number of processors with the possibility of maintaining a constant degree and a constant diameter. Other positive features include symmetry, incremental scalability, and fault tolerance. It is shown that the SMLH network provides better average message distance, average traffic density, and queuing delay than many similar networks, including the binary hypercube, the SBH, etc. Additionally, the SMLH has comparable performance to other high-performance hypercubic networks, including the Generalized Hypercube and the Hypermesh. An optical implementation methodology is proposed for SMLH. The implementation methodology combines both the advantages of free space optics with those of wavelength division multiplexing techniques. A detailed analysis of the feasibility of the proposed network is also presented.

Journal ArticleDOI
TL;DR: This work presents an attempt to draw inspiration from biology in the design of a novel digital circuit: a field-programmable gate array (FPGA), endowed with two features motivated and guided by the behavior of biological systems: self-replication and self-repair.
Abstract: Biological organisms are among the most intricate structures known to man, exhibiting highly complex behavior through the massively parallel cooperation of numerous relatively simple elements, the cells. As the development of computing systems approaches levels of complexity such that their synthesis begins to push the limits of human intelligence, engineers are starting to seek inspiration in nature for the design of computing systems, both at the software and at hardware levels. We present one such endeavor, notably an attempt to draw inspiration from biology in the design of a novel digital circuit: a field-programmable gate array (FPGA). This reconfigurable logic circuit will be endowed with two features motivated and guided by the behavior of biological systems: self-replication and self-repair.

Journal ArticleDOI
TL;DR: In this article, a massively parallel implementation of an algorithm for simulating the dynamics of metastable decay in kinetic Ising models is presented, which is directly applicable to a wide range of stochastic cellular automata.
Abstract: We experiment with a massively parallel implementation of an algorithm for simulating the dynamics of metastable decay in kinetic Ising models. The parallel scheme is directly applicable to a wide range of stochastic cellular automata where the discrete events (updates) are Poisson arrivals. For high performance, we utilize a continuous-time, asynchronous parallel version of the n-fold way rejection-free algorithm. Each processing element carries an lxl block of spins, and we employ the fast SHMEM-library routines on the Cray T3E distributed-memory parallel architecture. Different processing elements have different local simulated times. To ensure causality, the algorithm handles the asynchrony in a conservative fashion. Despite relatively low utilization and an intricate relationship between the average time increment and the size of the spin blocks, we find that for sufficiently large l the algorithm outperforms its corresponding parallel Metropolis (non-rejection-free) counterpart. As an example application, we present results for metastable decay in a model ferromagnetic or ferroelectric film, observed with a probe of area smaller than the total system.

01 Jan 1998
TL;DR: The interface to the MUMPS code and the message passing mechanisms that are used in the package are described, which are to develop a public domain library of sparse codes for distributed memory parallel computers.
Abstract: We describe aspects of the interface and design of Version 2.0 of the MUltifrontal Massively Parallel Solver MUMPS. This code solves sets of sparse linear equations Ax = b, where the matrix A is unsymmetric. It is written in Fortran 90 and uses MPI for message passing. It also calls the ScaLAPACK code which in turn uses the BLACS. Level 3 BLAS are also used by the code. MUMPS is the direct solver in the PARASOL project, an EU LTR Project with twelve partners from ve countries. The main aim of PARASOL is to develop a public domain library of sparse codes for distributed memory parallel computers. This report describes the interface to the MUMPS code and the message passing mechanisms that are used in the package.

Journal ArticleDOI
TL;DR: Artificial neural networks attempt to emulate the massively parallel and distributed processing of the human brain and are being examined for a variety of problems that have been difficult to solve.
Abstract: Artificial neural networks (ANNs) attempt to emulate the massively parallel and distributed processing of the human brain. They are being examined for a variety of problems that have been difficult...

Journal ArticleDOI
TL;DR: In this paper, a massively parallel algorithm for static and dynamic partitioning of unstructured FEM-meshes is presented, where a fast but inaccurate sequential clustering is determined which is used, together with a simple mapping heuristic, to map the mesh initially onto the processors of a massive parallel system.
Abstract: We present a massively parallel algorithm for static and dynamic partitioning of unstructured FEM-meshes. The method consists of two parts. First a fast but inaccurate sequential clustering is determined which is used, together with a simple mapping heuristic, to map the mesh initially onto the processors of a massively parallel system. The second part of the method uses a massively parallel algorithm to remap and optimize the mesh decomposition taking several cost functions into account. It first calculates the amount of nodes that have to be migrated between pairs of clusters in order to obtain an optimal load balancing. In a second step, nodes to be migrated are chosen according to cost functions optimizing the amount and necessary communication and other measures which are important for the numerical solution method (like for example the aspect ratio of the resulting domains).

Journal ArticleDOI
TL;DR: In this paper, a new algorithm to enable the implementation of dual control volume grand canonical molecular dynamics (DCV-GCMD) on massively parallel (MP) architectures is presented.
Abstract: A new algorithm to enable the implementation of dual control volume grand canonical molecular dynamics (DCV-GCMD) on massively parallel (MP) architectures is presented. DCVGCMD can be thought of as hybridization of molecular dynamics (MD) and grand canonical Monte Carlo (GCMC) and was developed recently to make possible the simulation of gradient-driven diffusion. The method has broad application to such problems as membrane separations, drug delivery systems, diffusion in polymers and zeolites, etc. The massively parallel algorithm for the DCV-GCMD method has been implemented in a code named LADERA which employs the short range Lennard-Jones potential for pure fluids and multicomponent mixtures including bulk and confined (single pore as well as amorphous solid materials) systems. Like DCV-GCMD, LADERA's MP algorithm can be thought of as a hybridization of two different algorithms, spatial MD and spatial GCMC. The DCV-GCMD method is described fully followed by the DCV-GCMD parallel algorithm employed in ...

Journal ArticleDOI
TL;DR: The present version of the LADERA FORTRAN code has the capability of modelling systems with explicit intramolecular interactions such as bonds, angles, and dihedral rotations and includes another new feature, which is the use of neighbour lists in force calculations.
Abstract: This paper, the second part of a series, extends the capabilities of the LADERA FORTRAN code for massively parallel dual control volume grand canonical molecular dynamics (DCVGCMD). DCV-GCMD is a hybrid of two more common molecular simulation techniques (grand canonical Monte Carlo and molecular dynamics) which allows the direct molecularlevel modelling of diffusion under a chemical potential gradient. The present version of the code, LADERA-B has the capability of modelling systems with explicit intramolecular interactions such as bonds, angles, and dihedral rotations. The utility of the new code for studying gradient-driven diffusion of small molecules through polymers is demonstrated by applying it to two model systems. LADERA-B includes another new feature, which is the use of neighbour lists in force calculations. This feature increases the speed of the code but presents several challenges in the parallel hybrid algorithm. There is discussion on how these problems were addressed and how our implement...

Proceedings ArticleDOI
02 Sep 1998
TL;DR: Various parallel programming models are discussed, although emphasis is given to a masterslave implementation using the Message Passing Interface (MPI), and a mathematical analysis is given on achieving peak efficiency in multilevel parallelism by selecting the most effective processor partitioning schemes.
Abstract: Single-level parallel optimization approaches, those in which either the simulation code executes in parallel or the optimization algorithm invokes multiple simultaneous single-processor analyses, have been investigated previously and been shown to be effective in reducing the time required to compute optimal solutions. However, these approaches have clear performance limitations which point to the need for multiple levels of parallelism in order to achieve peak parallel performance. Managing multiple simultaneous instances of massively parallel simulations is a challenging software undertaking, especially if the implementation is to be flexible, extensible, and generalpurpose. This paper focuses on the design for multilevel parallelism as implemented within the DAKOTA iterator toolkit. Various parallel programming models are discussed, although emphasis is given to a masterslave implementation using the Message Passing Interface (MPI). A mathematical analysis is given on achieving peak efficiency in multilevel parallelism by selecting the most effective processor partitioning schemes. This analysis is verified in some computational experiments.

Proceedings ArticleDOI
04 May 1998
TL;DR: A genetic algorithm is proposed as a generally applicable global learning method for finding and optimizing parameters of a cellular neural network that have to be insensitive to small perturbations.
Abstract: The operation of a cellular neural network (CNN) is defined by a set of 19 parameters. There is no known general method for finding these parameters; analytic design methods are available for a small class of problems only. Standard learning algorithms cannot be applied due to the lack of gradient information. The authors propose a genetic algorithm as a generally applicable global learning method. In order to be useful for real CNN VLSI chips, the parameters have to be insensitive to small perturbations Therefore, after the parameters are learnt they are optimized with respect to robustness in a second genetic processing step. As the simulation of CNNs necessitates the numerical integration of large systems of nonlinear differential equations, the evaluation of the fitness functions is computationally very expensive; a massively parallel supercomputer is used to achieve acceptable run times.

Journal ArticleDOI
TL;DR: A high-performance system with the latest NBISOM_25 chips is presented, integrated in a simulation framework for neural networks, that contains software tools for self-organizing maps as well as for neural associative memories, tools for pre- and postprocessing and tools for graphical analysis of the simulation results.