scispace - formally typeset
Search or ask a question

Showing papers on "Massively parallel published in 1987"


Journal ArticleDOI
TL;DR: Experiments show that the positional accuracy of points placed in the data space by a model pose obtained via clustering is comparable to the positional accuracies of the sensed data from which pose candidates are computed.
Abstract: The general paradigm of pose clustering is discussed and compared to other techniques applicable to the problem of object detection. Pose clustering is also called hypothesis accumulation and generalized Hough transform and is characterized by a “parallel” accumulation of low level evidence followed by a maxima or clustering step which selects pose hypotheses with strong support from the set of evidence. Examples are given showing the use of pose clustering in both 2D and 3D problems. Experiments show that the positional accuracy of points placed in the data space by a model pose obtained via clustering is comparable to the positional accuracy of the sensed data from which pose candidates are computed. A specific sensing system is described which yields an accuracy of a few millimeters. Complexity of the pose clustering approach relative to alternative approaches is discussed with reference to conventional computers and massively parallel computers. It is conjectured that the pose clustering approach can produce superior results in real time on a massively parallel machine.

266 citations


Proceedings Article
01 Oct 1987
TL;DR: In this paper, a parallel genetic algorithm for a medium-grained hypercube computer is discussed, where each processor runs the genetic algorithm on its own sub-population, periodically selecting the best individuals from the subpopulation and sending copies of them to one of its neighboring processors.
Abstract: This paper discusses a parallel genetic algorithm for a medium-grained hypercube computer. Each processor runs the genetic algorithm on its own sub-population, periodically selecting the best individuals from the sub-population and sending copies of them to one of its neighboring processors. The performance of the parallel algorithm on a function maximization problem is compared to the performance of the serial version. The parallel algorithm achieves comparable results with near-linear speed-up. In addition, some experiments were performed to study the effects of varying the parameters for the parallel model.

237 citations


Book ChapterDOI
15 Jun 1987
TL;DR: This paper describes a recently developed procedure that can learn to perform a recognition task and uses canonical internal representations of the patterns to identify familiar patterns in novel positions.
Abstract: One major goal of research on massively parallel networks of neuron-like processing elements is to discover efficient methods for recognizing patterns. Another goal is to discover general learning procedures that allow networks to construct the internal representations that are required for complex tasks. This paper describes a recently developed procedure that can learn to perform a recognition task. The network is trained on examples in which the input vector represents an instance of a pattern in a particular position and the required output vector represents its name. After prolonged training, the network develops canonical internal representations of the patterns and it uses these canonical representations to identify familiar patterns in novel positions.

205 citations


Journal ArticleDOI
TL;DR: This paper provides an overview of the Image Understanding Architecture (IUA), a massively parallel, multilevel system for supporting real-time image understanding applications and research in knowledge-based computer vision.
Abstract: THIS PAPER PROVIDES AN OVERVIEW OF THE IMAGE UNDERSTANDING ARCHITECTURE (IUA), A MASSIVELY PARALLEL, MULTI-LEVEL SYSTEM FOR SUPPORTING REAL-TIME IMAGE UNDERSTANDING APPLICATIONS AND RESEARCH IN KNOWLEDGE-BASED COMPUTER VISION. THE DESIGN OF THE IUA IS MOTIVATED BY CONSIDERING THE ARCHITECTURAL REQUIREMENTS FOR INTEGRATED REAL-TIME VISION IN TERMS OF THE TYPE OF PRO- CESSING ELEMENTS, CONTROL OF PROCESSING, AND COMMUNICATION BETWEEN PROCES- SING ELEMENTS. THE IUA INTEGRATES PARALLEL PROCESSORS OPERATING SIMULTANEOUSLY AT THREE LEVELS OF COMPUTATIONAL GRANULARITY IN A TIGHTLY-COUPLED ARCHITECTURE. EACH LEVEL OF THE IUA IS A PARALLEL PROCESSOR THAT IS DISTINCTLY DIFFERENT FROM THE OTHER TWO LEVELS, IN ORDER TO BEST MEET THE PROCESSING NEEDS AT EACH OF THE CORRESPONDING LEVELS OF ABSTRACTION IN THE INTERPRETATION PROCESS. COM- MUNICATION BETWEEN LEVELS TAKS PLACE VIA PARALLEL DATA AND CONTROL PATHS. THE PROCESSING ELEMENTS WITHIN EACH LEVEL CAN ALSO COMMUNICATE WITH EACH OTHER IN PARALLEL, VIA A DIFFERENT MECHANISM AT EACH LEVEL THAT IS DESIGNED TO MEET THE SPECIFIC COMMUNICATION NEEDS OF EACH LEVEL OF ABSTRACTION. AN ASSOCIATIVE PROCESSING PARADIGM, THAT PROVIDES A SIMPLE YET GENERAL MEANS OF MANAGING MASSIVE PARALLELISM, HAS BEEN UTILIZED AS THE PRINCIPLE CONTROL MECHANISM AT THE LOW AND INTERMEDIATE LEVELS. CONTROL OF PROCESSING IN THESE LEVELS IS BASED UPON THEIR RAPID RESPONSES TO QUERIES, INVOLVING

184 citations


Journal ArticleDOI
TL;DR: A new approach to learning in a multilayer optical neural network based on holographically interconnected nonlinear devices that performs an approximate implementation of the backpropagation learning procedure in a massively parallel high-speed nonlinear optical network.
Abstract: A new approach to learning in a multilayer optical neural network based on holographically interconnected nonlinear devices is presented. The proposed network can learn the interconnections that form a distributed representation of a desired pattern transformation operation. The interconnections are formed in an adaptive and self-aligning fashioias volume holographic gratings in photorefractive crystals. Parallel arrays of globally space-integrated inner products diffracted by the interconnecting hologram illuminate arrays of nonlinear Fabry-Perot etalons for fast thresholding of the transformed patterns. A phase conjugated reference wave interferes with a backward propagating error signal to form holographic interference patterns which are time integrated in the volume of a photorefractive crystal to modify slowly and learn the appropriate self-aligning interconnections. This multilayer system performs an approximate implementation of the backpropagation learning procedure in a massively parallel high-speed nonlinear optical network.

184 citations


Journal ArticleDOI
TL;DR: This paper presents the principles of constructing hypernets and analyzes their architectural potentials in terms of message routing complexity, cost-effective support for global as well as localized communication, I/O capabilities, and fault tolerance.
Abstract: A new class of modular networks is proposed for hierarchically constructing massively parallel computer systems for distributed supercomputing and AI applications. These networks are called hypernets. They are constructed incrementally with identical cubelets, treelets, or buslets that are well suited for VLSI implementation. Hypernets integrate positive features of both hypercubes and tree-based topologies, and maintain a constant node degree when the network size increases. This paper presents the principles of constructing hypernets and analyzes their architectural potentials in terms of message routing complexity, cost-effective support for global as well as localized communication, I/O capabilities, and fault tolerance. Several algorithms are mapped onto hypernets to illustrate their ability to support parallel processing in a hierarchically structured or data-dependent environment. The emulation of hypercube connections using less hardware is shown. The potential of hypernets for efficient support of connectionist models of computation is also explored.

162 citations


01 Aug 1987
TL;DR: This thesis presents a parallel production system interpreter that has been implemented on a four-processor multiprocessor based on Forgy's OPS5 interpreter and exploits production-level parallelism in production systems, and a parallel processing scheme for OPS5 production systems that allows some redundancy in the match computation.
Abstract: This thesis presents research on certain issues related to parallel processing of production systems. It first presents a parallel production system interpreter that has been implemented on a four-processor multiprocessor. This parallel interpreter is based on Forgy's OPS5 interpreter and exploits production-level parallelism in production systems. Runs on the multiprocessor system indicate that it is possible to obtain speed-up of around 1.7 in the match computation for certain production systems when productions are split into three sets that are processed in parallel. However for production systems that are already relatively fast on uniprocessors, the communication overhead imposed by the implementation environment essentially offsets any gains when productions are split for parallel match. The next issue addressed is that of partitioning a set of rules to processors in a parallel interpreter with production-level parallelism, and the extent of additional improvement in performance. The partitioning problem is formulated and an algorithm for approximate solutions is presented. Simulation results from a number of OPS5 production systems indicate that partitionings using information about run time behaviour of the production systems can improve the match performance by a factor of 1.10 to 1.25, compared to partitionings obtained using various simpler schemes. The thesis next presents a parallel processing scheme for OPS5 production systems that allows some redundancy in the match computation. This redundancy enables the processing of a production to be divided into units of medium granularity each of which can be processed in parallel. Subsequently, a parallel processor architecture for implementing the parallel processing algorithm is presented. This architecture is based on an array of simple processors which can be clustered into groups of potentially different sizes, each group processing an affected production during a cycle of execution. Simulation results for a number of production systems indicate that the proposed algorithm performs better than other proposed massively parallel architectures like DADO, or NON-VON that use much larger number of processors. However, for certain systems, the performance is in the same range or sometimes worse than that can be obtained while a parallel interpreter based on Forgy's RETE algorithm such as an interpreter using production-level parallelism implemented on a small number of powerful processors, or an interpreter based on Gupta's parallel version of Forgy's RETE algorithm, implemented on a shared memory multiprocessor with 32 - 64 processors.

91 citations


01 Jul 1987
TL;DR: A class of multiscale algorithms for the solution of large sparse linear systems that are particularly well adapted to massively parallel supercomputers is described, using an approximate inverse for smoothing and a super-interpolation operator to move the correction from coarse to fine scales, chosen to optimize the rate of convergence.
Abstract: We describe a class of multiscale algorithms for the solution of large sparse linear systems that are particularly well adapted to massively parallel supercomputers. While standard multigrid algorithms are unable to effectively use all processors when computing on coarse grids, the new algorithms utilize the same number of processors at all times. The basic idea is to solve many coarse scale problems simultaneously, combining the results in an optimal way to provide an improved fine scale solution. As a result, convergence rates are much faster than for standard multigrid methods we have obtained V-cycle convergence rates as good as .0046 with one smoothing application per cycle, and .0013 with two smoothings. On massively parallel machines, the improved convergence rate is attained at no extra computational cost since processors that would otherwise be sitting idle are utilized to provide the better convergence. On serial machines, the algorithm is slower because of the extra time spent on multiple coarse scales, though in certain cases the improved convergence rate may justify this particularly in cases where other methods do not converge. In constant coefficient situations the algorithm is easily analyzed theoretically using Fourier methods on a single grid. The fact that only one grid is involved substantially simplifies convergence proofs. A feature of the algorithms is the use of a matched pair of operators: an approximate inverse for smoothing and a super-interpolation operator to move the correction from coarse to fine scales, chosen to optimize the rate of convergence.

64 citations


Patent
16 Nov 1987
TL;DR: In this article, an array processing system including a plurality of processing elements each including a processor and an associated memory module, the system further including a router network over which each processing element can transfer messages to other random processing elements, a mechanism by which a processor can transmit data to one of four nearest-neighbor processors.
Abstract: An array processing system including a plurality of processing elements each including a processor and an associated memory module, the system further including a router network over which each processing element can transfer messages to other random processing elements, a mechanism by which a processor can transmit data to one of four nearest-neighbor processors. In addition, the processing elements are divided into groups each with four processing elements, in which one of the processing elements can access data in the other processing elements' memory modules. The routing network switches messages in a plurality of switching stages, with each stage connecting to the next stage through communications paths that are divided into groups, each group, in turn being associated with selected address signals. A communications path continuity test circuit associated with each path detects any discontinuity in the communications path and disables the path. Thus, the stage may attempt to transfer a message over another path associated with the same address.

60 citations


Patent
27 Mar 1987
TL;DR: In this article, a massively parallel vector computer comprises a set of vector processing nodes, each node including a main processor for controlling access to a random access memory through an internal bus, and external busses for interfacing with the internal bus.
Abstract: A massively parallel vector computer comprises a set of vector processing nodes, each node including a main processor for controlling access to a random access memory through an internal bus and a set of ports for interfacing external busses to the internal bus. The external busses interconnect pairs of nodes to form a network through which data may be transmitted from the random access memory in any one node to the random access memory in any other node in the network. Each vector processing node also includes a vector memory accessed through a local bus, the local and internal busses communicating via an additional port controlled by the main processor. A vector processor within each node performs operations on vectors stored in the vector memory and stores the results in the vector memory. A peripheral processing network comprises a set of peripheral processing nodes interconnected via further busses, and wherein selected peripheral processing nodes are coupled to selected vector processing nodes. The peripheral processing nodes are adapted to transmit data to and receive data from peripheral devices.

48 citations


Book
01 Nov 1987
TL;DR: Programming parallel processors, Programming parallel processors , مرکز فناوری اطلاعات و اصاع رسانی, کوشا�رزی شاوρز شرقعات, £1.5bn worth of parallel processors will be built in the next five years.
Abstract: Programming parallel processors , Programming parallel processors , مرکز فناوری اطلاعات و اطلاع رسانی کشاورزی

Book
01 Nov 1987
TL;DR: In this age of modern era, the use of internet must be maximized, and to get the on-line introduction to distributed and parallel computing book, as the world window, as many people suggest.
Abstract: In this age of modern era, the use of internet must be maximized. Yeah, internet will help us very much not only for important thing but also for daily activities. Many people now, from any level can use internet. The sources of internet connection can also be enjoyed in many places. As one of the benefits is to get the on-line introduction to distributed and parallel computing book, as the world window, as many people suggest.

Journal ArticleDOI
TL;DR: A new type of two-dimensional cellular automation method is introduced for computation of magnetohydrodynamic fluid systems and it is possible to compute both Lorentz-force and magnetic-induction effects.
Abstract: A new type of two-dimensional cellular automation method is introduced for computation of magnetohydrodynamic fluid systems. Particle population is described by a 36-component tensor referred to a hexagonal lattice. By appropriate choice of the coefficients that control the modified streaming algorithm and the definition of the macroscopic fields, it is possible to compute both Lorentz-force and magnetic-induction effects. The method is local in the microscopic space and therefore suited to massively parallel computations.

Proceedings ArticleDOI
01 Oct 1987
TL;DR: Algorithms and programming techniques needed to develop SUM (Simulation Using Massively parallel computers), a relaxation-based circuit simulator on the Connection Machine, a massively parallel processor with up to 65536 processors are described.
Abstract: Accurate circuit simulation is a very important step in the design of high performance integrated circuits. The ever increasing size of integrated circuits requires the use of an inordinate amount of computer time to be spent in circuit simulation. Parallel processors have been considered to speed up the simulation process. Massively parallel computers have been made available recently and present a new interesting paradigm for expensive CAD applications. This paper describes algorithms and programming techniques needed to develop SUM (Simulation Using Massively parallel computers), a relaxation-based circuit simulator on the Connection Machine, a massively parallel processor with up to 65536 processors. SUM can simulate circuits at almost constant CPU time per iteration, regardless of circuit size. SUM can simulate very large circuits. Circuit simulators running on the largest super computers can run circuits of comparable size, however SUM is easily scalable as the number of processors in the Connection Machine increases, with almost no increase in CPU time.

Book
01 Oct 1987
TL;DR: This book discusses Deterministic simulation of idealized parallel computers on more realistic ones, and Systolic algorithms for computing the visibility polygon and triangulation of a polygonal region.
Abstract: Contents of this book are the following: Preparata: Deterministic simulation of idealized parallel computers on more realistic ones; Convex hull of randomly chosen points from a polytope; Dataflow computing; Parallel in sequence; Towards the architecture of an elementary cortical processor; Parallel algorithms and static analysis of parallel programs; Parallel processing of combinatorial search; Communications; An O(nlogn) cost parallel algorithms for the single function coarsest partition problem; Systolic algorithms for computing the visibility polygon and triangulation of a polygonal region; and RELACS - A recursive layout computing system. Parallel linear conflict-free subtree access.

Journal ArticleDOI
01 Mar 1987
TL;DR: The results indicate that with the use of key features and the combination of a variety of powerful search patterns, the pyramidlike structure is effective and efficient for supporting parallel and hierarchical object recognition algorithms.
Abstract: Pyramidlike parallel hierarchical structures have been shown to be suitable for many computer vision tasks and have the potential for achieving the speeds needed for the real-time processing of real-world images. Algorithms are being developed to explore the pyramid's massively parallel and shallowly serial-hierarchical computing ability in an integrated system that combines both low-level and higher level vision tasks. Micromodular transforms are used to embody the program's knowledge of the different objects it must recognize. Pyramid vision programs are described that, starting with the image, use transforms that assess key features to dynamically imply other feature-detecting and characterizing transforms and additional top-down model-driven processes to apply. Program performance is presented for four real-world images of buildings. The use of key features in pyramid vision programs and the related search and control issues are discussed. To expedite the detection of various key features, feature-adaptable windows are developed. In addition to image-driven bottom-up and model-driven top-down processing, lateral search is used and is shown to be helpful, efficient, and feasible. The results indicate that with the use of key features and the combination of a variety of powerful search patterns, the pyramidlike structure is effective and efficient for supporting parallel and hierarchical object recognition algorithms.

Proceedings Article
01 Oct 1987
TL;DR: Parallel Supercomputing Today and the Cedar approach and Questions and Unexpected Answers in Concurrent Computation (G. Gottlieb).
Abstract: Parallel Supercomputing Today and the Cedar Approach (D.J. Kuck et al.). An Overview of the NYU Ultracomputer Project (A. Gottlieb). Questions and Unexpected Answers in Concurrent Computation (G.C. Fox). An Introduction to the IBM Research Parallel Processor Prototype (RP3) (G.F. Pfister et al.). Large Scale Parallel Computation on a Loosely Coupled Array of Processors (E. Clementi, J. Detrich). The Manchester Dataflow Computing System (J. Gurd, C. Kirkham, W. Bohm). The Cornell Parallel Supercomputing Effort (A. Nicolau). The GF11 Parallel Computer (J. Beetem, M. Denneau, D. Weingarten). Index.

Patent
20 May 1987
TL;DR: In this paper, a method and apparatus for improving the utilization of a parallel computer by allocating the resources of the parallel computer among a large number of users is described, which is accomplished by means for dividing the parallel computers into a plurality of processor arrays, each of which can be used independently of the others.
Abstract: A method and apparatus are described for improving the utilization of a parallel computer by allocating the resources of the parallel computer among a large number of users. A parallel computer is subdivided among a large number of users to meet the requirements of a multiplicity of data bases and programs that are run simultaneously on the computer. This is accomplished by means for dividing the parallel computer into a plurality of processor arrays, each of which can be used independently of the others. This division is made dynamically in the sense that the division can readily be altered and indeed in a time sharing environment may be altered between two successive time slots of the frame. Further, the parallel computer is organized so as to permit the simulation of additional parallel processors by each physical processor in the array and to provide for communication among the simulated parallel processors. Means are also provided for storing virtual processors in virtual memory. As a result of this design, it is possible to build a parallel computer with a number of physical processors on the order of 1,000,000 and a number of virtual processors on the order of 1,000,000,000,000. Moreover, since the computer can be dynamically reconfigured into a plurality of independent processor arrays, a device this size can be shared by a large number of users with each user operating on only a portion of the entire computer having a capacity appropriate for the problem then being addressed.

Journal ArticleDOI
TL;DR: A distributed diagnostic and structuring algorithm for the RECBAR is presented that enables the architecture to detect faults and structure itself accordingly within 2 · log2(L) + 1 time steps, thus making it a truly fault tolerant architecture.

Journal ArticleDOI
TL;DR: Computer simulations show that this method accounts as well as the Minimal Mapping Theory for apparent-motion phenomena, although some differences exist, and supports Ullman's (1979) suggestion that the visual system separates the structure-from-motion process into two stages.
Abstract: Two solutions for the correspondence problem for long-range motion are investigated. The first is a modification of the Minimal Mapping Theory (S. Ullman: The Interpretation of Visual Motion, MIT Press, Cambridge, 1979) that is implemented by a massively parallel network. In this network, every two units are interconnected, and thus, its convergence is fast and relatively independent of the number of image features. Computer simulations show that our method accounts as well as the Minimal Mapping Theory for apparent-motion phenomena, although some differences exist. Mathematical proofs provide conditions for the convergence of the network. The second 'solution' for the correspondence problem is called the Structural Theory. This theory assumes that the three-dimensional structure of viewed objects does not change fast in time. Then, the theory looks for the correspondence and three-dimensional structure that best fulfill this assumption. A massively parallel network implementation of this theory is also possible. However, its performance is poor due to the high complexity of its solution space. This supports Ullman's (1979) suggestion that the visual system separates the structure-from-motion process into two stages. First, a stage for motion measurement, and then a stage for structure recovery.

Patent
28 Aug 1987
TL;DR: In this article, a massively parallel computer (330) allocates the computer's resources among a large number of users (310A-310N) in a time sharing environment.
Abstract: A system for a massively parallel computer (330) allocates the computer's resources among a large number of users (310A-310N) in a time sharing environment. This is accomplished by dynamically dividing the computer (330) into independently usable processor arrays (400) that allows a multiplicity of data bases (470, 480, 490) to be run simultaneously. Further, each of the 1,000,000 possible physical processors (PPU) are able to simulate additional virtual processors on the order of 1,000,000,000,000. A device of this size can be shared by a large number of users (310A-310N) with each user operating on only a portion of the entire computer (330).

Proceedings Article
01 Jan 1987
TL;DR: In this article, an interconnection network is presented for a massively parallel fine-grained single-instruction, multiple-data (SIMD) system, called the polymorphic-torus, whose design goal is to provide high communication bandwidth under a packaging constraint.
Abstract: An interconnection network is presented for a massively parallel fine-grained single-instruction, multiple-data (SIMD) system, called the polymorphic-torus, whose design goal is to provide high communication bandwidth under a packaging constraint. This goal is achieved by the polymorphic principle, which injects switches with circuit-switching capability into every node of a base network (e.g. a two-dimensional torus). The polymorphic approach maintains wiring complexity of the base network but effectively increases the communication bandwidth due to its flexibility in reconfiguring the switches individually and dynamically to match the algorithm graph. Formal analysis on interpackage wiring (the flux) and the intrapackage wiring (the fluid) is given for the polymorphic-torus and the related torus networks. Three algorithms, namely, the Boolean, the max/min, and the sum operations, are developed to illustrate the use of the polymorphic principle in enhancing the communication bandwidth with no penalty of the interpackage wiring complexity. >

Journal ArticleDOI
TL;DR: Stochastic optical phenomena can be used for generation of random-bit arrays with a prescribed probability law that may vary in space and in time for implementation of Monte Carlo algorithms, including simulated annealing.
Abstract: Stochastic optical phenomena can be used for generation of random-bit arrays with a prescribed probability law that may vary in space and in time. This is useful for implementation of Monte Carlo algorithms, including simulated annealing, with a high recursion rate by monolithic massively parallel arrays of processing elements. Each processing element includes a photosensor. The use and advantages of speckle and of microchannel-plate image intensifiers amplifying single-photon events are considered.

Journal ArticleDOI
TL;DR: The special features of parallel programming consist mainly in dividing the problem into segments that will execute in parallel, and determine how the processor will communicate and synchronize with one another.
Abstract: The special features of parallel programming are discussed These consist mainly in dividing the problem into segments that will execute in parallel, and determine how the processor will communicate and synchronize with one another The role of architecture is described, and the task of expressing problems in parallel form is addressed The issue of efficiency is examined

Journal ArticleDOI
TL;DR: In this article, discrete analogues of variational inequalities and quasi-variational inequalities (Q.V.I.), encountered in stochastic control and mathematical physics, are discussed. And it is shown that those discrete V.I.'s and Q.V.'s can be written in the fixed point formx=Tx such that eitherT or some power ofT is a contraction.
Abstract: In this paper, discrete analogues of variational inequalities (V.I.) and quasi-variational inequalities (Q.V.I.), encountered in stochastic control and mathematical physics, are discussed. It is shown that those discrete V.I.'s and Q.V.I.'s can be written in the fixed point formx=Tx such that eitherT or some power ofT is a contraction. This leads to globally convergent iterative methods for the solution of discrete V.I.'s and Q.V.I.'s, which are very suitable for implementation on parallel computers with single-instruction, multiple-data architecture, particularly on massively parallel processors (M.P.P.'s).

Proceedings Article
01 Jan 1987
TL;DR: Simulations of the general connection system, and its implementation on the Connection Machine, indicate that the time and space requirements are proportional to the product of the average number of connections per neuron and the diameter of the interconnection network.
Abstract: Neural networks have attracted much interest recently, and using parallel architectures to simulate neural networks is a natural and necessary application. The SIMD model of parallel computation is chosen, because systems of this type can be built with large numbers of processing elements. However, such systems are not naturally suited to generalized communication. A method is proposed that allows an implementation of neural network connections on massively parallel SIMD architectures. The key to this system is an algorithm that allows the formation of arbitrary connections between the "neurons". A feature is the ability to add new connections quickly. It also has error recovery ability and is robust over a variety of network topologies. Simulations of the general connection system, and its implementation on the Connection Machine, indicate that the time and space requirements are proportional to the product of the average number of connections per neuron and the diameter of the interconnection network.

Journal ArticleDOI
TL;DR: Novel algorithmic techniques are described, such as vertical pipelining, subproblem partitioning, associative matching, and data duplication, that effectively exploit the massive parallelism available in fine-grained SIMD tree machines while avoiding communication bottlenecks.

Proceedings Article
23 Aug 1987
TL;DR: This paper offers an alternative competition model based upon a meta-network representation scheme called network regions that are analogous to net spaces in partitioned semantic networks that can be used in many ways to clarify the representational structure in massively parallel networks.
Abstract: Winner-take-all (WTA) structures are currently used in massively parallel (connectionist) networks to represent competitive behavior among sets of alternative hypotheses. However, this form of competition might be too rigid and not be appropriate for certain applications. For example, applications that involve noisy and erroneous inputs might mislead WTA structures into selecting a wrong outcome. In addition, for networks that continuously process input data, the outcome must dynamically change with changing inputs; WTA structures might "lock-in" on a previous outcome. This paper offers an alternative competition model for these applications. The model is based upon a meta-network representation scheme called network regions that are analogous to net spaces in partitioned semantic networks. Network regions can be used in many ways to clarify the representational structure in massively parallel networks. This paper focuses on how they are used to provide a flexible and adaptive competition model. Regions can be considered as representational units that represent the conceptual abstraction of a collection of nodes (or hypotheses). Through this higher-level abstraction, regions can better influence the collective behavior of nodes within the region. Several AI applications were used to test and evaluate this model.

01 Jul 1987
TL;DR: The Massively Parallel Processor is an ideal machine for computer experiments with simulated neural nets as well as more general cellular automata and the results on problem mapping and computational efficiency apply equally well to the neural nets of Hopfield, Hinton et al., and Geman and Gemen.
Abstract: The Massively Parallel Processor (MPP) is an ideal machine for computer experiments with simulated neural nets as well as more general cellular automata. Experiments using the MPP with a formal model neural network are described. The results on problem mapping and computational efficiency apply equally well to the neural nets of Hopfield, Hinton et al., and Geman and Geman.

Book ChapterDOI
25 Jun 1987
TL;DR: A classification is presented of both vector and parallel (i.e. SIMD and MIMD) computers, and two simple benchmarks are defined to assess performance.
Abstract: A classification is presented of both vector (i.e. SIMD) and parallel (i.e. MIMD) computers, and two simple benchmarks are defined to assess performance. Results are presented for the Cray X-MP vector computer, and the LCAP parallel system of ten FPS-164 computers.