scispace - formally typeset
Search or ask a question

Showing papers in "Concurrency and Computation: Practice and Experience in 1993"


Journal ArticleDOI
TL;DR: The concern in this paper is to show the expressiveness of MANIFOLD, the feasibility of its implementation and its usefulness in practice, and a series of small manifold programs which describe the skeletons of some adaptive recursive algorithms that are of particular interest in computer graphics.
Abstract: Management of the communications among a set of concurrent processes arises in many applications and is a central concern in parallel computing. In this paper we introduce MANIFOLD: a co-ordination language whose sole purpose is to describe and manage complex interconnections among independent, concurrent processes. In the underlying paradigm of this language the primary concern is not with what functionality the individual processes in a parallel system provide. Instead, the emphasis is on how these processes are interconnected and how their interaction patterns change during the execution life of the system. This paper also includes an overview of our implementation of MANIFOLD. As an example of the application of MANIFOLD, we present a series of small manifold programs which describe the skeletons of some adaptive recursive algorithms that are of particular interest in computer graphics. Our concern in this paper is to show the expressiveness of MANIFOLD, the feasibility of its implementation and its usefulness in practice. Issues regarding performance and optimization are beyond the scope of this paper.

160 citations


Journal ArticleDOI
TL;DR: ADAPTIVE provides an integrated environment for developing and experimenting with flexible transport system architectures that support lightweight and adaptive communication protocols for diverse multimedia applications running on high-performance networks.
Abstract: Computer communication systems must undergo significant changes to keep pace with the increasingly demanding and diverse multimedia applications that will run on the next generation of high-performance networks. To facilitate these changes, we are developing A Dynamically Assembled Protocol Transformation, Integration and evaluation Environment (ADAPTIVE). ADAPTIVE provides an integrated environment for developing and experimenting with flexible transport system architectures that support lightweight and adaptive communication protocols for diverse multimedia applications running on high-performance networks. Our approach employs a collection of reusable ‘building-block’ protocol mechanisms that may be composed together automatically based upon functional specifications. The resulting protocols execute in parallel on several target platforms including shared-memory and message-passing multiprocessors. ADAPTIVE provides a framework for (1) determining the functionality of customized lightweight protocol configurations that efficiently support multimedia applications and (2) mapping this functionality onto efficient parallel process architectures.

104 citations


Journal ArticleDOI
TL;DR: This work explains the steps involved in developing model programs and concludes that the study of programming paradigms provides an architectural vision of parallel scientific computing.
Abstract: We describe a programming methodology for computational science based on programming paradigms for multicomputers. Each paradigm is a class of algorithms that have the same control structure. For every paradigm, a general parallel program is developed. The general program is then used to derive two or more model programs, which solve specific problems in science and engineering. These programs have been tested on a Computing Surface and published with every detail open to scrutiny. We explain the steps involved in developing model programs and conclude that the study of programming paradigms provides an architectural vision of parallel scientific computing.

76 citations


Journal ArticleDOI
TL;DR: Experimental results suggest that initially allocating the same amount of work to each processor and letting the dynamic load balancing algorithm adjust the load during program execution yields very good performance.
Abstract: We describe a compiler and run-time system that allow data-parallel programs to execute on a network of heterogeneous UNIX workstations. The programming language supported is Dataparallel C, a SIMD language with virtual processors and a global name space. This parallel programming environment allows the user to take advantage of the power of multiple workstations without adding any message-passing calls to the source program. Because the performance of Individual workstations in a multi-user environment may change during the execution of a Dataparallel C program, the run-time system automatically performs dynamic load balancing. We present experimental results that demonstrate the usefulness of dynamic load-balancing In a multi-user environment These results suggest that initially allocating the same amount of work to each processor and letting the dynamic load balancing algorithm adjust the load during program execution yields very good performance. Hence neither the compiler nor the run-time system need a priori knowledge of the speeds of the machines that will participate in a program execution.

58 citations


Journal ArticleDOI
TL;DR: A model program for parallel execution of cellular automata on a multicomputer is developed and adapted for simulation of forest fires and numerical solution of Laplace's equation for stationary heat flow.
Abstract: We develop a model program for parallel execution of cellular automata on a multicomputer. The model program is then adapted for simulation of forest fires and numerical solution of Laplace's equation for stationary heat flow. The performance of the parallel program is analyzed and measured on a Computing Surface configured as a matrix of transputers with distributed memory.

42 citations


Journal ArticleDOI
TL;DR: In this article, the status of the Fortran translator for the Cedar computer at the end of March, 1991 is reported, followed by a discussion of the fortran77 to Cedar Fortran parallelizer that describes the techniques currently being implemented.
Abstract: This paper reports on the status of the Fortran translator for the Cedar computer at the end of March, 1991. A brief description of the Cedar Fortran language is followed by a discussion of the fortran77 to Cedar Fortran parallelizer that describes the techniques currently being implemented. A collection of experiments illustrate the e ectiveness of the current implementation, and point toward new approaches to be incorporated into the system in the near future.

40 citations


Journal ArticleDOI
TL;DR: In this article, the authors show that one can parallelize real scientific applications and obtain good performance with little effort if the right tools are used, such as mental, an object-oriented parallel processing system for both parallel and distributed architectures.
Abstract: Throughout much of the parallel processing community there is the sense that writing software for distributed-memory parallel processors Is subject to a ‘no pain—no gain’ rule: that In order to reap the benefits of parallel computation one must first suffer the pain of converting the application to run on a parallel machine. We believe this Is the result of Inadequate programming tools and not a problem Inherent to parallel processing. We will show that one can parallelize real scientific applications and obtain good performance with little effort If the right tools are used. Our vehicle for this demonstration is a 6000-line DNA and protein sequence comparison application that we have implemented in Mental, an object-oriented parallel processing system for both parallel and distributed architectures. We briefly describe the application and present performance information for both the Mentat version and a hand-coded parallel version of the application.

33 citations


Journal ArticleDOI
TL;DR: The Butterfly architecture is biased towards the use of remote Invocation for kernel operations that perform a significant number of memory references, and current architectural trends are likely to increase this bias in future machines, suggesting that straightforward parallelization of existing kernels is unlikely to yield acceptable performance.
Abstract: SUMMARY In the standard kernel organization on a bus-based multiprocessor, all processors share the code and data of the operating system; explicit synchronization is used to control access to kernel data structures. Distributed-memory multicomputers use an alternative approach, in which each instance of the kernel performs local operations directly and uses remote invocation to perform remote operations. Either approach to interkernel communication can be used in a large-scale shared-memory multiprocessor. In the paper we discuss the issues and architectural features that must be considered when choosing between remote memory access and remote invocation. We focus in particular on experience with the Psyche multiprocessor operating system on the BBN Butterfly Plus. We find that the Butterfly architecture is biased towards the use of remote Invocation for kernel operations that perform a significant number of memory references, and that current architectural trends are likely to increase this bias in future machines. This conclusion suggests that straightforward parallelization of existing kernels (e.g. by using semaphores to protect shared data) is unlikely in the future to yield acceptable performance. We note, however, that remote memory access is useful for small, frequently-executed operations, and is likely to remain so.

27 citations


Journal ArticleDOI
TL;DR: The underlying implementation of the ParaScope Editor is described, paying particular attention to the analysis and representation of dependence information and its reconstruction after changes to the program.
Abstract: The ParaScope Editor is a new kind of interactive parallel programming tool for developing scientific Fortran programs. It assists the knowledgeable user by displaying the results of sophisticated program analyses and by providing editing and a set of powerful interactive transformations. After an edit or parallelism-enhancing transformation, the ParaScope Editor incrementally updates both the analyses and source quickly. This paper describes the underlying implementation of the ParaScope Editor, paying particular attention to the analysis and representation of dependence information and its reconstruction after changes to the program.

22 citations


Journal ArticleDOI
TL;DR: This first paper defines the methodology to be used to analyse the benchmark results, and gives an example of a fully analysed application benchmark from General Relativity (GR1), which treats the execution time and absolute performance as functions of at least two variables, namely the problem size and the number of proecssors.
Abstract: This is the first of a series of papers on the Genesis distributed-memory benchmarks, which were developed under the European ESPRIT research program. The benchmarks provide a standard reference Fortran77 uniprocessor version, a distributed memory. MIMD version, and in some cases a Fortran90 version suitable for SIMD computers. The problems selected all have a scientific origin (mostly from physics or theoretical chemistry), and range from synthetic code fragments designed to measure the basic hardware properties of the computer (especially communication and synchronisation overheads), through commonly used library subroutines, to full application codes. This first paper defines the methodology to be used to analyse the benchmark results, and gives an example of a fully analysed application benchmark from General Relativity (GR1). First, suitable absolute performance metrics are carefully defined, then the performance analysis treats the execution time and absolute performance as functions of at least two variables, namely the problem size and the number of proecssors. The theoretical predictions are compared with, or fitted to, the measured results, and then used to predict (with due caution) how the performance might scale for larger problems and more processors than were actually available during the benchmarking. Benchmark measurements are given primarily for the German SUPRENUM computer, but also for the IBM 3083J, Convex C210 and a Parsys Supernode with 32 T800-20 transputers.

20 citations


Journal ArticleDOI
TL;DR: The paper discusses algorithms and programs for electron density averaging using a distributed memory MIMD system and uses a user controlled shared virtual memory and a dynamic load-balancing mechanism.
Abstract: The paper discusses algorithms and programs for electron density averaging using a distributed memory MIMD system. Electron density averaging is a computationally intensive step needed for phase refinement and extension in the computation of the 3-D structure of macromolecules like proteins and viruses. The determination of a single structure may require thousands of hours of CPU time for traditional supercomputers. The approach discussed in this paper leads to a reduction by one to two orders of magnitude of the computing time. The program runs on an Intel iPSC/860 and on the Touchstone Delta system and uses a user controlled shared virtual memory and a dynamic load-balancing mechanism.

Journal ArticleDOI
TL;DR: The paper presents a comparative analysis for algorithms that map pyramids onto hypercubes based on some important performance measures from graph theory and actual results from a Connection Machine system CM-2 containing 16K processors.
Abstract: The paper presents a comparative analysis for algorithms that map pyramids onto hypercubes. The analysis is based on some important performance measures from graph theory and actual results from a Connection Machine system CM-2 containing 16K processors. Connection Machine results are presented for pyramid algorithms that compute the perimeter of objects, apply 2-dimensional convolution, and segment images.

Journal ArticleDOI
TL;DR: The algorithm focuses on the reduction of global communication by exploring the coherence properties in the processor structure and designed rules of decomposition to efficiently handle the cases where the required number of processors is greater than available.
Abstract: We propose an algorithm for solving region-to-region visibility problems on digital terrain models using data parallel machines. Since global communication is the bottleneck in this kind of algorithm, the algorithm we propose focuses on the reduction of global communication. The algorithm analyses a strip of the source region at a time and sweeps through the source strip by strip. At most four sweeps are needed for the analysis. By exploring the coherence properties in the processor structure, global communication is minimized and complexity is substantially improved. Furthermore, all global write operations are exclusive and concurrency in global read operations is minimized. Since the problem size is usually large, we also designed rules of decomposition to efficiently handle the cases where the required number of processors is greater than available. The algorithm has been implemented on a Connection Machine CM-2, and results of computational experiments are presented.

Journal ArticleDOI
TL;DR: The paper presents the Compass SIMD compiler technology developed while working on a number of SIMD compilers and shows how the increased understanding of the SIMD compilation process on each successive project and the differences in the targets themselves has affected the shape of each compiler.
Abstract: SIMD computer systems offer tremendous potential speed-ups but aggressive compilation strategies are required to realize this potential. The paper presents the Compass SIMD compiler technology developed while working on a number of SIMD compilers. Although the various targets have much in common, our increased understanding of the SIMD compilation process on each successive project and the differences in the targets themselves has affected the shape of each compiler.

Journal ArticleDOI
TL;DR: Two techniques for parallelizing on a MIMD multicomputer a class of learning algorithms (competitive learning) for artificial neural networks widely used in pattern recognition and understanding are reported.
Abstract: The paper reports two techniques for parallelizing on a MIMD multicomputer a class of learning algorithms (competitive learning) for artificial neural networks widely used in pattern recognition and understanding The first technique presented, following the divide et impera strategy, achieves O(n/p + logP) time for n neurons and P processors interconnected as a tree A modification of the algorithm allows the application of a systolic technique with the processors interconnected as a ring; this technique has the advantage that the communication time does not depend on the number of processors The two techniques are also compared on the basis of predicted and measured performance on a transputer-based MIMD machine As the number of processors grows the advantage of the systolic approach increases On the contrary, the divide et impera approach is more advantageous in the retrieving phase

Journal ArticleDOI
TL;DR: In both proposals, the similarities between the chosen data-partitioning strategy, the communications pattern of the visualization processes and the topology of the physical system architecture represent the key points and provide improved software design and efficiency.
Abstract: A solution is proposed to the problem of interactive visualization and rendering of volume data. Designed for parallel distributed memory MIMD architectures, the volume rendering system is based on the ray tracing (RT) visualization technique, the Sticks representation scheme (a data structure exploiting data coherence for the compression of classified data sets), the use of a slice-partitioning technique for the distribution of the data between the processing nodes and the consequent ray-data-flow parallelizing strategy. The system has been implemented on two different architectures: an inmos Transputer network and a hypercube nCUBE 6400 architecture. The high number of processors of this latter machine has allowed us to exploit a second level of parallelism (parallelism on image space, or parallelism on pixels) in order to arrive at a higher degree of scalability. In both proposals, the similarities between the chosen data-partitioning strategy, the communications pattern of the visualization processes and the topology of the physical system architecture represent the key points and provide improved software design and efficiency. Moreover, the partitioning strategy used and the network interconnection topology reduce the communications overhead and allow for an efficient implementation of a static load-balancing technique based on the prerendering of a low resolution image. Details of the practical issues involved in the parallelization process of volumetric RT, commonly encountered problems (i.e. termination and deadlock prevention) and the sw migration process between different architectures are discussed.

Journal ArticleDOI
TL;DR: By taking advantage of the sparsity structure of the problem, the SOR algorithm was successfully implemented on two massively parallel Single-Instruction-Multiple-Data machines: a Connection Machine CM-2 and a MasPar MP-1.
Abstract: Serial and parallel successive overrelaxation (SOR) solutions of specially structured large-scale quadratic programs with simple bounds are discussed. By taking advantage of the sparsity structure of the problem, the SOR algorithm was successfully implemented on two massively parallel Single-Instruction-Multiple-Data machines: a Connection Machine CM-2 and a MasPar MP-1. Computational results for the well known obstacle problems show the effectiveness of the algorithm. Problems with millions of variables have been solved in a few minutes on these massively parallel machines, and speed-ups of 90% or more were achieved.

Journal ArticleDOI
TL;DR: The use of the Mother distributed shared memory allowed us to run the same code on the Cray as it ran on the SPARCStatlons, and the authors did not require the complex cache-coherent memory semantics provided by, say, Ivy or Mach to run this application effectively.
Abstract: At the Supcrcomputing Research Center we have built a computing farm consisting of 16 SPARCStation. ELCs. The ELCs all support the Mother distributed shared memory, which has primitives to support efficient synchronization and use of the network and processors. Mother docs not support the traditional consistency semantics provided by, for example, Ivy or Mach external pagers. The first parallel application we ran on the farm was a Monte Carlo radiative heat transfer simulation. The performance we achieved on the farm was within an order of magnitude of the performance we would expect to achieve on a 16-processor model of the C90 supercomputer available from Cray Research. With this application we found that the use of the Mother distributed shared memory allowed us to run the same code on the Cray as we ran on the SPARCStatlons, and we did not require the complex cache-coherent memory semantics provided by, say, Ivy or Mach to run this application effectively.

Journal ArticleDOI
TL;DR: The Dining Philosophers problem is used to illustrate how Coloured Petri nets can overcome limitations, and it is shown how a collection of processes in the Occam programming language can be developed directly from the properties of the net.
Abstract: Petri nets are proposed as a general-purpose design and modelling tool for parallel programs. The advantages of Petri nets for this purpose are discussed, and a solution to the Dining Philosophers problem is developed using simple Place-Transition nets. The limitations of Place-Transition nets are described, and the Dining Philosophers problem is used to illustrate how Coloured Petri nets can overcome these limitations. A more complex example of a Coloured Petri net is then given, and it is shown how a collection of processes in the Occam programming language can be developed directly from the properties of the net. Another Petri net model of a simple process farm is given, and a solution is developed in Parallel C: this further highlights the suitability of Petri nets as a design tool for parallel programs.

Journal ArticleDOI
TL;DR: The nature, construction and creation of system software components are described and the structure of the system software is discussed with particular reference to optimising access to distributed resources.
Abstract: The Flagship system is a graph reduction machine having a distributed physical architecture. Although Flagship sits firmly in the declarative world, explicit state is supported to express the behaviour of the operating system. This state not only has to be isolated from the declarative aspects of the Flagship machine, but also has to be supported with respect to distribution. The mechanisms provided for maintaining consistency of state are discussed with respect to atomic actions at levels in the Flagship system. This approach is used to demonstrate how the software environment was supported by the basic execution mechanism of the machine. The nature, construction and creation of system software components are described and the structure of the system software is discussed with particular reference to optimising access to distributed resources.

Journal ArticleDOI
TL;DR: A performance model from which it is possible to derive several different parallel programming metrics is introduced and an example in which the tool is used successfully to improve the performance of a parallel application.
Abstract: The paper describes a new performance monitoring tool, called Tmon, that has been developed to help application programmers understand the run-time behavior of a parallel system and tune the performance of their programs. Tmon measures resource utilization and traces process activities transparently during execution. A global interrupt technique used by Tmon allows measurement tasks to be executed simultaneously on different processors and an accurate global clock to be maintained with minimal overhead. Experimental results indicate that both the accuracy and the overhead of the monitor are well within an acceptable range. We introduce a performance model from which it is possible to derive several different parallel programming metrics. A weighted critical path analysis tool is also presented that focusses the user's attention on those parts of the program whose modification would most improve performance. An example in which the tool is used successfully to improve the performance of a parallel application is also presented. Tmon is currently implemented on top of the Trollius Operating System and runs on a 74-node transputer-based multicomputer.

Journal ArticleDOI
TL;DR: This special issue consists of a selection of papers presented at the First International Symposium on High-Performance Distributed Computing (HPDC-1) sponsored by IEEE Computer Socicty and Syracuse University at Syracuse in September 1992, which provide a glimpse into the range of problcms that arise in the application of distributed computing systems to computationally intensive problems.
Abstract: The 1980s spawned a revolution in the world of computing-a move away from central mainframe-based computing to distributed networks of workstations. Today workstation servers are fast achieving the levels of CPU performance, memory capacity-and I/O bandwidth once available only in mainframes, at a cost orders of magnitude below that of a mainframe. Workstations are now being increasingly used to solve computationally intensive problems in science and engineering that once belonged exclusively to the domain of supercomputers. With the increasing power of workstations comes the intriguing possibility of using a network of these worksrations to solve computationally difficult problems. Such a distributed network can potentially provide the processing power of a supercomputer at a small fraction of the cost. Realization of this potcntial, however, requires advances on a number of fronts, from high-spccd communication networks and interfaces to programming languages and tools. The IEEE Symposium on High-Performance Disfributed Computing was established in 1992 to address the growing need for a forum where researchers working on each of these enabling technologies can meet and exchange ideas. By addressing all the key technologies for high-performance distributed computing in a single forum, it is hoped that the conference will foster interaction among researchers and encourage them to work towards the common goal of realizing the potentid of distributed computing systems. The topics covered by the symposium include high-speed network technologies and interfaces, high-speed communication protocols, distributed algorithms, operating systems, programming tools and paradigms, and applications. This special issue consists of a selection of papers presented at the First International Symposium on High-Performance Distributed Computing (HPDC-1) sponsored by IEEE Computer Socicty and Syracuse University at Syracuse in September 1992. Together, they provide a glimpse into the range of problcms that arise in the application of distributed computing systems to computationally intensive problems.

Journal ArticleDOI
Gheorghe Almasi1, D. Hale2, T. McLuckie2, Jean Luc Bell2, A. Gordon2 
TL;DR: A simple theoretical model is presented that agrees well with measurements and allows speed-up to be predicted from a knowledge of the ratio of computation to communication, which can be determined empirically before the program is parallellzed.
Abstract: We report significant speed-up for seismic migration running in parallel on networkconnected IBM RISC/6000 workstations, A sustained performance of 15 MFLOP is obtained on a single-entry-level model 320, and speed-ups as high as 5 are obtained for six workstations connected by Ethernet or token ring. Our parallel software uses remote procedure calls provided by NCS (Network Computing System). We have run over a dozen workstations in parallel, but speed-ups become limited by network data rate. Fiber-optic communication should allow much greater speed-ups, and we describe our preliminary results with the fiberoptic serial link adapter of the RISC/6000. We also present a simple theoretical model that agrees well with our measurements and allows speed-up to be predicted from a knowledge of the ratio of computation to communication, which can be determined empirically before the program is parallellzed. We conclude with a brief discussion of alternative software approaches and programming models for network-connected parallel systems. In particular, our program was recently ported to PVM and Linda, and preliminary measurements yield speed-ups very close to those described here.

Journal ArticleDOI
TL;DR: An implementation of the lattice Boltzmann method on a homogeneous cluster of IBM RISC System/6000 superscalar workstations is presented.
Abstract: An implementation of the lattice Boltzmann method on a homogeneous cluster of IBM RISC System/6000 superscalar workstations is presented.

Journal ArticleDOI
TL;DR: It is shown that the distribution of applications across disparate supcrcomputing platforms Is feasible and has reasonable performance.
Abstract: We describe the implementation and performance of a three dimensional particle simulation distributed between a Thinking Machines CM-2 and a Cray Y-MP. These are connected by a combination of two high-speed networks: a high-performance parallel interface (HIPPI) and an optical network (UltraNet). This is the first application to use this configuration at NASA Ames Research Center. We describe our experience implementing and using the application and report the results of several timing measurements. We show that the distribution of applications across disparate supercomputing platforms is feasible and has reasonable performance. In addition, several practical aspects of the computing environment are discussed.

Journal ArticleDOI
Calton Pu1, Danilo Florissi1, Patricia Scares1, Philip S. Yu1, Kun-Luno Wu2 
TL;DR: Simulation results show that the active-sender/passive-receiver policy is the method of choice In most cases, and AH active policies perform far better than the policy without remote caching even in the degenerated case where each node is equally loaded.
Abstract: In a distributed system, data servers (file systems and databases) can easily become bottlenecks. We propose an approach to offloading data access requests from overloaded data servers to nodes that are Idle or less busy. This approach is referred to as remote caching, and the idle or less busy nodes are called mutual servers as they help out the busy server nodes on data accesses. In addition to server and client local caches, frequently accessed data are cached in the main memory of mutual servers, thus improving the data access time in the system. We evaluate several data propagation strategics among data servers and mutual servers. These include policies in which senders are active/passive and receivers are active/passive in initiating data propagation. For example, an active sender takes the initiative to offload data onto a passive receiver. Simulation results show that the active-sender/passive-receiver policy is the method of choice In most cases. Active-Sender policies are best able to exploit the main memory of other Idle nodes in the expected normal condition where some nodes are overloaded and others are less loaded. AH active policies perform far better than the policy without remote caching even in the degenerated case where each node is equally loaded.

Journal ArticleDOI
TL;DR: The paper describes techniques which have been developed to provide efficient global communications for machines based on these and similar processors, implemented by a packet-routing kernel adopting a standard, portable communications interface.
Abstract: Modern multicomputer components such as the inmos T800 and the Texas Instruments TMS320C40 provide hardware which supports efficient communication between directly connected processors. The paper describes techniques which have been developed to provide efficient global communications for machines based on these and similar processors. This service is implemented by a packet-routing kernel adopting a standard, portable communications interface. The design of this kernel is discussed with particular emphasis on the manner in which it guarantees reliable, lightweight, deadlock-free communication for arbitrary interconnection topologies. The kernel has been developed and tested on transputers. Extensive results on the performance of the router are presented to demonstrate that the substantial advantages associated with its well founded, structured design are not at odds with high efficiency. This router has been used to support several multi-transputer environments, including a commercially available POSIX-conformant operating system and a distributed occam environment freed from conventional configuration constraints.

Journal ArticleDOI
TL;DR: This paper addresses the engineering aspects of data dependence testing, particularly focusing on details that are necessary in any competent implementation of a data dependence test.
Abstract: Many papers have been written dealing with the science of data dependence tests, particularly for dependence of arrays in loops; the techniques typically reduce the dependence problem to an algebraic problem, then solve the algebraic problem by an algorithm which has some desired blend of efficiency, generality or precision. While a sound theoretical basis is necessary for dependence-based tools, these papers often leave out many implementation details. This paper addresses the engineering aspects of data dependence testing, particularly focusing on details that are necessary in any competent implementation of a data dependence test.

Journal ArticleDOI
TL;DR: The paper describes the overall open systems-based MOVIE design and itemizes currently implemented, developed and planned components of the system.
Abstract: MOVIE (Multitasking Object-oriented Visual Interactive Environment) is a new software system for high-performance distributed computing (HPDC), currently in the advanced design and implementation stage at Northeast Parallel Architectures Center (NPAC), Syracuse University. The MOVIE system is structured as a multiserver network of interpreters of the high-level object-oriented programming language MovieScript. MovieScript derives from PostScript and extends it in the C++ syntax-based object-oriented interpreted style towards 3D graphics, high-performance computing and general-purpose high-level communication protocol for distributed and MIMD-parallel computing. The paper describes the overall open systems-based MOVIE design and itemizes currently implemented, developed and planned components of the system.

Journal ArticleDOI
TL;DR: The performance evaluation process for a massively parallel distributed-memory SIMD computer is described generally, and the performance in basic computation, grid communication, and computation with grid communication is analysed.
Abstract: The performance evaluation process for a massively parallel distributed-memory SIMD computer is described generally. The performance in basic computation, grid communication, and computation with grid communication is analysed. A practical performance evaluation and analysis study is done for the Connection Machine 2, and conclusions about its performance are drawn.