scispace - formally typeset
Search or ask a question

Showing papers in "Scientific Programming in 1993"


Journal ArticleDOI
TL;DR: The current language definition and programming style of pC++ are described and exampies of parallel linear algebra operations are presented and a fast Poisson solver is described in complete detail.
Abstract: pC++ is an object-parallel extension to the C++ programming language. This paper describes the current language definition and illustrates the programming style. Exampies of parallel linear algebra operations are presented and a fast Poisson solver is described in complete detail.

137 citations


Journal ArticleDOI
TL;DR: A new method for scheduling beyond basic blocks called SHACOOF is presented, which takes advantage of a conventional, high quality basic block Scheduler by first suppressing selected subsequences of instructions and then scheduling the modified sequence of instructions using the basic block scheduler.
Abstract: Instruction scheduling algorithms are used in compilers to reduce run-time delays for the compiled code by the reordering or transformation of program statements, usually at the intermediate language or assembly code level. Considerable research has been carried out on scheduling code within the scope of basic blocks, i.e., straight line sections of code, and very effective basic block schedulers are now included in most modern compilers and especially for pipeline processors. In previous work Golumbic and Rainis: IBM J. Res. Dev., Vol. 34, pp.93-97, 1990, we presented code replication techniques for scheduling beyond the scope of basic blocks that provide reasonable improvements of running time of the compiled code, but which still leaves room for further improvement. In this article we present a new method for scheduling beyond basic blocks called SHACOOF. This new technique takes advantage of a conventional, high quality basic block scheduler by first suppressing selected subsequences of instructions and then scheduling the modified sequence of instructions using the basic block scheduler. A candidate subsequence for suppression can be found by identifying a region of a program control flow graph, called an S-region, which has a unique entry and a unique exit and meets predetermined criteria. This enables scheduling of a sequence of instructions beyond basic block boundaries, with only minimal changes to an existing compiler, by identifying beneficial opportunities to cover delays that would otherwise have been beyond its scope.

62 citations


Journal ArticleDOI
TL;DR: C++ classes that simplify development of adaptive mesh refinement (AMR) algorithms are described, which have allowed us to extend the original AMR algorithm to other problems with greatly reduced development time.
Abstract: We describe C++ classes that simplify development of adaptive mesh refinement (AMR) algorithms. The classes divide into two groups, generic classes that are broadly useful in adaptive algorithms, and application-specific classes that are the basis for our AMR algorithm. We employ two languages, with C++ responsible for the high-level data structures, and Fortran responsible for low-level numerics. The C++ implementation is as fast as the original Fortran implementation. Use of inheritance has allowed us to extend the original AMR algorithm to other problems with greatly reduced development time.

60 citations


Journal ArticleDOI
TL;DR: The extension of C to C to $C^H$ for numerical computation of real numbers, a general-purpose block-structured interpretive programming Ianguage that retains most features of C from the scientific computing point of view is described.
Abstract: We have developed a general-purpose block-structured interpretive programming Ianguage. The syntax and semantics of this language called $C^H$ are similar to C. $C^H$ retains most features of C from the scientific computing point of view. In this paper, the extension of C to $C^H$ for numerical computation of real numbers will be described. Metanumbers of −0.0, 0.0, Inf, −Inf, and NaN are introduced in $C^H$. Through these metanumbers, the power of the IEEE 754 arithmetic standard is easily available to the programmer. These metanumbers are extended to commonly used mathematical functions in the spirit of the IEEE 754 standard and ANSI C. The definitions for manipulation of these metanumbers in I/O; arithmetic, relational, and logic operations; and built-in polymorphic mathematical functions are defined. The capabilities of bitwise, assignment, address and indirection, increment and decrement, as well as type conversion operations in ANSI C are extended in $C^H$. In this paper, mainly new linguistic features of $C^H$ in comparison to C will be described. Example programs programmed in $C^H$ with metanumbers and polymorphic mathematical functions will demonstrate capabilities of CH in scientific computing.

51 citations


Journal ArticleDOI
TL;DR: The handling of complex numbers in the C^H programming language will be described and sample programs show that a computer language that does not distinguish the sign of zeros in complex numbers can also handle the branch cuts of multiple-valued complex functions effectively so long as it is appropriately designed and implemented.
Abstract: The handling of complex numbers in the $C^H$ programming language will be described in this paper. Complex is a built-in data type in $C^H$. The I/O, arithmetic and relational operations, and built-in mathematical functions are defined for both regular complex numbers and complex metanumbers of ComplexZero, Complexlnf, and ComplexNaN. Due to polymorphism, the syntax of complex arithmetic and relational operations and built-in mathematical functions are the same as those for real numbers. Besides polymorphism, the built-in mathematical functions are implemented with a variable number of arguments that greatly simplify computations of different branches of multiple-valued complex functions. The valid lvalues related to complex numbers are defined. Rationales for the design of complex features in $C^H$ are discussed from language design, implementation, and application points of views. Sample $C^H$ programs show that a computer language that does not distinguish the sign of zeros in complex numbers can also handle the branch cuts of multiple-valued complex functions effectively so long as it is appropriately designed and implemented.

20 citations


Journal ArticleDOI
TL;DR: This work implements 2D electromagnetic finite element scattering code in Mentat, an object-oriented parallel processing system, and presents performance results for both a Mentat and a hand-coded parallel Fortran version.
Abstract: The conventional wisdom in the scientific computing community is that the best way to solve large-scale numerically intensive scientific problems on today's parallel MIMD computers is to use Fortran or C programmed in a data-parallel style using low-level message-passing primitives. This approach inevitably leads to nonportable codes and extensive development time, and restricts parallel programming to the domain of the expert programmer. We believe that these problems are not inherent to parallel computing but are the result of the programming tools used. We will show that comparable performance can be achieved with little effort if better tools that present higher level abstractions are used. The vehicle for our demonstration is a 2D electromagnetic finite element scattering code we have implemented in Mentat, an object-oriented parallel processing system. We briefly describe the application. Mentat, the implementation, and present performance results for both a Mentat and a hand-coded parallel Fortran version.

14 citations


Journal ArticleDOI
TL;DR: An object-oriented implementation of numerical integration methods for solving ordinary differential equations that has software components that are common to many different integration methods identified and implemented in such a way that they can be reused.
Abstract: We describe an object-oriented implementation of numerical integration methods for solving ordinary differential equations. Software components that are common to many different integration methods have been identified and implemented in such a way that they can be reused. This facilitates the design of a uniform user interface and makes the task of implementing a new integration method fairly modest. The sharing of code in this type of implementation also allows for less subjective comparisons of the result from different integration methods.

11 citations


Journal ArticleDOI
TL;DR: A mechanism by which a compiler can load domain-specific and class-specific optimizations on an as needed basis and a simple interface that will enable this feature will be presented.
Abstract: So far C++ has made few inroads into the realm of scientific computing, which is still largely dominated by Fortran. Of the few attempts that have been made to apply C++ to numerically intensive codes, the results have often suffered from severe performance problems. A careful examination of these problems indicates that they are unlikely to be solved by incremental improvements in compiler optimization technology. The flow of this article will: motivate the discussion by describing a common efficiency problem that arises when numerical codes are programmed in C++; discuss some potential solution strategies that we believe are viable in the near term, but not over the long term; introduce a mechanism by which a compiler can load domain-specific and class-specific optimizations on an as needed basis. A simple interface that will enable this feature will be presented. Althoug our immediate motivation is that of numerically intensive codes, our approach is applicable to all application domains.

9 citations


Journal ArticleDOI
TL;DR: An algorithm that can be used to compute the sensitivity of a dynamical system to a selected parameter is presented along with a driver routine for evaluating the output of a model and its sensitivity to a single parameter.
Abstract: This article introduces basic principles of first order sensitivity analysis and presents an algorithm that can be used to compute the sensitivity of a dynamical system to a selected parameter. This analysis is performed by extending with sensitivity equations the set of differential equations describing the dynamical system. These additional equations require the evaluation of partial derivatives, and so a technique known as the table algorithm, which can be used to exactly and automatically compute these derivatives, is described. A C++ class which can be used to implement the table algorithm is presented along with a driver routine for evaluating the output of a model and its sensitivity to a single parameter. The use of this driver routine is illustrated with a specific application from environmental hazards modeling.

8 citations


Journal ArticleDOI
TL;DR: Experimental results from simulations of half a million particles using multiple methods support the belief that object-oriented approaches are eminently suited to programming distributed-memory machines in a manner that (to the applications programmer) is architecture-independent.
Abstract: This article reports on experiments from our ongoing project whose goal is to develop a C++ library which supports adaptive and irregular data structures on distributed memory supercomputers. We demonstrate the use of our abstractions in implementing "tree codes" for large-scale N-body simulations. These algorithms require dynamically evolving treelike data structures, as well as load-balancing, both of which are widely believed to make the application difficult and cumbersome to program for distributed-memory machines. The ease of writing the application code on top of our C++ library abstractions (which themselves are application independent), and the low overhead of the resulting C++ code (over hand-crafted C code) supports our belief that object-oriented approaches are eminently suited to programming distributed-memory machines in a manner that (to the applications programmer) is architecture-independent. Our contribution in parallel programming methodology is to identify and encapsulate general classes of communication and load-balancing strategies useful across applications and MIMD architectures. This article reports experimental results from simulations of half a million particles using multiple methods.

8 citations


Journal ArticleDOI
TL;DR: A set of parallel array classes, MetaMP, is discussed, implemented in C++ and interface to the PVM or Intel NX message-passing systems, that maps well to both distributed-memory and shared-memory architectures.
Abstract: We discuss a set of parallel array classes, MetaMP, for distributed-memory architectures. The classes are implemented in C++ and interface to the PVM or Intel NX message-passing systems. An array class implements a partitioned array as a set of objects distributed across the nodes - a "collective" object. Object methods hide the low-level message-passing and implement meaningful array operations. These include transparent guard strips (or sharing regions) that support finite-difference stencils, reductions and multibroadcasts for support of pivoting and row operations, and interpolation/contraction operations for support of multigrid algorithms. The concept of guard strips is generalized to an object implementation of lightweight sharing mechanisms for finite element method (FEM) and particle-in-cell (PIC) algorithms. The sharing is accomplished through the mechanism of weak memory coherence and can be efficiently implemented. The price of the efficient implementation is memory usage and the need to explicitly specify the coherence operations. An intriguing feature of this programming model is that it maps well to both distributed-memory and shared-memory architectures.

Journal ArticleDOI
TL;DR: The role of tools such as design aides and project browsers is discussed, and the impact of a framework-based approach upon compilers is examined, and examples are drawn from the prototype C++ based environment.
Abstract: Frameworks are reusable object-oriented designs for domain-specific programs. In our estimation, frameworks are the key to productivity and reuse. However, frameworks require increased support from the programming environment. A framework-based environment must include design aides and project browsers that can mediate between the user and the framework. A framework-based approach also places new requirements on conventional tools such as compilers. This article explores the impact of object-oriented frameworks upon a programming environment, in the context of object-oriented finite element and finite difference codes. The role of tools such as design aides and project browsers is discussed, and the impact of a framework-based approach upon compilers is examined. Examples are drawn from our prototype C++ based environment.

Journal ArticleDOI
TL;DR: A reusable object-oriented array library is developed that encapsulates machine dependencies within this library, so that the optimization of both codes on different architec-tures will only involve modification to a single library.
Abstract: This article considers the development of a reusable object-oriented array library, as well as the use of this library in the construction of finite difference and finite element codes. The classes in this array library are also generic enough to be used to construct other classes specific to finite difference and finite element methods. We demonstrate the usefulness of this library by inserting it into two existing object-oriented scientific codes developed at Sandia National Laboratories. One of these codes is based on finite difference methods, whereas the other is based on finite element methods. Previously, these codes were separately maintained across a variety of sequential and parallel computing platforms. The use of object-oriented programming allows both codes to make use of common base classes. This offers a number of advantages related to optimization and portability. Optimization efforts, particularly important in large scientific codes, can be focused on a single library. Furthermore, by encapsulating machine dependencies within this library, the optimization of both codes on different architec-tures will only involve modification to a single library.

Journal ArticleDOI
TL;DR: A hierarchy of C++ classes is constructed that provides potential portability across parallel architectures and leverages the existing compiler technology for translating data-parallel programs onto both SIMD and MIMD hardware.
Abstract: Our goal is to apply the software engineering advantages of object-oriented programming to the raw power of massively parallel architectures To do this we have constructed a hierarchy of C++ classes to support the data-parallel paradigm Feasibility studies and initial coding can be supported by any serial machine that has a C++ compiler Parallel execution requires an extended Cfront, which understands the data-parallel classes and generates C* code (C* is a data-parallel superset of ANSI C developed by Thinking Machines Corporation) This approach provides potential portability across parallel architectures and leverages the existing compiler technology for translating data-parallel programs onto both SIMD and MIMD hardware

Journal ArticleDOI
TL;DR: An efficient scheme for implementing particle tracking with space charge effects on an INTEL iPSC/860 machine is described and experimental results show that a parallel efficiency of 75% can be obtained.
Abstract: Particle-tracking simulation is one of the scientific applications that is well suited to parallel computations. At the Superconducting Super Collider, it has been theoretically and empirically demonstrated that particle tracking on a designed lattice can achieve very high parallel efficiency on a MIMD Intel iPSC/860 machine. The key to such success is the realization that the particles can be tracked independently without considering their interaction. The perfectly parallel nature of particle tracking is broken if the interaction effects between particles are included. The space charge introduces an electromagnetic force that will affect the motion of tracked particles in three-dimensional (3-D) space. For accurate modeling of the beam dynamics with space charge effects, one needs to solve 3-D Maxwell field equations, usually by a particle-in-cell (PIC) algorithm. This will require each particle to communicate with its neighbor grids to compute the momentum changes at each time step. It is expected that the 3-D PIC method will degrade parallel efficiency of particle-tracking implementation on any parallel computer. In this paper, we describe an efficient scheme for implementing particle tracking with space charge effects on an INTEL iPSC/860 machine. Experimental results show that a parallel efficiency of 75% can be obtained.

Journal ArticleDOI
TL;DR: This work will outline the progress towards the implementation of a C++ compiler capable of incorporating class-specific optimizations, and include the strength reduction of class:: array address calculations, elimination of large temporaries, and the placement of asynchronous send/recv calls so as to achieve computation/communication overlap.
Abstract: Class-specific optimizations are compiler optimizations specified by the class implementor to the compiler. They allow the compiler to take advantage of the semantics of the particular class so as to produce better code. Optimizations of interest include the strength reduction of class:: array address calculations, elimination of large temporaries, and the placement of asynchronous send/recv calls so as to achieve computation/communication overlap. We will outline our progress towards the implementation of a C++ compiler capable of incorporating class-specific optimizations.

Journal ArticleDOI
TL;DR: This work discusses experiences with two tools for large grain (or "macro task") parallelism and suggests ways to improve the speed of scientific computations by utilizing more efficient algorithms, particularly those that support parallel computation.
Abstract: The first digital computers consisted of a single processor acting on a single stream of data. In this so-called "von Neumann" architecture, computation speed is limited mainly by the time required to transfer data between the processor and memory. This limiting factor has been referred to as the "von Neumann bottleneck". The concern that the miniaturization of silicon-based integrated circuits will soon reach theoretical limits of size and gate times has led to increased interest in parallel architectures and also spurred research into alternatives to silicon-based implementations of processors. Meanwhile, sequential processors continue to be produced that have increased clock rates and an increase in memory locally available to a processor, and an increase in the rate at which data can be transferred to and from memories, networks, and remote storage. The efficiency of compilers and operating systems is also improving over time. Although such characteristics limit maximum performance, a large improvement in the speed of scientific computations can often be achieved by utilizing more efficient algorithms, particularly those that support parallel computation. This work discusses experiences with two tools for large grain (or "macro task") parallelism.

Journal ArticleDOI
TL;DR: The motivation for using object-oriented techniques in scientific programming is clear: as researchers continue to attempt more and more ambitious models and algorithms, they rapidly run up against the limits that can be handled using traditional procedural languages such as Fortran.
Abstract: On April 25-27, 1993, Rogue Wave Software, in cooperation with SIA.\1, sponsored the first annual Object-Oriented Numerics Conference (Onl\--SKI '93) at Sunriver, Oregon. The intention was to bring together mathematicians, scientists, engineers, and programmers who are interested in object-oriented numerics: the use of modern objectoriented techniques in the design of software solutions to numerical problems. The conference was verv successful: there were over 100 attendees from nine countries and more than 40 articles were presented. The motivation for using object-oriented techniques in scientific programming is clear: as researchers continue to attempt more and more ambitious models and algorithms, they rapidly run up against the limits that can be handled using traditional procedural languages such as Fortran. Object-oriented programming allows you to effectively code algorithms at an appropriate level of abstraction, and not have to worry about irrelevant details. It does this by allowing you to package related code and data together into objects, and then work with the data through a well-defined interface, rather than working with the data directly. Object-oriented languages have been around for years. Why, then, has it not been until recently that they have been used in scientific codes? The answer is that thev lacked the kev fea. . tures needed by the numerics community: runtime efficiency, widespread popularity, and ease


Journal ArticleDOI
TL;DR: The program's design and several lessons learned from its C++ implementation are discussed including the appropriate level for object-orientedness in numeric software, maintainability benefits, interfacing to Fortran libraries such as LAPACK, and performance issues.
Abstract: FastScat is a state-of-the-art program for computing electromagnetic scattering and radiation. Its purpose is to support the study of recent algorithmic advancements, such as the fast multipole method, that promise speed-ups of several orders of magnitude over conventional algorithms. The complexity of these algorithms and their associated data structures led us to adopt an object-oriented methodology for FastScat. We discuss the program's design and several lessons learned from its C++ implementation including the appropriate level for object-orientedness in numeric software, maintainability benefits, interfacing to Fortran libraries such as LAPACK, and performance issues.

Journal ArticleDOI
TL;DR: The use of these classes to develop a radio astronomy application is described and some of the performance issues that must be considered when these classes are used are discussed.
Abstract: This article describes a set of C++ classes developed for the AIPS++ project. These classes handle arrays having an arbitrary number of dimensions. We give an overview of the methods available in these classes and show some simple examples of their use. Finally we describe the use of these classes to develop a radio astronomy application and discuss some of the performance issues that must be considered when these classes are used.