scispace - formally typeset
Search or ask a question

Showing papers in "Scientific Programming in 1992"


Journal ArticleDOI
TL;DR: Experimental results show that ADifOR can handle real-life codes and that ADIFOR-generated codes are competitive with divided-difference approximations of derivatives, and studies suggest that the source transformation approach to automatic differentiation may improve the time to compute derivatives by orders of magnitude.
Abstract: The numerical methods employed in the solution of many scientific computing problems require the computation of derivatives of a function f $R^N$→$R^m$ Both the accuracy and the computational requirements of the derivative computation are usually of critical importance for the robustness and speed of the numerical solution Automatic Differentiation of FORtran (ADIFOR) is a source transformation tool that accepts Fortran 77 code for the computation of a function and writes portable Fortran 77 code for the computation of the derivatives In contrast to previous approaches, ADIFOR views automatic differentiation as a source transformation problem ADIFOR employs the data analysis capabilities of the ParaScope Parallel Programming Environment, which enable us to handle arbitrary Fortran 77 codes and to exploit the computational context in the computation of derivatives Experimental results show that ADIFOR can handle real-life codes and that ADIFOR-generated codes are competitive with divided-difference approximations of derivatives In addition, studies suggest that the source transformation approach to automatic differentiation may improve the time to compute derivatives by orders of magnitude

458 citations


Journal ArticleDOI
TL;DR: This paper presents the language features of Vienna Fortran for FORTRAN 77, together with examples illustrating the use of these features and discusses the advantages of a shared memory programming paradigm while explicitly controlling the data distribution.
Abstract: Exploiting the full performance potential of distributed memory machines requires a careful distribution of data across the processors. Vienna Fortran is a language extension of Fortran which provides the user with a wide range of facilities for such mapping of data structures. In contrast to current programming practice, programs in Vienna Fortran are written using global data references. Thus, the user has the advantages of a shared memory programming paradigm while explicitly controlling the data distribution. In this paper, we present the language features of Vienna Fortran for FORTRAN 77, together with examples illustrating the use of these features.

260 citations


Journal ArticleDOI
TL;DR: PCN as mentioned in this paper is a programming system designed to improve the productivity of scientists and engineers using parallel supercomputers by providing a simple notation for the concise specification of concurrent algorithms, the ability to incorporate existing Fortran and C code into parallel applications, facilities for reusing parallel program components, and integrated debugging and performance analysis tools.
Abstract: We describe the PCN programming system, focusing on those features designed to improve the productivity of scientists and engineers using parallel supercomputers. These features include a simple notation for the concise specification of concurrent algorithms, the ability to incorporate existing Fortran and C code into parallel applications, facilities for reusing parallel program components, a portable toolkit that allows applications to be developed on a workstation or small parallel computer and run unchanged on supercomputers, and integrated debugging and performance analysis tools. We survey representative scientific applications and identify problem classes for which PCN has proved particularly useful.

95 citations


Journal ArticleDOI
TL;DR: The problem of potentially misleading performance reporting is discussed in detail and some proposed guidelines for reporting performance are proposed, the adoption of which would raise the level of professionalism and reduce thelevel of confusion in the field of supercomputing.
Abstract: In a previous humorous note, I outlined 12 ways in which performance figures for scientific supercomputers can be distorted. In this paper, the problem of potentially misleading performance reporting is discussed in detail. Included are some examples that have appeared in recent published scientific papers. This paper also includes some proposed guidelines for reporting performance, the adoption of which would raise the level of professionalism and reduce the level of confusion in the field of supercomputing.

22 citations


Journal ArticleDOI
Alan H. Karp1
TL;DR: This note shows how to cut the time for this part of the calculation of the accelerations of the particles by a factor of 3 or more using standard Fortran.
Abstract: The most time consuming part of an N-body simulation is computing the components of the accelerations of the particles. On most machines the slowest part of computing the acceleration is in evaluating $r^{-3/2}$, which is especially true on machines that do the square root in software. This note shows how to cut the time for this part of the calculation by a factor of 3 or more using standard Fortran.

21 citations


Journal ArticleDOI
TL;DR: A parallel extension of the C programming language designed for multiprocessors that provide a facility for sharing memory between processors that has found the split-join programming model to have an inherent implementation advantage, compared to the fork-join model, when the number of processors in a machine becomes large.
Abstract: We describe a parallel extension of the C programming language designed for multiprocessors that provide a facility for sharing memory between processors. The programming model was initially developed on conventional shared memory machines with small processor counts such as the Sequent Balance and Alliant FX/8, but has more recently been used on a scalable massively parallel machine, the BBN TC2000. The programming model is split-join rather than fork-join. Concurrency is exploited to use a fixed number of processors more efficiently rather than to exploit more processors as in the fork-join model. Team splitting, a mechanism to split the team of processors executing a code into subteams to handle parallel subtasks, is used to provide an efficient mechanism to exploit nested concurrency. We have found the split-join programming model to have an inherent implementation advantage, compared to the fork-join model, when the number of processors in a machine becomes large.

16 citations


Journal ArticleDOI
TL;DR: A development methodology for DMM algorithms that is based on different levels of abstraction of the problem, the target architecture, and the CONLAB language itself is presented and illustrated with two examples.
Abstract: CONLAB (CONcurrent LABoratory) is an environment for developing algorithms for parallel computer architectures and for simulating different parallel architectures. A user can experimentally verify and obtain a picture of the real performance of a parallel algorithm executing on a simulated target architecture. CONLAB gives a high-level support for expressing computations and communications in a distributed memory multicomputer (DMM) environment. A development methodology for DMM algorithms that is based on different levels of abstraction of the problem, the target architecture, and the CONLAB language itself is presented and illustrated with two examples. Simulotion results for and real experiments on the Intel iPSC/2 hypercube are presented. Because CONLAB is developed to run on uniprocessor UNIX workstations, it is an educational tool that offers interactive (simulated) parallel computing to a wide audience.

10 citations


Journal ArticleDOI
TL;DR: A molecular dynamics algorithm for performing large-scale simulations using the Parallel C Preprocessor (PCP) programming paradigm on the BBN TC2000, a massively parallel computer, is discussed.
Abstract: A molecular dynamics algorithm for performing large-scale simulations using the Parallel C Preprocessor (PCP) programming paradigm on the BBN TC2000, a massively parallel computer, is discussed. The algorithm uses a linked-cell data structure to obtain the near neighbors of each atom as time evoles. Each processor is assigned to a geometric domain containing many subcells and the storage for that domain is private to the processor. Within this scheme, the interdomain (i.e., interprocessor) communication is minimized.

9 citations


Journal ArticleDOI
TL;DR: An object-oriented database for materials science data is described that brings together data from heterogeneous non-object-oriented sources and formats, and presents the user with a single, uniform object- oriented schema that transparently integrates these diverse databases.
Abstract: As a part of the scientific database research underway at the Oregon Graduate Institute, we are collaborating with materials scientists in the research and development of an extensible modeling and computation environment for materials science. Materials scientists are prolific users of computers for scientific research. Modeling techniques and algorithms are well known and refined, and computerized databases of chemical and physical property data abound. However, applications are typically developed in isolation, using information models specifically tailored for the needs of each application. Furthermore, available computerized databases in the form of CDs and on-line information services are still accessed manually by the scientist in an off-line fashion. Thus researchers are repeatedly constructing and populating new custom databases for each application. The goal of our research is to bridge this gulf between applications and sources of data. We believe that object-oriented technology in general and data-bases in particular, provide powerful tools for transparently bridging the gap between programs and data. An object-oriented database that not only manages data generated by user applications, but also provides access to relevant external data sources can be used to bridge this gap. An object-oriented database for materials science data is described that brings together data from heterogeneous non-object-oriented sources and formats, and presents the user with a single, uniform object-oriented schema that transparently integrates these diverse databases. A unique multilevel architecture is presented that provides a mechanism for efficiently accessing both heterogeneous external data sources and new data stored within the database.

5 citations


Journal ArticleDOI
Tom MacDonald1
TL;DR: A comparison of Standard C and Fortran-77 shows several key deficiencies in C that reduce its ability to adequately solve some numerical problems.
Abstract: The predominant programming language for numeric and scientific applications is Fortran-77 and supercomputers are primarily used to run large-scale numeric and scientific applications. Standard C* is not widely used for numerical and scientific programming, yet Standard C provides many desirable linguistic features not present in Fortran-77. Furthermore, the existence of a standard library and preprocessor eliminates the worst portability problems. A comparison of Standard C and Fortran-77 shows several key deficiencies in C that reduce its ability to adequately solve some numerical problems. Some of these problems have already been addressed by the C standard but others remain. Standard C with a few extensions and modifications could be suitable for all numerical applications and could become more popular in supercomputing environments.

2 citations


Journal ArticleDOI
TL;DR: A C++ class, called Tripod, was created as a tool to assist with the development of rule-base decision support systems and was tested as part of a proto-type decision support system for winter highway maintenance in the Intermountain West.
Abstract: A C++ class, called Tripod, was created as a tool to assist with the development of rule-base decision support systems. The Tripod class contains data structures for the rule-base and member functions for operating on the data. The rule-base is defined by three ASCII files. These files are translated by a preprocessor into a single file that is located when a rule-base object is instantiated. The Tripod class was tested as part of a proto-type decision support system (DSS) for winter highway maintenance in the Intermountain West. The DSS is composed of two principal modules: the main program, called the wrapper, and a Tripod rule-base object. The wrapper is a procedural module that interfaces with remote sensors and an external meterological database. The rule-base contains the logic for advising an inexperienced user and for assisting with the decision making process.


Journal ArticleDOI
TL;DR: C-Linda's performance in solving a particular scientific computing problem, the shallow water equations, is discussed, and comparisons with alternatives available on various shared and distributed memory parallel machines are made.
Abstract: Linda is a coordination language inverted by David Gelernter at Yale University, which when combined with a computation language (like C) yields a high-level parallel programming language for MIMD machines. Linda is based on a virtual shared associative memory containing objects called tuples. Skeptics have long claimed that Linda programs could not be efficient on distributed memory architectures. In this paper, we address this claim by discussing C-Linda's performance in solving a particular scientific computing problem, the shallow water equations, and make comparisons with alternatives available on various shared and distributed memory parallel machines.

Journal ArticleDOI
TL;DR: It is shown that an efficient implementation relies heavily on the user's ability to explicitly manage the memory system, and the techniques used to exploit the multiprocessor (the BBN TC2000) on a network simulation program.
Abstract: We present detailed experimental work involving a commercially available large scale shared memory multiple instruction stream-multiple data stream (MIMD) parallel computer having a software controlled cache coherence mechanism. To make effective use of such an architecture, the programmer is responsible for designing the program's structure to match the underlying multiprocessors capabilities. We describe the techniques used to exploit our multiprocessor (the BBN TC2000) on a network simulation program, showing the resulting performance gains and the associated programming costs. We show that an efficient implementation relies heavily on the user's ability to explicitly manage the memory system.

Journal ArticleDOI
Michael Metcalf1
TL;DR: A few weeks before the formal publication of the ISO Fortran 90 Standard, NAG announced the world's first f90 compiler as mentioned in this paper, which was used by the CERN Program Library.
Abstract: A few weeks before the formal publication of the ISO Fortran 90 Standard, NAG announced the world's first f90 compiler. We have evaluated the compiler by using it to assess the impact of Fortran 90 on the CERN Program Library.