Showing papers in &quot;Scientific Programming in 1995&quot;

Unfavorable Strides in Cache Memory Systems (RNR Technical Report RNR-92-015)

TL;DR: A study comparing three EEG representations, the unprocessed signals, a reduced-dimensional representation using the Karhunen - Loeve transform, and a frequency-based representation finds the best classification accuracy on untrained samples is 73% using the frequency- based representation.

...read moreread less

Abstract: EEG analysis has played a key role in the modeling of the brain's cortical dynamics, but relatively little effort has been devoted to developing EEG as a limited means of communication. If several mental states can be reliably distinguished by recognizing patterns in EEG, then a paralyzed person could communicate to a device such as a wheelchair by composing sequences of these mental states. EEG pattern recognition is a difficult problem and hinges on the success of finding representations of the EEG signals in which the patterns can be distinguished. In this article, we report on a study comparing three EEG representations, the unprocessed signals, a reduced-dimensional representation using the Karhunen - Loeve transform, and a frequency-based representation. Classification is performed with a two-layer neural network implemented on a CNAPS server (128 processor, SIMD architecture) by Adaptive Solutions, Inc. Execution time comparisons show over a hundred-fold speed up over a Sun Sparc 10. The best classification accuracy on untrained samples is 73% using the frequency-based representation.

...read moreread less

91 citations

Journal Article•DOI•

[...]

David H. Bailey¹•Institutions (1)

Ames Research Center¹

01 Apr 1995-Scientific Programming

TL;DR: This article analyzes the behavior of the cache when data are accessed at a constant stride, and a simple formula is presented that accurately gives the cache efficiency for various cache parameters and data strides.

...read moreread less

Abstract: An important issue in obtaining high performance on a scientific application running on a cache-based computer system is the behavior of the cache when data are accessed at a constant stride. Others who have discussed this issue have noted an odd phenomenon in such situations: A few particular innocent-looking strides result in sharply reduced cache efficiency. In this article, this problem is analyzed, and a simple formula is presented that accurately gives the cache efficiency for various cache parameters and data strides.

...read moreread less

34 citations

Journal Article•DOI•

An Introduction to High Performance Fortran

[...]

John H. Merlin¹, Anthony J. G. Hey¹•Institutions (1)

University of Southampton¹

01 Apr 1995-Scientific Programming

TL;DR: This article provides a tutorial introduction to the main features of HPF, an informal standard for extensions to Fortran 90 to assist its implementation on parallel architectures, particularly for data-parallel computation.

...read moreread less

Abstract: High Performance Fortran (HPF) is an informal standard for extensions to Fortran 90 to assist its implementation on parallel architectures, particularly for data-parallel computation. Among other things, it includes directives for specifying data distribution across multiple memories, and concurrent execution features. This article provides a tutorial introduction to the main features of HPF.

...read moreread less

27 citations

Journal Article•DOI•

A tensor product formulation of Strassen's matrix multiplication algorithm with memory reduction

[...]

Bharat Kumar, C.-H. Huang, P. Sadayappan, R. W. Johnson

ObjectMath—an object-oriented language and environment for symbolic and numerical processing in scientific computing

TL;DR: A modified formulation of Strassen's matrix multiplication algorithm is presented in which the working storage requirement is reduced to O(4$^n$) and the modified formulation exhibits sufficient parallelism for efficient implementation on a shared memory multiprocessor.

...read moreread less

Abstract: In this article, we present a program generation strategy of Strassen's matrix multiplication algorithm using a programming methodology based on tensor product formulas. In this methodology, block recursive programs such as the fast Fourier Transforms and Strassen's matrix multiplication algorithm are expressed as algebraic formulas involving tensor products and other matrix operations. Such formulas can be systematically translated to high-performance parallel/vector codes for various architectures. In this article, we present a nonrecursive implementation of Strassen's algorithm for shared memory vector processors such as the Cray Y-MP. A previous implementation of Strassen's algorithm synthesized from tensor product formulas required working storage of size O(7$^n$) for multiplying 2$^n$ × 2$^n$ matrices. We present a modified formulation in which the working storage requirement is reduced to O(4$^n$). The modified formulation exhibits sufficient parallelism for efficient implementation on a shared memory multiprocessor. Performance results on a Cray Y-MP8/64 are presented.

...read moreread less

21 citations

Journal Article•DOI•

[...]

Lars Viklund, Peter Fritzson

Language Constructs for Data Partitioning and Distribution

TL;DR: ObjectMath can increase productivity and quality, thus enabling users to solve problems that are too complex to handle with traditional tools, especially in application areas such as machine elements analysis, where complex nonlinear problems are the norm.

...read moreread less

Abstract: ObjectMath is a language for scientific computing that integrates object-oriented constructs with features for symbolic and numerical computation. Using ObjectMath, complex mathematical models may be implemented in a natural way. The ObjectMath programming environment provides tools for generating efficient numerical code from such models. Symbolic computation is used to rewrite and simplify equations before code is generated. One novelty of the ObjectMath approach is that it provides a comman language and an integrated environment for this kind of mixed symbolic/numerical computation. The motivation for this work is the current low-level state of the art in programming for scientific computing. Much numerical software is still being developed the traditional way in Fortran. This is especially true in application areas such as machine elements analysis, where complex nonlinear problems are the norm. We believe that tools like ObjectMath can increase productivity and quality, thus enabling users to solve problems that are too complex to handle with traditional tools.

...read moreread less

19 citations

Journal Article•DOI•

[...]

P. Crooks¹, R. H. Perrott¹•Institutions (1)

Queen's University Belfast¹

01 Apr 1995-Scientific Programming

TL;DR: This article presents a survey of language features for distributed memory multiprocessor systems (DMMs), in particular, systems that provide features for data partitioning and distribution.

...read moreread less

Abstract: This article presents a survey of language features for distributed memory multiprocessor systems (DMMs), in particular, systems that provide features for data partitioning and distribution. In these systems the programmer is freed from consideration of the low-level details of the target architecture in that there is no need to program explicit processes or specify interprocess communication. Programs are written according to the shared memory programming paradigm but the programmer is required to specify, by means of directives, additional syntax or interactive methods, how the data of the program are decomposed and distributed.

...read moreread less

15 citations

Journal Article•DOI•

Data-parallel numerical weather forecasting

[...]

Lex Wolters¹, Gerard Cats, Nils Gustafsson•Institutions (1)

Leiden University¹

The Fortran-P Translator: Towards Automatic Translation of Fortran 77 Programs for Massively Parallel Processors

TL;DR: The modifications needed to achieve a data-parallel version of this model without explicit message passing are outlined and the achieved performance of different numerical solution methods within this model is presented and compared.

...read moreread less

Abstract: In this article we describe the implementation of a numerical weather forecast model on a massively parallel computer system. This model is a production code used for routine weather forecasting at the meteorological institutes of several European countries. The modifications needed to achieve a data-parallel version of this model without explicit message passing are outlined. The achieved performance of different numerical solution methods within this model is presented and compared.

...read moreread less

11 citations

Journal Article•DOI•

[...]

Matthew T. O'Keefe¹, Terence Parr¹, B. Kevin Edgar¹, Steve Anderson¹, Paul R. Woodward¹, Henry G. Dietz² - Show less +2 more•Institutions (2)

University of Minnesota¹, Purdue University²

Low Latency Messages on Distributed Memory Multiprocessors

TL;DR: A self-similar coding style will accomplish what a vectorizable style has accomplished for vector machines by allowing the construction of robust, user-friendly, automatic translation systems that increase programmer productivity and generate fast, efficient code for MPPs.

...read moreread less

Abstract: Massively parallel processors (MPPs) hold the promise of extremely high performance that, if realized, could be used to study problems of unprecedented size and complexity. One of the primary stumbling blocks to this promise has been the lack of tools to translate application codes to MPP form. In this article we show how applications codes written in a subset of Fortran 77, called Fortran-P, can be translated to achieve good performance on several massively parallel machines. This subset can express codes that are self-similar, where the algorithm applied to the global data domain is also applied to each subdomain. We have found many codes that match the Fortran-P programming style and have converted them using our tools. We believe a self-similar coding style will accomplish what a vectorizable style has accomplished for vector machines by allowing the construction of robust, user-friendly, automatic translation systems that increase programmer productivity and generate fast, efficient code for MPPs.

...read moreread less

8 citations

Journal Article•DOI•

[...]

Matt Rosing¹, Joel H. Saltz²•Institutions (2)

Pacific Northwest National Laboratory¹, University of Maryland, College Park²

All-to-all communication on the connection machine CM-200

TL;DR: This article describes many of the issues in developing an efficient interface for communication on distributed memory machines and how to change the interface to match the hardware more closely.

...read moreread less

Abstract: This article describes many of the issues in developing an efficient interface for communication on distributed memory machines Although the hardware component of message latency is less than 1 ws on many distributed memory machines, the software latency associated with sending and receiving typed messages is on the order of 50 μs The reason for this imbalance is that the software interface does not match the hardware By changing the interface to match the hardware more closely, applications with fine grained communication can be put on these machines This article describes several tests performed and many of the issues involved in supporting low latency messages on distributed memory machines

...read moreread less

7 citations

Journal Article•DOI•

[...]

Kaph K. Mathur, S. Lennart Johnsson

The graphical benchmark information service

TL;DR: Detailed algorithms for all-to-all broadcast and reduction are given for arrays mapped by binary or binary-reflected Gray code encoding to the processing nodes of binary cube networks, thereby reducing the demand for the communications bandwidth.

...read moreread less

Abstract: Detailed algorithms for all-to-all broadcast and reduction are given for arrays mapped by binary or binary-reflected Gray code encoding to the processing nodes of binary cube networks. Algorithms are also given for the local computation of the array indices for the communicated data, thereby reducing the demand for the communications bandwidth. For the Connection Machine system CM-200, Hamiltonian cycle-based all-to-all communication algorithms yield a performance that is a factor of 2 to 10 higher than the performance offered by algorithms based on trees, butterfly networks, or the Connection Machine router. The peak data rate achieved for all-to-all broadcast on a 2,048-node Connection Machine system CM-200 is 5.4 Gbyte/s. The index order of the data in local memory depends on implementation details of the algorithms, but it is well defined. If a linear ordering is desired, then including the time for local data reordering reduces the effective peak data rate to 2.5 Gbyte/s.

...read moreread less

5 citations

Journal Article•DOI•

[...]

Mark Papiani, Anthony J. G. Hey, Roger W. Hockney

Benchmark Tests on the New IBM RISC System/6000 590 Workstation

TL;DR: The University of Southampton has developed the Graphical Benchmark Information Service (GBIS) on the World Wide Web to display interactively graphs of user-selected benchmark results from the GENESIS and PARKBENCH benchmark suites.

...read moreread less

Abstract: Unlike single-processor benchmarks, multiprocessor benchmarks can yield tens of numbers for each benchmark on each computer, as factors such as the number of processors and problem size are varied. A graphical display of performance surfaces therefore provides a satisfactory way of comparing results. The University of Southampton has developed the Graphical Benchmark Information Service (GBIS) on the World Wide Web (WWW) to display interactively graphs of user-selected benchmark results from the GENESIS and PARKBENCH benchmark suites.

...read moreread less

Journal Article•DOI•

[...]

Harvey J. Wasserman¹•Institutions (1)

Los Alamos National Laboratory¹

Parallelization of a three-dimensional shallow-water estuary model on the KSR-1

TL;DR: A set of well-characterized Fortran benchmarks spanning a range of computational characteristics was used for the study and the data from the 590 system are compared with those from a single-processor CRAY C90 system as well as with other microprocessor-based systems.

...read moreread less

Abstract: The results of benchmark tests on the superscalar IBM RISC System/6000 Model 590 are presented. A set of well-characterized Fortran benchmarks spanning a range of computational characteristics was used for the study. The data from the 590 system are compared with those from a single-processor CRAY C90 system as well as with other microprocessor-based systems, such as the Digital Equipment Corporation AXP 3000/500X and the Hewlett-Packard HP/735.

...read moreread less

Journal Article•DOI•

[...]

C. Falcó Korn, J. M. Bull, G. D. Riley, Peter Stansby

A static approach for compiling communications in parallel scientific programs

TL;DR: This work describes the implementation of the numerical scheme, and presents experimental results which demonstrate that a problem requiring 600,000 mesh points and 6,000 time steps can be solved in under 8 hours using 32 processors.

...read moreread less

Abstract: Flows in estuarial and coastal regions may be described by the shallow-water equations. The processes of pollution transport, sediment transport, and plume dispersion are driven by the underlying hydrodynamics. Accurate resolution of these processes requires a three-dimensional formulation with turbulence modeling, which is very demanding computationally. A numerical scheme has been developed which is both stable and accurate - we show that this scheme is also well suited to parallel processing, making the solution of massive complex problems a practical computing possibility. We describe the implementation of the numerical scheme on a Kendall Square Research KSR-1 multiprocessor, and present experimental results which demonstrate that a problem requiring 600,000 mesh points and 6,000 time steps can be solved in under 8 hours using 32 processors.

...read moreread less

Journal Article•DOI•

[...]

Damien Gautier de Lahaut, Cécile Germain

Simulation of compressible flow on a massively parallel architecture

TL;DR: Experimental studies on benchmark programs concerning scientific computing show that most communication patterns in application programs are predictable at compile-time, and an execution model is proposed that utilizes this knowledge such that predictable communications are directly compiled and dynamic communications are emulated by scheduling an appropriate set of compiled communications.

...read moreread less

Abstract: On most massively parallel architectures, the actual communication performance remains much less than the hardware capabilities. The main reason for this difference lies in the dynamic routing, because the software mechanisms for managing the routing represent a large overhead. This article presents experimental studies on benchmark programs concerning scientific computing; the results show that most communication patterns in application programs are predictable at compile-time. An execution model is proposed that utilizes this knowledge such that predictable communications are directly compiled and dynamic communications are emulated by scheduling an appropriate set of compiled communications. The performance of the model is evaluated, showing that performance is better in static cases and gracefully degrades with the growing complexity and dynamic aspect of the communication patterns.

...read moreread less

Journal Article•DOI•

[...]

Daniel N. Williams, Luc Bauwens

The Mutual Information as a Scoring Function for Speech Recognition

TL;DR: The porting and optimization of an explicit, time-dependent, computational fluid dynamics code on an 8,192-node MasPar MP-1 is described, and the performance of the code is slightly better than on a CRAY Y-MP for a functionally equivalent, optimized two-dimensional code.

...read moreread less

Abstract: This article describes the porting and optimization of an explicit, time-dependent, computational fluid dynamics code on an 8,192-node MasPar MP-1. The MasPar is a very fine-grained, single instruction, multiple data parallel computer. The code uses the flux-corrected transport algorithm. We describe the techniques used to port and optimize the code, and the behavior of a test problem. The test problem used to benchmark the flux-corrected transport code on the MasPar was a two-dimensional exploding shock with periodic boundary conditions. We discuss the performance that our code achieved on the MasPar, and compare its performance on the MasPar with its performance on other architectures. The comparisons show that the performance of the code on the MasPar is slightly better than on a CRAY Y-MP for a functionally equivalent, optimized two-dimensional code.

...read moreread less

Journal Article•

[...]

Kazuhiko Ozeki

15 Dec 1995-Scientific Programming

Journal Article•DOI•

A parallel processing approach to transition prediction for laminar flow control system design

[...]

R. W. Ford¹, D. I. A. Poll•Institutions (1)

University of Manchester¹

隆之櫻木, Tammo Houtgast, Joost M. Festen, 三樹夫東山

TL;DR: How parallel computing techniques on a KSR-1 produce performance improvements in transport aircraft if the process by which the wing boundary layer becomes turbulent can be controlled and extensive areas of laminar flow maintained is detailed.

...read moreread less

Abstract: The performance of transport aircraft can be considerably improved if the process by which the wing boundary layer becomes turbulent can be controlled and extensive areas of laminar flow maintained. In order to design laminar flow control systems, it is necessary to be able to predict the movement of the transition location in response to changes in control variables, e.g., surface suction. At present, the technique which is available to industry requires excessively long computational time - so long that it is not suitable for use in the "design process." Therefore, there is a clear need to produce a system which delivers results in near realtime, i.e., in seconds rather than hours. This article details how parallel computing techniques on a KSR-1 produce these performance improvements.

...read moreread less

Journal Article•

自然音(日常音)による聴力評価実験

[...]

23 Jun 1995-Scientific Programming

Journal Article•DOI•

Parallel algorithms for molecular dynamics simulation of irradiation effects in crystals

[...]

Eli Glikman¹, Ludmila Ioffe¹, Itzhak Kelson², Shlomit S. Pinter²•Institutions (2)

Tel Aviv University¹, Technion – Israel Institute of Technology²

Applications Analysis: Guest Editor's Introduction

TL;DR: New parallel algorithms for solving the problem of many body interactions in molecular dynamics (MD) using two parallelization methods are presented and demonstrated that they exploit parallelism effectively and can be used to simulate large crystals.

...read moreread less

Abstract: We present new parallel algorithms for solving the problem of many body interactions in molecular dynamics (MD). Such algorithms are essential in the simulation of irradiation effects in crystals, where the high energy of the impinging particles dictates computing with large numbers of atoms and for many time cycles. We realized the algorithms using two parallelization methods and compared their performance. Experimental results obtained on a Meiko machine demonstrate that the new algorithms exploit parallelism effectively and can be used to simulate large crystals.

...read moreread less

Journal Article•

病的音声の分析・変換・合成システムとその応用

[...]

康男遠藤, 英樹粕谷

18 May 1995-Scientific Programming

Journal Article•DOI•

[...]

David F. Snelling

標準パターンの任意区間によるスポッティングのためのReference Interval-free連続DP(RIFCDP)

TL;DR: The primary motivHtion behind thi,; special issue stems from a dt>sire to set> how variou,;; ,;;cit>nti,,;ts approach tlw task of,;cientifi (' pro{!ramming'), and the following levels of abstraction were suggested.

...read moreread less

Abstract: The primary motivHtion behind thi,; special issue stems from a dt>sire to set> how variou,;; ,;;cit>nti,;ts approach tlw task of ,;cientifi(' pro{!ramming. In generaL each scientist must formulate a problem and derivP a solution. The steps in thi,; process are not fixed or prescribed. howevPr. they arP nonetheless somewhat universal. In the Call for Papers for thi,; issue. I asked each author to describe the entire process from problem formulation to thP realization of a solution. Each step in this process can be characterized by a statement. describing the problem. in a notation or language ,;uitable to the current leucl cf abstraction. By way of {!Uidance, the following levels of abstraction were suggested in the Call for Paper,.;. ThesP wen· not enforced. but all the articles rough!~· follow this outline.

...read moreread less

Journal Article•

[...]

慶明伊藤, 次郎木山, 浩小島, 隆一岡

23 Jun 1995-Scientific Programming

Journal Article•DOI•

Parallel performance of a combustion chemistry simulation

[...]

Gregg Skinner, Rudolf Eigenmann