scispace - formally typeset
Search or ask a question

Showing papers on "FLOPS published in 1996"


Journal ArticleDOI
01 Oct 1996
TL;DR: The thermo-viscoelastic finite element algorithm with a conjugate gradient (CG) method with a speed-up factor of about 18 over the CRAY Y-MP was obtained for calculating 160000 element stiffness matrices at a rate of 3·29 Gflops.
Abstract: Data parallel implementation of the thermo-viscoelastic finite element algorithm with a conjugate gradient (CG) method on the CM-5 is presented. The performance study of thermo-viscoelastic finite element procedures on massively parallel processing machines is conducted. Parametric studies on the CM-5 as well as the CRAY Y-MP and benchmarks for several problem sizes are shown. The performance of the conjugate gradient method on the CRAY Y-MP and CM-5 is evaluated and is compared with ones of the CRAY sparse matrix and the Feable solvers. Using the nine-node isoparametric shell element and 256-processor CM-5, a speed-up factor of about 18 over the CRAY Y-MP was obtained for calculating 160000 element stiffness matrices at a rate of 3·29 Gflops. As for the CG method, the CM-5 out-performed the CRAY Y-MP by a factor of 7 for solving 320 800 equations.

9 citations


Proceedings ArticleDOI
01 Jan 1996
TL;DR: Single processor performance of the AlphaServer 8400 system is compared with that of the International Business Machines Corp. (IBM) RISC System/6000 POWER-2 microprocessor running at 66 MHz, the Silicon Graphics, Inc. (SGI) MIPS R8000 micro processor running at 75 MHz, and the Cray Research, Inc (Cray) CRAY J90 microprocessor, based on a set of Fortran benchmark codes.
Abstract: The second generation of the Digital Equipment Corp. (DEC) DECchip Alpha AXP microprocessor is referred to as the 21164. From the viewpoint of numerically-intensive computing, the primary difference between it and its predecessor, the 21064, is that the 21164 has twice the multiply/add throughput per clock period (CP), a maximum of two floating point operations (FLOPS) per CP vs. one for 21064. The AlphaServer 8400 is a shared-memory multiprocessor server system that can accommodate up to 12 CPUs and up to 14 GB of memory. In this report we will compare single processor performance of the 8400 system with that of the International Business Machines Corp. (IBM) RISC System/6000 POWER-2 microprocessor running at 66 MHz, the Silicon Graphics, Inc. (SGI) MIPS R8000 microprocessor running at 75 MHz, and the Cray Research, Inc. CRAY J90. The performance comparison is based on a set of Fortran benchmark codes that represent a portion of the Los Alamos National Laboratory supercomputer workload. The advantage of using these codes, is that the codes also span a wide range of computational characteristics, such as vectorizability, problem size, and memory access pattern. The primary disadvantage of using them is that detailed, quantitative analysis of performance behavior of all codes on all machines is difficult. One important addition to the benchmark set appears for the first time in this report. Whereas the older version was written for a vector processor, the newer version is more optimized for microprocessor architectures. Therefore, we have for the first time, an opportunity to measure performance on a single application using implementations that expose the respective strengths of vector and superscalar architecture. All results in this report are from single processors. A subsequent article will explore shared-memory multiprocessing performance of the 8400 system.

3 citations


Proceedings ArticleDOI
17 Nov 1996
TL;DR: A set of highly efficient and highly scalable algorithms for an unstructured computational package, the PSAS data assimilation package, as demonstrated by detailed performance analysis of systematic runs on up to 512-nodes of an Intel Paragon achieves an unprecedented 100-fold reduction in time to solution.
Abstract: We have designed and implemented a set of highly efficient and highly scalable algorithms for an unstructured computational package, the PSAS data assimilation package, as demonstrated by detailed performance analysis of systematic runs on up to 512-nodes of an Intel Paragon. The preconditioned Conjugate Gradient solver achieves a sustained 18 Gflops performance. Consequently, we achieve an unprecedented 100-fold reduction in time to solution on the Intel Paragon over a single head of a Cray C90. This not only exceeds the daily performance requirement of the Data Assimilation Office at NASA's Goddard Space Flight Center, but also makes it possible to explore much larger and challenging data assimilation problems which are unthinkable on a traditional computer platform such as the Cray C90.

2 citations