scispace - formally typeset
Search or ask a question
Author

Paul O. Frederickson

Bio: Paul O. Frederickson is an academic researcher from Ames Research Center. The author has contributed to research in topics: Multigrid method & Massively parallel. The author has an hindex of 7, co-authored 10 publications receiving 3388 citations.

Papers
More filters
Journal ArticleDOI
01 Sep 1991
TL;DR: A new set of benchmarks has been developed for the performance evaluation of highly parallel supercom puters that mimic the computation and data move ment characteristics of large-scale computational fluid dynamics applications.
Abstract: A new set of benchmarks has been developed for the performance evaluation of highly parallel supercom puters. These consist of five "parallel kernel" bench marks and three "simulated application" benchmarks. Together they mimic the computation and data move ment characteristics of large-scale computational fluid dynamics applications. The principal distinguishing feature of these benchmarks is their "pencil and paper" specification-all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional bench- marking approaches on highly parallel systems are avoided.

2,246 citations

Proceedings ArticleDOI
01 Jan 1990
TL;DR: In this paper, high-order accurate finite-volume schemes for solving the Euler equations of gasdynamics were developed, where the construction of a k-exact reconstruction operator given cell-averaged quantities and the use of high order flux quadrature formulas were used.
Abstract: High order accurate finite-volume schemes for solving the Euler equations of gasdynamics are developed. Central to the development of these methods are the construction of a k-exact reconstruction operator given cell-averaged quantities and the use of high order flux quadrature formulas. General polygonal control volumes (with curved boundary edges) are considered. The formulations presented make no explicit assumption as to complexity or convexity of control volumes. Numerical examples are presented for Ringleb flow to validate the methodology.

586 citations

Journal ArticleDOI
TL;DR: It is demonstrated that high performance efficiencies are attainable for multigrid on massively parallel computers, as indicated by an example of poor efficiency on 65,536 processors, and that parallel machines open the possibility of finding really new approaches to solving standard problems.
Abstract: Multigrid methods have been established as being among the most efficient techniques for solving complex elliptic equations. We sketch the multigrid idea, emphasizing that a multigrid solution is generally obtainable in a time directly proportional to the number of unknown variables on serial computers. Despite this, even the most powerful serial computers are not adequate for solving the very large systems generated, for instance, by discretization of fluid flow in three dimensions. A breakthrough can be achieved here only by highly parallel supercomputers. On the other hand, parallel computers are having a profound impact on computational science. Recently, highly parallel machines have taken the lead as the fastest supercomputers, a trend that is likely to accelerate in the future. We describe some of these new computers, and issues involved in using them. We describe standard parallel multigrid algorithms and discuss the question of how to implement them efficiently on parallel machines. The natural approach is to use grid partitioning. One intrinsic feature of a parallel machine is the need to perform interprocessor communication. It is important to ensure that time spent on such communication is maintained at a small fraction of computation time. We analyze standard parallel multigrid algorithms in two and three dimensions from this point of view, indicating that high performance efficiencies are attainable under suitable conditions on moderately parallel machines. We also demonstrate that such performance is not attainable for multigrid on massively parallel computers, as indicated by an example of poor efficiency on 65,536 processors. The fundamental difficulty is the inability to keep 65,536 processors busy when operating on very coarse grids. This example indicates that the straightforward parallelization of multigrid (and other) algorithms may not always be optimal. However, parallel machines open the possibility of finding really new approaches to solving standard problems. In particular, we present an intrinsically parallel variant of standard multigrid. This “PSMG” (parallel superconvergent multigrid) method allows all processors to be used at all times. even when processing on the coarsest grid levels. The sequential version of this method is not a sensible algorithm

100 citations

Journal ArticleDOI
TL;DR: The authors show that PSMG requires less than one-half as many arithmetic and one-fifth as many communication operations, per digit of error reduction, as a parallel standard multigrid algorithm (RBTRB) presented recently by Decker.
Abstract: In a previous paper [“Parallel Superconvergent Multigrid,” in Multigrid Methods, Marcel Dekker, New York, 1988] the authors introduced an efficient multiscale PDE solver for massively parallel architectures, which was called Parallel Superconvergent Multigrid, or PSMG. In this paper, sharp estimates are derived for the normalized work involved in PSMG solution—the number of parallel arithmetic and communication operations required per digit of error reduction. PSMG is shown to provide fourth-order accurate solutions of Poisson-type equations at convergence rates of .00165 per single relaxation iteration, and with parallel operation counts per grid level of 5.75 communications and 8.62 computations for each digit of error reduction. The authors show that PSMG requires less than one-half as many arithmetic and one-fifth as many communication operations, per digit of error reduction, as a parallel standard multigrid algorithm (RBTRB) presented recently by Decker [SIAM J. Sci. Statist. Comput., 12 (1991), pp. 208–220].

16 citations


Cited by
More filters
Book
15 Aug 1998
TL;DR: This book explains the forces behind this convergence of shared-memory, message-passing, data parallel, and data-driven computing architectures and provides comprehensive discussions of parallel programming for high performance and of workload-driven evaluation, based on understanding hardware-software interactions.
Abstract: The most exciting development in parallel computer architecture is the convergence of traditionally disparate approaches on a common machine structure. This book explains the forces behind this convergence of shared-memory, message-passing, data parallel, and data-driven computing architectures. It then examines the design issues that are critical to all parallel architecture across the full range of modern design, covering data access, communication performance, coordination of cooperative work, and correct implementation of useful semantics. It not only describes the hardware and software techniques for addressing each of these issues but also explores how these techniques interact in the same system. Examining architecture from an application-driven perspective, it provides comprehensive discussions of parallel programming for high performance and of workload-driven evaluation, based on understanding hardware-software interactions. * synthesizes a decade of research and development for practicing engineers, graduate students, and researchers in parallel computer architecture, system software, and applications development * presents in-depth application case studies from computer graphics, computational science and engineering, and data mining to demonstrate sound quantitative evaluation of design trade-offs * describes the process of programming for performance, including both the architecture-independent and architecture-dependent aspects, with examples and case-studies * illustrates bus-based and network-based parallel systems with case studies of more than a dozen important commercial designs Table of Contents 1 Introduction 2 Parallel Programs 3 Programming for Performance 4 Workload-Driven Evaluation 5 Shared Memory Multiprocessors 6 Snoop-based Multiprocessor Design 7 Scalable Multiprocessors 8 Directory-based Cache Coherence 9 Hardware-Software Tradeoffs 10 Interconnection Network Design 11 Latency Tolerance 12 Future Directions APPENDIX A Parallel Benchmark Suites

1,571 citations

Proceedings ArticleDOI
20 Jun 2009
TL;DR: This work proposes, crafted from a fundamental understanding of PCM technology parameters, area-neutral architectural enhancements that address these limitations and make PCM competitive with DRAM.
Abstract: Memory scaling is in jeopardy as charge storage and sensing mechanisms become less reliable for prevalent memory technologies, such as DRAM. In contrast, phase change memory (PCM) storage relies on scalable current and thermal mechanisms. To exploit PCM's scalability as a DRAM alternative, PCM must be architected to address relatively long latencies, high energy writes, and finite endurance.We propose, crafted from a fundamental understanding of PCM technology parameters, area-neutral architectural enhancements that address these limitations and make PCM competitive with DRAM. A baseline PCM system is 1.6x slower and requires 2.2x more energy than a DRAM system. Buffer reorganizations reduce this delay and energy gap to 1.2x and 1.0x, using narrow rows to mitigate write energy and multiple rows to improve locality and write coalescing. Partial writes enhance memory endurance, providing 5.6 years of lifetime. Process scaling will further reduce PCM energy costs and improve endurance.

1,568 citations

Book
11 Oct 2000
TL;DR: Aimed at the working researcher or scientific C/C++ or Fortran programmer, this text introduces the competent research programmer to a new vocabulary of idioms and techniques for parallelizing software using OpenMP.
Abstract: Aimed at the working researcher or scientific C/C++ or Fortran programmer, this text introduces the competent research programmer to a new vocabulary of idioms and techniques for parallelizing software using OpenMP.

1,253 citations

Book
01 Jan 2015
TL;DR: This updated edition includes new worked programming examples, expanded coverage and recent literature regarding incompressible flows, the Discontinuous Galerkin Method, the Lattice Boltzmann Method, higher-order spatial schemes, implicit Runge-Kutta methods and code parallelization.
Abstract: Computational Fluid Dynamics: Principles and Applications, Third Edition presents students, engineers, and scientists with all they need to gain a solid understanding of the numerical methods and principles underlying modern computation techniques in fluid dynamics By providing complete coverage of the essential knowledge required in order to write codes or understand commercial codes, the book gives the reader an overview of fundamentals and solution strategies in the early chapters before moving on to cover the details of different solution techniques This updated edition includes new worked programming examples, expanded coverage and recent literature regarding incompressible flows, the Discontinuous Galerkin Method, the Lattice Boltzmann Method, higher-order spatial schemes, implicit Runge-Kutta methods and parallelization An accompanying companion website contains the sources of 1-D and 2-D Euler and Navier-Stokes flow solvers (structured and unstructured) and grid generators, along with tools for Von Neumann stability analysis of 1-D model equations and examples of various parallelization techniques Will provide you with the knowledge required to develop and understand modern flow simulation codes Features new worked programming examples and expanded coverage of incompressible flows, implicit Runge-Kutta methods and code parallelization, among other topics Includes accompanying companion website that contains the sources of 1-D and 2-D flow solvers as well as grid generators and examples of parallelization techniques

1,228 citations

Book ChapterDOI
08 Apr 2002
TL;DR: The StreamIt language provides novel high-level representations to improve programmer productivity and program robustness within the streaming domain and the StreamIt compiler aims to improve the performance of streaming applications via stream-specific analyses and optimizations.
Abstract: We characterize high-performance streaming applications as a new and distinct domain of programs that is becoming increasingly important. The StreamIt language provides novel high-level representations to improve programmer productivity and program robustness within the streaming domain. At the same time, the StreamIt compiler aims to improve the performance of streaming applications via stream-specific analyses and optimizations. In this paper, we motivate, describe and justify the language features of StreamIt, which include: a structured model of streams, a messaging system for control, a re-initialization mechanism, and a natural textual syntax.

1,224 citations