Entering the petaflop era: the architecture and performance of Roadrunner

doi:10.5555/1413370.1413372

Open AccessProceedings ArticleDOI

Entering the petaflop era: the architecture and performance of Roadrunner

Kevin J. Barker, +6 more

- pp 1-11

Chats0

TLDR

A detailed architectural description of Roadrunner and a detailed performance analysis of the system are presented and a case study of optimizing the MPI-based application Sweep3D to exploit Roadrunner's hybrid architecture is also included.

Abstract:

Roadrunner is a 1.38 Pflop/s-peak (double precision) hybrid-architecture supercomputer developed by LANL and IBM. It contains 12,240 IBM PowerXCell 8i processors and 12,240 AMD Opteron cores in 3,060 compute nodes. Roadrunner is the first supercomputer to run Linpack at a sustained speed in excess of 1 Pflop/s. In this paper we present a detailed architectural description of Roadrunner and a detailed performance analysis of the system. A case study of optimizing the MPI-based application Sweep3D to exploit Roadrunner's hybrid architecture is also included. The performance of Sweep3D is compared to that of the code on a previous implementation of the Cell Broadband Engine architecture---the Cell BE---and on multi-core processors. Using validated performance models combined with Roadrunner-specific microbenchmarks we identify performance issues in the early pre-delivery system and infer how well the final Roadrunner configuration will perform once the system software stack has matured.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems

John E. Stone, +2 more

- 01 May 2010 -

Computing in Science and Engineering

TL;DR: The OpenCL standard offers a common API for program execution on systems composed of different types of computational devices such as multicore CPUs, GPUs, or other accelerators as mentioned in this paper, such as accelerators.

...read moreread less

Proceedings ArticleDOI

FTI: high performance fault tolerance interface for hybrid systems

Leonardo Bautista-Gomez, +5 more

TL;DR: This work proposes a low-overhead high-frequency multi-level checkpoint technique in which a highly-reliable topology-aware Reed-Solomon encoding in a three- level checkpoint scheme is integrated in the Fault Tolerance Interface FTI.

...read moreread less

Proceedings ArticleDOI

Efficient resource management for Cloud computing environments

Andrew J. Younge, +4 more

TL;DR: Using power-aware scheduling techniques, variable resource management, live migration, and a minimal virtual machine design, overall system efficiency will be vastly improved in a data center based Cloud with minimal performance overhead.

...read moreread less

Proceedings ArticleDOI

Liszt: a domain specific language for building portable mesh-based PDE solvers

Zachary DeVito, +12 more

TL;DR: Liszt is presented, a domain- specific language for constructing mesh-based PDE solvers, and language statements for interacting with an unstructured mesh, and storing data at its elements enable the compiler to expose the parallelism, locality, and synchronization of Liszt programs.

...read moreread less

Journal ArticleDOI

State-of-the-art in heterogeneous computing

André R. Brodtkorb, +4 more

- 01 Jan 2010 -

Scientific Programming

TL;DR: In this paper, the authors present an overview of the state-of-the-art in heterogeneous computing, focusing on three commonly found architectures: the Cell Broadband Engine Architecture, graphics processing units (GPUs), and field programmable gate arrays (FPGAs).

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Introduction to the cell multiprocessor

J. A. Kahle, +5 more

- 01 Jul 2005 -

Ibm Journal of Research and Development

TL;DR: This paper discusses the history of the project, the program objectives and challenges, the disign concept, the architecture and programming models, and the implementation of the Cell multiprocessor.

...read moreread less

Journal ArticleDOI

The LINPACK Benchmark: past, present and future

Jack Dongarra, +2 more

- 10 Aug 2003 -

Concurrency and Computation: Practice an...

TL;DR: Aside from the LINPACK Benchmark suite, the TOP500 and the HPL codes are presented and information is given on how to interpret the results of the benchmark and how the results fit into the performance evaluation process.

...read moreread less

Journal ArticleDOI

Synergistic Processing in Cell's Multicore Architecture

Michael K. Gschwind, +5 more

- 01 Mar 2006 -

IEEE Micro

TL;DR: The streamlined architecture provides an efficient multithreaded execution environment for both scalar and SIMD threads and represents a reaffirmation of the RISC principles of combining leading edge architecture and compiler optimizations.

...read moreread less

Book

MPI - The Complete Reference: Volume 1, The MPI Core

Marc Snir, +5 more

TL;DR: This volume, the definitive reference manual for the latest version of MPI-1, contains a complete specification of the MPI Standard, annotated with comments that clarify complicated issues, including why certain design choices were made, how users are intended to use the interface, and how they should construct their version ofMPI.

...read moreread less

Journal ArticleDOI

Cell broadband engine architecture and its first implementation: a performance view

Tong Chen, +3 more

- 01 Sep 2007 -

Ibm Journal of Research and Development

TL;DR: It is shown that the Cell/B.E.E., or Cell Broadband Engine, processor can outperform other modern processors by approximately an order of magnitude and by even more in some cases.

...read moreread less