scispace - formally typeset
Open AccessProceedings ArticleDOI

Entering the petaflop era: the architecture and performance of Roadrunner

Reads0
Chats0
TLDR
A detailed architectural description of Roadrunner and a detailed performance analysis of the system are presented and a case study of optimizing the MPI-based application Sweep3D to exploit Roadrunner's hybrid architecture is also included.
Abstract
Roadrunner is a 1.38 Pflop/s-peak (double precision) hybrid-architecture supercomputer developed by LANL and IBM. It contains 12,240 IBM PowerXCell 8i processors and 12,240 AMD Opteron cores in 3,060 compute nodes. Roadrunner is the first supercomputer to run Linpack at a sustained speed in excess of 1 Pflop/s. In this paper we present a detailed architectural description of Roadrunner and a detailed performance analysis of the system. A case study of optimizing the MPI-based application Sweep3D to exploit Roadrunner's hybrid architecture is also included. The performance of Sweep3D is compared to that of the code on a previous implementation of the Cell Broadband Engine architecture---the Cell BE---and on multi-core processors. Using validated performance models combined with Roadrunner-specific microbenchmarks we identify performance issues in the early pre-delivery system and infer how well the final Roadrunner configuration will perform once the system software stack has matured.

read more

Citations
More filters
Journal ArticleDOI

OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems

TL;DR: The OpenCL standard offers a common API for program execution on systems composed of different types of computational devices such as multicore CPUs, GPUs, or other accelerators as mentioned in this paper, such as accelerators.
Proceedings ArticleDOI

FTI: high performance fault tolerance interface for hybrid systems

TL;DR: This work proposes a low-overhead high-frequency multi-level checkpoint technique in which a highly-reliable topology-aware Reed-Solomon encoding in a three- level checkpoint scheme is integrated in the Fault Tolerance Interface FTI.
Proceedings ArticleDOI

Efficient resource management for Cloud computing environments

TL;DR: Using power-aware scheduling techniques, variable resource management, live migration, and a minimal virtual machine design, overall system efficiency will be vastly improved in a data center based Cloud with minimal performance overhead.
Proceedings ArticleDOI

Liszt: a domain specific language for building portable mesh-based PDE solvers

TL;DR: Liszt is presented, a domain- specific language for constructing mesh-based PDE solvers, and language statements for interacting with an unstructured mesh, and storing data at its elements enable the compiler to expose the parallelism, locality, and synchronization of Liszt programs.
Journal ArticleDOI

State-of-the-art in heterogeneous computing

TL;DR: In this paper, the authors present an overview of the state-of-the-art in heterogeneous computing, focusing on three commonly found architectures: the Cell Broadband Engine Architecture, graphics processing units (GPUs), and field programmable gate arrays (FPGAs).
References
More filters
Journal ArticleDOI

Introduction to the cell multiprocessor

TL;DR: This paper discusses the history of the project, the program objectives and challenges, the disign concept, the architecture and programming models, and the implementation of the Cell multiprocessor.
Journal ArticleDOI

The LINPACK Benchmark: past, present and future

TL;DR: Aside from the LINPACK Benchmark suite, the TOP500 and the HPL codes are presented and information is given on how to interpret the results of the benchmark and how the results fit into the performance evaluation process.
Journal ArticleDOI

Synergistic Processing in Cell's Multicore Architecture

TL;DR: The streamlined architecture provides an efficient multithreaded execution environment for both scalar and SIMD threads and represents a reaffirmation of the RISC principles of combining leading edge architecture and compiler optimizations.
Book

MPI - The Complete Reference: Volume 1, The MPI Core

TL;DR: This volume, the definitive reference manual for the latest version of MPI-1, contains a complete specification of the MPI Standard, annotated with comments that clarify complicated issues, including why certain design choices were made, how users are intended to use the interface, and how they should construct their version ofMPI.
Journal ArticleDOI

Cell broadband engine architecture and its first implementation: a performance view

TL;DR: It is shown that the Cell/B.E.E., or Cell Broadband Engine, processor can outperform other modern processors by approximately an order of magnitude and by even more in some cases.
Related Papers (5)