Entering the petaflop era: the architecture and performance of Roadrunner
Kevin J. Barker,Kei Davis,Adolfy Hoisie,Darren J. Kerbyson,Michael Lang,Scott Pakin,José Carlos Sancho +6 more
- pp 1-11
Reads0
Chats0
TLDR
A detailed architectural description of Roadrunner and a detailed performance analysis of the system are presented and a case study of optimizing the MPI-based application Sweep3D to exploit Roadrunner's hybrid architecture is also included.Abstract:
Roadrunner is a 1.38 Pflop/s-peak (double precision) hybrid-architecture supercomputer developed by LANL and IBM. It contains 12,240 IBM PowerXCell 8i processors and 12,240 AMD Opteron cores in 3,060 compute nodes. Roadrunner is the first supercomputer to run Linpack at a sustained speed in excess of 1 Pflop/s. In this paper we present a detailed architectural description of Roadrunner and a detailed performance analysis of the system. A case study of optimizing the MPI-based application Sweep3D to exploit Roadrunner's hybrid architecture is also included. The performance of Sweep3D is compared to that of the code on a previous implementation of the Cell Broadband Engine architecture---the Cell BE---and on multi-core processors. Using validated performance models combined with Roadrunner-specific microbenchmarks we identify performance issues in the early pre-delivery system and infer how well the final Roadrunner configuration will perform once the system software stack has matured.read more
Citations
More filters
Journal ArticleDOI
OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems
TL;DR: The OpenCL standard offers a common API for program execution on systems composed of different types of computational devices such as multicore CPUs, GPUs, or other accelerators as mentioned in this paper, such as accelerators.
Proceedings ArticleDOI
FTI: high performance fault tolerance interface for hybrid systems
Leonardo Bautista-Gomez,Seiji Tsuboi,Dimitri Komatitsch,Franck Cappello,Naoya Maruyama,Satoshi Matsuoka +5 more
TL;DR: This work proposes a low-overhead high-frequency multi-level checkpoint technique in which a highly-reliable topology-aware Reed-Solomon encoding in a three- level checkpoint scheme is integrated in the Fault Tolerance Interface FTI.
Proceedings ArticleDOI
Efficient resource management for Cloud computing environments
TL;DR: Using power-aware scheduling techniques, variable resource management, live migration, and a minimal virtual machine design, overall system efficiency will be vastly improved in a data center based Cloud with minimal performance overhead.
Proceedings ArticleDOI
Liszt: a domain specific language for building portable mesh-based PDE solvers
Zachary DeVito,Niels Joubert,Francisco Palacios,Stephen Oakley,Montserrat Medina,Mike Barrientos,Erich Elsen,Frank Ham,Alex Aiken,Karthik Duraisamy,Eric Darve,Juan J. Alonso,Pat Hanrahan +12 more
TL;DR: Liszt is presented, a domain- specific language for constructing mesh-based PDE solvers, and language statements for interacting with an unstructured mesh, and storing data at its elements enable the compiler to expose the parallelism, locality, and synchronization of Liszt programs.
Journal ArticleDOI
State-of-the-art in heterogeneous computing
TL;DR: In this paper, the authors present an overview of the state-of-the-art in heterogeneous computing, focusing on three commonly found architectures: the Cell Broadband Engine Architecture, graphics processing units (GPUs), and field programmable gate arrays (FPGAs).
References
More filters
Journal ArticleDOI
Introduction to the cell multiprocessor
TL;DR: This paper discusses the history of the project, the program objectives and challenges, the disign concept, the architecture and programming models, and the implementation of the Cell multiprocessor.
Journal ArticleDOI
The LINPACK Benchmark: past, present and future
TL;DR: Aside from the LINPACK Benchmark suite, the TOP500 and the HPL codes are presented and information is given on how to interpret the results of the benchmark and how the results fit into the performance evaluation process.
Journal ArticleDOI
Synergistic Processing in Cell's Multicore Architecture
TL;DR: The streamlined architecture provides an efficient multithreaded execution environment for both scalar and SIMD threads and represents a reaffirmation of the RISC principles of combining leading edge architecture and compiler optimizations.
Book
MPI - The Complete Reference: Volume 1, The MPI Core
TL;DR: This volume, the definitive reference manual for the latest version of MPI-1, contains a complete specification of the MPI Standard, annotated with comments that clarify complicated issues, including why certain design choices were made, how users are intended to use the interface, and how they should construct their version ofMPI.
Journal ArticleDOI
Cell broadband engine architecture and its first implementation: a performance view
TL;DR: It is shown that the Cell/B.E.E., or Cell Broadband Engine, processor can outperform other modern processors by approximately an order of magnitude and by even more in some cases.