SharP: Towards Programming Extreme-Scale Systems with Hierarchical Heterogeneous Memory
Manjunath Gorentla Venkata,Ferrol Aderholdt,Zachary W. Parchman +2 more
- pp 145-154
TLDR
This work proposes and develops the programming abstraction called SHARed data-structure centric Programming abstraction (SharP), a simple, usable, and portable abstraction for hierarchical-heterogeneous memory and a unified programming abstraction for Big-Compute and Big-Data applications.Abstract:
The pre-exascale systems are expected to have a significant amount of hierarchical and heterogeneous on-node memory, and this trend of system architecture in extreme-scale systems is expected to continue into the exascale era. Along with hierarchical-heterogeneous memory, the system typically has a high-performing network and a compute accelerator. This system architecture is not only effective for running traditional High Performance Computing (HPC) applications (Big-Compute), but also running data-intensive HPC applications and Big-Data applications. As a consequence, there is a growing desire to have a single system serve the needs of both Big-Compute and Big-Data applications. Though the system architecture supports the convergence of the Big-Compute and Big-Data, the programming models have yet to evolve to support either hierarchical-heterogeneous memory systems or the convergence. In this work, we propose and develop the programming abstraction called SHARed data-structure centric Programming abstraction (SharP) to address both of these goals, i.e., provide (1) a simple, usable, and portable abstraction for hierarchical-heterogeneous memory and (2) a unified programming abstraction for Big-Compute and Big-Data applications. To evaluate SharP, we implement a Stencil benchmark using SharP, port QMCPack, a petascale-capable application, and adapt Memcached ecosystem, a popular Big-Data framework, to use SharP, and quantify the performance and productivity advantages. Additionally, we demonstrate the simplicity of using SharP on different memories including DRAM, High-bandwidth Memory (HBM), and non-volatile random access memory (NVRAM).read more
Citations
More filters
Journal ArticleDOI
A Case For Intra-rack Resource Disaggregation in HPC
George Michelogiannakis,Benjamin Klenk,Brandon Cook,Min Yee Teh,Madeleine Glick,Larry R. Dennison,Keren Bergman,John Shalf +7 more
TL;DR: It is shown that for a rack (cabinet) configuration and applications similar to Cori, a central processing unit with intra-rack disaggregation has a 99.5% probability to find all resources it requires inside its rack.
Journal ArticleDOI
Approaches of enhancing interoperations among high performance computing and big data analytics via augmentation
TL;DR: This paper sheds light upon how big data frameworks can be ported to HPC platforms as a preliminary step towards the convergence of big data and exascale computing ecosystem.
Proceedings ArticleDOI
SharP Hash: A High-Performing Distributed Hash for Extreme-Scale Systems
TL;DR: SharP Hash's high performance is obtained through the use of high-performing networks and one-sided semantics and its performance characteristics are demonstrated with a synthetic micro-benchmark and implementation of a Key Value (KV) store, Memcached.
Proceedings ArticleDOI
Optimizing Data Aggregation by Leveraging the Deep Memory Hierarchy on Large-scale Systems
TL;DR: This paper presents a topology and memory-aware data movement library performing data aggregation on large-scale systems and demonstrates how this approach can decrease the I/O time of a classic workflow by 26%.
Journal ArticleDOI
Efficient Intra-Rack Resource Disaggregation for HPC Using Co-Packaged DWDM Photonics
George Michelogiannakis,Yehia Arafa,B. D. Cook,Liang Yuan Dai,Abdel-Hameed A. Badawy,Madeleine Glick,Yuyang Wang,Keren Bergman,John Shalf +8 more
TL;DR: In this article , modern photonic components can be co-designed with modern HPC racks to implement flexible intra-rack resource disaggregation and fully meet the bit error rate (BER) and high escape bandwidth of all chip types with negligible power overhead.
References
More filters
Journal ArticleDOI
Co-array Fortran for parallel programming
Robert W. Numrich,John Reid +1 more
TL;DR: The extension of Co-Array Fortran is introduced; examples to illustrate how clear, powerful, and flexible it can be; and a technical definition is provided.
MPI: A Message-Passing Interface
TL;DR: An overview of MPI, a proposed standard message passing interface for MIMD distributed memory concurrent computers, which includes point-to-point and collective communication routines, as well as support for process groups, communication contexts, and application topologies is presented.
Proceedings ArticleDOI
hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications
François Broquedis,Jérôme Clet-Ortega,Stéphanie Moreaud,Nathalie Furmento,Brice Goglin,Guillaume Mercier,Samuel Thibault,Raymond Namyst +7 more
TL;DR: The Hardware Locality (hwloc) software is introduced which gathers hardware information about processors, caches, memory nodes and more, and exposes it to applications and runtime systems in a abstracted and portable hierarchical manner.
Book ChapterDOI
Exascale computing technology challenges
TL;DR: The technology challenges on the road to exascale, their underlying causes, and their effect on the future of HPC system design are described.
Proceedings ArticleDOI
Global Arrays: a portable "shared-memory" programming model for distributed memory computers
TL;DR: The key concept of GA is that it provides a portable interface through which each process in a MIMD parallel program can asynchronously access logical blocks of physically distributed matrices, with no need for explicit cooperation by other processes.
Related Papers (5)
Decentralized Offload-based Execution on Memory-centric Compute Cores
Saambhavi Baskaran,Jack Sampson +1 more