scispace - formally typeset
Open AccessProceedings ArticleDOI

SharP: Towards Programming Extreme-Scale Systems with Hierarchical Heterogeneous Memory

TLDR
This work proposes and develops the programming abstraction called SHARed data-structure centric Programming abstraction (SharP), a simple, usable, and portable abstraction for hierarchical-heterogeneous memory and a unified programming abstraction for Big-Compute and Big-Data applications.
Abstract
The pre-exascale systems are expected to have a significant amount of hierarchical and heterogeneous on-node memory, and this trend of system architecture in extreme-scale systems is expected to continue into the exascale era. Along with hierarchical-heterogeneous memory, the system typically has a high-performing network and a compute accelerator. This system architecture is not only effective for running traditional High Performance Computing (HPC) applications (Big-Compute), but also running data-intensive HPC applications and Big-Data applications. As a consequence, there is a growing desire to have a single system serve the needs of both Big-Compute and Big-Data applications. Though the system architecture supports the convergence of the Big-Compute and Big-Data, the programming models have yet to evolve to support either hierarchical-heterogeneous memory systems or the convergence. In this work, we propose and develop the programming abstraction called SHARed data-structure centric Programming abstraction (SharP) to address both of these goals, i.e., provide (1) a simple, usable, and portable abstraction for hierarchical-heterogeneous memory and (2) a unified programming abstraction for Big-Compute and Big-Data applications. To evaluate SharP, we implement a Stencil benchmark using SharP, port QMCPack, a petascale-capable application, and adapt Memcached ecosystem, a popular Big-Data framework, to use SharP, and quantify the performance and productivity advantages. Additionally, we demonstrate the simplicity of using SharP on different memories including DRAM, High-bandwidth Memory (HBM), and non-volatile random access memory (NVRAM).

read more

Citations
More filters
Journal ArticleDOI

A Case For Intra-rack Resource Disaggregation in HPC

TL;DR: It is shown that for a rack (cabinet) configuration and applications similar to Cori, a central processing unit with intra-rack disaggregation has a 99.5% probability to find all resources it requires inside its rack.
Journal ArticleDOI

Approaches of enhancing interoperations among high performance computing and big data analytics via augmentation

TL;DR: This paper sheds light upon how big data frameworks can be ported to HPC platforms as a preliminary step towards the convergence of big data and exascale computing ecosystem.
Proceedings ArticleDOI

SharP Hash: A High-Performing Distributed Hash for Extreme-Scale Systems

TL;DR: SharP Hash's high performance is obtained through the use of high-performing networks and one-sided semantics and its performance characteristics are demonstrated with a synthetic micro-benchmark and implementation of a Key Value (KV) store, Memcached.
Proceedings ArticleDOI

Optimizing Data Aggregation by Leveraging the Deep Memory Hierarchy on Large-scale Systems

TL;DR: This paper presents a topology and memory-aware data movement library performing data aggregation on large-scale systems and demonstrates how this approach can decrease the I/O time of a classic workflow by 26%.
Journal ArticleDOI

Efficient Intra-Rack Resource Disaggregation for HPC Using Co-Packaged DWDM Photonics

TL;DR: In this article , modern photonic components can be co-designed with modern HPC racks to implement flexible intra-rack resource disaggregation and fully meet the bit error rate (BER) and high escape bandwidth of all chip types with negligible power overhead.
References
More filters
Journal ArticleDOI

Co-array Fortran for parallel programming

TL;DR: The extension of Co-Array Fortran is introduced; examples to illustrate how clear, powerful, and flexible it can be; and a technical definition is provided.

MPI: A Message-Passing Interface

Forum Mpi
TL;DR: An overview of MPI, a proposed standard message passing interface for MIMD distributed memory concurrent computers, which includes point-to-point and collective communication routines, as well as support for process groups, communication contexts, and application topologies is presented.
Proceedings ArticleDOI

hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications

TL;DR: The Hardware Locality (hwloc) software is introduced which gathers hardware information about processors, caches, memory nodes and more, and exposes it to applications and runtime systems in a abstracted and portable hierarchical manner.
Book ChapterDOI

Exascale computing technology challenges

TL;DR: The technology challenges on the road to exascale, their underlying causes, and their effect on the future of HPC system design are described.
Proceedings ArticleDOI

Global Arrays: a portable "shared-memory" programming model for distributed memory computers

TL;DR: The key concept of GA is that it provides a portable interface through which each process in a MIMD parallel program can asynchronously access logical blocks of physically distributed matrices, with no need for explicit cooperation by other processes.
Related Papers (5)