scispace - formally typeset
Open AccessJournal ArticleDOI

Main Memory in HPC: Do We Need More or Could We Live with Less?

Reads0
Chats0
TLDR
In this article, the authors analyzed the memory capacity requirements of important HPC benchmarks and applications and found that most of the HPC applications under study have per-core memory footprints in the range of hundreds of megabytes, but also detect applications and use cases that require gigabytes per core.
Abstract
An important aspect of High-Performance Computing (HPC) system design is the choice of main memory capacity. This choice becomes increasingly important now that 3D-stacked memories are entering the market. Compared with conventional Dual In-line Memory Modules (DIMMs), 3D memory chiplets provide better performance and energy efficiency but lower memory capacities. Therefore, the adoption of 3D-stacked memories in the HPC domain depends on whether we can find use cases that require much less memory than is available now.This study analyzes the memory capacity requirements of important HPC benchmarks and applications. We find that the High-Performance Conjugate Gradients (HPCG) benchmark could be an important success story for 3D-stacked memories in HPC, but High-Performance Linpack (HPL) is likely to be constrained by 3D memory capacity. The study also emphasizes that the analysis of memory footprints of production HPC applications is complex and that it requires an understanding of application scalability and target category, i.e., whether the users target capability or capacity computing. The results show that most of the HPC applications under study have per-core memory footprints in the range of hundreds of megabytes, but we also detect applications and use cases that require gigabytes per core. Overall, the study identifies the HPC applications and use cases with memory footprints that could be provided by 3D-stacked memory chiplets, making a first step toward adoption of this novel technology in the HPC domain.

read more

Citations
More filters
Proceedings ArticleDOI

Quantifying Memory Underutilization in HPC Systems and Using it to Improve Performance via Architecture Support

TL;DR: This paper performs the first large-scale study of system-level memory utilization in the context of HPC systems and proposes the first exploration of architectural techniques to improve memory utilization specifically for HPC Systems.
Journal ArticleDOI

A Case For Intra-rack Resource Disaggregation in HPC

TL;DR: It is shown that for a rack (cabinet) configuration and applications similar to Cori, a central processing unit with intra-rack disaggregation has a 99.5% probability to find all resources it requires inside its rack.
Journal ArticleDOI

Pricing schemes for energy-efficient HPC systems: Design and exploration

TL;DR: Energy efficiency is of paramount importance for the sustainability of high performance computing (HPC) systems as mentioned in this paper, and energy consumption limits the peak performance of supercomputers and accounts for a...
Book ChapterDOI

A Survey of Application Memory Usage on a National Supercomputer: An Analysis of Memory Requirements on ARCHER

TL;DR: Analysis of memory use by software application type reveals differences in memory use between periodic electronic structure, atomistic N-body, grid-based climate modelling, and grid- based CFD applications.
Journal ArticleDOI

Pricing Schemes for Energy-Efficient HPC Systems: Design and Exploration

TL;DR: In this article, the authors present a parametrized model to analyze the impact of frequency scaling on energy and to assess the potential total cost benefits for the HPC facility and the user.
References
More filters
Proceedings ArticleDOI

The SPLASH-2 programs: characterization and methodological considerations

TL;DR: This paper quantitatively characterize the SPLASH-2 programs in terms of fundamental properties and architectural interactions that are important to understand them well, including the computational load balance, communication to computation ratio and traffic needs, important working set sizes, and issues related to spatial locality.
Proceedings ArticleDOI

The PARSEC benchmark suite: characterization and architectural implications

TL;DR: This paper presents and characterizes the Princeton Application Repository for Shared-Memory Computers (PARSEC), a benchmark suite for studies of Chip-Multiprocessors (CMPs), and shows that the benchmark suite covers a wide spectrum of working sets, locality, data sharing, synchronization and off-chip traffic.

The Landscape of Parallel Computing Research: A View from Berkeley

TL;DR: The parallel landscape is frame with seven questions, and the following are recommended to explore the design space rapidly: • The overarching goal should be to make it easy to write programs that execute efficiently on highly parallel computing systems • The target should be 1000s of cores per chip, as these chips are built from processing elements that are the most efficient in MIPS (Million Instructions per Second) per watt, MIPS per area of silicon, and MIPS each development dollar.
Journal ArticleDOI

The LINPACK Benchmark: past, present and future

TL;DR: Aside from the LINPACK Benchmark suite, the TOP500 and the HPL codes are presented and information is given on how to interpret the results of the benchmark and how the results fit into the performance evaluation process.
Related Papers (5)