Toward a New Metric for Ranking High Performance Computing Systems
Summary (1 min read)
1. INTRODUCTION
- The High Performance Linpack (HPL) benchmark is the most widely recognized and discussed metric for ranking high performance computing systems.
- At the same time HPL rankings of computer systems are no longer so strongly correlated to real application performance, especially for the broad set of HPC applications governed by differential equations, which tend to have much stronger needs for high bandwidth and low latency, and tend to access data using irregular patterns.
- While Type 1 patterns are commonly found in real applications, additional computations and access patterns are also very common.
3. REQUIREMENTS
- Any new metric the authors introduce must satisfy a number of requirements.
- The ranking of computer systems using the new metric must correlate strongly to how their real applications would rank these same systems.
- Drive improvements to computer systems to benefit their applications:.
- The authors will perform thorough validation testing of any proposed benchmark against a suite of applications on current high-end systems using techniques similar to those identified in the Mantevo project [3].
- The authors will furthermore specify restrictions on changes to the reference version of the code to ensure that only changes that have relevance to their application base are permitted.
4. A PRECONDITIONED CONJUGATE GRADIENT BENCHMARK
- As the candidate for a new HPC metric, the authors consider the preconditioned conjugate gradient (PCG) method with a local symmetric Gauss-Seidel preconditioner .
- Set up data structures for the local symmetric Gauss-Seidel preconditioner.
- By doing this the authors can compare the numerical results for “correctness” at the end of each m-iteration phase.
- Since many large-scale applications use C++ for its compile-time polymorphism and objectoriented features, the authors believe it is important to have HPCG be a C++ code.
- B. Timing and execution rate results are reported.
5. JUSTIFICATION FOR HPCG BENCHMARK
- This is in contrast with many of their MPI-only applications today, and presents a big challenge to applications that must certify their computational results and debug in the presence of bitwise variability.
- At the same time, previous efforts are not appropriate to leverage, nor do expected trends in algorithms suggest a better approach at this time.
- As such, its scope is broader than what the authors propose here, but this benchmark does not address scalable distributed memory parallelism or nested parallelism.
7. SUMMARY AND CONCLUSIONS
- The High Performance Linpack (HPL) Benchmark is an incredibly successful metric for the high performance computing community.
- The trends it exposes, the focused optimization efforts it inspires and the publicity it brings to our community are very important.
- HPCG is large enough to be mathematically meaningful, yet small enough to easily understand and use.
Did you find this useful? Give us your feedback
Citations
226 citations
Cites methods from "Toward a New Metric for Ranking Hig..."
...he index of 31 or 63 bits is completely compatible to most numerical libraries such as Intel MKL. Moreover, reference implementation of the recent high performance conjugate gradient (HPCG) benchmark [15] also uses 32-bit signed integer for problem dimension no more than 231 and 64-bit signed integer for problem dimension larger than that. Therefore, it is safe to save 1 bit as the empty row hint and ...
[...]
148 citations
144 citations
Cites background from "Toward a New Metric for Ranking Hig..."
...A significant benefit of ACSR over other proposed SpMV approaches is that it works directly with the standard CSR format, and thus avoids significant preprocessing overheads....
[...]
123 citations
Cites background or methods from "Toward a New Metric for Ranking Hig..."
...SymGS: Symmetric Gauss-Seidel smoother (SymGS) is a key operation in the multigrid sparse 184 solver from HPCG [10]....
[...]
...Many important classes of algorithms, including machine learning problems (e.g., regression, classification using Support Vector Machines, and recommender systems), graph algorithms (e.g., the Graph500 benchmark and pagerank for computing ranks of webpages), as well as HPC applications (e.g., the HPCG benchmark) share similar computational and memory access patterns to that of Sparse Matrix Vector Multiplication, where the vector Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page....
[...]
...Our code is from the HPCG benchmark [10] and has been optimized for multicore processors [33]....
[...]
...For sparse linear algebra, we use HPCG [10], a newly introduced component of the Top500 supercomputer rankings....
[...]
...Operations on sparse data structures, such as sparse matrices, are important in a variety of emerging workloads in the areas of machine learning, graph operations and statistical analysis as well as sparse solvers used in High-Performance Computing [10, 42]....
[...]
117 citations
Cites background or methods from "Toward a New Metric for Ranking Hig..."
...The HPC Challenge Benchmark Suite (Dongarra and Heroux, 2013, Luszczek et al., 2006, Luszczek and Dongarra, 2010) has established itself as a performance measurement framework with a comprehensive set of computational and, more importantly, memory-access patterns that build on the popularity and relevance of HPL but add a much richer view of benchmarked hardware....
[...]
...The HPC Challenge Benchmark Suite (Dongarra and Heroux, 2013, Luszczek et al., 2006, Luszczek and Dongarra, 2010) has established itself as a performance measurement framework with a comprehensive set of computational and, more importantly, memory-access patterns that build on the popularity and…...
[...]
...Keywords Preconditioned conjugate gradient, multigrid smoothing, additive Schwarz, HPC benchmarking, validation and verification...
[...]
...The HPCG benchmark (Dongarra and Heroux, 2013) is a tool for ranking computer systems based on a simple additive Schwarz, symmetric Gauss–Seidel preconditioned conjugate-gradient solver....
[...]
References
2,246 citations
875 citations
785 citations
13 citations
"Toward a New Metric for Ranking Hig..." refers background in this paper
...Iterative Solver Benchmark: A lesser-known but more relevant benchmark, the Iterative Solver Benchmark [4] specifies the execution of a preconditioned CG and GMRES iteration using physically meaningful sparsity patterns and several preconditioners....
[...]
...Jack Dongarra, Victor Eijkhout, Henk van der Vorst, Iterative Solver Benchmark....
[...]
Related Papers (5)
Frequently Asked Questions (8)
Q2. What can be done to improve the performance of HPCG?
Emerging asynchronous collectives and other latency-hiding techniques can be explored in the context of HPCG and aid in their adoption and optimization on future systems.
Q3. Why is it important to have HPCG be a C++ code?
Since many large-scale applications use C++ for its compile-time polymorphism and objectoriented features, the authors believe it is important to have HPCG be a C++ code.
Q4. What is the restriction on the benchmarker?
The benchmarker is prohibited from exploiting regularity by using, for example, a sparse diagonal format and is prohibited from exploiting value symmetry to reduce storage requirements.
Q5. What can the authors do to verify and validate results?
The authors can compute spectral approximates that bound the error, and use other properties of PCG and SPD matrices to verify and validate results.
Q6. What is the widely recognized and discussed metric for ranking high performance computing systems?
The High Performance Linpack (HPL) benchmark is the most widely recognized and discussed metric for ranking high performance computing systems.
Q7. What are some of the characteristics of Type 2 patterns?
In particular, many important calculations, which the authors call Type 2 patterns, have much lower computation-to-data-access ratios, access memory irregularly, and have fine-grain recursive computations.
Q8. What are the main patterns in the HPCG benchmark?
The major communication (global and neighborhood collectives) and computational patterns (vector updates, dot products, sparse matrix-vector multiplications and local triangular solves) in their production differential equation codes, both implicit and explicit, are present in this benchmark.