scispace - formally typeset
Search or ask a question
Book ChapterDOI

PARCOACH Extension for Hybrid Applications with Interprocedural Analysis

01 Jan 2016-pp 135-146
TL;DR: An extension of the PARallel COntrol flow Anomaly CHecker (PARCOACH) is presented to enable verification of hybrid HPC applications and statically verifies the thread level required by an MPI+OpenMP application and outlines execution paths leading to potential deadlocks.
Abstract: Supercomputers are rapidly evolving with now millions of processing units, posing the questions of their programmability. Despite the emergence of more widespread and functional programming models, developing correct and effective parallel applications still remains a complex task. Although debugging solutions have emerged to address this issue, they often come with restrictions. Furthermore, programming model evolutions stress the requirement for a validation tool able to handle hybrid applications. Indeed, as current scientific applications mainly rely on MPI (Message-Passing Interface), new hardwares designed with a larger node-level parallelism advocate for an MPI+X solution with X a shared-memory model like OpenMP. But integrating two different approaches inside the same application can be error-prone leading to complex bugs. In an MPI+X program, not only the correctness of MPI should be ensured but also its interactions with the multi-threaded model. For example, identical MPI collective operations cannot be performed by multiple non-synchronized threads. In this paper, we present an extension of the PARallel COntrol flow Anomaly CHecker (PARCOACH) to enable verification of hybrid HPC applications. Relying on a GCC plugin that combines static and dynamic analysis, the first pass statically verifies the thread level required by an MPI+OpenMP application and outlines execution paths leading to potential deadlocks. Based on this analysis, the code is selectively instrumented, displaying an error and interrupting all processes if the actual scheduling leads to a deadlock situation.

Summary (3 min read)

1 Introduction

  • The evolution of supercomputers to Exascale systems raises the issue of choosing the right parallel programming models for applications.
  • With the help of a compiler pass, PARCOACH can extract potential parallel deadlocks related to control-flow divergence and issue warnings during the compilation.
  • This work is based on [9] and extends [10] with more details and an interprocedural analysis.
  • Therefore, each MPI task can have a different control flow within functions, but it goes through the same functions for communications.

1.1 Motivating Examples

  • The MPI specification requires that all MPI processes call the same collective operations (blocking and non-blocking since MPI-3) in the same order [11].
  • In Listing 2, MPI calls are executed outside the multithreaded region.
  • The minimum thread level required for this code is therefore MPI THREAD FUNNELED.
  • These simple examples illustrate the difficulty for a developer to ensure that MPI calls are correctly used inside an hybrid MPI+OpenMP application.
  • Then, Section 3 describes an interprocedural extension of the PARCOACH static pass.

2 PARCOACH Static and Dynamic Analyses for Hybrid Applications

  • The first analysis is located in the middle of the compilation chain, where the code is represented as an intermediate form.
  • When a potential deadlock is detected, PARCOACH reports a warning with precise information about the possible deadlock (line and name of the guilty MPI communications, and line of conditionals responsible for the deadlock).
  • Then the warnings are confirmed by a static instrumentation of the code.
  • Note that whenever the compile-time analysis is able to statically prove the correctness of a function, no code is inserted in the program, reducing the impact of their transformation on the execution time.
  • This section describes the following new features of PARCOACH: (i) detection of the minimal MPI thread-level support required by an MPI+OpenMP application (see [9] for more details) and (ii) checking misuse of MPI blocking and nonblocking collectives in a multi-threaded context (extension of [10]).

2.1 MPI Thread-Level Checking

  • This analysis finds the right MPI thread-level support to be used and identifies code fragments that may prevent conformance to a given level.
  • Verifying the compliance of an MPI thread level in MPI+OpenMP code resorts to check the placement of MPI calls.
  • To determine the thread context in which MPI calls are performed, the authors augment the CFGs by marking the nodes containing MPI calls (point-to-point and collective).
  • As defined in [9], a parallelism word is the sequence of OpenMP parallel constructs (P:parallel, S:single, M:master and B:barrier for implicit and explicit barriers) surrounding a node from the beginning of the function to the node.
  • Based on this analysis, the following section describes how collectives operations can be verified in a multithreaded context.

2.2 MPI Collective Communication Verification

  • This analysis proposes a solution to check the sequence of collective communications inside MPI+OpenMP programs.
  • To this end, the authors use the parallelism words defined in [10].
  • For this part, it is not necessary to seperate single from master regions.
  • Finally the STATIC PASS procedure detects category 3 of errors.

2.2.1 Static Instrumentation

  • The compile-time verification outputs warnings for MPI collective operations that may lead to an error or deadlock.
  • Nevertheless the static analysis could lead to false positives if the actual control-flow divergence is not happening during the execution.
  • To dynamically verify the total order of MPI collective sequences in each MPI process, validation functions (CCipw and CCcc) are inserted in nodes in the sets Sipw and Scc generated by the static pass (see Algorithm 1).
  • As multiple threads may call CC before return statements, this function is wrapped into a single pragma.

3 Interprocedural Analysis

  • Because PARCOACH relies on an intraprocedural analysis, it miss errors across function boundaries and therefore it may produce false positive as well as false negative results.
  • To extend PARCOACH with an interprocedural mechanism, the authors extended the intraprocedural approach through the application Call Graph (CG): nodes represent functions and edges model possible calls.
  • The main idea is to compute and reuse the summaries of each CFG through a CG traversal in reverse invocation order.
  • For this purpose, the intraprocedural analysis is modified to return the valid sequence of collective operations for each function .
  • Performing their intraprocedural analysis would lead to a deadlock warning on main (for the MPI Barrier operations) and on g (for the MPI Allreduce collective).

4 Experimental Results

  • The authors extended the PARCOACH implementation (GCC 4.7 plugin) to add analysis of hybrid applications.
  • To show the impact of PARCOACH analysis on the compilation and execution time, the authors tested the NASMZ [12], AMG benchmark [1], the EPCC suite [2] and HERA [5].
  • All results were conducted on Tera100, a petaflopic supercomputer at the CEA.
  • The calls in Figures 8(b) and 8(c) contain one and two calls to an MPI collective, respectively.
  • This interprocedural must be extended to a data flow analysis with the aim to study the dependence of these condition variables in order to know if they depend on the process rank or not.

5 Conclusion

  • The MPI+OpenMP approach is one solution to tackle the increasing node-level parallelism and the decreasing amount of memory per compute unit.
  • Some production codes are already hybrid and other applications are in the development process.
  • That is why the authors developed the platform PARCOACH that helps application developers to check which interaction support is required for a specific hybrid code and checks the correct usage of blocking and non-blocking MPI collective communications in an MPI+OpenMP application.
  • This enables us to reduce the number of false positives returned by the initial static analysis.
  • This interprocedural analysis could be improved to propagate collective issue information and can be coupled to a data-flow analysis to avoid false positive results.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

HAL Id: hal-01420655
https://hal.archives-ouvertes.fr/hal-01420655
Submitted on 20 Dec 2016
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entic research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diusion de documents
scientiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
PARCOACH Extension for Hybrid Applications with
Interprocedural Analysis
Emmanuelle Saillard, Hugo Brunie, Patrick Carribault, Denis Barthou
To cite this version:
Emmanuelle Saillard, Hugo Brunie, Patrick Carribault, Denis Barthou. PARCOACH Extension for
Hybrid Applications with Interprocedural Analysis. 9th International Workshop on Parallel Tools for
High Performance Computing, Sep 2015, Dresden, Germany. pp.135 - 146, �10.1007/978-3-319-39589-
0_11�. �hal-01420655�

PARCOACH Extension for Hybrid Applications
with Interprocedural Analysis
Emmanuelle Saillard, Hugo Brunie, Patrick Carribault and Denis Barthou
Abstract Supercomputers are rapidly evolving with now millions of processing
units, posing the questions of their programmability. Despite the emergence of more
widespread and functional programming models, developing correct and effective
parallel applications still remains a complex task. Although debugging solutions
have emerged to address this issue, they often come with restrictions. Further-
more, programming model evolutions stress the requirement for a validation tool
able to handle hybrid applications. Indeed, as current scientific applications mainly
rely on MPI (Message-Passing Interface), new hardwares designed with a larger
node-level parallelism advocate for an MPI+X solution with X a shared-memory
model like OpenMP. But integrating two different approaches inside the same ap-
plication can be error-prone leading to complex bugs. In an MPI+X program, not
only the correctness of MPI should be ensured but also its interactions with the
multi-threaded model. For example, identical MPI collective operations cannot be
performed by multiple non-synchronized threads. In this paper, we present an ex-
tension of the PARallel COntrol flow Anomaly CHecker (PARCOACH) to enable
verification of hybrid HPC applications. Relying on a GCC plugin that combines
static and dynamic analysis, the first pass statically verifies the thread level required
by an MPI+OpenMP application and outlines execution paths leading to potential
deadlocks. Based on this analysis, the code is selectively instrumented, displaying
an error and interrupting all processes if the actual scheduling leads to a deadlock
situation.
Emmanuelle Saillard
CEA, DAM, DIF, F-91297 Arpajon, France e-mail: emmanuelle.saillard.ocre@cea.fr
Hugo Brunie
CEA, DAM, DIF, F-91297 Arpajon, France e-mail: hugo.brunie.ocre@cea.fr
Patrick Carribault
CEA, DAM, DIF, F-91297 Arpajon, France e-mail: patrick.carribault@cea.fr
Denis Barthou
Bordeaux Institute of Technology, LaBRI / INRIA, Bordeaux, France e-mail: de-
nis.barthou@labri.fr
1

2 Emmanuelle Saillard, Hugo Brunie, Patrick Carribault and Denis Barthou
1 Introduction
The evolution of supercomputers to Exascale systems raises the issue of choosing
the right parallel programming models for applications. Currently, most HPC appli-
cations are based on MPI. But the hardware evolution of increasing core counts per
node leads to a mix of MPI with shared-memory approaches like OpenMP. However
merging two parallel programming models within the same application requires full
interoperability between these models and makes the debugging task more challeng-
ing. Therefore, there is a need for tools able to identify functional bugs as early as
possible during the development cycle. To tackle this issue, we designed the PAR-
allel COntrol flow Anomaly CHecker (PARCOACH) that combines static and dy-
namic analyses to enable an early detection of bugs in parallel applications. With
the help of a compiler pass, PARCOACH can extract potential parallel deadlocks
related to control-flow divergence and issue warnings during the compilation. Not
only the parallel contructs involved in the deadlock are identified and printed dur-
ing the compilation, but the statements responsible for the control-flow divergence
are also outputed. In this paper, we propose an extension of PARCOACH to hybrid
MPI+OpenMP applications and an interprocedural analysis to improve the bug de-
tection through a whole program. This work is based on [9] and extends [10] with
more details and an interprocedural analysis. To the best of our knowledge, only
Marmot [3] is able to detect errors in MPI+OpenMP programs. But as a dynamic
tool, Marmot detects errors during the execution and is limited to the dynamic par-
allel schedule and only detects errors occuring for a given inputset whereas our
approach allows for static bug detection with runtime support and detects bugs for
all possible values of inputs.
In the following we assume that all programs are SPMD MPI programs and all
MPI collective operations are called with compatible arguments (only the
MPI COMM
WORLD
communicator is supported). Therefore, each MPI task can have a different
control flow within functions, but it goes through the same functions for communi-
cations. Issues related to MPI arguments can be tested through other tools.
1.1 Motivating Examples
The MPI specification requires that all MPI processes call the same collective oper-
ations (blocking and non-blocking since MPI-3) in the same order [11]. These calls
do not have to occur at the same line of source code, but the dynamic sequence of
collectives should be the same otherwise a deadlock can occur. In addition, MPI
calls should be cautiously located in multi-threaded regions. Focusing only on MPI,
in Listing 1, because of the conditional in line 2 (
if
statement), some processes may
call the
MPI Reduce
function while others may not. Similarly, in Listing 2, some
MPI ranks may perform a blocking barrier (
MPI Barrier
) while others will call a
non-blocking one (
MPI Ibarrier
). The sequence is the same (call to one barrier),
but this blocking/non-blocking matching is forbidden by the MPI specification.

PARCOACH Extension for Hybrid Applications 3
Listing 1
1 v o i d f ( ) {
2 i f ( . . . )
3 {
4 # pragm a omp p a r a l l e l
5 {
6 # pragma omp s i n g l e
7 {
8 MPI
Reduce ( . . )
9 }
10 }
11 }
12 }
Listing 2
1 v o i d f ( ) {
2 i f ( . . . )
3 M P I
B a r r i e r ( . . )
4 e l s e
5 M P I
I b a r r i e r ( . . . )
6 # pragma omp p a r a l l e l
7 {
8 / /
9 }
10 }
Listing 3
1 v o i d f ( ) {
2 # pragma omp p a r a l l e l
3 {
4 / /
5 # pragma omp m a s t e r
6 {
7 MPI
Send ( . . )
8 / /
9 }
10 }
11 }
Listing 4
1 v o i d f ( ) {
2 # pragma omp p a r a l l e l
3 {
4 # pragma omp s i n g l e no w a i t
5 {
6 MPI
Reduce ( . . )
7 }
8 # pragma omp s i n g l e
9 {
10 MPI
Reduce ( . . )
11 }
12 }
13 }
Fig. 1 MPI+OpenMP Examples with different uses of MPI calls.
Regarding hybrid MPI+OpenMP applications, the MPI API defines four levels
of thread support to indicate how threads should interact with MPI:
MPI THREAD
SINGLE
,
MPI THREAD FUNNELED
,
MPI THREAD SERIALIZED
and
MPI THREAD MULT
IPLE
. MPI processes can be multithreaded but the MPI standard specifies that ”it
is the user responsibility to prevent r aces when threads within the same application
post conflicting communication calls” [11]. In Listing 2, MPI calls are executed
outside the multithreaded region. This piece of code is therefore compliant with the
MPI THREAD SINGLE
level. But MPI communications may appear inside OpenMP
blocks. For example, the MPI point-to-point function at line 7 in Listing 3 is inside
a
master
block. The minimum thread level required for this code is therefore
MPI
THREAD FUNNELED
. However, calls located inside a single or master block may lead
to different thread support. Indeed, in Listing 4, two
MPI Reduce
are in different
single
regions. Because of the
nowait
clause on the first
single
region, these
calls are performed simultaneously by different threads. This example requires the
maximum thread support level i.e.,
MPI THREAD MULTIPLE
.

4 Emmanuelle Saillard, Hugo Brunie, Patrick Carribault and Denis Barthou
These simple examples illustrate the difficulty for a developer to ensure that MPI
calls are correctly used inside an hybrid MPI+OpenMP application. A tool able to
check, for each MPI call, in which thread context it can be performed would help the
application developer to know which thread-level an application requires. Further-
more, beyond this support, checking deadlock of MPI collective communications
in presence of OpenMP constructs can be very tricky. In this paper, we propose an
extension of PARCOACH to tackle these issues, with the help of an interprocedural
analysis to improve the compile-time detection.
Section 2 gives an overview of the PARCOACH platform with a description of its
static and dynamic analyses for hybrid MPI+OpenMP applications. Then, Section 3
describes an interprocedural extension of the PARCOACH static pass. Section 4
presents experimental results and finally Section 5 concludes the paper.
2 PARCOACH Static and Dynamic Analyses for Hybrid
Applications
PARCOACH uses a two-step method to verify MPI+OpenMP applications as shown
in Figure 2. The first analysis is located in the middle of the compilation chain,
where the code is represented as an intermediate form. Each function of a program is
depicted by a graph representation called Control Flow Graph (CFG). PARCOACH
analyses the CFG of each function to detect potential errors or deadlocks in a pro-
gram. When a potential deadlock is detected, PARCOACH reports a warning with
precise information about the possible deadlock (line and name of the guilty MPI
communications, and line of conditionals responsible for the deadlock). Then the
warnings are confirmed by a static instrumentation of the code. Note that whenever
the compile-time analysis is able to statically prove the correctness of a function, no
code is inserted in the program, reducing the impact of our transformation on the
execution time. If deadlocks are about to occur at runtime, the program is st opped
and PARCOACH returns error messages with compilation information.
This section describes the following new features of PARCOACH: (i ) detection
of the minimal MPI thread-level support required by an MPI+OpenMP application
(see [9] for more details) and (ii) checking misuse of MPI blocking and nonblocking
collectives in a multi-threaded context (extension of [10]).
2.1 MPI Thread-Level Checking
This analysis finds the right MPI thread-level support to be used and identifies code
fragments that may prevent conformance to a given l evel. Verifying the compli-
ance of an MPI thread level in MPI+OpenMP code resorts to check the placement
of MPI calls. To determine the thread context in which MPI calls are performed,
we augment the CFGs by marking the nodes containing MPI calls (point-to-point

Citations
More filters
Proceedings ArticleDOI
21 Jun 2021
TL;DR: In this paper, the authors propose MPI-CorrBench as a common test harness that enables a structured comparison of the different tools available w.r.t. various types of errors.
Abstract: The Message Passing Interface (MPI) is the de-facto standard for distributed memory computing in high-performance computing (HPC). To aid developers write correct MPI programs, different tools have been proposed, e.g., Intel Trace Analyzer and Collector (ITAC), MUST, Parcoach and MPI-Checker. Unfortunately, the effectiveness of these tools is hard to compare, as they have not been evaluated on a common set of applications. More importantly, well-known and widespread benchmarks, which tend to be well-tested and error free, were used for their evaluation. To enable a structured comparison and improve the coverage and reliability of available MPI correctness tools, we propose MPI-CorrBench as a common test harness. MPI-CorrBench enables a structured comparison of the different tools available w.r.t. various types of errors. In our evaluation, we use MPI-CorrBench to provide a well-defined set of error-cases to MUST, ITAC, Parcoach and MPI-Checker. In particular, we find that ITAC and MUST complement each other in many cases. In general, MUST works better for detecting type errors while ITAC is better in detecting errors in non-blocking operations. Although the most-used functions of MPI are well supported, MPI-CorrBench shows that for one sided communication, the error detection capability of all evaluated tools needs improvement. Moreover, our experiments reveal a MPI standard violation in the MPICH test suite as well as several cases of discouraged use of MPI functionality.

4 citations

References
More filters
01 Apr 1994
TL;DR: This document contains all the technical features proposed for the interface and the goal of the Message Passing Interface, simply stated, is to develop a widely used standard for writing message-passing programs.
Abstract: The Message Passing Interface Forum (MPIF), with participation from over 40 organizations, has been meeting since November 1992 to discuss and define a set of library standards for message passing MPIF is not sanctioned or supported by any official standards organization The goal of the Message Passing Interface, simply stated, is to develop a widely used standard for writing message-passing programs As such the interface should establish a practical, portable, efficient and flexible standard for message passing , This is the final report, Version 10, of the Message Passing Interface Forum This document contains all the technical features proposed for the interface This copy of the draft was processed by LATEX on April 21, 1994 , Please send comments on MPI to mpi-comments@csutkedu Your comment will be forwarded to MPIF committee members who will attempt to respond

3,181 citations

Book ChapterDOI
01 Jan 2005
TL;DR: The development at CEA/DAM of a new AMR multi-physics hydrocode platform led to convincing results on a wide range of applications, from interface instabilities to charge computations in detonics.
Abstract: The development at CEA/DAMof a new AMR multi-physics hydrocode platform led to convincing results on a wide range of applications, from interface instabilities to charge computations in detonics.

34 citations


"PARCOACH Extension for Hybrid Appli..." refers methods in this paper

  • ...To show the impact of PARCOACH analysis on the compilation and execution time, we tested the NASMZ [12], AMG benchmark [1], the EPCC suite [2] and HERA [5]....

    [...]

  • ...The execution time overheads obtained for the NAS benchmarks and HERA are pre- sented in Figure 7 running on MPICH with GCC OpenMP....

    [...]

Journal ArticleDOI
TL;DR: This work examines the performance of mixed-mode OpenMP/MPI on a number of popular HPC architectures, using a synthetic benchmark suite and two large-scale applications, and finds performance characteristics which differ significantly between implementations.
Abstract: With the current prevalence of multi-core processors in HPC architectures mixed-mode programming, using both MPI and OpenMP in the same application, is seen as an important technique for achieving high levels of scalability. As there are few standard benchmarks written in this paradigm, it is difficult to assess the likely performance of such programs. To help address this, we examine the performance of mixed-mode OpenMP/MPI on a number of popular HPC architectures, using a synthetic benchmark suite and two large-scale applications. We find performance characteristics which differ significantly between implementations, and which highlight possible areas for improvement, especially when multiple OpenMP threads communicate simultaneously via MPI.

20 citations


"PARCOACH Extension for Hybrid Appli..." refers methods in this paper

  • ...To show the impact of PARCOACH analysis on the compilation and execution time, we tested the NASMZ [12], AMG benchmark [1], the EPCC suite [2] and HERA [5]....

    [...]

Journal ArticleDOI
01 Nov 2014
TL;DR: A static analysis to detect when a deadlock occurs in MPI collective operations in single program multiple data applications, assuming MPI calls are not nested in multithreaded regions is proposed.
Abstract: Nowadays most scientific applications are parallelized based on MPI communications. Collective MPI communications have to be executed in the same order by all processes in their communicator and the same number of times, otherwise they do not conform to the standard and a deadlock or other undefined behaviour can occur. As soon as the control flow involving these collective operations becomes more complex, in particular including conditionals on process ranks, ensuring the correction of such code is error-prone. We propose in this paper a static analysis to detect when such a situation occurs, combined with a code transformation that prevents deadlocking. We focus on blocking MPI collective operations in single program multiple data applications, assuming MPI calls are not nested in multithreaded regions. We show on several benchmarks the small impact on performance and the ease of integration of our techniques in the development process.

16 citations


"PARCOACH Extension for Hybrid Appli..." refers methods in this paper

  • ...To verify that we rely on Algorithm 1 proposed in [7] with the extension of nonblocking collectives detailed in [4]....

    [...]

Proceedings ArticleDOI
15 Sep 2013
TL;DR: A static analysis to detect when collective MPI communications deadlock and a code transformation that prevents from deadlocking is proposed, showing on several benchmarks the small impact on performance and the ease of integration of the techniques in the development process.
Abstract: Collective MPI communications have to be executed in the same order by all processes in their communicator and the same number of times, otherwise a deadlock occurs. As soon as the control-flow involving these collective operations becomes more complex, in particular including conditionals on process ranks, ensuring the correction of such code is error-prone. We propose in this paper a static analysis to detect when such situation occurs, combined with a code transformation that prevents from deadlocking. We show on several benchmarks the small impact on performance and the ease of integration of our techniques in the development process.

12 citations

Frequently Asked Questions (1)
Q1. What are the contributions in "Parcoach extension for hybrid applications with interprocedural analysis" ?

In this paper, the authors present an extension of the PARallel COntrol flow Anomaly CHecker ( PARCOACH ) to enable verification of hybrid HPC applications. Relying on a GCC plugin that combines static and dynamic analysis, the first pass statically verifies the thread level required by an MPI+OpenMP application and outlines execution paths leading to potential deadlocks.