scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

MPACP: An Approach for Automatic Matching of Parallel Application Communication Patterns

09 Dec 2008-pp 1517-1522
TL;DR: This work proposes a new approach MPACP (matching of parallel application communication patterns) to automate the analysis of the similarity between two parallel applications and provides a reliable report which will help users or developers understand the similarity among communication patterns of parallel applications.
Abstract: Current trends in HPC (high performance computing) suggest that clusters will soon consist with hundreds, if not thousands, processors and the size of current scientific problems becomes much larger than before. Many researchers have predicted that the communication among these processors has dominated the execution time of the scientific parallel applications. Users will need well understanding on communication patterns among scientific parallel applications and their similarities so that users benefit not only from cost saving on constructing the running environment for these applications but also from obtaining better performance. In this paper, we address the communication pattern matching, and focus on point-to-point communication, which is primarily utilized (over 90% all MPI (message passing interface) calls) in most MPI codes and has much more impact on the communication performance than collective communication does. In this work, our contribution is that we propose a new approach MPACP (matching of parallel application communication patterns) to automate the analysis of the similarity between two parallel applications and provide a reliable report which will help users or developers understand the similarity among communication patterns of parallel applications. Furthermore, experimental results demonstrate the effective performance of our scheme in terms of the automatic matching of parallel application communication patterns.
Citations
More filters
Proceedings ArticleDOI
01 Nov 2019
TL;DR: It is shown here that some proxy/parent pairs do not need the extra detail of dynamic behavior analysis, while others can benefit from it, and through this work a parent/proxy mismatch is identified and improved the proxy application.
Abstract: In this work we investigate the dynamic communication behavior of parent and proxy applications, and investigate whether or not the dynamic communication behavior of the proxy matches that of its respective parent application. The idea of proxy applications is that they should match their parent well, and should exercise the hardware and perform similarly, so that from them lessons can be learned about how the HPC system and the application can best be utilized. We show here that some proxy/parent pairs do not need the extra detail of dynamic behavior analysis, while others can benefit from it, and through this we also identified a parent/proxy mismatch and improved the proxy application.

6 citations


Cites background from "MPACP: An Approach for Automatic Ma..."

  • ...[12] present the only other work we have identified on characterizing similarity in MPI communication patterns....

    [...]

Proceedings ArticleDOI
01 Nov 2018
TL;DR: An exploratory effort at making an improved quantification of the correspondence of communication behavior for proxies and their respective parent applications and shows that each proxy analyzed is representative of its parent with respect to communication data.
Abstract: Proxy applications, or proxies, are simple applications meant to exercise systems in a way that mimics real applications (their parents). However, characterizing the relationship between the behavior of parent and proxy applications is not an easy task. In prior work [1], we presented a data-driven methodology to characterize the relationship between parent and proxy applications based on collecting runtime data from both and then using data analytics to find their correspondence or divergence. We showed that it worked well for hardware counter data, but our initial attempt using MPI function data was less satisfactory. In this paper, we present an exploratory effort at making an improved quantification of the correspondence of communication behavior for proxies and their respective parent applications. We present experimental evidence of positive results using four proxy applications from the current ECP Proxy Application Suite and their corresponding parent applications (in the ECP application portfolio). Results show that each proxy analyzed is representative of its parent with respect to communication data. In conjunction with our method presented in [1] (correspondence between computation and memory behavior), we get a strong understanding of how well a proxy predicts the comprehensive performance of its parent.

4 citations


Cites background from "MPACP: An Approach for Automatic Ma..."

  • ...Although there is a lot of older work done in characterization of application communication [24]–[30], relatively little work has been done in characterizing the similarity of communication patterns [31]....

    [...]

References
More filters
Journal ArticleDOI
01 Sep 1991
TL;DR: A new set of benchmarks has been developed for the performance evaluation of highly parallel supercom puters that mimic the computation and data move ment characteristics of large-scale computational fluid dynamics applications.
Abstract: A new set of benchmarks has been developed for the performance evaluation of highly parallel supercom puters. These consist of five "parallel kernel" bench marks and three "simulated application" benchmarks. Together they mimic the computation and data move ment characteristics of large-scale computational fluid dynamics applications. The principal distinguishing feature of these benchmarks is their "pencil and paper" specification-all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional bench- marking approaches on highly parallel systems are avoided.

2,246 citations


"MPACP: An Approach for Automatic Ma..." refers methods in this paper

  • ...These four applications from NAS Parallel Benchmark Suite [ 13 ] are: BT, SP, MG, and LU. There are three different problem sizes that A is the smallest problem size, B is the medium one and C is the largest problem size....

    [...]

Proceedings ArticleDOI
16 Nov 2002
TL;DR: This work investigates the scalability, architectural requirements, performance characteristics of eight scalable scientific applications, and distill these factors into common traits and overall recommendations for both users and designers of scalable platforms.
Abstract: We investigate the scalability, architectural requirements,a nd performance characteristics of eight scalable scientific applications. Our analysis is driven by empirical measurements using statistical and tracing instrumentation for both communication and computation. Based on these measurements, we refine our analysis into precise explanations of the factors that influence performance and scalability for each application; we distill these factors into common traits and overall recommendations for both users and designers of scalable platforms. Our experiments demonstrate that some traits, such as improvements in the scaling and performance of MPI's collective operations, will benefit most applications. We also find specific characteristics of some applications that limit performance. For example, one application's intensive use of a 64-bit, floating-point divide instruction, which has high latency and is not pipelined on the POWER3, limits the performance of the application's primary computation.

83 citations


"MPACP: An Approach for Automatic Ma..." refers background in this paper

  • ...Many teams include researchers and developers have realized that the communication patterns of parallel applications play a significant role in the performance [4], [5] and [6]....

    [...]

Proceedings ArticleDOI
12 Nov 2005
TL;DR: Overall results show that HFAST is a promising approach for practically addressing the interconnect requirements of future peta-scale systems.
Abstract: The path towards realizing peta-scale computing is increasingly dependent on scaling up to unprecedented numbers of processors. To prevent the interconnect architecture between processors from dominating the overall cost of such systems, there is a critical need for interconnect solutions that both provide performance to ulta-scale applications and have costs that scale linearly with system size. In this work we propose the Hybrid Flexibly Assignable Switch Topology (HFAST) infrastructure. The HFAST approach uses both passive (circuit switch) and active (packet switch) commodity switch components to deliver all of the flexibility and fault-tolerance of a fully-interconnected network (such as a fat-tree), while preserving the nearly linear cost scaling associated with traditional low-degree interconnect networks. To understand the applicability of this technology, we perform an in-depth study of communication requirements across a broad spectrum of important scientific applications, whose computational methods include: finite-difference, latticebolzmann, particle in cell, sparse linear algebra, particle mesh ewald, and FFT-based solvers. We use the IPM (Integrated Performance Monitoring) profiling layer to gather detailed messaging statistics with minimal impact to code performance. This profiling provides us sufficiently detailed communication topology and message volume data to evaluate these applications in the context of the proposed hybrid interconnect. Overall results show that HFAST is a promising approach for practically addressing the interconnect requirements of future peta-scale systems.

80 citations


"MPACP: An Approach for Automatic Ma..." refers background in this paper

  • ...The message sizes and the number of messages are analyzed to determine whether applications are latency or bandwidth bound....

    [...]

Book ChapterDOI
31 Jan 1998
TL;DR: It is found that the locality metrics are relatively insensitive to system and problem size variations making them robust metrics for characterizing the communication patterns of parallel applications.
Abstract: This paper examines the communication patterns of parallel scientific programs, including some of the NAS benchmarks and the Miami Isopycnic Coordinate Ocean Model (MICOM), that use explicit message-passing. Communication locality, including communication event locality, message destination locality, and message size locality, is proposed and studied in addition to the widely accepted metrics of message size, destination, and generation distributions. We find that the locality metrics are relatively insensitive to system and problem size variations making them robust metrics for characterizing the communication patterns of parallel applications. We observe that the communication patterns of the benchmark programs are consistent with those of the actual application. The results of this study will be useful for understanding parallel applications' communication behavior and for designing more realistic synthetic benchmarks.

73 citations


"MPACP: An Approach for Automatic Ma..." refers background in this paper

  • ...A study of the communication volume is performed in [8], [9] and [10]....

    [...]

Proceedings ArticleDOI
01 Feb 1997
TL;DR: It is shown that it is possible to express the message generation and spatial distribution of an application in terms of commonly used distributions, which can be used in the analysis of ICNs for developing realistic performance models.
Abstract: The interconnection network (ICN) is a vital component of a parallel machine and is often the limiting factor in the performance of several parallel applications. While ICN performance evaluation has been a widely researched topic, there have been very few studies that have used real applications to drive this research. In this paper we develop a framework for characterizing the communication properties of parallel applications. Message generation frequency, spatial distribution of messages and message length are the three attributes that quantify any communication. We develop a methodology to quantify these attributes, in particular the first two attributes. We employ two strategies, namely dynamic and static, in our methodology. In the former, the applications are executed on an execution-driven simulator called SPASM, while in the latter they are executed on a parallel machine, IBM SP2. We gather communication events from these executions and feed them to a 2-D mesh network simulator. The log of the network activity is then analyzed using a statistical analysis package (SAS) to find the message inter-arrival time distribution and spatial distribution via regression analysis. Five shared memory applications and two message passing applications are analyzed to quantify their communication workloads. It is shown that it is possible to express the message generation and spatial distribution of an application in terms of commonly used distributions. These distributions can be used in the analysis of ICNs for developing realistic performance models.

52 citations


"MPACP: An Approach for Automatic Ma..." refers background in this paper

  • ...Most studies concentrate on three aspects of the parallel application communication: temporal, spatial and volume which are firstly defined in [7]....

    [...]