scispace - formally typeset
Search or ask a question
Author

Guillaume Chapuis

Bio: Guillaume Chapuis is an academic researcher from Los Alamos National Laboratory. The author has contributed to research in topics: Discrete event simulation & Shortest path problem. The author has an hindex of 5, co-authored 8 publications receiving 65 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: A new approach for solving the All-Pairs Shortest-Path (APSP) problem for planar graphs that exploits the massive on-chip parallelism available in today's Graphics Processing Units (GPUs) is presented and two new algorithms are described based on this approach.

23 citations

Proceedings ArticleDOI
15 May 2016
TL;DR: This paper presents a sufficiently detailed interconnect model for Cray's Gemini 3-D torus network that has been integrated with an implementation of the Message-Passing Interface that can mimic most of its functions with packet-level accuracy on the target platform.
Abstract: Interconnection network is a critical component of high-performance computing architecture and application co-design For many scientific applications, the increasing communication complexity poses a serious concern as it may hinder the scaling properties of these applications on novel architectures It is apparent that a scalable, efficient, and accurate interconnect model would be essential for performance evaluation studies In this paper, we present an interconnect model for predicting the performance of large-scale applications on high-performance architectures In particular, we present a sufficiently detailed interconnect model for Cray's Gemini 3-D torus network The model has been integrated with an implementation of the Message-Passing Interface (MPI) that can mimic most of its functions with packet-level accuracy on the target platform Extensive experiments show that our integrated model provides good accuracy for predicting the network behavior, while at the same time allowing for good parallel scaling performance

16 citations

Proceedings ArticleDOI
06 Dec 2015
TL;DR: This work introduces La-pdes, a parameterized benchmark application for measuring parallel and serial discrete event simulation (PDES) performance, and demonstrates through instrumentation that La- pdes assumptions regarding distributions are realistic and results of the eight scenarios on the PDES engine Simian.
Abstract: We introduce La-pdes, a parameterized benchmark application for measuring parallel and serial discrete event simulation (PDES) performance. Applying a holistic view of PDES system performance, La-pdes tests the performance factors of (i) the (P)DES engine in terms of event queue efficiency, synchronization mechanism, and load-balancing schemes; (ii) available hardware in terms of handling computationally intensive loads, memory size, cache hierarchy, and clock speed; and (iii) interaction with communication middleware (often MPI) through message buffering. La-pdes consists of seven scenarios for individual performance factors and an agglomerative stress evaluation scenario. The scenarios are implemented through concrete values of input parameters to La-pdes, which include number of entities and events, endtime, inter-send time distributions, computational and event load distributions, memory use distributions, cache-friendliness, and event queue sizes. We demonstrate through instrumentation that La-pdes assumptions regarding distributions are realistic and we present results of the eight scenarios on the PDES engine Simian.

13 citations

Proceedings ArticleDOI
04 Jan 2016
TL;DR: The GPU Module of a Performance Prediction Toolkit developed at Los Alamos National Laboratory is presented, which enables code developers to efficiently test novel algorithmic ideas particularly for large-scale computational physics codes.
Abstract: We present the GPU Module of a Performance Prediction Toolkit developed at Los Alamos National Laboratory, which enables code developers to efficiently test novel algorithmic ideas particularly for large-scale computational physics codes. The GPU Module is a heavily-parameterized model of the GPU hardware that takes as input a sequence of abstracted instructions that the user provides as a representation of the application or can also be read in from the GPU intermediate representation PTX format. These instructions are then executed in a discrete event simulation framework of the entire computing infrastructure that can include multi-GPU and also multi-node components as typically found in high performance computing applications. Our GPU Module aims at a trade-off between the cycle-accuracy of GPU simulators and the fast execution times of analytical models. This trade-off is achieved by simulating at cycle level only a portion of the computations and using this partial runtime to analytically predict the total execution of the modeled application. We present GPU models that we validate against three different benchmark applications that cover the range from bandwidth- to cycle-limited. Our runtime predictions are within an error of 20%. We then predict performance of a next-generation GPU (Nvidia’s Pascal) for the same benchmark applications.

8 citations

Proceedings ArticleDOI
11 Dec 2016
TL;DR: SPHSim is intended to be a computational physicist tool to quickly predict the performance of novel algorithmic ideas on novel exascale-style hardware such as GPUs with a focus on extreme parallelism.
Abstract: We present performance prediction studies and trade-offs of Smoothed Particle Hydrodynamics (SPH) codes that rely on a Hashed OctTree data structure to efficiently respond to neighborhood queries. We use the Performance Prediction Toolkit (PPT) to (i) build a loop-structure model (SPHSim) of an SPH code, where parameters capture the specific physics of the problem and method controls that SPH offers, (ii) validate SPHSim against SPH runs on mid-range clusters, (iii) show strong- and weak-scaling results for SPHSim, which test the underlying discrete simulation engine, and (iv) use SPHSim to run design parameter scans showing trade-offs of interconnect latency and physics computation costs across a wide range of values for physics, method and hardware parameters. SPHSim is intended to be a computational physicist tool to quickly predict the performance of novel algorithmic ideas on novel exascale-style hardware such as GPUs with a focus on extreme parallelism.

6 citations


Cited by
More filters
01 Jan 2016
TL;DR: Thank you very much for downloading using mpi portable parallel programming with the message passing interface for reading a good book with a cup of coffee in the afternoon, instead they are facing with some malicious bugs inside their laptop.
Abstract: Thank you very much for downloading using mpi portable parallel programming with the message passing interface. As you may know, people have search hundreds times for their chosen novels like this using mpi portable parallel programming with the message passing interface, but end up in harmful downloads. Rather than reading a good book with a cup of coffee in the afternoon, instead they are facing with some malicious bugs inside their laptop.

593 citations

BookDOI
01 Jan 2016
TL;DR: It is shown that for every > 0 the authors have fk(Km,n) ≤ (1 + )dk mn k , provided that both m and n are sufficiently large – where dk depends only on k, which coincides with the lower bound fk (G) ≥ dk e(G) k, valid for all bipartite graphs.
Abstract: Let G be a graph. A k-radius sequence for G is a sequence of vertices of G such that for every edge uv of G vertices u and v appear at least once within distance k in the sequence. The length of a shortest k-radius sequence for G is denoted by fk(G). Such sequences appear in a problem related to computing values of some 2-argument functions. Suppose we have a set V of large objects, stored in an external database, and our cache can accommodate at most k + 1 objects from V at one time. If we are given a set E of pairs of objects for which we want to compute the value of some 2-argument function, and assume that our cache is managed in FIFO manner, then fk(G) (where G = (V, E)) is the minimum number of times we need to copy an object from the database to the cache. We give an asymptotically tight estimation on fk(G) for complete bipartite graphs. We show that for every > 0 we have fk(Km,n) ≤ (1 + )dk mn k , provided that both m and n are sufficiently large – where dk depends only on k. This upper bound asymptotically coincides with the lower bound fk(G) ≥ dk e(G) k , valid for all bipartite graphs. We also show that determining fk(G) for an arbitrary graph G is NP-hard for every constant k > 1.

67 citations

Journal ArticleDOI
TL;DR: This paper improves on the roofline model following a quantitative approach and presents a completely automated GPU performance prediction technique that utilizes micro-benchmarking and profiling in a “black box” fashion as no inspection of source/binary code is required.

48 citations

Journal ArticleDOI
TL;DR: This paper presents PPT-GPU, a scalable and accurate simulation framework that enables GPU code developers and architects to predict the performance of applications in a fast, and accurate manner on different GPU architectures.
Abstract: Performance modeling is a challenging problem due to the complexities of hardware architectures. In this paper, we present PPT-GPU, a scalable and accurate simulation framework that enables GPU code developers and architects to predict the performance of applications in a fast, and accurate manner on different GPU architectures. PPT-GPU is part of the open source project, Performance Prediction Toolkit (PPT) developed at the Los Alamos National Laboratory. We extend the old GPU model in PPT that predict the runtimes of computational physics codes to offer better prediction accuracy, for which, we add models for different memory hierarchies found in GPUs and latencies for different instructions. To further show the utility of PPT-GPU, we compare our model against real GPU device(s) and the widely used cycle-accurate simulator, GPGPU-Sim using different workloads from RODINIA and Parboil benchmarks. The results indicate that the predicted performance of PPT-GPU is within a 10 percent error compared to the real device(s). In addition, PPT-GPU is highly scalable, where it is up to 450x faster than GPGPU-Sim with more accurate results.

31 citations

Proceedings ArticleDOI
06 Dec 2015
TL;DR: Simian, a family of open-source Parallel Discrete Event Simulation (PDES) engines written using Lua and Python, achieves high performance comparable with the state-of-the-art PDES engines implemented using compiled languages such as C or C++.
Abstract: We introduce Simian, a family of open-source Parallel Discrete Event Simulation (PDES) engines written using Lua and Python. Simian reaps the benefits of interpreted languages---ease of use, fast development time, enhanced readability and a high degree of portability on different platforms---and, through the optional use of Just-In-Time (JIT) compilation, achieves high performance comparable with the state-of-the-art PDES engines implemented using compiled languages such as C or C++. This paper describes the main design concepts of Simian, and presents a benchmark performance study, comparing four Simian implementations (written in Python and Lua, with and without using JIT) against a traditionally compiled simulator, MiniSSF, written in C++. Our experiments show that Simian in Lua with JIT outperforms MiniSSF, sometimes by a factor of three under high computational workloads.

23 citations