Home
/
Authors
/
Guillaume Chapuis

Author

Guillaume Chapuis

Bio: Guillaume Chapuis is an academic researcher from Los Alamos National Laboratory. The author has contributed to research in topics: Discrete event simulation & Shortest path problem. The author has an hindex of 5, co-authored 8 publications receiving 65 citations.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

All-Pairs Shortest Path algorithms for planar graph for GPU-accelerated clusters

[...]

Hristo N. Djidjev¹, Guillaume Chapuis¹, Rumen Andonov², Sunil Thulasidasan¹, Dominique Lavenier² - Show less +1 more•Institutions (2)

Los Alamos National Laboratory¹, University of Rennes²

01 Nov 2015-Journal of Parallel and Distributed Computing

TL;DR: A new approach for solving the All-Pairs Shortest-Path (APSP) problem for planar graphs that exploits the massive on-chip parallelism available in today's Graphics Processing Units (GPUs) is presented and two new algorithms are described based on this approach.

...read moreread less

23 citations

Proceedings Article•DOI•

An Integrated Interconnection Network Model for Large-Scale Performance Prediction

[...]

Kishwar Ahmed¹, Mohammad Abu Obaida¹, Jason Liu¹, Stephan Eidenbenz², Nandakishore Santhi², Guillaume Chapuis² - Show less +2 more•Institutions (2)

Florida International University¹, Los Alamos National Laboratory²

15 May 2016

TL;DR: This paper presents a sufficiently detailed interconnect model for Cray's Gemini 3-D torus network that has been integrated with an implementation of the Message-Passing Interface that can mimic most of its functions with packet-level accuracy on the target platform.

...read moreread less

Abstract: Interconnection network is a critical component of high-performance computing architecture and application co-design For many scientific applications, the increasing communication complexity poses a serious concern as it may hinder the scaling properties of these applications on novel architectures It is apparent that a scalable, efficient, and accurate interconnect model would be essential for performance evaluation studies In this paper, we present an interconnect model for predicting the performance of large-scale applications on high-performance architectures In particular, we present a sufficiently detailed interconnect model for Cray's Gemini 3-D torus network The model has been integrated with an implementation of the Message-Passing Interface (MPI) that can mimic most of its functions with packet-level accuracy on the target platform Extensive experiments show that our integrated model provides good accuracy for predicting the network behavior, while at the same time allowing for good parallel scaling performance

...read moreread less

16 citations

Proceedings Article•DOI•

Parameterized benchmarking of parallel discrete event simulation systems: communication, computation, and memory

[...]

Eunjung Park¹, Stephan Eidenbenz¹, Nandakishore Santhi¹, Guillaume Chapuis¹, Bradley W. Settlemyer¹ - Show less +1 more•Institutions (1)

Los Alamos National Laboratory¹

06 Dec 2015

TL;DR: This work introduces La-pdes, a parameterized benchmark application for measuring parallel and serial discrete event simulation (PDES) performance, and demonstrates through instrumentation that La- pdes assumptions regarding distributions are realistic and results of the eight scenarios on the PDES engine Simian.

...read moreread less

Abstract: We introduce La-pdes, a parameterized benchmark application for measuring parallel and serial discrete event simulation (PDES) performance. Applying a holistic view of PDES system performance, La-pdes tests the performance factors of (i) the (P)DES engine in terms of event queue efficiency, synchronization mechanism, and load-balancing schemes; (ii) available hardware in terms of handling computationally intensive loads, memory size, cache hierarchy, and clock speed; and (iii) interaction with communication middleware (often MPI) through message buffering. La-pdes consists of seven scenarios for individual performance factors and an agglomerative stress evaluation scenario. The scenarios are implemented through concrete values of input parameters to La-pdes, which include number of entities and events, endtime, inter-send time distributions, computational and event load distributions, memory use distributions, cache-friendliness, and event queue sizes. We demonstrate through instrumentation that La-pdes assumptions regarding distributions are realistic and we present results of the eight scenarios on the PDES engine Simian.

...read moreread less

13 citations

Proceedings Article•DOI•

GPU Performance Prediction Through Parallel Discrete Event Simulation and Common Sense

[...]

Guillaume Chapuis¹, Stephan Eidenbenz¹, Nandakishore Santhi¹•Institutions (1)

Los Alamos National Laboratory¹

04 Jan 2016

TL;DR: The GPU Module of a Performance Prediction Toolkit developed at Los Alamos National Laboratory is presented, which enables code developers to efficiently test novel algorithmic ideas particularly for large-scale computational physics codes.

...read moreread less

Abstract: We present the GPU Module of a Performance Prediction Toolkit developed at Los Alamos National Laboratory, which enables code developers to efficiently test novel algorithmic ideas particularly for large-scale computational physics codes. The GPU Module is a heavily-parameterized model of the GPU hardware that takes as input a sequence of abstracted instructions that the user provides as a representation of the application or can also be read in from the GPU intermediate representation PTX format. These instructions are then executed in a discrete event simulation framework of the entire computing infrastructure that can include multi-GPU and also multi-node components as typically found in high performance computing applications. Our GPU Module aims at a trade-off between the cycle-accuracy of GPU simulators and the fast execution times of analytical models. This trade-off is achieved by simulating at cycle level only a portion of the computations and using this partial runtime to analytically predict the total execution of the modeled application. We present GPU models that we validate against three different benchmark applications that cover the range from bandwidth- to cycle-limited. Our runtime predictions are within an error of 20%. We then predict performance of a next-generation GPU (Nvidia’s Pascal) for the same benchmark applications.

...read moreread less

8 citations

Proceedings Article•DOI•

Predicting performance of smoothed particle hydrodynamics codes at large scales

[...]

Guillaume Chapuis¹, David Nicholaeff¹, Stephan Eidenbenz¹, Robert Pavel¹•Institutions (1)

Los Alamos National Laboratory¹

11 Dec 2016

TL;DR: SPHSim is intended to be a computational physicist tool to quickly predict the performance of novel algorithmic ideas on novel exascale-style hardware such as GPUs with a focus on extreme parallelism.

...read moreread less

Abstract: We present performance prediction studies and trade-offs of Smoothed Particle Hydrodynamics (SPH) codes that rely on a Hashed OctTree data structure to efficiently respond to neighborhood queries. We use the Performance Prediction Toolkit (PPT) to (i) build a loop-structure model (SPHSim) of an SPH code, where parameters capture the specific physics of the problem and method controls that SPH offers, (ii) validate SPHSim against SPH runs on mid-range clusters, (iii) show strong- and weak-scaling results for SPHSim, which test the underlying discrete simulation engine, and (iv) use SPHSim to run design parameter scans showing trade-offs of interconnect latency and physics computation costs across a wide range of values for physics, method and hardware parameters. SPHSim is intended to be a computational physicist tool to quickly predict the performance of novel algorithmic ideas on novel exascale-style hardware such as GPUs with a focus on extreme parallelism.

...read moreread less

6 citations

Cited by

PDF

Open Access

More filters

Using Mpi Portable Parallel Programming With The Message Passing Interface

[...]

Christina Freytag

01 Jan 2016

TL;DR: Thank you very much for downloading using mpi portable parallel programming with the message passing interface for reading a good book with a cup of coffee in the afternoon, instead they are facing with some malicious bugs inside their laptop.

...read moreread less

Abstract: Thank you very much for downloading using mpi portable parallel programming with the message passing interface. As you may know, people have search hundreds times for their chosen novels like this using mpi portable parallel programming with the message passing interface, but end up in harmful downloads. Rather than reading a good book with a cup of coffee in the afternoon, instead they are facing with some malicious bugs inside their laptop.

...read moreread less

593 citations

Book•DOI•

Graph-Theoretic Concepts in Computer Science

[...]

Pinar Heggernes

01 Jan 2016

TL;DR: It is shown that for every > 0 the authors have fk(Km,n) ≤ (1 + )dk mn k , provided that both m and n are sufficiently large – where dk depends only on k, which coincides with the lower bound fk (G) ≥ dk e(G) k, valid for all bipartite graphs.

...read moreread less

Abstract: Let G be a graph. A k-radius sequence for G is a sequence of vertices of G such that for every edge uv of G vertices u and v appear at least once within distance k in the sequence. The length of a shortest k-radius sequence for G is denoted by fk(G). Such sequences appear in a problem related to computing values of some 2-argument functions. Suppose we have a set V of large objects, stored in an external database, and our cache can accommodate at most k + 1 objects from V at one time. If we are given a set E of pairs of objects for which we want to compute the value of some 2-argument function, and assume that our cache is managed in FIFO manner, then fk(G) (where G = (V, E)) is the minimum number of times we need to copy an object from the database to the cache. We give an asymptotically tight estimation on fk(G) for complete bipartite graphs. We show that for every > 0 we have fk(Km,n) ≤ (1 + )dk mn k , provided that both m and n are sufficiently large – where dk depends only on k. This upper bound asymptotically coincides with the lower bound fk(G) ≥ dk e(G) k , valid for all bipartite graphs. We also show that determining fk(G) for an arbitrary graph G is NP-hard for every constant k > 1.

...read moreread less

67 citations

Journal Article•DOI•

A quantitative roofline model for GPU kernel performance estimation using micro-benchmarks and hardware metric profiling

[...]

Elias Konstantinidis¹, Yiannis Cotronis¹•Institutions (1)

National and Kapodistrian University of Athens¹

01 Sep 2017-Journal of Parallel and Distributed Computing

TL;DR: This paper improves on the roofline model following a quantitative approach and presents a completely automated GPU performance prediction technique that utilizes micro-benchmarking and profiling in a “black box” fashion as no inspection of source/binary code is required.

...read moreread less

48 citations

Journal Article•DOI•

PPT-GPU: Scalable GPU Performance Modeling

[...]

Yehia Arafa¹, Abdel-Hameed A. Badawy¹, Gopinath Chennupati², Nandakishore Santhi², Stephan Eidenbenz² - Show less +1 more•Institutions (2)

New Mexico State University¹, Los Alamos National Laboratory²

01 Jan 2019-IEEE Computer Architecture Letters

TL;DR: This paper presents PPT-GPU, a scalable and accurate simulation framework that enables GPU code developers and architects to predict the performance of applications in a fast, and accurate manner on different GPU architectures.

...read moreread less

Abstract: Performance modeling is a challenging problem due to the complexities of hardware architectures. In this paper, we present PPT-GPU, a scalable and accurate simulation framework that enables GPU code developers and architects to predict the performance of applications in a fast, and accurate manner on different GPU architectures. PPT-GPU is part of the open source project, Performance Prediction Toolkit (PPT) developed at the Los Alamos National Laboratory. We extend the old GPU model in PPT that predict the runtimes of computational physics codes to offer better prediction accuracy, for which, we add models for different memory hierarchies found in GPUs and latencies for different instructions. To further show the utility of PPT-GPU, we compare our model against real GPU device(s) and the widely used cycle-accurate simulator, GPGPU-Sim using different workloads from RODINIA and Parboil benchmarks. The results indicate that the predicted performance of PPT-GPU is within a 10 percent error compared to the real device(s). In addition, PPT-GPU is highly scalable, where it is up to 450x faster than GPGPU-Sim with more accurate results.

...read moreread less

31 citations

Proceedings Article•DOI•

The simian concept: parallel discrete event simulation with interpreted languages and just-in-time compilation

[...]

Nandakishore Santhi¹, Stephan Eidenbenz¹, Jason Liu²•Institutions (2)

Los Alamos National Laboratory¹, Florida International University²

06 Dec 2015

TL;DR: Simian, a family of open-source Parallel Discrete Event Simulation (PDES) engines written using Lua and Python, achieves high performance comparable with the state-of-the-art PDES engines implemented using compiled languages such as C or C++.

...read moreread less

Abstract: We introduce Simian, a family of open-source Parallel Discrete Event Simulation (PDES) engines written using Lua and Python. Simian reaps the benefits of interpreted languages---ease of use, fast development time, enhanced readability and a high degree of portability on different platforms---and, through the optional use of Just-In-Time (JIT) compilation, achieves high performance comparable with the state-of-the-art PDES engines implemented using compiled languages such as C or C++. This paper describes the main design concepts of Simian, and presents a benchmark performance study, comparing four Simian implementations (written in Python and Lua, with and without using JIT) against a traditionally compiled simulator, MiniSSF, written in C++. Our experiments show that Simian in Lua with JIT outperforms MiniSSF, sometimes by a factor of three under high computational workloads.

...read moreread less

23 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13

Collapse