Home
/
Authors
/
Aurelien Bouteiller

Author

Aurelien Bouteiller

Other affiliations: University of Paris-Sud, French Institute for Research in Computer Science and Automation, École normale supérieure de Lyon ...read more

Bio: Aurelien Bouteiller is an academic researcher from University of Tennessee. The author has contributed to research in topics: Fault tolerance & Scalability. The author has an hindex of 29, co-authored 87 publications receiving 3159 citations. Previous affiliations of Aurelien Bouteiller include University of Paris-Sud & French Institute for Research in Computer Science and Automation.

Papers published on a yearly basis

2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

MPICH-V: Toward a Scalable Fault Tolerant MPI for Volatile Nodes

[...]

George Bosilca¹, Aurelien Bouteiller¹, Franck Cappello¹, Samir Djilali¹, Gilles Fedak¹, Cécile Germain¹, Thomas Herault¹, Pierre Lemarinier¹, Oleg Lodygensky¹, Frédéric Magniette¹, Vincent Neri¹, Anton Selikhov¹ - Show less +8 more•Institutions (1)

University of Paris¹

16 Nov 2002

TL;DR: This work presents MPICH-V, an automatic Volatility tolerant MPI environment based on uncoordinated checkpoint/roll-back and distributed message logging, and presents a detailed performance evaluation of every component and its global performance for non-trivial parallel applications.

...read moreread less

Abstract: Global Computing platforms, large scale clusters and future TeraGRID systems gather thousands of nodes for computing parallel scientific applications. At this scale, node failures or disconnections are frequent events. This Volatility reduces the MTBF of the whole system in the range of hours or minutes. We present MPICH-V, an automatic Volatility tolerant MPI environment based on uncoordinated checkpoint/roll-back and distributed message logging. MPICH-V architecture relies on Channel Memories, Checkpoint servers and theoretically proven protocols to execute existing or new, SPMD and Master-Worker MPI applications on volatile nodes. To evaluate its capabilities, we run MPICH-V within a framework for which the number of nodes, Channels Memories and Checkpoint Servers can be completely configured as well as the node Volatility. We present a detailed performance evaluation of every component of MPICH-V and its global performance for non-trivial parallel applications. Experimental results demonstrate good scalability and high tolerance to node volatility.

...read moreread less

323 citations

Journal Article•DOI•

DAGuE: A generic distributed DAG engine for High Performance Computing

[...]

George Bosilca¹, Aurelien Bouteiller¹, Anthony Danalis¹, Thomas Herault¹, Pierre Lemarinier², Jack Dongarra³ - Show less +2 more•Institutions (3)

University of Tennessee¹, University of Rennes², Oak Ridge National Laboratory³

01 Jan 2012

TL;DR: DAGuE is presented, a generic framework for architecture aware scheduling and management of micro-tasks on distributed many-core heterogeneous architectures and uses a dynamic, fully-distributed scheduler based on cache awareness, data-locality and task priority.

...read moreread less

Abstract: The frenetic development of the current architectures places a strain on the current state-of-the-art programming environments. Harnessing the full potential of such architectures is a tremendous task for the whole scientific computing community. We present DAGuE a generic framework for architecture aware scheduling and management of micro-tasks on distributed many-core heterogeneous architectures. Applications we consider can be expressed as a Direct Acyclic Graph of tasks with labeled edges designating data dependencies. DAGs are represented in a compact, problem-size independent format that can be queried on-demand to discover data dependencies, in a totally distributed fashion. DAGuE assigns computation threads to the cores, overlaps communications and computations and uses a dynamic, fully-distributed scheduler based on cache awareness, data-locality and task priority. We demonstrate the efficiency of our approach, using several micro-benchmarks to analyze the performance of different components of the framework, and a linear algebra factorization as a use case.

...read moreread less

251 citations

Journal Article•DOI•

PaRSEC: Exploiting Heterogeneity to Enhance Scalability

[...]

George Bosilca¹, Aurelien Bouteiller¹, Anthony Danalis¹, Mathieu Faverge, Thomas Herault¹, Jack Dongarra¹ - Show less +2 more•Institutions (1)

University of Tennessee¹

01 Nov 2013

TL;DR: In this article, the authors present an approach based on task parallelism that reveals the application's parallelism by expressing its algorithm as a task flow, which allows the algorithm to be decoupled from the data distribution and the underlying hardware.

...read moreread less

Abstract: New high-performance computing system designs with steeply escalating processor and core counts, burgeoning heterogeneity and accelerators, and increasingly unpredictable memory access times call for one or more dramatically new programming paradigms. These new approaches must react and adapt quickly to unexpected contentions and delays, and they must provide the execution environment with sufficient intelligence and flexibility to rearrange the execution to improve resource utilization. The authors present an approach based on task parallelism that reveals the application's parallelism by expressing its algorithm as a task flow. This strategy allows the algorithm to be decoupled from the data distribution and the underlying hardware, since the algorithm is entirely expressed as flows of data. This kind of layering provides a clear separation of concerns among architecture, algorithm, and data distribution. Developers benefit from this separation because they can focus solely on the algorithmic level without the constraints involved with programming for current and future hardware trends.

...read moreread less

247 citations

Proceedings Article•DOI•

MPICH-V2: a Fault Tolerant MPI for Volatile Nodes based on Pessimistic Sender Based Message Logging

[...]

Aurelien Bouteiller¹, Franck Cappello¹, Thomas Herault¹, Géraud Krawezik¹, Pierre Lemarinier¹, Frédéric Magniette¹ - Show less +2 more•Institutions (1)

University of Paris¹

15 Nov 2003

TL;DR: Experimental results demonstrate that MPICH-V2 provides performance close toMPICH-P4 for applications using large messages while reducing dramatically the number of reliable nodes compared to MPich-V1.

...read moreread less

Abstract: Execution of MPI applications on clusters and Grid deployments suffering from node and network failures motivates the use of fault tolerant MPI implementations. We present MPICH-V2 (the second protocol of MPICH-V project), an automatic fault tolerant MPI implementation using an innovative protocol that removes the most limiting factor of the pessimistic message logging approach: reliable logging of in transit messages. MPICH-V2 relies on uncoordinated checkpointing, sender based message logging and remote reliable logging of message logical clocks. This paper presents the architecture of MPICH-V2, its theoretical foundation and the performance of the implementation. We compare MPICH-V2 to MPICH-V1 and MPICH-P4 evaluating a) its point-to-point performance, b) the performance for the NAS benchmarks, c) the application performance when many faults occur during the execution. Experimental results demonstrate that MPICH-V2 provides performance close to MPICH-P4 for applications using large messages while reducing dramatically the number of reliable nodes compared to MPICH-V1.

...read moreread less

204 citations

Journal Article•DOI•

Post-failure recovery of MPI communication capability: Design and rationale

[...]

Wesley Bland¹, Aurelien Bouteiller¹, Thomas Herault¹, George Bosilca¹, Jack Dongarra¹ - Show less +1 more•Institutions (1)

University of Tennessee¹

01 Aug 2013

TL;DR: This paper presents a set of extensions to MPI that allow communication capabilities to be restored, while maintaining the extreme level of performance to which MPI users have become accustomed.

...read moreread less

Abstract: As supercomputers are entering an era of massive parallelism where the frequency of faults is increasing, the MPI Standard remains distressingly vague on the consequence of failures on MPI communications. Advanced fault-tolerance techniques have the potential to prevent full-scale application restart and therefore lower the cost incurred for each failure, but they demand from MPI the capability to detect failures and resume communications afterward. In this paper, we present a set of extensions to MPI that allow communication capabilities to be restored, while maintaining the extreme level of performance to which MPI users have become accustomed. The motivation behind the design choices are weighted against alternatives, a task that requires simultaneously considering MPI from the viewpoint of both the user and the implementor. The usability of the interfaces for expressing advanced recovery techniques is then discussed, including the difficult issue of enabling separate software layers to coordinate their recovery.

...read moreread less

188 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18

Collapse

Cited by

PDF

Open Access

More filters

Fast parallel algorithms for short-range molecular dynamics

[...]

Steven J. Plimpton¹•Institutions (1)

Sandia National Laboratories¹

01 May 1993

TL;DR: Comparing the results to the fastest reported vectorized Cray Y-MP and C90 algorithm shows that the current generation of parallel machines is competitive with conventional vector supercomputers even for small problems.

...read moreread less

Abstract: Three parallel algorithms for classical molecular dynamics are presented. The first assigns each processor a fixed subset of atoms; the second assigns each a fixed subset of inter-atomic forces to compute; the third assigns each a fixed spatial region. The algorithms are suitable for molecular dynamics models which can be difficult to parallelize efficiently—those with short-range forces where the neighbors of each atom change rapidly. They can be implemented on any distributed-memory parallel machine which allows for message-passing of data between independently executing processors. The algorithms are tested on a standard Lennard-Jones benchmark problem for system sizes ranging from 500 to 100,000,000 atoms on several parallel supercomputers--the nCUBE 2, Intel iPSC/860 and Paragon, and Cray T3D. Comparing the results to the fastest reported vectorized Cray Y-MP and C90 algorithm shows that the current generation of parallel machines is competitive with conventional vector supercomputers even for small problems. For large problems, the spatial algorithm achieves parallel efficiencies of 90% and a 1840-node Intel Paragon performs up to 165 faster than a single Cray C9O processor. Trade-offs between the three algorithms and guidelines for adapting them to more complex molecular dynamics simulations are also discussed.

...read moreread less

29,323 citations

Proceedings Article•DOI•

Mesos: a platform for fine-grained resource sharing in the data center

[...]

Benjamin Hindman¹, Andy Konwinski¹, Matei Zaharia¹, Ali Ghodsi¹, Anthony D. Joseph¹, Randy H. Katz¹, Scott Shenker¹, Ion Stoica¹ - Show less +4 more•Institutions (1)

University of California, Berkeley¹

30 Mar 2011

TL;DR: The results show that Mesos can achieve near-optimal data locality when sharing the cluster among diverse frameworks, can scale to 50,000 (emulated) nodes, and is resilient to failures.

...read moreread less

Abstract: We present Mesos, a platform for sharing commodity clusters between multiple diverse cluster computing frameworks, such as Hadoop and MPI. Sharing improves cluster utilization and avoids per-framework data replication. Mesos shares resources in a fine-grained manner, allowing frameworks to achieve data locality by taking turns reading data stored on each machine. To support the sophisticated schedulers of today's frameworks, Mesos introduces a distributed two-level scheduling mechanism called resource offers. Mesos decides how many resources to offer each framework, while frameworks decide which resources to accept and which computations to run on them. Our results show that Mesos can achieve near-optimal data locality when sharing the cluster among diverse frameworks, can scale to 50,000 (emulated) nodes, and is resilient to failures.

...read moreread less

1,786 citations

Book Chapter•DOI•

Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation

[...]

Edgar Gabriel¹, Graham E. Fagg¹, George Bosilca¹, Thara Angskun¹, Jack Dongarra¹, Jeffrey M. Squyres², Vishal Sahay², Prabhanjan Kambadur², Brian Barrett², Andrew Lumsdaine², Ralph H. Castain³, David Daniel³, Richard L. Graham³, Timothy S. Woodall³ - Show less +10 more•Institutions (3)

University of Tennessee¹, Indiana University², Los Alamos National Laboratory³

19 Sep 2004-Lecture Notes in Computer Science

TL;DR: Open MPI provides a unique combination of novel features previously unavailable in an open-source, production-quality implementation of MPI, which provides both a stable platform for third-party research as well as enabling the run-time composition of independent software add-ons.

...read moreread less

Abstract: A large number of MPI implementations are currently available, each of which emphasize different aspects of high-performance computing or are intended to solve a specific research problem. The result is a myriad of incompatible MPI implementations, all of which require separate installation, and the combination of which present significant logistical challenges for end users. Building upon prior research, and influenced by experience gained from the code bases of the LAM/MPI, LA-MPI, and FT-MPI projects, Open MPI is an all-new, production-quality MPI-2 implementation that is fundamentally centered around component concepts. Open MPI provides a unique combination of novel features previously unavailable in an open-source, production-quality implementation of MPI. Its component architecture provides both a stable platform for third-party research as well as enabling the run-time composition of independent software add-ons. This paper presents a high-level overview the goals, design, and implementation of Open MPI.

...read moreread less

1,603 citations

Proceedings Article•DOI•

Dask: Parallel Computation with Blocked algorithms and Task Scheduling

[...]

Matthew Rocklin

01 Jan 2015

TL;DR: This work couple blocked algorithms with dynamic and memory aware task scheduling to achieve a parallel and out-of-core NumPy clone and shows how this extends the effective scale of modern hardware to larger datasets.

...read moreread less

Abstract: Dask enables parallel and out-of-core computation. We couple blocked algorithms with dynamic and memory aware task scheduling to achieve a parallel and out-of-core NumPy clone. We show how this extends the effective scale of modern hardware to larger datasets and discuss how these ideas can be more broadly applied to other parallel collections.

...read moreread less

496 citations

Journal Article•DOI•

Addressing failures in exascale computing

[...]

Marc Snir¹, Robert W. Wisniewski², Jacob A. Abraham³, Sarita V. Adve⁴, Saurabh Bagchi⁵, Pavan Balaji¹, James Belak⁶, Pradip Bose⁷, Franck Cappello¹, Bill Carlson, Andrew A. Chien⁸, Paul W. Coteus⁷, Nathan DeBardeleben⁹, Pedro C. Diniz¹⁰, Christian Engelmann¹¹, Mattan Erez³, Saverio Fazzari¹², Al Geist¹¹, Rinku Gupta¹, Fred Johnson¹³, Sriram Krishnamoorthy¹⁴, Sven Leyffer¹, Dean A. Liberty¹⁵, Subhasish Mitra¹⁶, Todd Munson¹, Robert Schreiber¹⁷, Jon Stearley¹⁸, Eric Van Hensbergen - Show less +24 more•Institutions (18)

Argonne National Laboratory¹, Intel², University of Texas at Austin³, University of Illinois at Urbana–Champaign⁴, Purdue University⁵, Lawrence Livermore National Laboratory⁶, IBM⁷, University of Chicago⁸, Los Alamos National Laboratory⁹, Information Sciences Institute¹⁰, Oak Ridge National Laboratory¹¹, Booz Allen Hamilton¹², Science Applications International Corporation¹³, Pacific Northwest National Laboratory¹⁴, Advanced Micro Devices¹⁵, Stanford University¹⁶, Hewlett-Packard¹⁷, Sandia National Laboratories¹⁸

01 May 2014

TL;DR: This report presents a report produced by a workshop on ‘Addressing failures in exascale computing’ held in Park City, Utah, 4–11 August 2012, which summarizes and builds on discussions on resilience.

...read moreread less

Abstract: We present here a report produced by a workshop on 'Addressing failures in exascale computing' held in Park City, Utah, 4-11 August 2012. The charter of this workshop was to establish a common taxonomy about resilience across all the levels in a computing system, discuss existing knowledge on resilience across the various hardware and software layers of an exascale system, and build on those results, examining potential solutions from both a hardware and software perspective and focusing on a combined approach. The workshop brought together participants with expertise in applications, system software, and hardware; they came from industry, government, and academia, and their interests ranged from theory to implementation. The combination allowed broad and comprehensive discussions and led to this document, which summarizes and builds on those discussions.

...read moreread less

406 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse