Home
/
Authors
/
Pierre-André Wacrenier

Author

Pierre-André Wacrenier

Other affiliations: L'Abri, French Institute for Research in Computer Science and Automation

Bio: Pierre-André Wacrenier is an academic researcher from University of Bordeaux. The author has contributed to research in topics: Scheduling (computing) & Thread (computing). The author has an hindex of 5, co-authored 7 publications receiving 1208 citations. Previous affiliations of Pierre-André Wacrenier include L'Abri & French Institute for Research in Computer Science and Automation.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

StarPU: a unified platform for task scheduling on heterogeneous multicore architectures

[...]

Cédric Augonnet¹, Samuel Thibault¹, Raymond Namyst¹, Pierre-André Wacrenier¹•Institutions (1)

University of Bordeaux¹

01 Feb 2011

TL;DR: StarPU as mentioned in this paper is a runtime system that provides a high-level unified execution model for numerical kernel designers with a convenient way to generate parallel tasks over heterogeneous hardware and easily develop and tune powerful scheduling algorithms.

...read moreread less

Abstract: In the field of HPC, the current hardware trend is to design multiprocessor architectures featuring heterogeneous technologies such as specialized coprocessors (e.g. Cell/BE) or data-parallel accelerators (e.g. GPUs). Approaching the theoretical performance of these architectures is a complex issue. Indeed, substantial efforts have already been devoted to efficiently offload parts of the computations. However, designing an execution model that unifies all computing units and associated embedded memory remains a main challenge. We therefore designed StarPU, an original runtime system providing a high-level, unified execution model tightly coupled with an expressive data management library. The main goal of StarPU is to provide numerical kernel designers with a convenient way to generate parallel tasks over heterogeneous hardware on the one hand, and easily develop and tune powerful scheduling algorithms on the other hand. We have developed several strategies that can be selected seamlessly at run-time, and we have analyzed their efficiency on several algorithms running simultaneously over multiple cores and a GPU. In addition to substantial improvements regarding execution times, we have obtained consistent superlinear parallelism by actually exploiting the heterogeneous nature of the machine. We eventually show that our dynamic approach competes with the highly optimized MAGMA library and overcomes the limitations of the corresponding static scheduling in a portable way. Copyright © 2010 John Wiley & Sons, Ltd.

...read moreread less

1,116 citations

Book Chapter•DOI•

StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures

[...]

Cédric Augonnet¹, Samuel Thibault¹, Raymond Namyst¹, Pierre-André Wacrenier¹•Institutions (1)

University of Bordeaux¹

01 Jan 2009-Lecture Notes in Computer Science

TL;DR: StarPU as discussed by the authors is a runtime system that provides a high-level unified execution model for numerical kernel designers with a convenient way to generate parallel tasks over heterogeneous hardware and easily develop and tune powerful scheduling algorithms.

...read moreread less

82 citations

Posted Content•

Building Portable Thread Schedulers for Hierarchical Multiprocessors: the BubbleSched Framework

[...]

Samuel Thibault¹, Raymond Namyst¹, Pierre-André Wacrenier¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

14 Jun 2007-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this article, the authors present a framework that allows scheduling experts to implement and experiment with customized thread schedulers, and provide a powerful API for dynamically distributing bubbles among the machine in a highlevel, portable, and efficient way.

...read moreread less

Abstract: Exploiting full computational power of current more and more hierarchical multiprocessor machines requires a very careful distribution of threads and data among the underlying non-uniform architecture. Unfortunately, most operating systems only provide a poor scheduling API that does not allow applications to transmit valuable scheduling hints to the system. In a previous paper, we showed that using a bubble-based thread scheduler can significantly improve applications' performance in a portable way. However, since multithreaded applications have various scheduling requirements, there is no universal scheduler that could meet all these needs. In this paper, we present a framework that allows scheduling experts to implement and experiment with customized thread schedulers. It provides a powerful API for dynamically distributing bubbles among the machine in a high-level, portable, and efficient way. Several examples show how experts can then develop, debug and tune their own portable bubble schedulers.

...read moreread less

36 citations

Book Chapter•DOI•

Building portable thread schedulers for hierarchical multiprocessors: the bubblesched framework

[...]

Samuel Thibault¹, Raymond Namyst¹, Pierre-André Wacrenier¹•Institutions (1)

L'Abri¹

28 Aug 2007

TL;DR: This paper presents a framework that allows scheduling experts to implement and experiment with customized thread schedulers and provides a powerful API for dynamically distributing bubbles among the machine in a high-level, portable, and efficient way.

...read moreread less

Abstract: Exploiting full computational power of current more and more hierarchical multiprocessor machines requires a very careful distribution of threads and data among the underlying non-uniform architecture. Unfortunately, most operating systems only provide a poor scheduling API that does not allow applications to transmit valuable scheduling hints to the system. In a previous paper [1], we showed that using a bubble-based thread scheduler can significantly improve applications' performance in a portable way. However, since multithreaded applications have various scheduling requirements, there is no universal scheduler that could meet all these needs. In this paper, we present a framework that allows scheduling experts to implement and experiment with customized thread schedulers. It provides a powerful API for dynamically distributing bubbles among the machine in a high-level, portable, and efficient way. Several examples show how experts can then develop, debug and tune their own portable bubble schedulers.

...read moreread less

35 citations

Book Chapter•DOI•

An Efficient OpenMP Runtime System for Hierarchical Architectures

[...]

Samuel Thibault¹, François Broquedis¹, Brice Goglin¹, Raymond Namyst¹, Pierre-André Wacrenier¹ - Show less +1 more•Institutions (1)

L'Abri¹

03 Jun 2007

TL;DR: BubbleSched as discussed by the authors uses the BubbleSched platform as a threading backend for the GOMP OpenMP compiler, which can easily transpose affinities of thread teams into scheduling hints using abstractions called bubbles.

...read moreread less

Abstract: Exploiting the full computational power of always deeper hierarchical multiprocessor machines requires a very careful distribution of threads and data among the underlying non-uniform architecture. The emergence of multi-core chips and NUMA machines makes it important to minimize the number of remote memory accesses, to favor cache affinities, and to guarantee fast completion of synchronization steps. By using the BubbleSched platform as a threading backend for the GOMP OpenMP compiler, we are able to easily transpose affinities of thread teams into scheduling hints using abstractions called bubbles. We then propose a scheduling strategy suited to nested OpenMP parallelism. The resulting preliminary performance evaluations show an important improvement of the speedup on a typical NAS OpenMP benchmark application.

...read moreread less

17 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

StarPU: a unified platform for task scheduling on heterogeneous multicore architectures

[...]

Cédric Augonnet¹, Samuel Thibault¹, Raymond Namyst¹, Pierre-André Wacrenier¹•Institutions (1)

University of Bordeaux¹

01 Feb 2011

...read moreread less

1,116 citations

Journal Article•DOI•

Kokkos: Enabling manycore performance portability through polymorphic memory access patterns

[...]

H. Carter Edwards¹, Christian Robert Trott¹, Daniel Sunderland¹•Institutions (1)

Sandia National Laboratories¹

01 Dec 2014-Journal of Parallel and Distributed Computing

TL;DR: Kokkos’ abstractions are described, its application programmer interface (API) is summarized, performance results for unit-test kernels and mini-applications are presented, and an incremental strategy for migrating legacy C++ codes to Kokkos is outlined.

...read moreread less

682 citations

Journal Article•DOI•

A Survey of CPU-GPU Heterogeneous Computing Techniques

[...]

Sparsh Mittal¹, Jeffrey S. Vetter¹•Institutions (1)

Oak Ridge National Laboratory¹

21 Jul 2015-ACM Computing Surveys

TL;DR: This article surveys Heterogeneous Computing Techniques (HCTs) such as workload partitioning that enable utilizing both CPUs and GPUs to improve performance and/or energy efficiency and reviews both discrete and fused CPU-GPU systems.

...read moreread less

Abstract: As both CPUs and GPUs become employed in a wide range of applications, it has been acknowledged that both of these Processing Units (PUs) have their unique features and strengths and hence, CPU-GPU collaboration is inevitable to achieve high-performance computing. This has motivated a significant amount of research on heterogeneous computing techniques, along with the design of CPU-GPU fused chips and petascale heterogeneous supercomputers. In this article, we survey Heterogeneous Computing Techniques (HCTs) such as workload partitioning that enable utilizing both CPUs and GPUs to improve performance and/or energy efficiency. We review heterogeneous computing approaches at runtime, algorithm, programming, compiler, and application levels. Further, we review both discrete and fused CPU-GPU systems and discuss benchmark suites designed for evaluating Heterogeneous Computing Systems (HCSs). We believe that this article will provide insights into the workings and scope of applications of HCTs to researchers and motivate them to further harness the computational powers of CPUs and GPUs to achieve the goal of exascale performance.

...read moreread less

414 citations

Proceedings Article•DOI•

Phoenix rebirth: Scalable MapReduce on a large-scale shared-memory system

[...]

Richard M. Yoo¹, Anthony Romano¹, Christos Kozyrakis¹•Institutions (1)

Stanford University¹

04 Oct 2009

TL;DR: This work optimizes Phoenix, a MapReduce runtime for shared-memory multi-cores and multiprocessors, on a quad-chip, 32-core, 256-thread UltraSPARC T2+ system with NUMA characteristics and shows how a multi-layered approach leads to significant speedup improvements with 256 threads.

...read moreread less

Abstract: Dynamic runtimes can simplify parallel programming by automatically managing concurrency and locality without further burdening the programmer. Nevertheless, implementing such runtime systems for large-scale, shared-memory systems can be challenging. This work optimizes Phoenix, a MapReduce runtime for shared-memory multi-cores and multiprocessors, on a quad-chip, 32-core, 256-thread UltraSPARC T2+ system with NUMA characteristics. We show how a multi-layered approach that comprises optimizations on the algorithm, implementation, and OS interaction leads to significant speedup improvements with 256 threads (average of 2.5× higher speedup, maximum of 19×). We also identify the roadblocks that limit the scalability of parallel runtimes on shared-memory systems, which are inherently tied to the OS scalability on large-scale systems.

...read moreread less

278 citations

Journal Article•DOI•

DAGuE: A generic distributed DAG engine for High Performance Computing

[...]

George Bosilca¹, Aurelien Bouteiller¹, Anthony Danalis¹, Thomas Herault¹, Pierre Lemarinier², Jack Dongarra³ - Show less +2 more•Institutions (3)

University of Tennessee¹, University of Rennes², Oak Ridge National Laboratory³

01 Jan 2012

TL;DR: DAGuE is presented, a generic framework for architecture aware scheduling and management of micro-tasks on distributed many-core heterogeneous architectures and uses a dynamic, fully-distributed scheduler based on cache awareness, data-locality and task priority.

...read moreread less

Abstract: The frenetic development of the current architectures places a strain on the current state-of-the-art programming environments. Harnessing the full potential of such architectures is a tremendous task for the whole scientific computing community. We present DAGuE a generic framework for architecture aware scheduling and management of micro-tasks on distributed many-core heterogeneous architectures. Applications we consider can be expressed as a Direct Acyclic Graph of tasks with labeled edges designating data dependencies. DAGs are represented in a compact, problem-size independent format that can be queried on-demand to discover data dependencies, in a totally distributed fashion. DAGuE assigns computation threads to the cores, overlaps communications and computations and uses a dynamic, fully-distributed scheduler based on cache awareness, data-locality and task priority. We demonstrate the efficiency of our approach, using several micro-benchmarks to analyze the performance of different components of the framework, and a linear algebra factorization as a use case.

...read moreread less

251 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse