Home
/
Authors
/
Rudolf Eigenmann

Author

Rudolf Eigenmann

Other affiliations: Lawrence Livermore National Laboratory, Purdue University, University of Virginia ...read more

Bio: Rudolf Eigenmann is an academic researcher from University of Delaware. The author has contributed to research in topics: Compiler & Automatic parallelization. The author has an hindex of 40, co-authored 183 publications receiving 5914 citations. Previous affiliations of Rudolf Eigenmann include Lawrence Livermore National Laboratory & Purdue University.

Papers published on a yearly basis

2022
2021
2020
2019
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

OpenMP to GPGPU: a compiler framework for automatic translation and optimization

[...]

Seyong Lee¹, Seung-Jai Min¹, Rudolf Eigenmann¹•Institutions (1)

Purdue University¹

14 Feb 2009

TL;DR: This paper presents a compiler framework for automatic source-to-source translation of standard OpenMP applications into CUDA-based GPGPU applications, and identifies several key transformation techniques, which enable efficient GPU global memory access, to achieve high performance.

...read moreread less

Abstract: GPGPUs have recently emerged as powerful vehicles for general-purpose high-performance computing. Although a new Compute Unified Device Architecture (CUDA) programming model from NVIDIA offers improved programmability for general computing, programming GPGPUs is still complex and error-prone. This paper presents a compiler framework for automatic source-to-source translation of standard OpenMP applications into CUDA-based GPGPU applications. The goal of this translation is to further improve programmability and make existing OpenMP applications amenable to execution on GPGPUs. In this paper, we have identified several key transformation techniques, which enable efficient GPU global memory access, to achieve high performance. Experimental results from two important kernels (JACOBI and SPMUL) and two NAS OpenMP Parallel Benchmarks (EP and CG) show that the described translator and compile-time optimizations work well on both regular and irregular applications, leading to performance improvements of up to 50X over the unoptimized translation (up to 328X over serial).

...read moreread less

476 citations

Journal Article•DOI•

Parallel programming with Polaris

[...]

William Blume¹, Ramón Doallo¹, Ramón Doallo², Ramón Doallo³, Rudolf Eigenmann⁴, Rudolf Eigenmann¹, John Grout¹, Jay Hoeflinger¹, Thomas R. Lawrence¹ - Show less +5 more•Institutions (4)

University of Illinois at Urbana–Champaign¹, University of Santiago de Compostela², University of A Coruña³, Purdue University⁴

01 Dec 1996-IEEE Computer

TL;DR: Polaris, an experimental translator of conventional Fortran programs that target machines such as the Cray T3D, is discussed, which would liberate programmers from the complexities of explicit, machine oriented parallel programming.

...read moreread less

Abstract: Parallel programming tools are limited, making effective parallel programming difficult and cumbersome. Compilers that translate conventional sequential programs into parallel form would liberate programmers from the complexities of explicit, machine oriented parallel programming. The paper discusses parallel programming with Polaris, an experimental translator of conventional Fortran programs that target machines such as the Cray T3D.

...read moreread less

350 citations

Journal Article•DOI•

Automatic program parallelization

[...]

Utpal Banerjee¹, Rudolf Eigenmann², Alexandru Nicolau³, David Padua•Institutions (3)

Intel¹, University of Illinois at Urbana–Champaign², University of California, Irvine³

01 Feb 1993

TL;DR: An overview of automatic program parallelization techniques is presented, which covers dependence analysis techniques, followed by a discussion of program transformations, including straight-line code parallelization, do-loop transformations, and parallelization of recursive routines.

...read moreread less

Abstract: An overview of automatic program parallelization techniques is presented. It covers dependence analysis techniques, followed by a discussion of program transformations, including straight-line code parallelization, do-loop transformations, and parallelization of recursive routines. Several experimental studies on the effectiveness of parallelizing compilers are surveyed. >

...read moreread less

313 citations

Proceedings Article•DOI•

OpenMPC: Extended OpenMP Programming and Tuning for GPUs

[...]

Seyong Lee¹, Rudolf Eigenmann¹•Institutions (1)

Purdue University¹

13 Nov 2010

TL;DR: This paper has developed a fully automatic compilation and user-assisted tuning system supporting OpenMPC, which builds on OpenMP to provide an abstraction of the complex CUDA programming model and offers high-level controls of the involved parameters and optimizations.

...read moreread less

Abstract: General-Purpose Graphics Processing Units (GPGPUs) are promising parallel platforms for high performance computing. The CUDA (Compute Unified Device Architecture) programming model provides improved programmability for general computing on GPGPUs. However, its unique execution model and memory model still pose significant challenges for developers of efficient GPGPU code. This paper proposes a new programming interface, called OpenMPC, which builds on OpenMP to provide an abstraction of the complex CUDA programming model and offers high-level controls of the involved parameters and optimizations. We have developed a fully automatic compilation and user-assisted tuning system supporting OpenMPC. In addition to a range of compiler transformations and optimizations, the system includes tuning capabilities for generating, pruning, and navigating the search space of compilation variants. Our results demonstrate that OpenMPC offers both programmability and tunability. Our system achieves 88% of the performance of the hand-coded CUDA programs.

...read moreread less

250 citations

Book Chapter•DOI•

SPEComp: A New Benchmark Suite for Measuring Parallel Computer Performance

[...]

Vishal Aslot¹, Max Domeika², Rudolf Eigenmann¹, Greg Gaertner, Wesley Jones, Bodo K. Parady³ - Show less +2 more•Institutions (3)

Purdue University¹, Intel², Sun Microsystems³

30 Jul 2001

TL;DR: An overview of a new benchmark suite for parallel computers, SPEComp, which targets mid-size parallel servers and includes a number of science/engineering and data processing applications, is presented.

...read moreread less

Abstract: We present a new benchmark suite for parallel computers. SPEComp targets mid-size parallel servers. It includes a number of science/engineering and data processing applications. Parallelism is expressed in the OpenMP API. The suite includes two data sets, Medium and Large, of approximately 1.6 and 4 GB in size. Our overview also describes the organization developing SPEComp, issues in creating OpenMP parallel benchmarks, the benchmarking methodology underlying SPEComp, and basic performance characteristics.

...read moreread less

239 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Pin: building customized program analysis tools with dynamic instrumentation

[...]

Chi-Keung Luk¹, Robert Cohn¹, Robert Muth¹, Harish Patil¹, Artur Klauser¹, Geoff Lowney¹, Steven Wallace¹, Vijay Janapa Reddi², Kim Hazelwood¹ - Show less +5 more•Institutions (2)

Intel¹, University of Colorado Boulder²

12 Jun 2005

TL;DR: The goals are to provide easy-to-use, portable, transparent, and efficient instrumentation, and to illustrate Pin's versatility, two Pintools in daily use to analyze production software are described.

...read moreread less

Abstract: Robust and powerful software instrumentation tools are essential for program analysis tasks such as profiling, performance evaluation, and bug detection. To meet this need, we have developed a new instrumentation system called Pin. Our goals are to provide easy-to-use, portable, transparent, and efficient instrumentation. Instrumentation tools (called Pintools) are written in C/C++ using Pin's rich API. Pin follows the model of ATOM, allowing the tool writer to analyze an application at the instruction level without the need for detailed knowledge of the underlying instruction set. The API is designed to be architecture independent whenever possible, making Pintools source compatible across different architectures. However, a Pintool can access architecture-specific details when necessary. Instrumentation with Pin is mostly transparent as the application and Pintool observe the application's original, uninstrumented behavior. Pin uses dynamic compilation to instrument executables while they are running. For efficiency, Pin uses several techniques, including inlining, register re-allocation, liveness analysis, and instruction scheduling to optimize instrumentation. This fully automated approach delivers significantly better instrumentation performance than similar tools. For example, Pin is 3.3x faster than Valgrind and 2x faster than DynamoRIO for basic-block counting. To illustrate Pin's versatility, we describe two Pintools in daily use to analyze production software. Pin is publicly available for Linux platforms on four architectures: IA32 (32-bit x86), EM64T (64-bit x86), Itanium®, and ARM. In the ten months since Pin 2 was released in July 2004, there have been over 3000 downloads from its website.

...read moreread less

4,019 citations

A Break in the Clouds: Towards a Cloud Definition

[...]

Chris Rose

01 Jan 2011

2,037 citations

Book•

Parallel Computer Architecture: A Hardware/Software Approach

[...]

David E. Culler, Anoop Gupta, Jaswinder Pal Singh

15 Aug 1998

TL;DR: This book explains the forces behind this convergence of shared-memory, message-passing, data parallel, and data-driven computing architectures and provides comprehensive discussions of parallel programming for high performance and of workload-driven evaluation, based on understanding hardware-software interactions.

...read moreread less

Abstract: The most exciting development in parallel computer architecture is the convergence of traditionally disparate approaches on a common machine structure. This book explains the forces behind this convergence of shared-memory, message-passing, data parallel, and data-driven computing architectures. It then examines the design issues that are critical to all parallel architecture across the full range of modern design, covering data access, communication performance, coordination of cooperative work, and correct implementation of useful semantics. It not only describes the hardware and software techniques for addressing each of these issues but also explores how these techniques interact in the same system. Examining architecture from an application-driven perspective, it provides comprehensive discussions of parallel programming for high performance and of workload-driven evaluation, based on understanding hardware-software interactions. * synthesizes a decade of research and development for practicing engineers, graduate students, and researchers in parallel computer architecture, system software, and applications development * presents in-depth application case studies from computer graphics, computational science and engineering, and data mining to demonstrate sound quantitative evaluation of design trade-offs * describes the process of programming for performance, including both the architecture-independent and architecture-dependent aspects, with examples and case-studies * illustrates bus-based and network-based parallel systems with case studies of more than a dozen important commercial designs Table of Contents 1 Introduction 2 Parallel Programs 3 Programming for Performance 4 Workload-Driven Evaluation 5 Shared Memory Multiprocessors 6 Snoop-based Multiprocessor Design 7 Scalable Multiprocessors 8 Directory-based Cache Coherence 9 Hardware-Software Tradeoffs 10 Interconnection Network Design 11 Latency Tolerance 12 Future Directions APPENDIX A Parallel Benchmark Suites

...read moreread less

1,571 citations

Proceedings Article•DOI•

Architecting phase change memory as a scalable dram alternative

[...]

Benjamin C. Lee¹, Engin Ipek¹, Onur Mutlu², Doug Burger¹•Institutions (2)

Microsoft¹, Carnegie Mellon University²

20 Jun 2009

TL;DR: This work proposes, crafted from a fundamental understanding of PCM technology parameters, area-neutral architectural enhancements that address these limitations and make PCM competitive with DRAM.

...read moreread less

Abstract: Memory scaling is in jeopardy as charge storage and sensing mechanisms become less reliable for prevalent memory technologies, such as DRAM. In contrast, phase change memory (PCM) storage relies on scalable current and thermal mechanisms. To exploit PCM's scalability as a DRAM alternative, PCM must be architected to address relatively long latencies, high energy writes, and finite endurance.We propose, crafted from a fundamental understanding of PCM technology parameters, area-neutral architectural enhancements that address these limitations and make PCM competitive with DRAM. A baseline PCM system is 1.6x slower and requires 2.2x more energy than a DRAM system. Buffer reorganizations reduce this delay and energy gap to 1.2x and 1.0x, using narrow rows to mitigate write energy and multiple rows to improve locality and write coalescing. Partial writes enhance memory endurance, providing 5.6 years of lifetime. Process scaling will further reduce PCM energy costs and improve endurance.

...read moreread less

1,568 citations

Journal Article•DOI•

Collaboration Processes: Inside the Black Box

[...]

Ann Marie Thomson¹, James L. Perry²•Institutions (2)

Indiana University¹, Indiana University – Purdue University Indianapolis²

09 Nov 2006-Public Administration Review

TL;DR: The authors argue that public managers should look inside the "black box" of collaboration processes and find a complex construct of five variable dimensions: governance, administration, organizational autonomy, mutuality, and norms.

...read moreread less

Abstract: Social science research contains a wealth of knowledge for people seeking to understand collaboration processes. The authors argue that public managers should look inside the “black box” of collaboration processes. Inside, they will find a complex construct of five variable dimensions: governance, administration, organizational autonomy, mutuality, and norms. Public managers must know these five dimensions and manage them intentionally in order to collaborate effectively.

...read moreread less

1,115 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse