Home
/
Authors
/
John Shalf

Author

John Shalf

Other affiliations: National Center for Supercomputing Applications, University of Illinois at Urbana–Champaign, University of Tennessee ...read more

Bio: John Shalf is an academic researcher from Lawrence Berkeley National Laboratory. The author has contributed to research in topics: Supercomputer & Cache. The author has an hindex of 55, co-authored 236 publications receiving 13215 citations. Previous affiliations of John Shalf include National Center for Supercomputing Applications & University of Illinois at Urbana–Champaign.

Topics: Supercomputer, Cache, Grid, Adaptive mesh refinement, Petascale computing ...read more

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1996

Papers

PDF

Open Access

More filters

The Landscape of Parallel Computing Research: A View from Berkeley

[...]

Krste Asanovic, Ras Bodik, Bryan Catanzaro, Joseph Gebis, Parry Husbands, Kurt Keutzer, David A. Patterson, William Plishker, John Shalf, Samuel Williams, Katherine Yelick - Show less +7 more

18 Dec 2006

TL;DR: The parallel landscape is frame with seven questions, and the following are recommended to explore the design space rapidly: • The overarching goal should be to make it easy to write programs that execute efficiently on highly parallel computing systems • The target should be 1000s of cores per chip, as these chips are built from processing elements that are the most efficient in MIPS (Million Instructions per Second) per watt, MIPS per area of silicon, and MIPS each development dollar.

...read moreread less

Abstract: Author(s): Asanovic, K; Bodik, R; Catanzaro, B; Gebis, J; Husbands, P; Keutzer, K; Patterson, D; Plishker, W; Shalf, J; Williams, SW | Abstract: The recent switch to parallel microprocessors is a milestone in the history of computing. Industry has laid out a roadmap for multicore designs that preserves the programming paradigm of the past via binary compatibility and cache coherence. Conventional wisdom is now to double the number of cores on a chip with each silicon generation. A multidisciplinary group of Berkeley researchers met nearly two years to discuss this change. Our view is that this evolutionary approach to parallel hardware and software may work from 2 or 8 processor systems, but is likely to face diminishing returns as 16 and 32 processor systems are realized, just as returns fell with greater instruction-level parallelism. We believe that much can be learned by examining the success of parallelism at the extremes of the computing spectrum, namely embedded computing and high performance computing. This led us to frame the parallel landscape with seven questions, and to recommend the following: • The overarching goal should be to make it easy to write programs that execute efficiently on highly parallel computing systems • The target should be 1000s of cores per chip, as these chips are built from processing elements that are the most efficient in MIPS (Million Instructions per Second) per watt, MIPS per area of silicon, and MIPS per development dollar. • Instead of traditional benchmarks, use 13 “Dwarfs” to design and evaluate parallel programming models and architectures. (A dwarf is an algorithmic method that captures a pattern of computation and communication.) • “Autotuners” should play a larger role than conventional compilers in translating parallel programs. • To maximize programmer productivity, future programming models must be more human-centric than the conventional focus on hardware or applications. • To be successful, programming models should be independent of the number of processors. • To maximize application efficiency, programming models should support a wide range of data types and successful models of parallelism: task-level parallelism, word-level parallelism, and bit-level parallelism. 1 The Landscape of Parallel Computing Research: A View From Berkeley • Architects should not include features that significantly affect performance or energy if programmers cannot accurately measure their impact via performance counters and energy counters. • Traditional operating systems will be deconstructed and operating system functionality will be orchestrated using libraries and virtual machines. • To explore the design space rapidly, use system emulators based on Field Programmable Gate Arrays (FPGAs) that are highly scalable and low cost. Since real world applications are naturally parallel and hardware is naturally parallel, what we need is a programming model, system software, and a supporting architecture that are naturally parallel. Researchers have the rare opportunity to re-invent these cornerstones of computing, provided they simplify the efficient programming of highly parallel systems.

...read moreread less

2,262 citations

Journal Article•DOI•

The International Exascale Software Project roadmap

[...]

Jack Dongarra¹, Pete Beckman¹, Terry Moore¹, Patrick Aerts¹, Giovanni Aloisio¹, Jean-Claude Andre¹, David Barkai¹, Jean-Yves Berthou¹, Taisuke Boku¹, Bertrand Braunschweig¹, Franck Cappello¹, Barbara Chapman¹, Xuebin Chi¹, Alok Choudhary¹, Sudip S. Dosanjh¹, Thom H. Dunning¹, Sandro Fiore¹, Al Geist¹, Bill Gropp¹, Robert W. Harrison¹, Mark Hereld¹, Michael A. Heroux¹, Adolfy Hoisie¹, Koh Hotta¹, Zhong Jin¹, Yutaka Ishikawa¹, Fred Johnson¹, Sanjay Kale¹, Richard Kenway¹, David E. Keyes¹, Bill Kramer¹, Jesús Labarta¹, Alain Lichnewsky¹, Thomas Lippert¹, Bob Lucas¹, Barney Maccabe¹, Satoshi Matsuoka¹, Paul Messina¹, Peter Michielse¹, Bernd Mohr¹, Matthias S. Mueller¹, Wolfgang E. Nagel¹, Hiroshi Nakashima¹, Michael E. Papka¹, Daniel A. Reed¹, Mitsuhisa Sato¹, Edward Seidel¹, John Shalf¹, David Skinner¹, Marc Snir¹, Thomas Sterling¹, Rick Stevens¹, Frederick H. Streitz¹, Bob Sugar¹, Shinji Sumimoto¹, William Tang¹, John Taylor¹, Rajeev Thakur¹, Anne E. Trefethen¹, Mateo Valero¹, Aad J. van der Steen¹, Jeffrey S. Vetter¹, Peg Williams¹, Robert W. Wisniewski¹, Katherine Yelick¹ - Show less +61 more•Institutions (1)

University of Tennessee¹

01 Feb 2011

TL;DR: The work of the community to prepare for the challenges of exascale computing is described, ultimately combing their efforts in a coordinated International Exascale Software Project.

...read moreread less

Abstract: Over the last 20 years, the open-source community has provided more and more software on which the worldâs high-performance computing systems depend for performance and productivity. The community has invested millions of dollars and years of effort to build key components. However, although the investments in these separate software elements have been tremendously valuable, a great deal of productivity has also been lost because of the lack of planning, coordination, and key integration of technologies necessary to make them work together smoothly and efficiently, both within individual petascale systems and between different systems. It seems clear that this completely uncoordinated development model will not provide the software needed to support the unprecedented parallelism required for peta/ exascale computation on millions of cores, or the flexibility required to exploit new hardware models and features, such as transactional memory, speculative execution, and graphics processing units. This report describes the work of the community to prepare for the challenges of exascale computing, ultimately combing their efforts in a coordinated International Exascale Software Project.

...read moreread less

736 citations

Proceedings Article•DOI•

Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud

[...]

Keith Jackson¹, Lavanya Ramakrishnan¹, Krishna Muriki¹, Shane Canon¹, Shreyas Cholia¹, John Shalf¹, Harvey J. Wasserman¹, Nicholas J. Wright¹ - Show less +4 more•Institutions (1)

Lawrence Berkeley National Laboratory¹

30 Nov 2010

TL;DR: This work represents the most comprehensive evaluation to date comparing conventional HPC platforms to Amazon EC2, using real applications representative of the workload at a typical supercomputing center, and results indicate that EC2 is six times slower than a typical mid-range Linux cluster, and twenty times faster than a modern HPC system.

...read moreread less

Abstract: Cloud computing has seen tremendous growth, particularly for commercial web applications. The on-demand, pay-as-you-go model creates a flexible and cost-effective means to access compute resources. For these reasons, the scientific computing community has shown increasing interest in exploring cloud computing. However, the underlying implementation and performance of clouds are very different from those at traditional supercomputing centers. It is therefore critical to evaluate the performance of HPC applications in today’s cloud environments to understand the tradeoffs inherent in migrating to the cloud. This work represents the most comprehensive evaluation to date comparing conventional HPC platforms to Amazon EC2, using real applications representative of the workload at a typical supercomputing center. Overall results indicate that EC2 is six times slower than a typical mid-range Linux cluster, and twenty times slower than a modern HPC system. The interconnect on the EC2 cloud platform severely limits performance and causes significant variability.

...read moreread less

593 citations

Proceedings Article•DOI•

Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures

[...]

Kaushik Datta¹, Mark Murphy¹, Vasily Volkov¹, Samuel Williams¹, Jonathan Carter², Leonid Oliker¹, David A. Patterson¹, John Shalf², Katherine Yelick¹ - Show less +5 more•Institutions (2)

University of California, Berkeley¹, Lawrence Berkeley National Laboratory²

15 Nov 2008

TL;DR: This work explores multicore stencil (nearest-neighbor) computations --- a class of algorithms at the heart of many structured grid codes, including PDF solvers, and develops a number of effective optimization strategies, and builds an auto-tuning environment that searches over these strategies to minimize runtime, while maximizing performance portability.

...read moreread less

Abstract: Understanding the most efficient design and utilization of emerging multicore systems is one of the most challenging questions faced by the mainstream and scientific computing industries in several decades. Our work explores multicore stencil (nearest-neighbor) computations --- a class of algorithms at the heart of many structured grid codes, including PDF solvers. We develop a number of effective optimization strategies, and build an auto-tuning environment that searches over our optimizations and their parameters to minimize runtime, while maximizing performance portability. To evaluate the effectiveness of these strategies we explore the broadest set of multicore architectures in the current HPC literature, including the Intel Clovertown, AMD Barcelona, Sun Victoria Falls, IBM QS22 PowerXCell 8i, and NVIDIA GTX280. Overall, our auto-tuning optimization methodology results in the fastest multicore stencil performance to date. Finally, we present several key insights into the architectural tradeoffs of emerging multicore designs and their implications on scientific algorithm development.

...read moreread less

544 citations

Journal Article•DOI•

Optimization of sparse matrix-vector multiplication on emerging multicore platforms

[...]

Samuel Williams¹, Leonid Oliker², Richard Vuduc³, John Shalf², Katherine Yelick¹, James Demmel¹ - Show less +2 more•Institutions (3)

University of California, Berkeley¹, Lawrence Berkeley National Laboratory², Georgia Institute of Technology³

01 Mar 2009

TL;DR: This work examines sparse matrix-vector multiply (SpMV) - one of the most heavily used kernels in scientific computing - across a broad spectrum of multicore designs, and presents several optimization strategies especially effective for the multicore environment.

...read moreread less

Abstract: We are witnessing a dramatic change in computer architecture due to the multicore paradigm shift, as every electronic device from cell phones to supercomputers confronts parallelism of unprecedented scale. To fully unleash the potential of these systems, the HPC community must develop multicore specific-optimization methodologies for important scientific computations. In this work, we examine sparse matrix-vector multiply (SpMV) - one of the most heavily used kernels in scientific computing - across a broad spectrum of multicore designs. Our experimental platform includes the homogeneous AMD quad-core, AMD dual-core, and Intel quad-core designs, the heterogeneous STI Cell, as well as one of the first scientific studies of the highly multithreaded Sun Victoria Falls (a Niagara2 SMP). We present several optimization strategies especially effective for the multicore environment, and demonstrate significant performance improvements compared to existing state-of-the-art serial and parallel SpMV implementations. Additionally, we present key insights into the architectural trade-offs of leading multicore design strategies, in the context of demanding memory-bound numerical algorithms.

...read moreread less

513 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

The Anatomy of the Grid: Enabling Scalable Virtual Organizations

[...]

Ian Foster¹, Carl Kesselman², Steven Tuecke¹•Institutions (2)

Argonne National Laboratory¹, University of Southern California²

01 Aug 2001

TL;DR: The authors present an extensible and open Grid architecture, in which protocols, services, application programming interfaces, and software development kits are categorized according to their roles in enabling resource sharing.

...read moreread less

Abstract: "Grid" computing has emerged as an important new field, distinguished from conventional distributed computing by its focus on large-scale resource sharing, innovative applications, and, in some cases, high performance orientation. In this article, the authors define this new field. First, they review the "Grid problem," which is defined as flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resources--what is referred to as virtual organizations. In such settings, unique authentication, authorization, resource access, resource discovery, and other challenges are encountered. It is this class of problem that is addressed by Grid technologies. Next, the authors present an extensible and open Grid architecture, in which protocols, services, application programming interfaces, and software development kits are categorized according to their roles in enabling resource sharing. The authors describe requirements that they believe any such mechanisms must satisfy and discuss the importance of defining a compact set of intergrid protocols to enable interoperability among different Grid systems. Finally, the authors discuss how Grid technologies relate to other contemporary technologies, including enterprise integration, application service provider, storage service provider, and peer-to-peer computing. They maintain that Grid concepts and technologies complement and have much to contribute to these other approaches.

...read moreread less

6,716 citations

Journal Article•DOI•

Edge Computing: Vision and Challenges

[...]

Weisong Shi¹, Jie Cao¹, Quan Zhang¹, Youhuizi Li¹, Lanyu Xu¹ - Show less +1 more•Institutions (1)

Wayne State University¹

09 Jun 2016-IEEE Internet of Things Journal

TL;DR: The definition of edge computing is introduced, followed by several case studies, ranging from cloud offloading to smart home and city, as well as collaborative edge to materialize the concept of edge Computing.

...read moreread less

Abstract: The proliferation of Internet of Things (IoT) and the success of rich cloud services have pushed the horizon of a new computing paradigm, edge computing, which calls for processing the data at the edge of the network. Edge computing has the potential to address the concerns of response time requirement, battery life constraint, bandwidth cost saving, as well as data safety and privacy. In this paper, we introduce the definition of edge computing, followed by several case studies, ranging from cloud offloading to smart home and city, as well as collaborative edge to materialize the concept of edge computing. Finally, we present several challenges and opportunities in the field of edge computing, and hope this paper will gain attention from the community and inspire more research in this direction.

...read moreread less

5,198 citations

Posted Content•

The Anatomy of the Grid - Enabling Scalable Virtual Organizations

[...]

Ian Foster¹, Carl Kesselman², Steven Tuecke¹•Institutions (2)

Argonne National Laboratory¹, University of Southern California²

29 Mar 2001-arXiv: Hardware Architecture

TL;DR: This article reviews the "Grid problem," and presents an extensible and open Grid architecture, in which protocols, services, application programming interfaces, and software development kits are categorized according to their roles in enabling resource sharing.

...read moreread less

Abstract: "Grid" computing has emerged as an important new field, distinguished from conventional distributed computing by its focus on large-scale resource sharing, innovative applications, and, in some cases, high-performance orientation. In this article, we define this new field. First, we review the "Grid problem," which we define as flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resources-what we refer to as virtual organizations. In such settings, we encounter unique authentication, authorization, resource access, resource discovery, and other challenges. It is this class of problem that is addressed by Grid technologies. Next, we present an extensible and open Grid architecture, in which protocols, services, application programming interfaces, and software development kits are categorized according to their roles in enabling resource sharing. We describe requirements that we believe any such mechanisms must satisfy, and we discuss the central role played by the intergrid protocols that enable interoperability among different Grid systems. Finally, we discuss how Grid technologies relate to other contemporary technologies, including enterprise integration, application service provider, storage service provider, and peer-to-peer computing. We maintain that Grid concepts and technologies complement and have much to contribute to these other approaches.

...read moreread less

3,595 citations

Journal Article•DOI•

Globus: a Metacomputing Infrastructure Toolkit

[...]

Ian Foster¹, Carl Kesselman²•Institutions (2)

Argonne National Laboratory¹, Information Sciences Institute²

01 Jun 1997

TL;DR: The Globus system is intended to achieve a vertically integrated treatment of application, middleware, and net work, an integrated set of higher level services that enable applications to adapt to heteroge neous and dynamically changing metacomputing environ ments.

...read moreread less

Abstract: The Globus system is intended to achieve a vertically integrated treatment of application, middleware, and net work. A low-level toolkit provides basic mechanisms such as communication, authentication, network information, and data access. These mechanisms are used to con struct various higher level metacomputing services, such as parallel programming tools and schedulers. The long- term goal is to build an adaptive wide area resource environment AWARE, an integrated set of higher level services that enable applications to adapt to heteroge neous and dynamically changing metacomputing environ ments. Preliminary versions of Globus components were deployed successfully as part of the I-WAY networking experiment.

...read moreread less

3,450 citations

Proceedings Article•DOI•

Rodinia: A benchmark suite for heterogeneous computing

[...]

Shuai Che¹, Michael Boyer¹, Jiayuan Meng¹, David Tarjan¹, Jeremy W. Sheaffer¹, Sang-Ha Lee¹, Kevin Skadron¹ - Show less +3 more•Institutions (1)

University of Virginia¹

04 Oct 2009

TL;DR: This characterization shows that the Rodinia benchmarks cover a wide range of parallel communication patterns, synchronization techniques and power consumption, and has led to some important architectural insight, such as the growing importance of memory-bandwidth limitations and the consequent importance of data layout.

...read moreread less

Abstract: This paper presents and characterizes Rodinia, a benchmark suite for heterogeneous computing. To help architects study emerging platforms such as GPUs (Graphics Processing Units), Rodinia includes applications and kernels which target multi-core CPU and GPU platforms. The choice of applications is inspired by Berkeley's dwarf taxonomy. Our characterization shows that the Rodinia benchmarks cover a wide range of parallel communication patterns, synchronization techniques and power consumption, and has led to some important architectural insight, such as the growing importance of memory-bandwidth limitations and the consequent importance of data layout.

...read moreread less

2,697 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse