Home
/
Authors
/
Jennifer M. Anderson

Author

Jennifer M. Anderson

Other affiliations: Stanford University, University of California, Irvine

Bio: Jennifer M. Anderson is an academic researcher from VMware. The author has contributed to research in topics: Compiler & Virtual machine. The author has an hindex of 26, co-authored 29 publications receiving 3909 citations. Previous affiliations of Jennifer M. Anderson include Stanford University & University of California, Irvine.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Maximizing multiprocessor performance with the SUIF compiler

[...]

Mary Hall, Jennifer M. Anderson¹, Jennifer M. Anderson², Saman Amarasinghe³, Saman Amarasinghe¹, Brian R. Murphy¹, Brian R. Murphy⁴, Shih-Wei Liao⁵, Shih-Wei Liao¹, Edouard Bugnion¹, Edouard Bugnion⁶, Monica S. Lam⁷, Monica S. Lam¹, Monica S. Lam⁸ - Show less +10 more•Institutions (8)

Stanford University¹, University of California, Irvine², Cornell University³, Massachusetts Institute of Technology⁴, National Taiwan University⁵, ETH Zurich⁶, Carnegie Mellon University⁷, University of British Columbia⁸

01 Dec 1996-IEEE Computer

TL;DR: In this paper, the authors describe automatic parallelization techniques in the SUIF (Stanford University Intermediate Format) compiler that result in good multiprocessor performance for array-based numerical programs.

...read moreread less

Abstract: This article describes automatic parallelization techniques in the SUIF (Stanford University Intermediate Format) compiler that result in good multiprocessor performance for array-based numerical programs. Parallelizing compilers for multiprocessors face many hurdles. However, SUIF's robust analysis and memory optimization techniques enabled speedups on three fourths of the NAS and SPECfp95 benchmark programs.

...read moreread less

592 citations

Journal Article•DOI•

SUIF: an infrastructure for research on parallelizing and optimizing compilers

[...]

Robert P. Wilson¹, Robert S. French¹, Christopher S. Wilson¹, Saman Amarasinghe¹, Jennifer M. Anderson¹, Steve W. K. Tjiang¹, Shih-Wei Liao¹, Chau-Wen Tseng¹, Mary Hall¹, Monica S. Lam¹, John L. Hennessy¹ - Show less +7 more•Institutions (1)

Stanford University¹

01 Dec 1994-Sigplan Notices

TL;DR: The SUIF compiler is built into a powerful, flexible system that may be useful for many other researchers and the authors invite you to use and welcome your contributions to this infrastructure.

...read moreread less

Abstract: Compiler infrastructures that support experimental research are crucial to the advancement of high-performance computing. New compiler technology must be implemented and evaluated in the context of a complete compiler, but developing such an infrastructure requires a huge investment in time and resources. We have spent a number of years building the SUIF compiler into a powerful, flexible system, and we would now like to share the results of our efforts.SUIF consists of a small, clearly documented kernel and a toolkit of compiler passes built on top of the kernel. The kernel defines the intermediate representation, provides functions to access and manipulate the intermediate representation, and structures the interface between compiler passes. The toolkit currently includes C and Fortran front ends, a loop-level parallelism and locality optimizer, an optimizing MIPS back end, a set of compiler development tools, and support for instructional use.Although we do not expect SUIF to be suitable for everyone, we think it may be useful for many other researchers. We thus invite you to use SUIF and welcome your contributions to this infrastructure. Directions for obtaining the SUIF software are included at the end of this paper.

...read moreread less

587 citations

Journal Article•DOI•

Continuous profiling: where have all the cycles gone?

[...]

Jennifer M. Anderson, Lance M. Berc, Jeffrey Dean, Sanjay Ghemawat, Monika Henzinger, Shun-Tak Albert Leung, Richard L. Sites, Mark T. Vandevoorde, Carl A. Waldspurger, William E. Weihl - Show less +6 more

01 Oct 1997

TL;DR: The Digital Continuous Profiling Infrastructure is a sampling-based profiling system designed to run continuously on production systems, supporting multiprocessors, works on unmodified executables, and collects profiles for entire systems, including user programs, shared libraries, and the operating system kernel.

...read moreread less

Abstract: This article describes the Digital Continuous Profiling Infrastructure, a sampling-based profiling system designed to run continuously on production systems. The system supports multiprocessors, works on unmodified executables, and collects profiles for entire systems, including user programs, shared libraries, and the operating system kernel. Samples are collected at a high rate (over 5200 samples/sec. per 333MHz processor), yet with low overhead (1–3% slowdown for most workloads). Analysis tools supplied with the profiling system use the sample data to produce a precise and accurate accounting, down to the level of pipeline stalls incurred by individual instructions, of where time is bring spent. When instructions incur stalls, the tools identify possible reasons, such as cache misses, branch mispredictions, and functional unit contention. The fine-grained instruction-level analysis guides users and automated optimizers to the causes of performance problems and provides important insights for fixing them.

...read moreread less

545 citations

Proceedings Article•DOI•

Global optimizations for parallelism and locality on scalable parallel machines

[...]

Jennifer M. Anderson, Monica S. Lam

01 Jun 1993

TL;DR: A compiler algorithm that automatically finds computation and data decompositions that optimize both parallelism and locality that is designed for use with both distributed and shared address space machines.

...read moreread less

Abstract: Data locality is critical to achieving high performance on large-scale parallel machines. Non-local data accesses result in communication that can greatly impact performance. Thus the mapping, or decomposition, of the computation and data onto the processors of a scalable parallel machine is a key issue in compiling programs for these architectures. This paper describes a compiler algorithm that automatically finds computation and data decompositions that optimize both parallelism and locality. This algorithm is designed for use with both distributed and shared address space machines. The scope of our algorithm is dense matrix computations where the array accesses are affine functions of the loop indices. Our algorithm can handle programs with general nestings of parallel and sequential loops. We present a mathematical framework that enables us to systematically derive the decompositions. Our algorithm can exploit parallelism in both fully parallelizable loops as well as loops that require explicit synchronization. The algorithm will trade off extra degrees of parallelism to eliminate communication. If communication is needed, the algorithm will try to introduce the least expensive forms of communication into those parts of the program that are least frequently executed.

...read moreread less

388 citations

Proceedings Article•DOI•

Data and computation transformations for multiprocessors

[...]

Jennifer M. Anderson¹, Saman Amarasinghe¹, Monica S. Lam¹•Institutions (1)

Stanford University¹

01 Aug 1995

TL;DR: This work has developed the first compiler system that fully automatically parallelizes sequential programs and changes the original array layouts to improve memory system performance, and shows that the compiler can effectively optimize parallelism in conjunction with memory subsystem performance.

...read moreread less

Abstract: Effective memory hierarchy utilization is critical to the performance of modern multiprocessor architectures. We have developed the first compiler system that fully automatically parallelizes sequential programs and changes the original array layouts to improve memory system performance. Our optimization algorithm consists of two steps. The first step chooses the parallelization and computation assignment such that synchronization and data sharing are minimized. The second step then restructures the layout of the data in the shared address space with an algorithm that is based on a new data transformation framework. We ran our compiler on a set of application programs and measured their performance on the Stanford DASH multiprocessor. Our results show that the compiler can effectively optimize parallelism in conjunction with memory subsystem performance.

...read moreread less

248 citations

1
2
3
4
…
5
6

Collapse

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

Investigating the energy consumption of a wireless network interface in an ad hoc networking environment

[...]

Laura Marie Feeney¹, Martin Nilsson¹•Institutions (1)

Swedish Institute of Computer Science¹

22 Apr 2001

TL;DR: A series of experiments are described which obtained detailed measurements of the energy consumption of an IEEE 802.11 wireless network interface operating in an ad hoc networking environment, and some implications for protocol design and evaluation in ad hoc networks are discussed.

...read moreread less

Abstract: Energy-aware design and evaluation of network protocols requires knowledge of the energy consumption behavior of actual wireless interfaces. But little practical information is available about the energy consumption behavior of well-known wireless network interfaces and device specifications do not provide information in a form that is helpful to protocol developers. This paper describes a series of experiments which obtained detailed measurements of the energy consumption of an IEEE 802.11 wireless network interface operating in an ad hoc networking environment. The data is presented as a collection of linear equations for calculating the energy consumed in sending, receiving and discarding broadcast and point-to-point data packets of various sizes. Some implications for protocol design and evaluation in ad hoc networks are discussed.

...read moreread less

1,810 citations

Book•

Parallel Computer Architecture: A Hardware/Software Approach

[...]

David E. Culler, Anoop Gupta, Jaswinder Pal Singh

15 Aug 1998

TL;DR: This book explains the forces behind this convergence of shared-memory, message-passing, data parallel, and data-driven computing architectures and provides comprehensive discussions of parallel programming for high performance and of workload-driven evaluation, based on understanding hardware-software interactions.

...read moreread less

Abstract: The most exciting development in parallel computer architecture is the convergence of traditionally disparate approaches on a common machine structure. This book explains the forces behind this convergence of shared-memory, message-passing, data parallel, and data-driven computing architectures. It then examines the design issues that are critical to all parallel architecture across the full range of modern design, covering data access, communication performance, coordination of cooperative work, and correct implementation of useful semantics. It not only describes the hardware and software techniques for addressing each of these issues but also explores how these techniques interact in the same system. Examining architecture from an application-driven perspective, it provides comprehensive discussions of parallel programming for high performance and of workload-driven evaluation, based on understanding hardware-software interactions. * synthesizes a decade of research and development for practicing engineers, graduate students, and researchers in parallel computer architecture, system software, and applications development * presents in-depth application case studies from computer graphics, computational science and engineering, and data mining to demonstrate sound quantitative evaluation of design trade-offs * describes the process of programming for performance, including both the architecture-independent and architecture-dependent aspects, with examples and case-studies * illustrates bus-based and network-based parallel systems with case studies of more than a dozen important commercial designs Table of Contents 1 Introduction 2 Parallel Programs 3 Programming for Performance 4 Workload-Driven Evaluation 5 Shared Memory Multiprocessors 6 Snoop-based Multiprocessor Design 7 Scalable Multiprocessors 8 Directory-based Cache Coherence 9 Hardware-Software Tradeoffs 10 Interconnection Network Design 11 Latency Tolerance 12 Future Directions APPENDIX A Parallel Benchmark Suites

...read moreread less

1,571 citations

Proceedings Article•DOI•

A data locality optimizing algorithm

[...]

Michael Wolf¹, Monica S. Lam¹•Institutions (1)

Stanford University¹

01 May 1991

TL;DR: An algorithm that improves the locality of a loop nest by transforming the code via interchange, reversal, skewing and tiling is proposed, and is successful in optimizing codes such as matrix multiplication, successive over-relaxation, LU decomposition without pivoting, and Givens QR factorization.

...read moreread less

Abstract: This paper proposes an algorithm that improves the locality of a loop nest by transforming the code via interchange, reversal, skewing and tiling. The loop transformation algorithm is based on two concepts: a mathematical formulation of reuse and locality, and a loop transformation theory that unifies the various transforms as unimodular matrix transformations.The algorithm has been implemented in the SUIF (Stanford University Intermediate Format) compiler, and is successful in optimizing codes such as matrix multiplication, successive over-relaxation (SOR), LU decomposition without pivoting, and Givens QR factorization. Performance evaluation indicates that locality optimization is especially crucial for scaling up the performance of parallel code.

...read moreread less

1,352 citations

Journal Article•DOI•

Collaboration Processes: Inside the Black Box

[...]

Ann Marie Thomson¹, James L. Perry²•Institutions (2)

Indiana University¹, Indiana University – Purdue University Indianapolis²

09 Nov 2006-Public Administration Review

TL;DR: The authors argue that public managers should look inside the "black box" of collaboration processes and find a complex construct of five variable dimensions: governance, administration, organizational autonomy, mutuality, and norms.

...read moreread less

Abstract: Social science research contains a wealth of knowledge for people seeking to understand collaboration processes. The authors argue that public managers should look inside the “black box” of collaboration processes. Inside, they will find a complex construct of five variable dimensions: governance, administration, organizational autonomy, mutuality, and norms. Public managers must know these five dimensions and manage them intentionally in order to collaborate effectively.

...read moreread less

1,115 citations

Book Chapter•DOI•

CIL: Intermediate Language and Tools for Analysis and Transformation of C Programs

[...]

George C. Necula¹, Scott McPeak¹, S. P. Rahul¹, Westley Weimer¹•Institutions (1)

University of California¹

08 Apr 2002

TL;DR: The structure of CIL is described, with a focus on how it disambiguates those features of C that were found to be most confusing for program analysis and transformation, allowing a complete project to be viewed as a single compilation unit.

...read moreread less

Abstract: This paper describes the C Intermediate Language: a high-level representation along with a set of tools that permit easy analysis and source-to-source transformation of C programs.Compared to C, CIL has fewer constructs. It breaks down certain complicated constructs of C into simpler ones, and thus it works at a lower level than abstract-syntax trees. But CIL is also more high-level than typical intermediate languages (e.g., three-address code) designed for compilation. As a result, what we have is a representation that makes it easy to analyze and manipulate C programs, and emit them in a form that resembles the original source. Moreover, it comes with a front-end that translates to CIL not only ANSI C programs but also those using Microsoft C or GNU C extensions.We describe the structure of CIL with a focus on how it disambiguates those features of C that we found to be most confusing for program analysis and transformation. We also describe a whole-program merger based on structural type equality, allowing a complete project to be viewed as a single compilation unit. As a representative application of CIL, we show a transformation aimed at making code immune to stack-smashing attacks. We are currently using CIL as part of a system that analyzes and instruments C programs with run-time checks to ensure type safety. CIL has served us very well in this project, and we believe it can usefully be applied in other situations as well.

...read moreread less

1,065 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse