scispace - formally typeset
Search or ask a question
Author

Jennifer M. Anderson

Bio: Jennifer M. Anderson is an academic researcher from VMware. The author has contributed to research in topics: Compiler & Virtual machine. The author has an hindex of 26, co-authored 29 publications receiving 3909 citations. Previous affiliations of Jennifer M. Anderson include Stanford University & University of California, Irvine.

Papers
More filters
Journal ArticleDOI
TL;DR: In this paper, the authors describe automatic parallelization techniques in the SUIF (Stanford University Intermediate Format) compiler that result in good multiprocessor performance for array-based numerical programs.
Abstract: This article describes automatic parallelization techniques in the SUIF (Stanford University Intermediate Format) compiler that result in good multiprocessor performance for array-based numerical programs. Parallelizing compilers for multiprocessors face many hurdles. However, SUIF's robust analysis and memory optimization techniques enabled speedups on three fourths of the NAS and SPECfp95 benchmark programs.

592 citations

Journal ArticleDOI
TL;DR: The SUIF compiler is built into a powerful, flexible system that may be useful for many other researchers and the authors invite you to use and welcome your contributions to this infrastructure.
Abstract: Compiler infrastructures that support experimental research are crucial to the advancement of high-performance computing. New compiler technology must be implemented and evaluated in the context of a complete compiler, but developing such an infrastructure requires a huge investment in time and resources. We have spent a number of years building the SUIF compiler into a powerful, flexible system, and we would now like to share the results of our efforts.SUIF consists of a small, clearly documented kernel and a toolkit of compiler passes built on top of the kernel. The kernel defines the intermediate representation, provides functions to access and manipulate the intermediate representation, and structures the interface between compiler passes. The toolkit currently includes C and Fortran front ends, a loop-level parallelism and locality optimizer, an optimizing MIPS back end, a set of compiler development tools, and support for instructional use.Although we do not expect SUIF to be suitable for everyone, we think it may be useful for many other researchers. We thus invite you to use SUIF and welcome your contributions to this infrastructure. Directions for obtaining the SUIF software are included at the end of this paper.

587 citations

Journal ArticleDOI
01 Oct 1997
TL;DR: The Digital Continuous Profiling Infrastructure is a sampling-based profiling system designed to run continuously on production systems, supporting multiprocessors, works on unmodified executables, and collects profiles for entire systems, including user programs, shared libraries, and the operating system kernel.
Abstract: This article describes the Digital Continuous Profiling Infrastructure, a sampling-based profiling system designed to run continuously on production systems. The system supports multiprocessors, works on unmodified executables, and collects profiles for entire systems, including user programs, shared libraries, and the operating system kernel. Samples are collected at a high rate (over 5200 samples/sec. per 333MHz processor), yet with low overhead (1–3% slowdown for most workloads). Analysis tools supplied with the profiling system use the sample data to produce a precise and accurate accounting, down to the level of pipeline stalls incurred by individual instructions, of where time is bring spent. When instructions incur stalls, the tools identify possible reasons, such as cache misses, branch mispredictions, and functional unit contention. The fine-grained instruction-level analysis guides users and automated optimizers to the causes of performance problems and provides important insights for fixing them.

545 citations

Proceedings ArticleDOI
01 Jun 1993
TL;DR: A compiler algorithm that automatically finds computation and data decompositions that optimize both parallelism and locality that is designed for use with both distributed and shared address space machines.
Abstract: Data locality is critical to achieving high performance on large-scale parallel machines. Non-local data accesses result in communication that can greatly impact performance. Thus the mapping, or decomposition, of the computation and data onto the processors of a scalable parallel machine is a key issue in compiling programs for these architectures. This paper describes a compiler algorithm that automatically finds computation and data decompositions that optimize both parallelism and locality. This algorithm is designed for use with both distributed and shared address space machines. The scope of our algorithm is dense matrix computations where the array accesses are affine functions of the loop indices. Our algorithm can handle programs with general nestings of parallel and sequential loops. We present a mathematical framework that enables us to systematically derive the decompositions. Our algorithm can exploit parallelism in both fully parallelizable loops as well as loops that require explicit synchronization. The algorithm will trade off extra degrees of parallelism to eliminate communication. If communication is needed, the algorithm will try to introduce the least expensive forms of communication into those parts of the program that are least frequently executed.

388 citations

Proceedings ArticleDOI
01 Aug 1995
TL;DR: This work has developed the first compiler system that fully automatically parallelizes sequential programs and changes the original array layouts to improve memory system performance, and shows that the compiler can effectively optimize parallelism in conjunction with memory subsystem performance.
Abstract: Effective memory hierarchy utilization is critical to the performance of modern multiprocessor architectures. We have developed the first compiler system that fully automatically parallelizes sequential programs and changes the original array layouts to improve memory system performance. Our optimization algorithm consists of two steps. The first step chooses the parallelization and computation assignment such that synchronization and data sharing are minimized. The second step then restructures the layout of the data in the shared address space with an algorithm that is based on a new data transformation framework. We ran our compiler on a set of application programs and measured their performance on the Stanford DASH multiprocessor. Our results show that the compiler can effectively optimize parallelism in conjunction with memory subsystem performance.

248 citations


Cited by
More filters
Proceedings ArticleDOI
22 Apr 2001
TL;DR: A series of experiments are described which obtained detailed measurements of the energy consumption of an IEEE 802.11 wireless network interface operating in an ad hoc networking environment, and some implications for protocol design and evaluation in ad hoc networks are discussed.
Abstract: Energy-aware design and evaluation of network protocols requires knowledge of the energy consumption behavior of actual wireless interfaces. But little practical information is available about the energy consumption behavior of well-known wireless network interfaces and device specifications do not provide information in a form that is helpful to protocol developers. This paper describes a series of experiments which obtained detailed measurements of the energy consumption of an IEEE 802.11 wireless network interface operating in an ad hoc networking environment. The data is presented as a collection of linear equations for calculating the energy consumed in sending, receiving and discarding broadcast and point-to-point data packets of various sizes. Some implications for protocol design and evaluation in ad hoc networks are discussed.

1,810 citations

Book
15 Aug 1998
TL;DR: This book explains the forces behind this convergence of shared-memory, message-passing, data parallel, and data-driven computing architectures and provides comprehensive discussions of parallel programming for high performance and of workload-driven evaluation, based on understanding hardware-software interactions.
Abstract: The most exciting development in parallel computer architecture is the convergence of traditionally disparate approaches on a common machine structure. This book explains the forces behind this convergence of shared-memory, message-passing, data parallel, and data-driven computing architectures. It then examines the design issues that are critical to all parallel architecture across the full range of modern design, covering data access, communication performance, coordination of cooperative work, and correct implementation of useful semantics. It not only describes the hardware and software techniques for addressing each of these issues but also explores how these techniques interact in the same system. Examining architecture from an application-driven perspective, it provides comprehensive discussions of parallel programming for high performance and of workload-driven evaluation, based on understanding hardware-software interactions. * synthesizes a decade of research and development for practicing engineers, graduate students, and researchers in parallel computer architecture, system software, and applications development * presents in-depth application case studies from computer graphics, computational science and engineering, and data mining to demonstrate sound quantitative evaluation of design trade-offs * describes the process of programming for performance, including both the architecture-independent and architecture-dependent aspects, with examples and case-studies * illustrates bus-based and network-based parallel systems with case studies of more than a dozen important commercial designs Table of Contents 1 Introduction 2 Parallel Programs 3 Programming for Performance 4 Workload-Driven Evaluation 5 Shared Memory Multiprocessors 6 Snoop-based Multiprocessor Design 7 Scalable Multiprocessors 8 Directory-based Cache Coherence 9 Hardware-Software Tradeoffs 10 Interconnection Network Design 11 Latency Tolerance 12 Future Directions APPENDIX A Parallel Benchmark Suites

1,571 citations

Proceedings ArticleDOI
01 May 1991
TL;DR: An algorithm that improves the locality of a loop nest by transforming the code via interchange, reversal, skewing and tiling is proposed, and is successful in optimizing codes such as matrix multiplication, successive over-relaxation, LU decomposition without pivoting, and Givens QR factorization.
Abstract: This paper proposes an algorithm that improves the locality of a loop nest by transforming the code via interchange, reversal, skewing and tiling. The loop transformation algorithm is based on two concepts: a mathematical formulation of reuse and locality, and a loop transformation theory that unifies the various transforms as unimodular matrix transformations.The algorithm has been implemented in the SUIF (Stanford University Intermediate Format) compiler, and is successful in optimizing codes such as matrix multiplication, successive over-relaxation (SOR), LU decomposition without pivoting, and Givens QR factorization. Performance evaluation indicates that locality optimization is especially crucial for scaling up the performance of parallel code.

1,352 citations

Journal ArticleDOI
TL;DR: The authors argue that public managers should look inside the "black box" of collaboration processes and find a complex construct of five variable dimensions: governance, administration, organizational autonomy, mutuality, and norms.
Abstract: Social science research contains a wealth of knowledge for people seeking to understand collaboration processes. The authors argue that public managers should look inside the “black box” of collaboration processes. Inside, they will find a complex construct of five variable dimensions: governance, administration, organizational autonomy, mutuality, and norms. Public managers must know these five dimensions and manage them intentionally in order to collaborate effectively.

1,115 citations

Book ChapterDOI
08 Apr 2002
TL;DR: The structure of CIL is described, with a focus on how it disambiguates those features of C that were found to be most confusing for program analysis and transformation, allowing a complete project to be viewed as a single compilation unit.
Abstract: This paper describes the C Intermediate Language: a high-level representation along with a set of tools that permit easy analysis and source-to-source transformation of C programs.Compared to C, CIL has fewer constructs. It breaks down certain complicated constructs of C into simpler ones, and thus it works at a lower level than abstract-syntax trees. But CIL is also more high-level than typical intermediate languages (e.g., three-address code) designed for compilation. As a result, what we have is a representation that makes it easy to analyze and manipulate C programs, and emit them in a form that resembles the original source. Moreover, it comes with a front-end that translates to CIL not only ANSI C programs but also those using Microsoft C or GNU C extensions.We describe the structure of CIL with a focus on how it disambiguates those features of C that we found to be most confusing for program analysis and transformation. We also describe a whole-program merger based on structural type equality, allowing a complete project to be viewed as a single compilation unit. As a representative application of CIL, we show a transformation aimed at making code immune to stack-smashing attacks. We are currently using CIL as part of a system that analyzes and instruments C programs with run-time checks to ensure type safety. CIL has served us very well in this project, and we believe it can usefully be applied in other situations as well.

1,065 citations