scispace - formally typeset
Search or ask a question
Author

John L. Hennessy

Bio: John L. Hennessy is an academic researcher from Stanford University. The author has contributed to research in topics: Cache & Shared memory. The author has an hindex of 64, co-authored 166 publications receiving 28572 citations. Previous affiliations of John L. Hennessy include University of Wisconsin-Madison & University of California, Berkeley.


Papers
More filters
Book
01 Dec 1989
TL;DR: This best-selling title, considered for over a decade to be essential reading for every serious student and practitioner of computer design, has been updated throughout to address the most important trends facing computer designers today.
Abstract: This best-selling title, considered for over a decade to be essential reading for every serious student and practitioner of computer design, has been updated throughout to address the most important trends facing computer designers today. In this edition, the authors bring their trademark method of quantitative analysis not only to high-performance desktop machine design, but also to the design of embedded and server systems. They have illustrated their principles with designs from all three of these domains, including examples from consumer electronics, multimedia and Web technologies, and high-performance computing.

11,671 citations

Book
01 Jan 1993
TL;DR: The third edition of the book as mentioned in this paper has been updated with new pedagogical features, such as new information and challenging exercises for the advanced student, as well as a complete index of the material in the book and on the CD appears in the printed index.
Abstract: What's New in the Third Edition, Revised Printing The same great book gets better! This revised printing features all of the original content along with these additional features:. Appendix A (Assemblers, Linkers, and the SPIM Simulator) has been moved from the CD-ROM into the printed book. Corrections and bug fixesThird Edition featuresNew pedagogical features.Understanding Program Performance -Analyzes key performance issues from the programmer's perspective .Check Yourself Questions -Helps students assess their understanding of key points of a section .Computers In the Real World -Illustrates the diversity of applications of computing technology beyond traditional desktop and servers .For More Practice -Provides students with additional problems they can tackle .In More Depth -Presents new information and challenging exercises for the advanced student New reference features .Highlighted glossary terms and definitions appear on the book page, as bold-faced entries in the index, and as a separate and searchable reference on the CD. .A complete index of the material in the book and on the CD appears in the printed index and the CD includes a fully searchable version of the same index. .Historical Perspectives and Further Readings have been updated and expanded to include the history of software R&D. .CD-Library provides materials collected from the web which directly support the text. In addition to thoroughly updating every aspect of the text to reflect the most current computing technology, the third edition .Uses standard 32-bit MIPS 32 as the primary teaching ISA. .Presents the assembler-to-HLL translations in both C and Java. .Highlights the latest developments in architecture in Real Stuff sections: -Intel IA-32 -Power PC 604 -Google's PC cluster -Pentium P4 -SPEC CPU2000 benchmark suite for processors -SPEC Web99 benchmark for web servers -EEMBC benchmark for embedded systems -AMD Opteron memory hierarchy -AMD vs. 1A-64 New support for distinct course goals Many of the adopters who have used our book throughout its two editions are refining their courses with a greater hardware or software focus. We have provided new material to support these course goals: New material to support a Hardware Focus .Using logic design conventions .Designing with hardware description languages .Advanced pipelining .Designing with FPGAs .HDL simulators and tutorials .Xilinx CAD tools New material to support a Software Focus .How compilers work .How to optimize compilers .How to implement object oriented languages .MIPS simulator and tutorial .History sections on programming languages, compilers, operating systems and databases On the CD.NEW: Search function to search for content on both the CD-ROM and the printed text.CD-Bars: Full length sections that are introduced in the book and presented on the CD .CD-Appendixes: Appendices B-D .CD-Library: Materials collected from the web which directly support the text .CD-Exercises: For More Practice provides exercises and solutions for self-study.In More Depth presents new information and challenging exercises for the advanced or curious student .Glossary: Terms that are defined in the text are collected in this searchable reference .Further Reading: References are organized by the chapter they support .Software: HDL simulators, MIPS simulators, and FPGA design tools .Tutorials: SPIM, Verilog, and VHDL .Additional Support: Processor Models, Labs, Homeworks, Index covering the book and CD contents Instructor Support Instructor support provided on textbooks.elsevier.com:.Solutions to all the exercises .Figures from the book in a number of formats .Lecture slides prepared by the authors and other instructors .Lecture notes

1,521 citations

Proceedings ArticleDOI
01 May 1990
TL;DR: A new model of memory consistency, called release consistency, that allows for more buffering and pipelining than previously proposed models is introduced and is shown to be equivalent to the sequential consistency model for parallel programs with sufficient synchronization.
Abstract: Scalable shared-memory multiprocessors distribute memory among the processors and use scalable interconnection networks to provide high bandwidth and low latency communication. In addition, memory accesses are cached, buffered, and pipelined to bridge the gap between the slow shared memory and the fast processors. Unless carefully controlled, such architectural optimizations can cause memory accesses to be executed in an order different from what the programmer expects. The set of allowable memory access orderings forms the memory consistency model or event ordering model for an architecture.This paper introduces a new model of memory consistency, called release consistency, that allows for more buffering and pipelining than previously proposed models. A framework for classifying shared accesses and reasoning about event ordering is developed. The release consistency model is shown to be equivalent to the sequential consistency model for parallel programs with sufficient synchronization. Possible performance gains from the less strict constraints of the release consistency model are explored. Finally, practical implementation issues are discussed, concentrating on issues relevant to scalable architectures.

1,169 citations

Journal ArticleDOI
TL;DR: The directory architecture for shared memory (Dash) as discussed by the authors allows shared data to be cached, significantly reducing the latency of memory accesses and yielding higher processor utilization and higher overall performance, and a distributed directory-based protocol that provides cache coherence without compromising scalability.
Abstract: The overall goals and major features of the directory architecture for shared memory (Dash) are presented. The fundamental premise behind the architecture is that it is possible to build a scalable high-performance machine with a single address space and coherent caches. The Dash architecture is scalable in that it achieves linear or near-linear performance growth as the number of processors increases from a few to a few thousand. This performance results from distributing the memory among processing nodes and using a network with scalable bandwidth to connect the nodes. The architecture allows shared data to be cached, significantly reducing the latency of memory accesses and yielding higher processor utilization and higher overall performance. A distributed directory-based protocol that provides cache coherence without compromising scalability is discussed in detail. The Dash prototype machine and the corresponding software support are described. >

961 citations


Cited by
More filters
01 Dec 2010
TL;DR: This chapter discusses quantum information theory, public-key cryptography and the RSA cryptosystem, and the proof of Lieb's theorem.
Abstract: Part I. Fundamental Concepts: 1. Introduction and overview 2. Introduction to quantum mechanics 3. Introduction to computer science Part II. Quantum Computation: 4. Quantum circuits 5. The quantum Fourier transform and its application 6. Quantum search algorithms 7. Quantum computers: physical realization Part III. Quantum Information: 8. Quantum noise and quantum operations 9. Distance measures for quantum information 10. Quantum error-correction 11. Entropy and information 12. Quantum information theory Appendices References Index.

14,825 citations

Proceedings ArticleDOI
07 Jul 2001
TL;DR: In this paper, the authors present a database containing ground truth segmentations produced by humans for images of a wide variety of natural scenes, and define an error measure which quantifies the consistency between segmentations of differing granularities.
Abstract: This paper presents a database containing 'ground truth' segmentations produced by humans for images of a wide variety of natural scenes. We define an error measure which quantifies the consistency between segmentations of differing granularities and find that different human segmentations of the same image are highly consistent. Use of this dataset is demonstrated in two applications: (1) evaluating the performance of segmentation algorithms and (2) measuring probability distributions associated with Gestalt grouping factors as well as statistics of image region properties.

6,505 citations

Proceedings ArticleDOI
01 May 1995
TL;DR: This paper quantitatively characterize the SPLASH-2 programs in terms of fundamental properties and architectural interactions that are important to understand them well, including the computational load balance, communication to computation ratio and traffic needs, important working set sizes, and issues related to spatial locality.
Abstract: The SPLASH-2 suite of parallel applications has recently been released to facilitate the study of centralized and distributed shared-address-space multiprocessors. In this context, this paper has two goals. One is to quantitatively characterize the SPLASH-2 programs in terms of fundamental properties and architectural interactions that are important to understand them well. The properties we study include the computational load balance, communication to computation ratio and traffic needs, important working set sizes, and issues related to spatial locality, as well as how these properties scale with problem size and the number of processors. The other, related goal is methodological: to assist people who will use the programs in architectural evaluations to prune the space of application and machine parameters in an informed and meaningful way. For example, by characterizing the working sets of the applications, we describe which operating points in terms of cache size and problem size are representative of realistic situations, which are not, and which re redundant. Using SPLASH-2 as an example, we hope to convey the importance of understanding the interplay of problem size, number of processors, and working sets in designing experiments and interpreting their results.

4,002 citations

Proceedings ArticleDOI
02 Dec 2001
TL;DR: A new version of SimpleScalar that has been adapted to the ARM instruction set is used to characterize the performance of the benchmarks using configurations similar to current and next generation embedded processors.
Abstract: This paper examines a set of commercially representative embedded programs and compares them to an existing benchmark suite, SPEC2000. A new version of SimpleScalar that has been adapted to the ARM instruction set is used to characterize the performance of the benchmarks using configurations similar to current and next generation embedded processors. Several characteristics distinguish the representative embedded programs from the existing SPEC benchmarks including instruction distribution, memory behavior, and available parallelism. The embedded benchmarks, called MiBench, are freely available to all researchers.

3,548 citations

Proceedings ArticleDOI
25 Oct 2008
TL;DR: This paper presents and characterizes the Princeton Application Repository for Shared-Memory Computers (PARSEC), a benchmark suite for studies of Chip-Multiprocessors (CMPs), and shows that the benchmark suite covers a wide spectrum of working sets, locality, data sharing, synchronization and off-chip traffic.
Abstract: This paper presents and characterizes the Princeton Application Repository for Shared-Memory Computers (PARSEC), a benchmark suite for studies of Chip-Multiprocessors (CMPs). Previous available benchmarks for multiprocessors have focused on high-performance computing applications and used a limited number of synchronization methods. PARSEC includes emerging applications in recognition, mining and synthesis (RMS) as well as systems applications which mimic large-scale multithreaded commercial programs. Our characterization shows that the benchmark suite covers a wide spectrum of working sets, locality, data sharing, synchronization and off-chip traffic. The benchmark suite has been made available to the public.

3,514 citations