scispace - formally typeset
Author

Doug Burger

Bio: Doug Burger is an academic researcher from Microsoft. The author has contributed to research in topic(s): Microarchitecture & Cache. The author has an hindex of 67, co-authored 182 publication(s) receiving 26634 citation(s). Previous affiliations of Doug Burger include University of Wisconsin-Madison & University of Michigan.


Papers
More filters
Journal ArticleDOI
TL;DR: This document describes release 2.0 of the SimpleScalar tool set, a suite of free, publicly available simulation tools that offer both detailed and high-performance simulation of modern microprocessors.
Abstract: This document describes release 2.0 of the SimpleScalar tool set, a suite of free, publicly available simulation tools that offer both detailed and high-performance simulation of modern microprocessors. The new release offers more tools and capabilities, precompiled binaries, cleaner interfaces, better documentation, easier installation, improved portability, and higher performance. This paper contains a complete description of the tool set, including retrieval and installation instructions, a description of how to use the tools, a description of the target SimpleScalar architecture, and many details about the internals of the tools and how to customize them. With this guide, the tool set can be brought up and generating results in under an hour (on supported platforms).

3,061 citations

Proceedings ArticleDOI
23 Jun 2002
TL;DR: An end-to-end model is described and validated that enables us to compute the soft error rates (SER) for existing and future microprocessor-style designs and predicts that the SER per chip of logic circuits will increase nine orders of magnitude from 1992 to 2011 and at that point will be comparable to the SERper chip of unprotected memory elements.
Abstract: This paper examines the effect of technology scaling and microarchitectural trends on the rate of soft errors in CMOS memory and logic circuits. We describe and validate an end-to-end model that enables us to compute the soft error rates (SER) for existing and future microprocessor-style designs. The model captures the effects of two important masking phenomena, electrical masking and latching-window masking, which inhibit soft errors in combinational logic. We quantify the SER due to high-energy neutrons in SRAM cells, latches, and logic circuits for feature sizes from 600 nm to 50 nm and clock periods from 16 to 6 fan-out-of-4 inverter delays. Our model predicts that the SER per chip of logic circuits will increase nine orders of magnitude from 1992 to 2011 and at that point will be comparable to the SER per chip of unprotected memory elements. Our result emphasizes that computer system designers must address the risks of soft errors in logic circuits for future designs.

1,474 citations

Journal ArticleDOI
TL;DR: A comprehensive study that projects the speedup potential of future multicores and examines the underutilization of integration capacity-dark silicon-is timely and crucial.
Abstract: A key question for the microprocessor research and design community is whether scaling multicores will provide the performance and value needed to scale down many more technology generations. To provide a quantitative answer to this question, a comprehensive study that projects the speedup potential of future multicores and examines the underutilization of integration capacity-dark silicon-is timely and crucial.

1,463 citations

Proceedings ArticleDOI
20 Jun 2009
TL;DR: This work proposes, crafted from a fundamental understanding of PCM technology parameters, area-neutral architectural enhancements that address these limitations and make PCM competitive with DRAM.
Abstract: Memory scaling is in jeopardy as charge storage and sensing mechanisms become less reliable for prevalent memory technologies, such as DRAM. In contrast, phase change memory (PCM) storage relies on scalable current and thermal mechanisms. To exploit PCM's scalability as a DRAM alternative, PCM must be architected to address relatively long latencies, high energy writes, and finite endurance.We propose, crafted from a fundamental understanding of PCM technology parameters, area-neutral architectural enhancements that address these limitations and make PCM competitive with DRAM. A baseline PCM system is 1.6x slower and requires 2.2x more energy than a DRAM system. Buffer reorganizations reduce this delay and energy gap to 1.2x and 1.0x, using narrow rows to mitigate write energy and multiple rows to improve locality and write coalescing. Partial writes enhance memory endurance, providing 5.6 years of lifetime. Process scaling will further reduce PCM energy costs and improve endurance.

1,459 citations

Proceedings ArticleDOI
04 Jun 2011
TL;DR: The study shows that regardless of chip organization and topology, multicore scaling is power limited to a degree not widely appreciated by the computing community.
Abstract: Since 2005, processor designers have increased core counts to exploit Moore's Law scaling, rather than focusing on single-core performance. The failure of Dennard scaling, to which the shift to multicore parts is partially a response, may soon limit multicore scaling just as single-core scaling has been curtailed. This paper models multicore scaling limits by combining device scaling, single-core scaling, and multicore scaling to measure the speedup potential for a set of parallel workloads for the next five technology generations. For device scaling, we use both the ITRS projections and a set of more conservative device scaling parameters. To model single-core scaling, we combine measurements from over 150 processors to derive Pareto-optimal frontiers for area/performance and power/performance. Finally, to model multicore scaling, we build a detailed performance model of upper-bound performance and lower-bound core power. The multicore designs we study include single-threaded CPU-like and massively threaded GPU-like multicore chip organizations with symmetric, asymmetric, dynamic, and composed topologies. The study shows that regardless of chip organization and topology, multicore scaling is power limited to a degree not widely appreciated by the computing community. Even at 22 nm (just one year from now), 21% of a fixed-size chip must be powered off, and at 8 nm, this number grows to more than 50%. Through 2024, only 7.9x average speedup is possible across commonly used parallel workloads, leaving a nearly 24-fold gap from a target of doubled performance per generation.

1,352 citations


Cited by
More filters
Proceedings ArticleDOI
02 Dec 2001
TL;DR: A new version of SimpleScalar that has been adapted to the ARM instruction set is used to characterize the performance of the benchmarks using configurations similar to current and next generation embedded processors.
Abstract: This paper examines a set of commercially representative embedded programs and compares them to an existing benchmark suite, SPEC2000. A new version of SimpleScalar that has been adapted to the ARM instruction set is used to characterize the performance of the benchmarks using configurations similar to current and next generation embedded processors. Several characteristics distinguish the representative embedded programs from the existing SPEC benchmarks including instruction distribution, memory behavior, and available parallelism. The embedded benchmarks, called MiBench, are freely available to all researchers.

3,418 citations

Journal ArticleDOI
TL;DR: It is shown how this model can be used to draw together a wide range of diverse data from cognitive, social, developmental, personality, clinical, and neuropsychological autobiographical memory research.
Abstract: The authors describe a model of autobiographical memory in which memories are transitory mental constructions within a self-memory system (SMS). The SMS contains an autobiographical knowledge base and current goals of the working self. Within the SMS, control processes modulate access to the knowledge base by successively shaping cues used to activate autobiographical memory knowledge structures and, in this way, form specific memories. The relation of the knowledge base to active goals is reciprocal, and the knowledge base "grounds" the goals of the working self. It is shown how this model can be used to draw together a wide range of diverse data from cognitive, social, developmental, personality, clinical, and neuropsychological autobiographical memory research.

3,133 citations

Journal ArticleDOI
TL;DR: This document describes release 2.0 of the SimpleScalar tool set, a suite of free, publicly available simulation tools that offer both detailed and high-performance simulation of modern microprocessors.
Abstract: This document describes release 2.0 of the SimpleScalar tool set, a suite of free, publicly available simulation tools that offer both detailed and high-performance simulation of modern microprocessors. The new release offers more tools and capabilities, precompiled binaries, cleaner interfaces, better documentation, easier installation, improved portability, and higher performance. This paper contains a complete description of the tool set, including retrieval and installation instructions, a description of how to use the tools, a description of the target SimpleScalar architecture, and many details about the internals of the tools and how to customize them. With this guide, the tool set can be brought up and generating results in under an hour (on supported platforms).

3,061 citations

Proceedings ArticleDOI
01 May 2000
TL;DR: Wattch is presented, a framework for analyzing and optimizing microprocessor power dissipation at the architecture-level and opens up the field of power-efficient computing to a wider range of researchers by providing a power evaluation methodology within the portable and familiar SimpleScalar framework.
Abstract: Power dissipation and thermal issues are increasingly significant in modern processors. As a result, it is crucial that power/performance tradeoffs be made more visible to chip architects and even compiler writers, in addition to circuit designers. Most existing power analysis tools achieve high accuracy by calculating power estimates for designs only after layout or floorplanning are complete. In addition to being available only late in the design process, such tools are often quite slow, which compounds the difficulty of running them for a large space of design possibilities.This paper presents Wattch, a framework for analyzing and optimizing microprocessor power dissipation at the architecture-level. Wattch is 1000X or more faster than existing layout-level power tools, and yet maintains accuracy within 10% of their estimates as verified using industry tools on leading-edge designs. This paper presents several validations of Wattch's accuracy. In addition, we present three examples that demonstrate how architects or compiler writers might use Wattch to evaluate power consumption in their design process.We see Wattch as a complement to existing lower-level tools; it allows architects to explore and cull the design space early on, using faster, higher-level tools. It also opens up the field of power-efficient computing to a wider range of researchers by providing a power evaluation methodology within the portable and familiar SimpleScalar framework.

2,828 citations

Proceedings ArticleDOI
12 Dec 2009
TL;DR: Combining power, area, and timing results of McPAT with performance simulation of PARSEC benchmarks at the 22nm technology node for both common in-order and out-of-order manycore designs shows that when die cost is not taken into account clustering 8 cores together gives the best energy-delay product, whereas when cost is taking into account configuring clusters with 4 cores gives thebest EDA2P and EDAP.
Abstract: This paper introduces McPAT, an integrated power, area, and timing modeling framework that supports comprehensive design space exploration for multicore and manycore processor configurations ranging from 90nm to 22nm and beyond. At the microarchitectural level, McPAT includes models for the fundamental components of a chip multiprocessor, including in-order and out-of-order processor cores, networks-on-chip, shared caches, integrated memory controllers, and multiple-domain clocking. At the circuit and technology levels, McPAT supports critical-path timing modeling, area modeling, and dynamic, short-circuit, and leakage power modeling for each of the device types forecast in the ITRS roadmap including bulk CMOS, SOI, and double-gate transistors. McPAT has a flexible XML interface to facilitate its use with many performance simulators. Combined with a performance simulator, McPAT enables architects to consistently quantify the cost of new ideas and assess tradeoffs of different architectures using new metrics like energy-delay-area2 product (EDA2P) and energy-delay-area product (EDAP). This paper explores the interconnect options of future manycore processors by varying the degree of clustering over generations of process technologies. Clustering will bring interesting tradeoffs between area and performance because the interconnects needed to group cores into clusters incur area overhead, but many applications can make good use of them due to synergies of cache sharing. Combining power, area, and timing results of McPAT with performance simulation of PARSEC benchmarks at the 22nm technology node for both common in-order and out-of-order manycore designs shows that when die cost is not taken into account clustering 8 cores together gives the best energy-delay product, whereas when cost is taken into account configuring clusters with 4 cores gives the best EDA2P and EDAP.

2,287 citations