Showing papers in "IEEE Micro in 2006"

PDF

Open Access

Journal Article•DOI•

The M5 Simulator: Modeling Networked Systems

[...]

Nathan Binkert¹, Ronald G. Dreslinski¹, Lisa R. Hsu¹, Kevin Lim¹, Ali G. Saidi¹, Steven K. Reinhardt¹ - Show less +2 more•Institutions (1)

University of Michigan¹

01 Jul 2006-IEEE Micro

TL;DR: The M5 simulator provides features necessary for simulating networked hosts, including full-system capability, a detailed I/O subsystem, and the ability to simulate multiple networked systems deterministically.

...read moreread less

Abstract: The M5 simulator is developed specifically to enable research in TCP/IP networking. The M5 simulator provides features necessary for simulating networked hosts, including full-system capability, a detailed I/O subsystem, and the ability to simulate multiple networked systems deterministically. M5's usefulness as a general-purpose architecture simulator and its liberal open-source license has led to its adoption by several academic and commercial groups

...read moreread less

839 citations

Journal Article•DOI•

CMOS Photonics for High-Speed Interconnects

[...]

Cary Gunn¹•Institutions (1)

Luxtera¹

01 Mar 2006-IEEE Micro

TL;DR: Luxtera has demonstrated the technology required to implement CMOS photonics, and product development is underway as discussed by the authors for 10-Gbps operation, in addition to that required to scale to 100 Gbps and 1 Tbps.

...read moreread less

Abstract: Luxtera has demonstrated the technology required to implement CMOS photonics, and product development is underway. It has also demonstrated all the technology required for 10-Gbps operation, in addition to that required to scale to 100 Gbps and 1 Tbps. A single 10-Gbps channel today integrates tens of optical components into a single die alongside circuitry of modest gate count, 100,000 per transceiver. For the first time, high-speed optical communications directly between silicon die are possible at a price-performance point competitive with traditional electrical interconnects

...read moreread less

493 citations

Journal Article•DOI•

Synergistic Processing in Cell's Multicore Architecture

[...]

Michael K. Gschwind¹, Harm Peter Hofstee¹, Brian Flachs¹, M. Hopkin¹, Y. Watanabe², Takeshi Yamazaki³ - Show less +2 more•Institutions (3)

IBM¹, Toshiba², Sony Computer Entertainment³

01 Mar 2006-IEEE Micro

TL;DR: The streamlined architecture provides an efficient multithreaded execution environment for both scalar and SIMD threads and represents a reaffirmation of the RISC principles of combining leading edge architecture and compiler optimizations.

...read moreread less

Abstract: Eight synergistic processor units enable the Cell Broadband Engine's breakthrough performance. The SPU architecture implements a novel, pervasively data-parallel architecture combining scalar and SIMD processing on a wide data path. A large number of SPUs per chip provide high thread-level parallelism. The streamlined architecture provides an efficient multithreaded execution environment for both scalar and SIMD threads and represents a reaffirmation of the RISC principles of combining leading edge architecture and compiler optimizations. These design decisions have enabled the Cell BE to deliver unprecedented supercomputer-class compute power for consumer applications

...read moreread less

463 citations

Journal Article•DOI•

Cell Multiprocessor Communication Network: Built for Speed

[...]

Michael Kistler¹, Michael P. Perrone¹, Fabrizio Petrini²•Institutions (2)

IBM¹, Pacific Northwest National Laboratory²

01 May 2006-IEEE Micro

TL;DR: The authors analyze the cell processor's communication network, using a series of benchmarks involving various DMA traffic patterns and synchronization protocols to illuminate this important point in multicore processor design.

...read moreread less

Abstract: Multicore designs promise various power-performance and area-performance benefits. But inadequate design of the on-chip communication network can deprive applications of these benefits. To illuminate this important point in multicore processor design, the authors analyze the cell processor's communication network, using a series of benchmarks involving various DMA traffic patterns and synchronization protocols

...read moreread less

391 citations

Journal Article•DOI•

SimFlex: Statistical Sampling of Computer System Simulation

[...]

Thomas F. Wenisch¹, Roland E. Wunderlich¹, Michael Ferdman¹, Anastasia Ailamaki¹, Babak Falsafi¹, James C. Hoe¹ - Show less +2 more•Institutions (1)

Carnegie Mellon University¹

01 Jul 2006-IEEE Micro

TL;DR: Statistical sampling makes simulation-based studies feasible by providing ten-thousand-fold reductions in simulation runtime and enabling thousand-way simulation parallelism.

...read moreread less

Abstract: Timing-accurate full-system multiprocessor simulations can take years because of architecture and application complexity. Statistical sampling makes simulation-based studies feasible by providing ten-thousand-fold reductions in simulation runtime and enabling thousand-way simulation parallelism

...read moreread less

339 citations

Journal Article•DOI•

Unbounded Transactional Memory

[...]

C.S. Ananian¹, Krste Asanovic¹, Bradley C. Kuszmaul¹, Charles E. Leiserson¹, S. Lie² - Show less +1 more•Institutions (2)

Massachusetts Institute of Technology¹, Advanced Micro Devices²

01 Jan 2006-IEEE Micro

TL;DR: A hardware implementation of unbounded transactional memory, called UTM, is described, which exploits the common case for performance without sacrificing correctness on transactions whose footprint can be nearly as large as virtual memory.

...read moreread less

Abstract: This article advances the following thesis: transactional memory should be virtualized to support transactions of arbitrary footprint and duration. Such support should be provided through hardware and be made visible to software through the machines instruction set architecture. We call a transactional memory system unbounded if the system can handle transactions of arbitrary duration that have footprints nearly as big as the systems virtual memory. The primary goal of unbounded transactional memory is to make concurrent programming easier without incurring much implementation overhead. Unbounded transactional-memory architectures can achieve high performance in the common case of small transactions, without sacrificing correctness in large transactions

...read moreread less

295 citations

Journal Article•DOI•

Old and New

[...]

R. Mateosian

01 Jul 2006-IEEE Micro

TL;DR: Richard Mateosian reviews old and new books, including Weinberg on Writing--The Fieldstone Method, The Art of Computer Programming, From Java to Ruby--Things Every Manager Should Know, and Introduction to DITA--A User Guide to the Darwin Information Typing Architecture.

...read moreread less

Abstract: Richard Mateosian reviews old and new books, including Weinberg on Writing--The Fieldstone Method, The Art of Computer Programming, From Java to Ruby--Things Every Manager Should Know, and Introduction to DITA--A User Guide to the Darwin Information Typing Architecture.

...read moreread less

228 citations

Journal Article•DOI•

IPC Considered Harmful for Multiprocessor Workloads

[...]

Alaa R. Alameldeen¹, Darien Wood²•Institutions (2)

Intel¹, University of Wisconsin-Madison²

01 Jul 2006-IEEE Micro

TL;DR: This article challenges the commonly held view that IPC accurately reflects performance - at least for multithreaded workloads running on multiprocessors, and concludes that work-related metrics, such as time per transaction, are the most accurate and reliable way to estimate multip rocessor workload performance.

...read moreread less

Abstract: Many architectural simulation studies use instructions per cycle (IPC) to analyze performance. In this article, we challenge the commonly held view that IPC accurately reflects performance - at least for multithreaded workloads running on multiprocessors. Work-related metrics, such as time per transaction, are the most accurate and reliable way to estimate multiprocessor workload performance

...read moreread less

150 citations

Journal Article•DOI•

Digitally Assisted Analog Circuits

[...]

Boris Murmann¹•Institutions (1)

Stanford University¹

01 Mar 2006-IEEE Micro

TL;DR: Digitally assisted analog circuits can exploit digital circuits' high density and low energy per computation to enable a new generation of interface electronics based on minimal-precision, low-complexity analog building blocks.

...read moreread less

Abstract: Today's interfaces between digital and "real world" analog signals rely mainly on complex analog circuit components that strictly limit achievable power efficiency and throughput. Digitally assisted analog circuits can exploit digital circuits' high density and low energy per computation to enable a new generation of interface electronics based on minimal-precision, low-complexity analog building blocks

...read moreread less

145 citations

Journal Article•DOI•

Leakage Power Analysis and Reduction for Nanoscale Circuits

[...]

Amit Agarwal¹, Saibal Mukhopadhyay², Arijit Raychowdhury², Kaushik Roy², Chris H. Kim³ - Show less +1 more•Institutions (3)

Intel¹, Purdue University², University of Minnesota³

01 Mar 2006-IEEE Micro

TL;DR: Leakage current in the nanometer regime has become a significant portion of power dissipation in CMOS circuits as threshold voltage, channel length, and gate oxide thickness scale downward.

...read moreread less

Abstract: Leakage current in the nanometer regime has become a significant portion of power dissipation in CMOS circuits as threshold voltage, channel length, and gate oxide thickness scale downward. Various techniques are available to reduce leakage power in high-performance systems

...read moreread less

137 citations

Journal Article•DOI•

Impact of Parameter Variations on Circuits and Microarchitecture

[...]

Osman Unsal, J. Tschanz¹, Keith Bowman¹, Vivek De¹, Xavier Vera¹, Antonio González², Oguz Ergin³ - Show less +3 more•Institutions (3)

Intel¹, Polytechnic University of Catalonia², TOBB University of Economics and Technology³

01 Nov 2006-IEEE Micro

TL;DR: Variability must be considered at both the circuit and micro-architectural design levels to keep pace with performance scaling and to keep power consumption within reasonable limits as mentioned in this paper, and an overview of the main sources of variability can be found in this paper.

...read moreread less

Abstract: Parameter variations, which are increasing along with advances in process technologies, affect both timing and power. Variability must be considered at both the circuit and microarchitectural design levels to keep pace with performance scaling and to keep power consumption within reasonable limits. This article presents an overview of the main sources of variability and surveys variation-tolerant circuit and microarchitectural approaches

...read moreread less

Journal Article•DOI•

Using Bulk Built-in Current Sensors to Detect Soft Errors

[...]

Egas Henes Neto, Ivandro Ribeiro, Michele Vieira, Gilson Wirth, Fernanda Lima Kastensmidt¹ - Show less +1 more•Institutions (1)

Universidade Federal do Rio Grande do Sul¹

01 Sep 2006-IEEE Micro

TL;DR: Connecting a built-in current sensor in the design bulk of a digital system increases sensitivity for detecting transient upsets in combinational and sequential logic.

...read moreread less

Abstract: Connecting a built-in current sensor in the design bulk of a digital system increases sensitivity for detecting transient upsets in combinational and sequential logic. SPICE simulations validate this approach and show only minor penalties in terms of area, performance, and power consumption

...read moreread less

Journal Article•DOI•

SeaStar Interconnect: Balanced Bandwidth for Scalable Performance

[...]

Ron Brightwell¹, K.T. Predretti¹, Keith D. Underwood¹, Trammell Hudson•Institutions (1)

Sandia National Laboratories¹

01 May 2006-IEEE Micro

TL;DR: The SeaStar was designed specifically to support Sandia National Laboratories' ASC Red Storm, a distributed-memory parallel computing platform containing more than 11,000 network end-points and presented designers with several challenging goals that were commensurate with a high-performance network for a system of that scale.

...read moreread less

Abstract: The Seastar, a new ASIC from Cray, is a full system-on-chip design that integrates high-speed serial links, a 3D router, and traditional network interface functionality, including an embedded processor in a single chip. Cray Inc. designed the SeaStar specifically to support Sandia National Laboratories' ASC Red Storm, a distributed-memory parallel computing platform containing more than 11,000 network end-points. SeaStar presented designers with several challenging goals that were commensurate with a high-performance network for a system of that scale. The primary challenge was to provide a well-balanced, highly scalable, highly reliable network. From the Red Storm perspective, a balanced network is one that maximizes network performance relative to the computational power of the network end-points. A main challenge for SeaStar was to maximize the bytes-to-flops ratio of network bandwidth - that is, to maximize the amount of network bandwidth relative to each nodes floating-point capability

...read moreread less

Journal Article•DOI•

Xbox 360 System Architecture

[...]

Jeffrey A. Andrews¹, Nicholas R. Baker¹•Institutions (1)

Microsoft¹

01 Mar 2006-IEEE Micro

TL;DR: The Xbox 360 contains an aggressive hardware architecture and implementation targeted at game console workloads that implements the product designers' goal of providing game developers a hardware platform to implement their next-generation game ambitions.

...read moreread less

Abstract: This article covers the Xbox 360's high-level technical requirements, a short system overview, and details of the CPU and the GPU. The Xbox 360 contains an aggressive hardware architecture and implementation targeted at game console workloads. The core silicon implements the product designers' goal of providing game developers a hardware platform to implement their next-generation game ambitions. The core chips include the standard conceptual blocks of CPU, graphics processing unit (GPU), memory, and I/O. Each of these components and their interconnections are customized to provide a user-friendly game console product. The authors describe their architectural trade-offs and summarize the system's software programming support

...read moreread less

Journal Article•DOI•

Efficient Runahead Execution: Power-Efficient Memory Latency Tolerance

[...]

Onur Mutlu, Hyesoon Kim, Yale N. Patt

01 Jan 2006-IEEE Micro

TL;DR: For runahead execution to be efficiently implemented in current or future high-performance processors which will be energy-constrained, processor designers must develop techniques to reduce these extra instructions.

...read moreread less

Abstract: Today's high-performance processors face main-memory latencies on the order of hundreds of processor clock cycles. As a result, even the most aggressive processors spend a significant portion of their execution time stalling and waiting for main-memory accesses to return data to the execution core. Runahead execution is a promising way to tolerate long main-memory latencies because it has modest hardware cost and doesn't significantly increase processor complexity. Runahead execution improves a processors performance by speculatively pre-executing the application program while the processor services a long-latency (1,2) data cache miss, instead of stalling the processor for the duration of the L2 miss. For runahead execution to be efficiently implemented in current or future high-performance processors which will be energy-constrained, processor designers must develop techniques to reduce these extra instructions. Our solution to this problem includes both hardware and software mechanisms that are simple, implementable, and effective

...read moreread less

Journal Article•DOI•

Coarse-Grain Coherence Tracking: RegionScout and Region Coherence Arrays

[...]

Jason F. Cantin¹, James E. Smith¹, Mikko H. Lipasti¹, Andreas Moshovos², Babak Falsafi³ - Show less +1 more•Institutions (3)

University of Wisconsin-Madison¹, University of Toronto², Carnegie Mellon University³

01 Jan 2006-IEEE Micro

TL;DR: Two CGCT implementations are presented, RegionScout and Region Coherence Arrays, and simulation results for a broadcast-based multiprocessor system running commercial, scientific, and multiprogrammed workloads are provided.

...read moreread less

Abstract: Cache-coherent shared-memory multiprocessors have wide-ranging applications, from commercial transaction processing and database services to large-scale scientific computing. Coarse-grain coherence tracking (CGCT) is a new technique that extends a conventional coherence mechanism and optimizes coherence enforcement. It monitors the coherence status of large regions of memory and uses that information to avoid unnecessary broadcasts and filter unnecessary cache tag lookups, thus improving system performance and power consumption. This article presents two CGCT implementations, RegionScout and Region Coherence Arrays, and provides simulation results for a broadcast-based multiprocessor system running commercial, scientific, and multiprogrammed workloads

...read moreread less

Journal Article•DOI•

Designing a Crossbar Scheduler for HPC Applications

[...]

Cyriel Minkenberg¹, Francois Abel¹, P. Muller¹, Rajaram B. Krishnamurthy¹, Mitch Gusat¹, P. Dill¹, Ilias Iliadis¹, Ronald P. Luijten¹, Roe Hemenway², Richard R. Grzybowski², E. Schiattarella³ - Show less +7 more•Institutions (3)

IBM¹, Corning Inc.², Polytechnic University of Turin³

01 May 2006-IEEE Micro

TL;DR: The Corning-IBM joint optical shared memory supercomputer interconnect system (Osmosis) project explores the opportunity to advance the role of optical-switching technologies in high-performance computing systems.

...read moreread less

Abstract: A crucial part of any high-performance computing (HPC) system is its interconnection network. Corning and IBM are jointly developing a demonstration interconnect based on optical cell switching with electronic control. The Corning-IBM joint optical shared memory supercomputer interconnect system (Osmosis) project explores the opportunity to advance the role of optical-switching technologies in such systems. Key innovations in the scheduler architecture directly address the main HPC requirements: low latency, high throughput, efficient multicast support, and high reliability

...read moreread less

Journal Article•DOI•

Opportunistic Transient-Fault Detection

[...]

Mohamed Gomaa¹, T. N. Vijaykumar¹•Institutions (1)

Purdue University¹

01 Jan 2006-IEEE Micro

TL;DR: The authors target better coverage while incurring minimal performance degradation by opportunistically using redundancy in future commodity microprocessors.

...read moreread less

Abstract: CMOS scaling continues to enable faster transistors and lower supply voltage, improving microprocessor performance and reducing per-transistor power. The downside of scaling is increased susceptibility to soft errors due to strikes by cosmic particles and radiation from packaging materials. The result is degraded reliability in future commodity microprocessors. The authors target better coverage while incurring minimal performance degradation by opportunistically using redundancy

...read moreread less

Journal Article•DOI•

Efficient, Scalable Congestion Management for Interconnection Networks

[...]

Pedro Javier Garcia¹, Francisco J. Quiles¹, Jose Flich², José Duato², Ian Johnson, Finbar Naven - Show less +2 more•Institutions (2)

University of Castilla–La Mancha¹, Polytechnic University of Valencia²

01 Sep 2006-IEEE Micro

TL;DR: Among proposed strategies for congestion management, only the regional explicit congestion notification (RECN) mechanism achieves both the required efficiency and the scalability that emerging systems demand.

...read moreread less

Abstract: Compared to the overdimensioned designs of the past, current interconnection networks operate closer to the point of saturation and run a higher risk of congestion. Among proposed strategies for congestion management, only the regional explicit congestion notification (RECN) mechanism achieves both the required efficiency and the scalability that emerging systems demand

...read moreread less

Journal Article•DOI•

Architectures for Bit-Split String Scanning in Intrusion Detection

[...]

Lin Tan¹, Timothy Sherwood²•Institutions (2)

University of Illinois at Urbana–Champaign¹, University of California, Santa Barbara²

01 Jan 2006-IEEE Micro

TL;DR: Through careful codesign and optimization of an architecture with a new string matching algorithm, the authors show it is possible to build a system that is almost 12 times more efficient than the currently best known approaches.

...read moreread less

Abstract: String matching is a critical element of modern intrusion detection systems because it lets a system make decisions based not just on headers, but actual content flowing through the network. Through careful codesign and optimization of an architecture with a new string matching algorithm, the authors show it is possible to build a system that is almost 12 times more efficient than the currently best known approaches

...read moreread less

Journal Article•DOI•

Dynamic-Compiler-Driven Control for Microprocessor Energy and Performance

[...]

Qiang Wu¹, Margaret Martonosi¹, D.W. Clark¹, Vijay Janapa Reddi², Daniel A. Connors², Youfeng Wu³, Jaekyu Lee³, David Brooks⁴ - Show less +4 more•Institutions (4)

Princeton University¹, University of Colorado Boulder², Intel³, Harvard University⁴

01 Jan 2006-IEEE Micro

TL;DR: A dynamic-compiler-driven runtime voltage and frequency optimizer is proposed for microprocessors that achieves energy savings of up to 70 percent and can be implemented and deployed in a real system.

...read moreread less

Abstract: A general dynamic-compilation environment offers power and performance control opportunities for microprocessors. The authors propose a dynamic-compiler-driven runtime voltage and frequency optimizer. A prototype of their design, implemented and deployed in a real system, achieves energy savings of up to 70 percent

...read moreread less

Journal Article•DOI•

Temperature-Aware On-Chip Networks

[...]

Li Shang¹, Li-Shiuan Peh², Amit Kumar², Niraj K. Jha²•Institutions (2)

Queen's University¹, Princeton University²

01 Jan 2006-IEEE Micro

TL;DR: Sirius, an thermal modeling and simulation framework combines with ThermalHerd, a distributed runtime scheme for thermal management to offer a path to thermally efficient on-chip network design.

...read moreread less

Abstract: On-chip networks are becoming increasingly popular as a way to connect high-performance single-chip computer systems, but thermal issues greatly limit network design. Sirius, an thermal modeling and simulation framework combines with ThermalHerd, a distributed runtime scheme for thermal management to offer a path to thermally efficient on-chip network design

...read moreread less

Journal Article•DOI•

SWICH: A Prototype for Efficient Cache-Level Checkpointing and Rollback

[...]

Radu Teodorescu¹, Jun Nakano¹, Josep Torrellas¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Sep 2006-IEEE Micro

TL;DR: Swich is introduced, an FPGA-based prototype of a new cache-level scheme that keeps two live checkpoints at all times, forming a sliding rollback window that maintains a large minimum and average length.

...read moreread less

Abstract: Existing cache-level checkpointing schemes do not continuously support a large rollback window. Immediately after a checkpoint, the number of instructions that the processor can undo falls to zero. To address this problem, we introduce Swich, an FPGA-based prototype of a new cache-level scheme that keeps two live checkpoints at all times, forming a sliding rollback window that maintains a large minimum and average length

...read moreread less

Journal Article•DOI•

Adaptive History-Based Memory Schedulers for Modern Processors

[...]

Ibrahim Hur¹, Calvin Lin²•Institutions (2)

IBM¹, University of Texas at Austin²

01 Jan 2006-IEEE Micro

TL;DR: A new memory scheduler is presented that makes decisions based on the history of recently scheduled operations, providing two advantages: it can better reason about the delays associated with complex DRAM structure, and it can adapt to different observed workload.

...read moreread less

Abstract: Careful memory scheduling can increase memory bandwidth and overall system performance. We present a new memory scheduler that makes decisions based on the history of recently scheduled operations, providing two advantages: it can better reason about the delays associated with complex DRAM structure, and it can adapt to different observed workload

...read moreread less

Journal Article•DOI•

BugNet: Recording Application-Level Execution for Deterministic Replay Debugging

[...]

Satish Narayanasamy¹, Gilles Pokam¹, Brad Calder¹•Institutions (1)

University of California, San Diego¹

01 Jan 2006-IEEE Micro

TL;DR: With software's increasing complexity, providing efficient hardware support for software debugging is critical and will allow developers to deterministically replay and debug an application to pin-point the root cause of a bug.

...read moreread less

Abstract: With software's increasing complexity, providing efficient hardware support for software debugging is critical. Hardware support is necessary to observe and capture, with little or no overhead, the exact execution of a program. Providing this ability to developers will allow them to deterministically replay and debug an application to pin-point the root cause of a bug

...read moreread less

Journal Article•DOI•

Energy-Efficient Thread-Level Speculation

[...]

Jose Renau¹, Karin Strauss², Luis Ceze², Wei Liu², Smruti R. Sarangi², James Tuck², Josep Torrellas² - Show less +3 more•Institutions (2)

University of California, Santa Cruz¹, University of Illinois at Urbana–Champaign²

01 Jan 2006-IEEE Micro

TL;DR: This article refutes the claim that such a design of chip multiprocessors with thread-level speculation is necessarily too energy inefficient and proposes out-of-order task spawning to exploit more sources of speculative task-level parallelism.

...read moreread less

Abstract: Chip multiprocessors with thread-level speculation have become the subject of intense research, this article refutes the claim that such a design is necessarily too energy inefficient. In addition, it proposes out-of-order task spawning to exploit more sources of speculative task-level parallelism

...read moreread less

Journal Article•DOI•

Microarchitectural Protection Against Stack-Based Buffer Overflow Attacks

[...]

Yong-Joon Park¹, Zhao Zhang¹, Gyungho Lee²•Institutions (2)

Iowa State University¹, University of Illinois at Chicago²

01 Jul 2006-IEEE Micro

TL;DR: A microarchitecture-based, software-transparent mechanism offers protection against stack-based buffer overflow attacks with moderate hardware cost and negligible performance overhead.

...read moreread less

Abstract: Although researchers have proposed several software approaches to preventing buffer overflow attacks, adversaries still extensively exploit this vulnerability. A microarchitecture-based, software-transparent mechanism offers protection against stack-based buffer overflow attacks with moderate hardware cost and negligible performance overhead

...read moreread less

Journal Article•DOI•

Designing reliable systems with unreliable components

[...]

Pradip Bose¹•Institutions (1)

IBM¹

01 Sep 2006-IEEE Micro

TL;DR: Three articles in this general issue of IEEE Micro address the challenge of reliable designs of the future.

...read moreread less

Abstract: Many electronics experts predicted that component failures (in particular, tube failures) in the pioneering ENIAC machine would be so frequent that the machine would never be useful. But the engineers (system architects) and component manufacturers improved their art over time to improve the systems availability. Their achievement of remarkably low failure rates should serve as an inspiration to chip- and system-level designers today. Three articles in this general issue of IEEE Micro address the challenge of reliable designs of the future.

...read moreread less

Journal Article•DOI•

Efficient Sampling Startup for SimPoint

[...]

M. Van Biesbrouck¹, Brad Calder¹, Lieven Eeckhout²•Institutions (2)

University of California, Berkeley¹, Ghent University²

01 Jul 2006-IEEE Micro

TL;DR: The accuracy and speed of various sampling startup techniques are compared, introducing touched memory image and memory hierarchy state, to reduce sampled benchmark simulation times from hours to minutes.

...read moreread less

Abstract: Sampling techniques dramatically shorten simulation times for industry-standard benchmarks, but establishing the correct architecture and microarchitecture states at the beginning of each sample can be time-consuming. This article compares the accuracy and speed of various sampling startup techniques, introducing touched memory image and memory hierarchy state. Together, these two techniques reduce sampled benchmark simulation times from hours to minutes

...read moreread less

Journal Article•DOI•

A Software-Configurable Processor Architecture

[...]

R.E. Gonzalez

01 Sep 2006-IEEE Micro

TL;DR: A software-configurable processor combines a traditional RISC processor with a field-programmable instruction extension unit that lets the system designer tailor the processor to a particular application.

...read moreread less

Abstract: A software-configurable processor combines a traditional RISC processor with a field-programmable instruction extension unit that lets the system designer tailor the processor to a particular application. To add application-specific instructions to the processor, the programmer adds a pragma before a C or C++ function declaration, and the compiler then turns the function into a single instruction

...read moreread less