scispace - formally typeset
Search or ask a question
Author

Jun Nakano

Bio: Jun Nakano is an academic researcher from University of Illinois at Urbana–Champaign. The author has contributed to research in topics: Cache & Very-large-scale integration. The author has an hindex of 4, co-authored 4 publications receiving 541 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: In this paper, a microarchitecture-aware model for process variation is proposed, including both random and systematic effects, and the model is specified using a small number of highly intuitive parameters.
Abstract: Within-die parameter variation poses a major challenge to high-performance microprocessor design, negatively impacting a processor's frequency and leakage power. Addressing this problem, this paper proposes a microarchitecture-aware model for process variation-including both random and systematic effects. The model is specified using a small number of highly intuitive parameters. Using the variation model, this paper also proposes a framework to model timing errors caused by parameter variation. The model yields the failure rate of microarchitectural blocks as a function of clock frequency and the amount of variation. With the combination of the variation model and the error model, we have VARIUS, a comprehensive model that is capable of producing detailed statistics of timing errors as a function of different process parameters and operating conditions. We propose possible applications of VARIUS to microarchitectural research.

386 citations

Proceedings ArticleDOI
01 Dec 2007
TL;DR: D-FGBB is introduced, which allows the continuous re-evaluation of the bias voltages to adapt to dynamic conditions and can be synergistically combined with dynamic voltage and frequency scaling (DVFS), creating an effective means to manage power.
Abstract: Parameter variation is detrimental to a processor's frequency and leakage power. One proposed technique to mitigate it is Fine-Grain Body Biasing (FGBB), where different parts of the processor chip are given a voltage bias that changes the speed and leakage proper- ties of their transistors. This technique has been proposed for static application, with the bias voltages being programmed at manufac- turing time for worst-case conditions. In this paper, we introduce Dynamic FGBB (D-FGBB), which allows the continuous re-evaluation of the bias voltages to adapt to dynamic conditions. Our results show that D-FGBB is very versa- tile and effective. Specifically, with the processor working in nor- mal mode at fixed frequency, D-FGBB reduces the leakage power of the chip by an average of 28 42% compared to static FGBB. Alternatively, with the processor working in a high-performance mode, D-FGBB increases the processor frequency by an average of 7 9% compared to static FGBB -- or 7 16% compared to no body biasing. Finally, we also show that D-FGBB can be syner- gistically combined with Dynamic Voltage and Frequency Scaling (DVFS), creating an effective means to manage power.

133 citations

Journal ArticleDOI
TL;DR: Swich is introduced, an FPGA-based prototype of a new cache-level scheme that keeps two live checkpoints at all times, forming a sliding rollback window that maintains a large minimum and average length.
Abstract: Existing cache-level checkpointing schemes do not continuously support a large rollback window. Immediately after a checkpoint, the number of instructions that the processor can undo falls to zero. To address this problem, we introduce Swich, an FPGA-based prototype of a new cache-level scheme that keeps two live checkpoints at all times, forming a sliding rollback window that maintains a large minimum and average length

35 citations

01 Jan 2006
TL;DR: ReViveI/O, a scheme for I/O undo and redo that is compatible with mechanisms for hardware-assisted rollback of memory state, and architecture-aware fine-grain body biasing to improve the frequency and leakage power dissipation of processors.
Abstract: As technology feature size continues to shrink, we see two challenging problems in designing computer systems. One is the hardware unreliability due to increasing chances of transient hardware faults caused by high-energy particles. The other is the variability in the semiconductor manufacturing process, which eventually impacts the frequency and the leakage power dissipation of a chip. In the first part, we study the problem of handling I/O in memory-based checkpointing systems. The increasing demand for reliable computers has led to proposals for hardware-assisted rollback of memory state. Such approach promises major reductions in Mean Time To Repair (MTTR). Unfortunately, adoption of such proposals is hindered by the lack of efficient mechanisms for I/O recovery. We present and evaluate ReViveI/O, a scheme for I/O undo and redo that is compatible with mechanisms for hardware-assisted rollback of memory state. We have implemented a Linux-based prototype that shows low-overhead, low-MTTR recovery of I/O is feasible. For 20--120 ms between checkpoints, a throughput-oriented workload has negligible overhead and recovery time. In the second part, we study architecture-aware fine-grain body biasing to improve the frequency and leakage power dissipation of processors. As VLSI technology continues to scale, parameter variation is about to pose a major challenge to high-performance processor design. In particular, the within-die variation of threshold voltage is directly detrimental to the chip's frequency and leakage power. One proposed technique to address such variation is Fine-Grain Body Biasing (FGBB), where different chip sections are given a certain voltage bias that modifies the threshold voltage. We show that FGBB should be applied in an architecture-aware manner, following the shapes of architectural modules. The reason is that architectural functionality affects the BB needed through temperature and type of critical path. To prove this idea, we develop a model of threshold voltage variation and apply it to simulated batches of chips. We show that architecture-aware FGBB enables 35% of the chips to work at the highest frequency, compared to 18% with conventional FGBB, potentially increasing each chip's value by 50%. It also reduces the leakage of the chips by 40%, compared to 25% with conventional FGBB.

4 citations


Cited by
More filters
Journal ArticleDOI
01 Nov 2009
TL;DR: This white paper synthesizes the motivations, observations and research issues considered as determinant of several complimentary experts of HPC in applications, programming models, distributed systems and system management.
Abstract: Over the past few years resilience has became a major issue for high-performance computing (HPC) systems, in particular in the perspective of large petascale systems and future exascale systems. These systems will typically gather from half a million to several millions of central processing unit (CPU) cores running up to a billion threads. From the current knowledge and observations of existing large systems, it is anticipated that exascale systems will experience various kind of faults many times per day. It is also anticipated that the current approach for resilience, which relies on automatic or application level checkpoint/ restart, will not work because the time for checkpointing and restarting will exceed the mean time to failure of a full system. This set of projections leaves the community of fault tolerance for HPC systems with a difficult challenge: finding new approaches, which are possibly radically disruptive, to run applications until their normal termination, despite the essentially unstable nature of exascale systems. Yet, the community has only five to six years to solve the problem. This white paper synthesizes the motivations, observations and research issues considered as determinant of several complimentary experts of HPC in applications, programming models, distributed systems and system management.

387 citations

Journal ArticleDOI
01 Jun 2008
TL;DR: In a 20-core CMP, the combination of variation-aware application scheduling and LinOpt increases the average throughput by 12-17% and reduces the average ED2 by 30-38% - all relative to using variation- aware scheduling together with a simple extension to Intel's Foxton power management algorithm.
Abstract: Within-die process variation causes individual cores in a ChipMultiprocessor (CMP) to differ substantially in both static powerconsumed and maximum frequency supported. In this environment,ignoring variation effects whenscheduling applications or when managing power withDynamic Voltage and Frequency Scaling (DVFS) is suboptimal. This paper proposes variation-aware algorithms for applicationscheduling and power management. One such power managementalgorithm, called {\em LinOpt}, uses linear programmingto find the best voltage and frequency levels for each of thecores in the CMP --- maximizing throughput at a given power budget.In a 20-core CMP, the combination of variation-awareapplication scheduling and {\em LinOpt} increases the averagethroughput by 12--17\% and reduces the average $ED^2$ by 30--38\%--- all relative to using variation-awarescheduling together with a simple extension to Intel's Foxtonpower management algorithm.

351 citations

Proceedings ArticleDOI
08 Nov 2008
TL;DR: This paper shows how to hide the effects of aging and how to slow down aging, which will enable a multicore designed for a 7-year service life to run, on average, at a 14-15% higher frequency during its whole service life.
Abstract: Processors progressively age during their service life due to normal workload activity. Such aging results in gradually slower circuits. Anticipating this fact, designers add timing guardbands to processors, so that processors last for a number of years. As a result, aging has important design and cost implications. To address this problem, this paper shows how to hide the effects of aging and how to slow it down. Our framework is called Facelift. It hides aging through aging-driven application scheduling. It slows down aging by applying voltage changes at key times - it uses a non-linear optimization algorithm to carefully balance the impact of voltage changes on the aging rate and on the critical path delays. Moreover, Facelift can gainfully configure the chip for a short service life. Simulation results indicate that Facelift leads to more cost-effective multicores. We can take a multicore designed for a 7-year service life and, by hiding and slowing down aging, enable it to run, on average, at a 14-15% higher frequency during its whole service life. Alternatively, we can design the multicore for a 5 to 7-month service life and still use it for 7 years.

242 citations

Journal ArticleDOI
TL;DR: The failure rates of HPC systems are reviewed, rollback-recovery techniques which are most often used for long-running applications on HPC clusters are discussed, and a taxonomy is developed for over twenty popular checkpoint/restart solutions.
Abstract: In recent years, High Performance Computing (HPC) systems have been shifting from expensive massively parallel architectures to clusters of commodity PCs to take advantage of cost and performance benefits. Fault tolerance in such systems is a growing concern for long-running applications. In this paper, we briefly review the failure rates of HPC systems and also survey the fault tolerance approaches for HPC systems and issues with these approaches. Rollback-recovery techniques which are most often used for long-running applications on HPC clusters are discussed because they are widely used for long-running applications on HPC systems. Specifically, the feature requirements of rollback-recovery are discussed and a taxonomy is developed for over twenty popular checkpoint/restart solutions. The intent of this paper is to aid researchers in the domain as well as to facilitate development of new checkpointing solutions.

238 citations