scispace - formally typeset
Search or ask a question
Author

Y.H. Chan

Bio: Y.H. Chan is an academic researcher from IBM. The author has contributed to research in topics: Microprocessor & CMOS. The author has an hindex of 12, co-authored 20 publications receiving 446 citations.

Papers
More filters
Proceedings ArticleDOI
18 Jun 2007
TL;DR: The POWER6trade microprocessor combines ultra-high frequency operation, aggressive power reduction, a highly scalable memory subsystem, and mainframe-like reliability, availability, and serviceability.
Abstract: The POWER6trade microprocessor combines ultra-high frequency operation, aggressive power reduction, a highly scalable memory subsystem, and mainframe-like reliability, availability, and serviceability. The 341mm2 700M transistor dual-core microprocessor is fabricated in a 65nm SOI process with 10 levels of low-k copper interconnect. It operates at clock frequencies over 5GHz in high-performance applications, and consumes under 100W in power-sensitive applications.

120 citations

Proceedings ArticleDOI
14 Jun 2007
TL;DR: A fully functional read and half select disturb-free 1.2 Mb SRAM is demonstrated at 1.6+ GHz at IV, 25degC and yield of 90-100%.
Abstract: A fully functional read and half select disturb-free 1.2 Mb SRAM is demonstrated. Measured results show an operating range of 0.4 V to 1.5 V and -25degC to 100degC, speed of 6.6+ GHz at IV, 25degC and yield of 90-100%.

66 citations

Proceedings ArticleDOI
18 Sep 2006
TL;DR: A dual-read 8-way set-associative data cache comprising four 16kB SRAMs and 2 set-prediction macros per P0WER6 core is presented.
Abstract: A dual-read 8-way set-associative data cache comprising four 16kB SRAMs and 2 set-prediction macros per P0WER6 core is presented. The array utilizes a 0.75mum2 butted-junction split-word line 6T cell in 65nm SOI. The design features dual power supplies, unidirectional polysilicon, and hierarchical undamped bit lines for enhanced cell stability and performance

37 citations

Journal ArticleDOI
TL;DR: It is shown that the use of high-Vt and thick oxide cell transistors can improve leakage, read and write-ability without causing significant performance degradation.
Abstract: In this paper we have studied the impacts of floating body effect, device leakage, and gate oxide tunneling leakage on the read and write-ability of a PD/SOI CMOS SRAM cell under Vt, L and W variations in sub-100 nm technology for the first time. The floating body effect is shown to degrade the read stability while improving the write-ability. On the other hand, the gate-to-body tunneling current improves the read stability while degrading the write-ability. It is also shown that the use of high-Vt and thick oxide cell transistors can improve leakage, read and write-ability without causing significant performance degradation. The test-chip is fabricated in sub-90 nm SOI technology to show the effectiveness of high-Vt and thick-oxide devices in improving stability of SRAM cells.

34 citations

Proceedings ArticleDOI
07 Feb 2000
TL;DR: The G6 system as discussed by the authors is a 637 MHz S/390 microprocessor with 6 levels of copper metal plus an additional layer of local interconnect on a 14.6/spl times/14.7 mm/sup 2/ die with 25M transistors.
Abstract: The G6 system is a sixth generation CMOS server for the S/390 line of products featuring a 12+2 SMP size and significant frequency improvements obtained through the use of low-Vt devices and copper interconnects. The microprocessor operates at 760 MHz at the fast end of the process distribution. The system ships at 637 MHz in a 12+2 chilled SMP configuration. Measured system performance on the 12 way is 1600 S/390 MIPs, providing over 50% more performance than the G5. This microprocessor uses CMOS7S technology, which has a 0.2 /spl mu/m process. The chip uses 6 levels of copper metal plus an additional layer of local interconnect on a 14.6/spl times/14.7 mm/sup 2/ die with 25M transistors (7M logic/18M array). The power supply is 1.9 V and the chip power is 33 W at 637 MHz.

27 citations


Cited by
More filters
01 Jan 2010
TL;DR: The TILE64TM processor as mentioned in this paper is a multicore SoC targeting the high-performance demands of a wide range of embedded applications across networking and digital multimedia applications, with 64 tile processors arranged in an 8x8 array.
Abstract: The TILE64TM processor is a multicore SoC targeting the high-performance demands of a wide range of embedded applications across networking and digital multimedia applications. A figure shows a block diagram with 64 tile processors arranged in an 8x8 array. These tiles connect through a scalable 2D mesh network with high-speed I/Os on the periphery. Each general-purpose processor is identical and capable of running SMP Linux.

634 citations

Proceedings ArticleDOI
01 Feb 2008
TL;DR: The TILE64TM processor is a multicore SoC targeting the high-performance demands of a wide range of embedded applications across networking and digital multimedia applications.
Abstract: The TILE64TM processor is a multicore SoC targeting the high-performance demands of a wide range of embedded applications across networking and digital multimedia applications. A figure shows a block diagram with 64 tile processors arranged in an 8x8 array. These tiles connect through a scalable 2D mesh network with high-speed I/Os on the periphery. Each general-purpose processor is identical and capable of running SMP Linux.

587 citations

Proceedings ArticleDOI
01 Dec 2007
TL;DR: A range of cache refresh and placement schemes that are sensitive to retention time are proposed, and it is shown that most of the retention time variations can be masked by the microarchitecture when using these schemes.
Abstract: Process variations will greatly impact the stability, leakage power consumption, and performance of future microprocessors. These variations are especially detrimental to 6T SRAM (6-transistor static memory) structures and will become critical with continued technology scaling. In this paper, we propose new on-chip memory architectures based on novel 3T1D DRAM (3-transistor, 1-diode dynamic memory) cells. We provide a detailed comparison between 6T and 3T1D designs in the context of a L1 data cache. The effects of physical device variation on a 3T1D cache can be lumped into variation of data retention times. This paper proposes a range of cache refresh and placement schemes that are sensitive to reten- tion time, and we show that most of the retention time variations can be masked by the microarchitecture when using these schemes. We have performed detailed circuit and architectural simulations assuming different degrees of variability in advanced technology nodes, and we show that the resulting memory architecture can tol- erate large process variations with little or even no impact on per- formance when compared to ideal 6T SRAM designs. Furthermore, these designs are robust to memory cell stability issues and can achieve large power savings. These advantages make the new mem- ory architectures a promising choice for on-chip variation-tolerant cache structures required for next generation microprocessors.

359 citations

Journal ArticleDOI
TL;DR: The IBM S/390 G5 microprocessor in IBM's newest CMOS mainframe system provides more than twice the performance of the previous generation, the G4, and offers improved reliability and availability, along with new architectural features such as support for IEEE floating-point arithmetic and a redesigned L2 cache and processor interconnect.
Abstract: The IBM S/390 G5 microprocessor in IBM's newest CMOS mainframe system provides more than twice the performance of the previous generation, the G4. The G5 system offers improved reliability and availability, along with new architectural features such as support for IEEE floating-point arithmetic and a redesigned L2 cache and processor interconnect. The G5 system implements the ESA/390 instruction-set architecture, which is based on and compatible with the original S/360 architecture. Therefore, it has no RISC (reduced-instruction-set computing) concepts and is one of the most complex of all CISC (complex-instruction-set computing) architectures. Designers had to meet a unique set of challenges to achieve the G5's level of performance-for example, achieving a very high frequency given the complexity of the architecture.

340 citations

Journal ArticleDOI
TL;DR: A global clock distribution strategy implemented on several microprocessor chips is described, which consists of buffered, tunable tree networks, with the final trees all driving a common grid.
Abstract: A global clock distribution strategy used on several microprocessor chips is described. The clock network consists of buffered tunable trees or treelike networks, with the final level of trees all driving a single common grid covering most of the chip. This topology combines advantages of both trees and grids. A new tuning method was required to efficiently tune such a large strongly connected interconnect network consisting of up to 6 m of wire and modeled with 50000 resistors, capacitors, and inductors. Variations are described to handle different floor-planning styles. Global clock skew as low as 22 ps on large microprocessor chips has been measured.

311 citations