scispace - formally typeset
Search or ask a question
Author

B. Huott

Bio: B. Huott is an academic researcher from IBM. The author has contributed to research in topics: Cache & POWER6. The author has an hindex of 5, co-authored 5 publications receiving 233 citations.
Topics: Cache, POWER6, Microprocessor, Noise, eDRAM

Papers
More filters
Proceedings ArticleDOI
18 Jun 2007
TL;DR: The POWER6trade microprocessor combines ultra-high frequency operation, aggressive power reduction, a highly scalable memory subsystem, and mainframe-like reliability, availability, and serviceability.
Abstract: The POWER6trade microprocessor combines ultra-high frequency operation, aggressive power reduction, a highly scalable memory subsystem, and mainframe-like reliability, availability, and serviceability. The 341mm2 700M transistor dual-core microprocessor is fabricated in a 65nm SOI process with 10 levels of low-k copper interconnect. It operates at clock frequencies over 5GHz in high-performance applications, and consumes under 100W in power-sensitive applications.

120 citations

Proceedings ArticleDOI
Norman Karl James1, Phillip J. Restle1, Joshua Friedrich1, B. Huott1, Bradley McCredie1 
18 Jun 2007
TL;DR: The noise measurements and simulation both show that the shorted core power grid design has less noise and a higher maximum frequency than the split core power supply design.
Abstract: The POWER6trade is a dual-core microprocessor fabricated in a 65nm SOI process with 10 levels of low-k copper interconnects. Chips with split- and connected-core power supplies are fabricated, modeled, and tested, showing both the advantages and disadvantages of each. On-chip noise measurements are compared to simulation. The noise measurements and simulation both show that the shorted core power grid design has less noise and a higher maximum frequency.

83 citations

Proceedings ArticleDOI
01 Feb 2018
TL;DR: The IBM Z microprocessor in the z14 system has been redesigned to improve performance, system capacity, and security over the previous z13 system, and is designed in Global Foundries 14nm high performance SOI FinFET technology with 17 layers of copper interconnect.
Abstract: The IBM Z microprocessor in the z14 system has been redesigned to improve performance, system capacity, and security [1] over the previous z13 system [2]. The system contains up to 24 central processor (CP) and 4 system controller (SC) chips. Each CP, shown in die photo A (Fig. 2.2.7), operates at 5.2GHz and is comprised of 10 cores, 2 PCIe Gen3 interfaces, an IO bus controller (GX), 128MB of L3 embedded DRAM (eDRAM) cache, X-BUS interfaces connecting to 2 other CP chips and one SC chip, and a redundant array of independent memory (RAIM) interface. Each core on the CP chip has 4MB of eDRAM L2 Data cache and 2MB of eDRAM L2 Instruction cache, with 128KB SRAM Instruction and 128KB SRAM Data L1 caches. Each SC, shown in die photo B (Fig. 2.2.7), operates at 2.6GHz and has 672MB of L4 eDRAM cache, X-BUS interfaces connecting to CP chips in the drawer and A-BUS interfaces connecting SCs on the other drawers. Both chips are 696mm2 and are designed in Global Foundries 14nm high performance (14HP) SOI FinFET technology with 17 layers of copper interconnect [3]. The CP contains 6.1B transistors, while the SC contains 9.7B transistors. The total IO bandwidth of the CP and SC are 2.9Tb/s and 5.5Tb/s, respectively.

14 citations

Proceedings ArticleDOI
08 Nov 2005
TL;DR: An advanced optical diagnostic technique used for diagnosing the IBM z990 eServer microprocessor, equipped with the high quantum efficiency superconducting single-photon detector (SSPD), shows a unique diagnostic capability for optically probing the internal nodes of a chip.
Abstract: In this paper, we describe an advanced optical diagnostic technique used for diagnosing the IBM z990 eServer microprocessor (Slegel et al., 2004). Time-to-market pressure demands quick diagnostic turnaround time and high diagnostic resolution while the ever increasing design complexity, density, cycle time, and shrinking technologies dramatically add difficulties to diagnostics. Although design-for-test (DFT) and design-for-diagnostics (DFD) features are implemented in the latest microprocessors to help easing the diagnostic efforts, they may still not be sufficient to diagnose certain fails. The well-known picosecond imaging circuit analysis (PICA) (Kash and Tsang, 1997) tool, equipped with the high quantum efficiency superconducting single-photon detector (SSPD,) shows a unique diagnostic capability for optically probing the internal nodes of a chip. Several hard-to-diagnose examples will be used to demonstrate the unique capabilities of this technique

11 citations


Cited by
More filters
01 Jan 2010
TL;DR: The TILE64TM processor as mentioned in this paper is a multicore SoC targeting the high-performance demands of a wide range of embedded applications across networking and digital multimedia applications, with 64 tile processors arranged in an 8x8 array.
Abstract: The TILE64TM processor is a multicore SoC targeting the high-performance demands of a wide range of embedded applications across networking and digital multimedia applications. A figure shows a block diagram with 64 tile processors arranged in an 8x8 array. These tiles connect through a scalable 2D mesh network with high-speed I/Os on the periphery. Each general-purpose processor is identical and capable of running SMP Linux.

634 citations

Proceedings ArticleDOI
01 Feb 2008
TL;DR: The TILE64TM processor is a multicore SoC targeting the high-performance demands of a wide range of embedded applications across networking and digital multimedia applications.
Abstract: The TILE64TM processor is a multicore SoC targeting the high-performance demands of a wide range of embedded applications across networking and digital multimedia applications. A figure shows a block diagram with 64 tile processors arranged in an 8x8 array. These tiles connect through a scalable 2D mesh network with high-speed I/Os on the periphery. Each general-purpose processor is identical and capable of running SMP Linux.

587 citations

Journal ArticleDOI
TL;DR: 3D technology from IBM is highlighted, including demonstration test vehicles used to develop ground rules, collect data, and evaluate reliability, and examples of 3D emerging industry product applications that could create marketable systems are provided.
Abstract: Three-dimensional (3D) silicon integration of active devices with through-silicon vias (TSVs), thinned silicon, and silicon-to-silicon fine-pitch interconnections offers many product benefits. Advantages of these emerging 3D silicon integration technologies can include the following: power efficiency, performance enhancements, significant product miniaturization, cost reduction, and modular design for improved time to market. IBM research activities are aimed at providing design rules, structures, and processes that make 3D technology manufacturable for chips used in actual products on the basis of data from test-vehicle (i.e., prototype) design, fabrication, and characterization demonstrations. Three-dimensional integration can be applied to a wide range of interconnection densities (<10/cm2 to 108/cm2), requiring new architectures for product optimization and multiple options for fabrication. Demonstration test structures, which are designed, fabricated, and characterized, are used to generate experimental data, establish models and design guidelines, and help define processes for future product consideration. This paper 1) reviews technology integration from a historical perspective, 2) describes industry-wide progress in 3D technology with examples of TSV and silicon-silicon interconnection advancement over the last 10 years, 3) highlights 3D technology from IBM, including demonstration test vehicles used to develop ground rules, collect data, and evaluate reliability, and 4) provides examples of 3D emerging industry product applications that could create marketable systems.

461 citations

Proceedings ArticleDOI
01 Dec 2007
TL;DR: A range of cache refresh and placement schemes that are sensitive to retention time are proposed, and it is shown that most of the retention time variations can be masked by the microarchitecture when using these schemes.
Abstract: Process variations will greatly impact the stability, leakage power consumption, and performance of future microprocessors. These variations are especially detrimental to 6T SRAM (6-transistor static memory) structures and will become critical with continued technology scaling. In this paper, we propose new on-chip memory architectures based on novel 3T1D DRAM (3-transistor, 1-diode dynamic memory) cells. We provide a detailed comparison between 6T and 3T1D designs in the context of a L1 data cache. The effects of physical device variation on a 3T1D cache can be lumped into variation of data retention times. This paper proposes a range of cache refresh and placement schemes that are sensitive to reten- tion time, and we show that most of the retention time variations can be masked by the microarchitecture when using these schemes. We have performed detailed circuit and architectural simulations assuming different degrees of variability in advanced technology nodes, and we show that the resulting memory architecture can tol- erate large process variations with little or even no impact on per- formance when compared to ideal 6T SRAM designs. Furthermore, these designs are robust to memory cell stability issues and can achieve large power savings. These advantages make the new mem- ory architectures a promising choice for on-chip variation-tolerant cache structures required for next generation microprocessors.

359 citations

Journal ArticleDOI
09 Jun 2012
TL;DR: This work introduces a methodology for designing scalable and efficient scale-out server processors based on a metric of performance-density, and facilitates the design of optimal multi-core configurations, called pods.
Abstract: Scale-out datacenters mandate high per-server throughput to get the maximum benefit from the large TCO investment Emerging applications (eg, data serving and web search) that run in these datacenters operate on vast datasets that are not accommodated by on-die caches of existing server chips Large caches reduce the die area available for cores and lower performance through long access latency when instructions are fetched Performance on scale-out workloads is maximized through a modestly-sized last-level cache that captures the instruction footprint at the lowest possible access latency In this work, we introduce a methodology for designing scalable and efficient scale-out server processors Based on a metric of performance-density, we facilitate the design of optimal multi-core configurations, called pods Each pod is a complete server that tightly couples a number of cores to a small last-level cache using a fast interconnect Replicating the pod to fill the die area yields processors which have optimal performance density, leading to maximum per-chip throughput Moreover, as each pod is a stand-alone server, scale-out processors avoid the expense of global (ie, inter-pod) interconnect and coherence These features synergistically maximize throughput, lower design complexity, and improve technology scalability In 20nm technology, scale-out chips improve throughput by 5x-65x over conventional and by 16x-19x over emerging tiled organizations

185 citations