scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A 14 nm 1.1 Mb Embedded DRAM Macro With 1 ns Access

TL;DR: A 1.1 Mb embedded DRAM macro (eDRAM), for next-generation IBM SOI processors, employs 14 nm FinFET logic technology with 0.0174 μm2 deep-trench capacitor cell that enables a high voltage gain of a power-gated inverter at mid-level input voltage.
Abstract: A 1.1 Mb embedded DRAM macro (eDRAM), for next-generation IBM SOI processors, employs 14 nm FinFET logic technology with $\hbox{0.0174}~\mu\hbox{m}^{2}$ deep-trench capacitor cell. A Gated-feedback sense amplifier enables a high voltage gain of a power-gated inverter at mid-level input voltage, while supporting 66 cells per local bit-line. A dynamic-and-gate-thin-oxide word-line driver that tracks standard logic process variation improves the eDRAM array performance with reduced area. The 1.1 $~$ Mb macro composed of 8 $\times$ 2 72 Kb subarrays is organized with a center interface block architecture, allowing 1 ns access latency and 1 ns bank interleaving operation using two banks, each having 2 ns random access cycle. 5 GHz operation has been demonstrated in a system prototype, which includes 6 instances of 1.1 Mb eDRAM macros, integrated with an array-built-in-self-test engine, phase-locked loop (PLL), and word-line high and word-line low voltage generators. The advantage of the 14 nm FinFET array over the 22 nm array was confirmed using direct tester control of the 1.1 Mb eDRAM macros integrated in 16 Mb inline monitor.
Citations
More filters
Journal ArticleDOI
TL;DR: 3D DRAMs including DDR3, wide I/O mobile DRAM, and more recently, the hybrid-memory cube (HMC) and high-bandwidth memory (HBM) targeted for high-performance computing systems are reviewed.
Abstract: This paper describes orthogonal scaling of dynamic-random-access-memories (DRAMs) using through-silicon-vias (TSVs). We review 3D DRAMs including DDR3, wide I/O mobile DRAM (WIDE I/O), and more recently, the hybrid-memory cube (HMC) and high-bandwidth memory (HBM) targeted for high-performance computing systems. We then cover embedded 3D DRAM for high-performance cache memories, reviewing an early cache prototype employing face-to-face 3D stacking which confirmed negligible performance and retention degradation using 32 nm server and ASIC embedded DRAM macros. A second cache system prototype based on POWER7 was developed to confirm feasibility of stacking $\mu {\rm P}$ and high density cache memory, with $> 2~{\rm GHz}$ operation. For test and assembly, a micro-electro-mechanical-system (MEMS) probe-card with an integrated active silicon chip, realized a 50 $\mu{\rm m}$ pitch micro-probing at-speed-active-test for known-good-die (KGD) sorting. Finally, oxide wafer bonding with Cu TSV demonstrated wafer-scale 3D integration, with TSV diameters as small as 1 $\mu{\rm m}$ . The paper concludes with comments on the challenges for future 3D DRAMs.

14 citations


Cites methods from "A 14 nm 1.1 Mb Embedded DRAM Macro ..."

  • ...Embedded DRAM (eDRAM) [6]–[8], employing a one-transistor and one-capacitor cell, provides 2....

    [...]

Proceedings ArticleDOI
20 Oct 2018
TL;DR: This work presents CABLE, a novel CAche-Based Link Encoder that enables point-to-point link compression between coherent caches, re-purposing the data already stored in the caches as a massive and scalable dictionary for data compression.
Abstract: Off-chip bandwidth is a scarce resource in modern processors, and it is expected to become even more limited on a per-core basis as we move into the era of high-throughput and massively-parallel computation. One promising approach to overcome limited bandwidth is off-chip link compression. Unfortunately, previously proposed latency-driven compression schemes are not a good fit for latency-tolerant manycore systems, and they often do not have the dictionary capacity to accommodate more than a few concurrent threads. In this work, we present CABLE, a novel CAche-Based Link Encoder that enables point-to-point link compression between coherent caches, re-purposing the data already stored in the caches as a massive and scalable dictionary for data compression. We show the broad applicability of CABLE by applying it to two critical off-chip links: (1) the memory link interface to off-chip memory, and (2) the cache-coherent link between processors in a multi-chip system. We have implemented CABLE's search pipeline hardware in Verilog using the OpenPiton framework to show its feasibility. Evaluating with SPEC2006, we find that CABLE increases effective off-chip bandwidth by 7.2x and system throughput by 3.78x on average, 83% and 258% better than CPACK, respectively.

5 citations


Cites background or methods from "A 14 nm 1.1 Mb Embedded DRAM Macro ..."

  • ...4A data array access without tag check typically takes around 1ns [41, 42] for eDRAMs....

    [...]

  • ...Latency-wise, SRAMs usually take one cycle to access, while eDRAMs take between 1ns [41] and 3ns [42]....

    [...]

Journal ArticleDOI
TL;DR: The IBM z15 system improves upon the prior-generation z14 design within the same chip footprint and technology node, while featuring the addition of two cores, 33%/100%/43% additional L2/L3/L4 cache, as well as additional core features and on-chip accelerators.
Abstract: The IBM z15 system improves upon the prior-generation z14 design within the same chip footprint and technology node, while featuring the addition of two cores, 33%/100%/43% additional L2/L3/L4 cache, as well as additional core features and on-chip accelerators. The largest 5-drawer system configuration includes 20 central processor (CP) chips, five system controller (SC) chips, and 40 TB of memory. With ~200 cores across all CP chips operating with 99.99999% uptime at 5.2 GHz, z15 achieves a 25% increase in system capacity and a 14% single thread performance improvement over the z14 system. In this article, we describe the key design factors and system/characterization refinement that enabled these results, including the novel 2-Mb embedded dynamic random access memory (eDRAM) cell, a new voltage droop monitor, a more comprehensive power reduction infrastructure to reduce power-limited yield, results on reliability-limited versus power-limited yield, and a characterization effort for exploring even higher frequencies, with our first reported 6-GHz values achieved in the lab at customer temperatures and voltages.

3 citations


Cites background or methods from "A 14 nm 1.1 Mb Embedded DRAM Macro ..."

  • ...A key innovation enabling a scaled design without a scaled technology was the development of the 2-Mb eDRAM macro, which doubled the density of the z14 eDRAM solution [15]....

    [...]

  • ...in the z14 generation’s 1-Mb macro [15] from Fig....

    [...]

Book ChapterDOI
01 Jan 2018
TL;DR: This chapter is dedicated to comprehensively survey representative embedded flash-memory technologies from the memory-cell level to the system level, and the basic circuit-design techniques required in embedded flash hard macros under different design constraints from stand-alone flash memories.
Abstract: This chapter is dedicated to comprehensively survey representative embedded flash-memory technologies from the memory-cell level to the system level. First, various types of embedded flash-memory cells are briefly overviewed in terms of cell structure, operation principle, and features in terms of characteristics and reliability . Then presented are the basic circuit-design techniques required in embedded flash hard macros under different design constraints from stand-alone flash memories. In addition, system-level design, which plays important roles for function enhancement to meet a wide range of requirements, is also covered. Finally, future prospects of eFlash-memory technologies are briefly summarized.

2 citations

References
More filters
Journal ArticleDOI
TL;DR: For both low-power and high-performance applications, DGCMOS-FinFET offers a most promising direction for continued progress in VLSI.
Abstract: Double-gate devices will enable the continuation of CMOS scaling after conventional scaling has stalled. DGCMOS/FinFET technology offers a tactical solution to the gate dielectric barrier and a strategic path for silicon scaling to the point where only atomic fluctuations halt further progress. The conventional nature of the processes required to fabricate these structures has enabled rapid experimental progress in just a few years. Fully integrated CMOS circuits have been demonstrated in a 180 nm foundry-compatible process, and methods for mapping conventional, planar CMOS product designs to FinFET have been developed. For both low-power and high-performance applications, DGCMOS-FinFET offers a most promising direction for continued progress in VLSI.

413 citations

Journal ArticleDOI
TL;DR: Power Systems™ continue strong 7th Generation Power chip: Balanced Multi-Core design EDRAM technology SMT4 greater then 4X performance in same power envelope as previous generation.
Abstract: The Power7 is IBM's first eight-core processor, with each core capable of four-way simultaneous-multithreading operation. Its key architectural features include an advanced memory hierarchy with three levels of on-chip cache; embedded-DRAM devices used in the highest level of the cache; and a new memory interface. This balanced multicore design scales from 1 to 32 sockets in commercial and scientific environments.

259 citations


"A 14 nm 1.1 Mb Embedded DRAM Macro ..." refers methods in this paper

  • ...IBM introduced trench capacitor eDRAM into its high performance microprocessors beginning with 45nm and Power 7 [1] to provide a higher density cache without chip crossings....

    [...]

Proceedings ArticleDOI
C-H. Lin1, Brian J. Greene1, Shreesh Narasimha1, J. Cai1, A. Bryant1, Carl J. Radens1, Vijay Narayanan1, Barry Linder1, Herbert L. Ho1, A. Aiyar1, E. Alptekin1, J-J. An1, Michael V. Aquilino1, Ruqiang Bao1, V. Basker1, Nicolas Breil1, MaryJane Brodsky1, William Y. Chang1, Clevenger Leigh Anne H1, Dureseti Chidambarrao1, Cathryn Christiansen1, D. Conklin1, C. DeWan1, H. Dong1, L. Economikos1, Bernard A. Engel1, Sunfei Fang1, D. Ferrer1, A. Friedman1, Allen H. Gabor1, Fernando Guarin1, Ximeng Guan1, M. Hasanuzzaman1, J. Hong1, D. Hoyos1, Basanth Jagannathan1, S. Jain1, S.-J. Jeng1, J. Johnson1, B. Kannan1, Y. Ke1, Babar A. Khan1, Byeong Y. Kim1, Siyuranga O. Koswatta1, Amit Kumar1, T. Kwon1, Unoh Kwon1, L. Lanzerotti1, H-K Lee1, W-H. Lee1, A. Levesque1, Wai-kin Li1, Zhengwen Li1, Wei Liu1, S. Mahajan1, Kevin McStay1, Hasan M. Nayfeh1, W. Nicoll1, G. Northrop1, A. Ogino1, Chengwen Pei1, S. Polvino1, Ravikumar Ramachandran1, Z. Ren1, Robert R. Robison1, Saraf Iqbal Rashid1, Viraj Y. Sardesai1, S. Saudari1, Dominic J. Schepis1, Christopher D. Sheraw1, Shariq Siddiqui1, Liyang Song1, Kenneth J. Stein1, C. Tran1, Henry K. Utomo1, Reinaldo A. Vega1, Geng Wang1, Han Wang1, W. Wang1, X. Wang1, D. Wehelle-Gamage1, E. Woodard1, Yongan Xu1, Y. Yang1, N. Zhan1, Kai Zhao1, C. Zhu1, K. Boyd1, E. Engbrecht1, K. Henson1, E. Kaste1, Siddarth A. Krishnan1, Edward P. Maciejewski1, Huiling Shang1, Noah Zamdmer1, R. Divakaruni1, J. Rice1, Scott R. Stiffler1, Paul D. Agnello1 
01 Dec 2014
TL;DR: In this article, the authors present a fully integrated 14nm CMOS technology featuring fin-FET architecture on an SOI substrate for a diverse set of SoC applications including HP server microprocessors and LP ASICs.
Abstract: We present a fully integrated 14nm CMOS technology featuring finFET architecture on an SOI substrate for a diverse set of SoC applications including HP server microprocessors and LP ASICs. This SOI finFET architecture is integrated with a 4th generation deep trench embedded DRAM to provide an ultra-dense (0.0174um2) memory solution for industry leading ‘scale-out’ processor design. A broad range of Vts is enabled on chip through a unique dual workfunction process applied to both NFETs and PFETs. This enables simultaneous optimization of both lowVt (HP) and HiVt (LP) devices without reliance on problematic approaches like heavy doping or Lgate modulation to create Vt differentiation. The SOI finFET's excellent subthreshold behavior allows gate length scaling to the sub 20nm regime and superior low Vdd operation. This leads to a substantial (>35%) performance gain for Vdd ∼0.8V compared to the HP 22nm planar predecessor technology. At the same time, the exceptional FE/BE reliability enables high Vdd (>1.1V) operation essential to the high single thread performance for processors intended for ‘scale-up’ enterprise systems. A hierarchical BEOL with 15 levels of copper interconnect delivers both high performance wire-ability as well as effective power supply and clock distribution for very large >600mm2 SoCs.

137 citations


"A 14 nm 1.1 Mb Embedded DRAM Macro ..." refers methods in this paper

  • ...1(a), utilizes Replacement Metal Gate (RMG) SOI FinFET devices, resulting in a 33% shrink from the 22 nm bit cell [16], [17]....

    [...]

  • ...The pass transistor/access device of the cell is a 3.5 nm thick oxide, fully depleted (FD) FinFET with an undoped channel....

    [...]

  • ...This 22 nm design style has been successfully migrated into a 14 nm FinFET eDRAM [20] learning vehicle, complete with an ABIST engine, word-line charge pumps (VPP & VWL), and pad-cage interface circuitry as a system prototype....

    [...]

  • ...Hence, FD SOI FinFETs can achieve high write back current without increasing the LBL capacitance, which is a key advantage to previous eDRAM technologies employing planar pass transistors....

    [...]

  • ...By nature, SOI fully depleted FinFET devices, have low overall junction capacitance as the only junctions created in the SOI are in the horizontal plane of the device, i.e., between the body of the device and diffusion....

    [...]

Proceedings ArticleDOI
06 Mar 2014
TL;DR: This paper presents 14nm FinFET-based 128Mb 6T SRAM chips featuring low-VMIN with newly developed assist techniques, and presents peripheral-assist techniques required to overcome the bitcell challenges to high yield.
Abstract: With the explosive growth of battery-operated portable devices, the demand for low power and small size has been increasing for system-on-a-chip (SoC). The FinFET is considered as one of the most promising technologies for future low-power mobile applications because of its good scaling ability, high on-current, better SCE and subthreshold slope, and small leakage current [1]. As a key approach for low-power, supply-voltage (VDD) scaling has been widely used in SoC design. However, SRAM is the limiting factor of voltage-scaling, since all SRAM functions of read, write, and hold-stability are highly influenced by increased variations at low VDD, resulting in lower yield. In addition, the width-quantization property of FinFET device reduces the design window for transistor sizing, and increases the failure probability due to the un-optimized bitcell sizing [1]. In order to overcome the bitcell challenges to high yield, peripheral-assist techniques are required. In this paper, we present 14nm FinFET-based 128Mb 6T SRAM chips featuring low-VMIN with newly developed assist techniques.

113 citations

Proceedings ArticleDOI
01 Dec 2012
TL;DR: A hierarchical BEOL with 15 levels of copper interconnect including self-aligned via processing delivers high performance with exceptional reliability in SOI CMOS 22nm technology.
Abstract: We present a fully-integrated SOI CMOS 22nm technology for a diverse array of high-performance applications including server microprocessors, memory controllers and ASICs. A pre-doped substrate enables scaling of this third generation of SOI deep-trench-based embedded DRAM for a dense high-performance memory hierarchy. Dual-Embedded stressor technology including SiGe and Si:C for improved carrier mobility in both PMOS and NMOS FETs is presented for the first time. A hierarchical BEOL with 15 levels of copper interconnect including self-aligned via processing delivers high performance with exceptional reliability.

92 citations

Related Papers (5)