scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A 45 nm SOI Embedded DRAM Macro for the POWER™ Processor 32 MByte On-Chip L3 Cache

TL;DR: A 1.35 ns random access and 1.7 ns-random-cycle SOI embedded-DRAM macro has been developed for the POWER7™ high-performance microprocessor, allowing the embedded DRAM to operate reliably without constraining of the microprocessor voltage supply windows.
Abstract: A 1.35 ns random access and 1.7 ns-random-cycle SOI embedded-DRAM macro has been developed for the POWER7™ high-performance microprocessor. The macro employs a 6 transistor micro sense-amplifier architecture with extended precharge scheme to enhance the sensing margin for product quality. The detailed study shows a 67% bit-line power reduction with only 1.7% area overhead, while improving a read zero margin by more than 500ps. The array voltage window is improved by the programmable BL voltage generator, allowing the embedded DRAM to operate reliably without constraining of the microprocessor voltage supply windows. The 2.5nm gate oxide transistor cell with deep-trench capacitor is accessed by the 1.7 V wordline high voltage (VPP) with V WL low voltage (VWL), and both are generated internally within the microprocessor. This results in a 32 MB on-chip L3 on-chip-cache for 8 cores in a 567 mm POWER7™ die.
Citations
More filters
Journal ArticleDOI
TL;DR: The studies based on the proposed scaling methodology show that in-plane STT-MRAM will outperform SRAM from 15 nm node, while its perpendicular counterpart requires further innovations in MTJ material in order to overcome the poor write performance scaling from 22 nm node onwards.
Abstract: This paper explores the scalability of in-plane and perpendicular MTJ based STT-MRAMs from 65 nm to 8 nm while taking into consideration realistic variability effects. We focus on the read and write performances of a STT-MRAM based cache rather than the obvious advantages such as the denser bit-cell and zero static power. An accurate MTJ macromodel capturing key MTJ properties was adopted for efficient Monte Carlo simulations. For the simulation of access devices and peripheral circuitries, ITRS projected transistor parameters were utilized and calibrated using the MASTAR tool that has been widely used in industry. 6T SRAM and STT-MRAM arrays were implemented with aggressive assist schemes to mimic industrial memory designs. A constant JC0·RA/VDD scaling scenario was used which to the first order gives the optimal balance between read and write margins of STT-MRAMs. The thermal stability factor ensuring a 10 year retention time was obtained by adjusting the free layer thickness as well as assuming improvement in the crystalline anisotropy. Our studies based on the proposed scaling methodology show that in-plane STT-MRAM will outperform SRAM from 15 nm node, while its perpendicular counterpart requires further innovations in MTJ material in order to overcome the poor write performance scaling from 22 nm node onwards.

322 citations

Proceedings ArticleDOI
09 Mar 2015
TL;DR: DESTINY is presented, a microarchitecture-level tool for modeling 3D (and 2D) cache designs using SRAM, embedded DRAM (eDRAM), spin transfer torque RAM (STT-RAM), resistive RAM (ReRAM) and phase change RAM (PCM), and has been validated against industrial cache prototypes.
Abstract: The continuous drive for performance has pushed the researchers to explore novel memory technologies (e.g. non-volatile memory) and novel fabrication approaches (e.g. 3D stacking) in the design of caches. However, a comprehensive tool which models both conventional and emerging memory technologies for both 2D and 3D designs has been lacking. We present DESTINY, a microarchitecture-level tool for modeling 3D (and 2D) cache designs using SRAM, embedded DRAM (eDRAM), spin transfer torque RAM (STT-RAM), resistive RAM (ReRAM) and phase change RAM (PCM). DESTINY facilitates design-space exploration across several dimensions, such as optimizing for a target (e.g. latency or area) for a given memory technology, choosing the suitable memory technology or fabrication method (i.e. 2D v/s 3D) for a desired optimization target etc. DESTINY has been validated against industrial cache prototypes. We believe that DESTINY will drive architecture and system-level studies and will be useful for researchers and designers.

142 citations


Cites methods from "A 45 nm SOI Embedded DRAM Macro for..."

  • ...DESTINY framework utilizes the 2D circuit-level model of NVSim, which was extended to model 2D eDRAM and 3D design of SRAM, eDRAM and monolithic NVMs....

    [...]

  • ...Since NVSim does not model banks, we only compare against the smallest cache size....

    [...]

  • ...NVSim provides an incomplete eDRAM model which has also not been validated against any prototype....

    [...]

  • ...DESTINY utilizes the 2D circuit-level modeling framework of NVSim for SRAM and NVMs....

    [...]

  • ...2015 Design, Automation & Test in Europe Conference & Exhibition (DATE) 1545 B. 2D and 3D eDRAM Validation As stated before, the eDRAM model in NVSim is incomplete and has not been validated....

    [...]

Journal ArticleDOI
01 Jan 2015
TL;DR: An overview of the history and the current status of the various spintronic devices being pursued by the research community is provided, and how spin-based components are integrated into a computing system and the advantages that result are described.
Abstract: As the end draws near for Moore's law, the search for low-power alternatives to complementary metal–oxide–semiconductor (CMOS) technology is intensifying. Among the various post-CMOS candidates, spintronic devices have gained special attention for their potential to overcome the power and performance limitations of CMOS. In particular, all spin logic (ASL) technology, which performs Boolean operations and transfers the output in the spin domain, has been proposed for enabling new capabilities—such as high density, low device count, and nonvolatility—that were previously impossible with CMOS technology. In this paper, first we provide an overview of the history and the current status of the various spintronic devices being pursued by the research community. Then, we describe how spin-based components are integrated into a computing system and the advantages that result. We use a hypothetical spintronic-based Intel Core i7 as a test vehicle to compare the system-level power requirements of ASL- and CMOS-based systems, taking into consideration the unique demands of spin-based interconnects. We conclude with a brief analysis of current limitations and future directions of spintronic research.

135 citations


Cites background from "A 45 nm SOI Embedded DRAM Macro for..."

  • ...The urgent need for low-power alternatives has led to a flurry of research activity on novel post-CMOS device technologies [8], [9]....

    [...]

Proceedings ArticleDOI
01 Dec 2011
TL;DR: In this paper, node-agnostic Cu TSVs integrated with high-K/metal gate and embedded DRAM were used in functional 3D modules. Thermal cycling and stress results show no degradation of TSV or BEOL structures, and device and functional data indicate that there is no significant impact from TSV processing and/or proximity.
Abstract: Node-agnostic Cu TSVs integrated with high-K/metal gate and embedded DRAM were used in functional 3D modules. Thermal cycling and stress results show no degradation of TSV or BEOL structures, and device and functional data indicate that there is no significant impact from TSV processing and/or proximity.

108 citations

Journal ArticleDOI
TL;DR: Circuit techniques for enhancing the retention time and random cycle of logic-compatible embedded DRAMs (eDRAMs) are presented and a half-swing write bit-line (WBL) scheme is adopted to improve the WBL speed and reduce its power dissipation during write-back operation.
Abstract: Circuit techniques for enhancing the retention time and random cycle of logic-compatible embedded DRAMs (eDRAMs) are presented. An asymmetric 2T gain cell utilizes the gate and junction leakages of a PMOS write device to maintain a high data `1' voltage level which enables fast read access using an NMOS read device. A current-mode sense amplifier (C-S/A) featuring a cross-coupled PMOS latch and pseudo-PMOS diode pairs is proposed to overcome the innate problem of small read bit-line (RBL) voltage swing in 2T eDRAMs with improved voltage headroom and better impedance matching under process-voltage-temperature (PVT) variations. A half-swing write bit-line (WBL) scheme is adopted to improve the WBL speed by 33% and reduce its power dissipation by 25% during write-back operation with no effect on retention time. A stepped write word-line (WWL) driver reduces the current drawn from the boosted high and low supplies by 67%. A 192 kb eDRAM test chip with 512 cells-per-BL implemented in a 65 nm low-power (LP) CMOS process shows a random cycle frequency and latency of 667 MHz and 1.65 ns, respectively, at 1.1 V and 85 × °C. The measured refresh period at a 99.9% bit yield condition was 110 μs which is comparable to that of recently published 1T1C eDRAM designs.

99 citations


Cites result from "A 45 nm SOI Embedded DRAM Macro for..."

  • ...Therefore, the static power dissipation of gain cell eDRAM including both leakage power and refresh power components can be smaller than that of an SRAM and similar to that of a 1T1C eDRAM [6], [9]....

    [...]

  • ...2168729 embedded DRAM (eDRAM) technology [4], [9]....

    [...]

References
More filters
Journal ArticleDOI
29 May 2009
TL;DR: This paper describes a 2.3 Billion transistors, 8-core, 16-thread, 64-bit Xeon® EX processor with a 24 MB shared L3 cache implemented in a 45 nm nine-metal process to improve manufacturing yields and enable multiple product flavors from the same silicon die.
Abstract: This paper describes a 2.3 Billion transistors, 8-core, 16-thread, 64-bit Xeon® EX processor with a 24 MB shared L3 cache implemented in a 45 nm nine-metal process. Multiple clock and voltage domains are used to reduce power consumption. Long channel devices and cache sleep mode are used to minimize leakage. Core and cache recovery improve manufacturing yields and enable multiple product flavors from the same silicon die. The disabled blocks are both clock and power gated to minimize their power consumption. Idle power is reduced by shutting off the unterminated I/O links and shedding phases in the voltage regulator to improve the power conversion efficiency.

126 citations

Proceedings ArticleDOI
18 Jun 2007
TL;DR: The POWER6trade microprocessor combines ultra-high frequency operation, aggressive power reduction, a highly scalable memory subsystem, and mainframe-like reliability, availability, and serviceability.
Abstract: The POWER6trade microprocessor combines ultra-high frequency operation, aggressive power reduction, a highly scalable memory subsystem, and mainframe-like reliability, availability, and serviceability. The 341mm2 700M transistor dual-core microprocessor is fabricated in a 65nm SOI process with 10 levels of low-k copper interconnect. It operates at clock frequencies over 5GHz in high-performance applications, and consumes under 100W in power-sensitive applications.

120 citations

Proceedings ArticleDOI
18 Mar 2010
TL;DR: Focusing on speed, the dual-supply ripple-domino SRAM concepts follows the schemes described elsewhere.
Abstract: The next processor of the POWER ™ family, called POWER7™ is introduced. Eight quad-threaded cores are integrated together with two memory controllers and high-speed system links on a 567mm2 die, employing 1.2B transistors in 45nm CMOS SOI technology [4]. High on-chip performance and therefore bandwidth is achieved using 11 layers of low-к copper wiring and devices with enhanced dual-stress liners. The technology features deep trench [DT] capacitors that are used to build the 32MB embedded DRAM L3 based on a 0.067µm2 DRAM cell. DT capacitors are used also to reduce on-chip voltage-island supply noise. Focusing on speed, the dual-supply ripple-domino SRAM concepts follows the schemes described elsewhere.

79 citations


"A 45 nm SOI Embedded DRAM Macro for..." refers methods in this paper

  • ...7 ns random cycle embedded DRAM macro [9] developed for the POWER7TM processor [10] in 45 nm SOI CMOS technology....

    [...]

Journal ArticleDOI
S.S. Iyer1, John E. Barth1, Paul C. Parries1, James P. Norum1, J. P. Rice1, L. R. Logan1, D. Hoyniak1 
TL;DR: The salient features of this 130-nm complementary metal oxide semiconductor technology, including the IBM unique embedded dynamic random access memory (DRAM) technology, are outlined.
Abstract: The Blue Gene®/L chip is a technological tour de force that embodies the system-on-a-chip concept in its entirety. This paper outlines the salient features of this 130-nm complementary metal oxide semiconductor (CMOS) technology, including the IBM unique embedded dynamic random access memory (DRAM) technology. Crucial to the execution of Blue Gene/L is the simultaneous instantiation of multiple PowerPC® cores, high-performance static random access memory (SRAM), DRAM, and several other logic design blocks on a single-platform technology. The IBM embedded DRAM platform allows this seamless integration without compromising performance, reliability, or yield. We discuss the process architecture, the key parameters of the logic components used in the processor cores and other logic design blocks, the SRAM features used in the L2 cache, and the embedded DRAM that forms the L3 cache. We also discuss the evolution of embedded DRAM technology into a higher-performance space in the 90-nm and 65-nm nodes and the potential for dynamic memory to improve overall memory subsystem performance.

77 citations

Proceedings Article
01 Jan 2008
TL;DR: In this article, the authors describe a 500MHz random cycle Silicon on Insulator (SOI) embedded DRAM macro which features a three-transistor micro sense amplifier, realizing significant performance gains over traditional array design methods.
Abstract: -As microprocessors enter the highly multi-core/multi-threaded era, higher density, lower latency embedded memory will be required to meet cache design needs. This paper describes a 500MHz random cycle Silicon on Insulator (SOI) embedded DRAM macro which features a three-transistor micro sense amplifier, realizing significant performance gains over traditional array design methods. To address the realities of process integration, we describe the features and issues associated with integrating this DRAM into SOI technology, including deep trench processing and floating body effects. After a brief description of the macro architecture, details are provided on the three-transistor micro sense amplifier scheme, which is key to achieving a high transfer ratio with minimal area overhead. The paper concludes with hardware results and a summary.

62 citations