Home
/
Authors
/
Abraham Mathews

Author

Abraham Mathews

Bio: Abraham Mathews is an academic researcher from IBM. The author has contributed to research in topics: eDRAM & Dram. The author has an hindex of 8, co-authored 18 publications receiving 266 citations.

Topics: eDRAM, Dram, Logic gate, Clock signal, Synchronous circuit ...read more

Papers

PDF

Open Access

More filters

Journal Article•DOI•

A 45 nm SOI Embedded DRAM Macro for the POWER™ Processor 32 MByte On-Chip L3 Cache

[...]

John E. Barth¹, Donald W. Plass¹, Erik A. Nelson¹, Chorng-Lii Hwang¹, Gregory J. Fredeman¹, Michael A. Sperling¹, Abraham Mathews¹, T. Kirihata¹, William Robert Reohr¹, K Nair¹, Nianzheng Caon¹ - Show less +7 more•Institutions (1)

IBM¹

01 Jan 2011-IEEE Journal of Solid-state Circuits

TL;DR: A 1.35 ns random access and 1.7 ns-random-cycle SOI embedded-DRAM macro has been developed for the POWER7™ high-performance microprocessor, allowing the embedded DRAM to operate reliably without constraining of the microprocessor voltage supply windows.

...read moreread less

Abstract: A 1.35 ns random access and 1.7 ns-random-cycle SOI embedded-DRAM macro has been developed for the POWER7™ high-performance microprocessor. The macro employs a 6 transistor micro sense-amplifier architecture with extended precharge scheme to enhance the sensing margin for product quality. The detailed study shows a 67% bit-line power reduction with only 1.7% area overhead, while improving a read zero margin by more than 500ps. The array voltage window is improved by the programmable BL voltage generator, allowing the embedded DRAM to operate reliably without constraining of the microprocessor voltage supply windows. The 2.5nm gate oxide transistor cell with deep-trench capacitor is accessed by the 1.7 V wordline high voltage (VPP) with V WL low voltage (VWL), and both are generated internally within the microprocessor. This results in a 32 MB on-chip L3 on-chip-cache for 8 cores in a 567 mm POWER7™ die.

...read moreread less

63 citations

Patent•

Peak power reduction methods in distributed charge pump systems

[...]

Fadi H. Gebara¹, Jente B. Kuang¹, Abraham Mathews¹•Institutions (1)

IBM¹

05 May 2011

TL;DR: In this paper, a distributed charge pump system uses a delay element and frequency dividers to generate out of phase pump clock signals that drive different charge pumps, to offset peak current clock edges for each charge pump and thereby reduce overall peak power.

...read moreread less

Abstract: A distributed charge pump system uses a delay element and frequency dividers to generate out of phase pump clock signals that drive different charge pumps, to offset peak current clock edges for each charge pump and thereby reduce overall peak power. Clock signal division and phase offset may be extended to multiple levels for further smoothing of the pump clock signal transitions. A dual frequency divider may be used which receives the clock signal and its complement, and generates two divided signals that are 90° out of phase. In an illustrative embodiment the clock generator comprises a variable-frequency clock source, and a voltage regulator senses an output voltage of the charge pumps, generates a reference voltage based on a currently selected frequency of the variable-frequency clock source, and temporarily disables the charge pumps (by turning off local pump clocks) when the output voltage is greater than the reference voltage.

...read moreread less

40 citations

Proceedings Article•DOI•

A 45nm SOI embedded DRAM macro for POWER7TM 32MB on-chip L3 cache

[...]

John E. Barth¹, Don Plass¹, Erik A. Nelson¹, Charlie Hwang¹, Gregory J. Fredeman¹, Michael A. Sperling¹, Abraham Mathews¹, William Robert Reohr¹, Kavita Nair¹, Nianzheng Cao¹ - Show less +6 more•Institutions (1)

IBM¹

18 Mar 2010

TL;DR: This high performance DRAM macro is used to construct a large 32MB L3 cache on-chip, eliminating delay, area and power from the off-chip interface, simultaneously improving system performance, reducing cost, power and soft error vulnerability.

...read moreread less

Abstract: Logic-based embedded DRAM has matured into a wide range of ASIC applications, SRAM replacements [1] and off-chip caches for microprocessors [2]. While embedded DRAM has been leveraged in supercomputers such as IBM's BlueGene/L [3], it's use has been limited to moderate performance bulk logic technologies. Although prototypes have been demonstrated [4], DRAM has yet to be embedded on a high performance microprocessor. This paper discloses an SOI DRAM macro implemented on-chip with the IBM POWER7™ high performance microprocessor [5], and introduces enhancements to the micro sense amp (µSA) architecture [6]. This high performance DRAM macro is used to construct a large 32MB L3 cache on-chip, eliminating delay, area and power from the off-chip interface, simultaneously improving system performance, reducing cost, power and soft error vulnerability. Figure 19.1.1a shows an SEM of the 45nm SOI DRAM Device and Deep Trench (DT) capacitor [7]. DT offers 25x more capacitance than planar structures and was also utilized to reduce on-chip voltage island supply noise.

...read moreread less

39 citations

Patent•

Switched-Capacitor Charge Pumps

[...]

Fadi H. Gebara¹, Jente B. Kuang¹, Abraham Mathews¹•Institutions (1)

IBM¹

12 May 2010

TL;DR: In this article, a two-phase charging circuit, cross-coupled transistors connected to output nodes of the switched capacitors, and a pump output connected to source terminals of the transistors are described.

...read moreread less

Abstract: A switched-capacitor charge pump comprises a two-phase charging circuit, cross-coupled transistors connected to output nodes of the switched capacitors, and a pump output connected to source terminals of the cross-coupled transistors. The charge pump has side transistors for boosting charge transfer, and gating logic of the side transistors includes level shifters which control connections to the pump output or a reference voltage. Negative and positive charge pump embodiments are provided. The charging circuit advantageously utilizes non-overlapping wide and narrow clock signals to generate multiple gating signals. The pump clock circuit preferably provides independent, programmable adjustment of the widths of the wide and narrow clock signals. An override mode can be provided using clamping circuits which shunt the pump output to the second nodes of the switched capacitors.

...read moreread less

33 citations

Journal Article•DOI•

A 1 MB Cache Subsystem Prototype With 1.8 ns Embedded DRAMs in 45 nm SOI CMOS

[...]

Peter Juergen Klim¹, John E. Barth¹, William Robert Reohr¹, David Dick¹, Gregory J. Fredeman¹, Gary Koch¹, Hien Minh Le¹, A. Khargonekar¹, Pamela Wilcox¹, John W. Golz¹, Jente B. Kuang¹, Abraham Mathews¹, Jethro C. Law¹, Trong V. Luong¹, Hung C. Ngo¹, R. Freese², Hillery C. Hunter¹, Erik A. Nelson¹, Paul C. Parries¹, Toshiaki Kirihata¹, Subramanian S. Iyer¹ - Show less +17 more•Institutions (2)

IBM¹, Advanced Micro Devices²

24 Mar 2009-IEEE Journal of Solid-state Circuits

TL;DR: A single voltage supply, 1 MB cache subsystem prototype that integrates 2 GHz embedded DRAM (eDRAM) macros with on- chip word-line voltage supply generation, a 4 Kb one-time-programmable read-only memory (OTPROM) for redundancy and repair control, and on-chip OTPROM programming voltage generation, clock generation and distribution are described.

...read moreread less

Abstract: We describe a single voltage supply, 1 MB cache subsystem prototype that integrates 2 GHz embedded DRAM (eDRAM) macros with on-chip word-line voltage supply generation , a 4 Kb one-time-programmable read-only memory (OTPROM) for redundancy and repair control, on-chip OTPROM programming voltage generation, clock generation and distribution, array built-in self-test circuitry (ABIST), user logic and pervasive logic. The eDRAM employs a programmable pipeline, achieving 1.8 ns latency, and features concurrent refresh capability.

...read moreread less

25 citations

1
2
3
4
…

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

A Scaling Roadmap and Performance Evaluation of In-Plane and Perpendicular MTJ Based STT-MRAMs for High-Density Cache Memory

[...]

Ki-Chul Chun¹, Hui Zhao¹, Jonathan Harms¹, Tae-Hyoung Kim², Jian-Ping Wang¹, Chris H. Kim¹ - Show less +2 more•Institutions (2)

University of Minnesota¹, Nanyang Technological University²

01 Jan 2013-IEEE Journal of Solid-state Circuits

TL;DR: The studies based on the proposed scaling methodology show that in-plane STT-MRAM will outperform SRAM from 15 nm node, while its perpendicular counterpart requires further innovations in MTJ material in order to overcome the poor write performance scaling from 22 nm node onwards.

...read moreread less

Abstract: This paper explores the scalability of in-plane and perpendicular MTJ based STT-MRAMs from 65 nm to 8 nm while taking into consideration realistic variability effects. We focus on the read and write performances of a STT-MRAM based cache rather than the obvious advantages such as the denser bit-cell and zero static power. An accurate MTJ macromodel capturing key MTJ properties was adopted for efficient Monte Carlo simulations. For the simulation of access devices and peripheral circuitries, ITRS projected transistor parameters were utilized and calibrated using the MASTAR tool that has been widely used in industry. 6T SRAM and STT-MRAM arrays were implemented with aggressive assist schemes to mimic industrial memory designs. A constant JC0·RA/VDD scaling scenario was used which to the first order gives the optimal balance between read and write margins of STT-MRAMs. The thermal stability factor ensuring a 10 year retention time was obtained by adjusting the free layer thickness as well as assuming improvement in the crystalline anisotropy. Our studies based on the proposed scaling methodology show that in-plane STT-MRAM will outperform SRAM from 15 nm node, while its perpendicular counterpart requires further innovations in MTJ material in order to overcome the poor write performance scaling from 22 nm node onwards.

...read moreread less

322 citations

Proceedings Article•DOI•

DRISA: a DRAM-based Reconfigurable In-Situ Accelerator

[...]

Shuangchen Li¹, Niu Dimin², Malladi Krishna T², Zheng Hongzhong², Bob Brennan², Yuan Xie¹ - Show less +2 more•Institutions (2)

University of California, Santa Barbara¹, Samsung²

14 Oct 2017

TL;DR: DRISA, a DRAM-based Reconfigurable In-Situ Accelerator architecture, is proposed to provide both powerful computing capability and large memory capacity/bandwidth to address the memory wall problem in traditional von Neumann architecture.

...read moreread less

Abstract: Data movement between the processing units and the memory in traditional von Neumann architecture is creating the “memory wall” problem. To bridge the gap, two approaches, the memory-rich processor (more on-chip memory) and the compute-capable memory (processing-in-memory) have been studied. However, the first one has strong computing capability but limited memory capacity/bandwidth, whereas the second one is the exact the opposite.To address the challenge, we propose DRISA, a DRAM-based Reconfigurable In-Situ Accelerator architecture, to provide both powerful computing capability and large memory capacity/bandwidth. DRISA is primarily composed of DRAM memory arrays, in which every memory bitline can perform bitwise Boolean logic operations (such as NOR). DRISA can be reconfigured to compute various functions with the combination of the functionally complete Boolean logic operations and the proposed hierarchical internal data movement designs. We further optimize DRISA to achieve high performance by simultaneously activating multiple rows and subarrays to provide massive parallelism, unblocking the internal data movement bottlenecks, and optimizing activation latency and energy. We explore four design options and present a comprehensive case study to demonstrate significant acceleration of convolutional neural networks. The experimental results show that DRISA can achieve 8.8× speedup and 1.2× better energy efficiency compared with ASICs, and 7.7× speedup and 15× better energy efficiency over GPUs with integer operations.CCS CONCEPTS• Hardware → Dynamic memory; • Computer systems organization → reconfigurable computing; Neural networks;

...read moreread less

315 citations

Proceedings Article•DOI•

DESTINY: a tool for modeling emerging 3D NVM and eDRAM caches

[...]

Matt Poremba¹, Sparsh Mittal², Dong Li², Jeffrey S. Vetter³, Yuan Xie⁴ - Show less +1 more•Institutions (4)

Pennsylvania State University¹, Oak Ridge National Laboratory², Georgia Institute of Technology³, University of California, Santa Barbara⁴

09 Mar 2015

TL;DR: DESTINY is presented, a microarchitecture-level tool for modeling 3D (and 2D) cache designs using SRAM, embedded DRAM (eDRAM), spin transfer torque RAM (STT-RAM), resistive RAM (ReRAM) and phase change RAM (PCM), and has been validated against industrial cache prototypes.

...read moreread less

Abstract: The continuous drive for performance has pushed the researchers to explore novel memory technologies (e.g. non-volatile memory) and novel fabrication approaches (e.g. 3D stacking) in the design of caches. However, a comprehensive tool which models both conventional and emerging memory technologies for both 2D and 3D designs has been lacking. We present DESTINY, a microarchitecture-level tool for modeling 3D (and 2D) cache designs using SRAM, embedded DRAM (eDRAM), spin transfer torque RAM (STT-RAM), resistive RAM (ReRAM) and phase change RAM (PCM). DESTINY facilitates design-space exploration across several dimensions, such as optimizing for a target (e.g. latency or area) for a given memory technology, choosing the suitable memory technology or fabrication method (i.e. 2D v/s 3D) for a desired optimization target etc. DESTINY has been validated against industrial cache prototypes. We believe that DESTINY will drive architecture and system-level studies and will be useful for researchers and designers.

...read moreread less

142 citations

Journal Article•DOI•

Spin-Based Computing: Device Concepts, Current Status, and a Case Study on a High-Performance Microprocessor

[...]

Jongyeon Kim¹, Ayan Paul¹, Paul A. Crowell¹, Steven J. Koester¹, Sachin S. Sapatnekar¹, Jian-Ping Wang¹, Chris H. Kim¹ - Show less +3 more•Institutions (1)

University of Minnesota¹

01 Jan 2015

TL;DR: An overview of the history and the current status of the various spintronic devices being pursued by the research community is provided, and how spin-based components are integrated into a computing system and the advantages that result are described.

...read moreread less

Abstract: As the end draws near for Moore's law, the search for low-power alternatives to complementary metal–oxide–semiconductor (CMOS) technology is intensifying. Among the various post-CMOS candidates, spintronic devices have gained special attention for their potential to overcome the power and performance limitations of CMOS. In particular, all spin logic (ASL) technology, which performs Boolean operations and transfers the output in the spin domain, has been proposed for enabling new capabilities—such as high density, low device count, and nonvolatility—that were previously impossible with CMOS technology. In this paper, first we provide an overview of the history and the current status of the various spintronic devices being pursued by the research community. Then, we describe how spin-based components are integrated into a computing system and the advantages that result. We use a hypothetical spintronic-based Intel Core i7 as a test vehicle to compare the system-level power requirements of ASL- and CMOS-based systems, taking into consideration the unique demands of spin-based interconnects. We conclude with a brief analysis of current limitations and future directions of spintronic research.

...read moreread less

135 citations

Patent•

Semiconductor memory device.

[...]

Tomoharu Tanaka¹, Hiroshi Nakamura¹, Toru Tanzawa¹•Institutions (1)

Toshiba¹

12 Mar 2008

TL;DR: In this paper, the read circuit senses a change in a voltage of the bitline of a bitline, and applies a voltage which is different from the first voltage to the gate of the first transistor when it senses a voltage change.

...read moreread less

Abstract: A semiconductor memory device comprises memory cells, a bitline connected to the memory cells, a read circuit including a precharge circuit, and a first transistor connected between the bitline and the read circuit, wherein a first voltage is applied to a gate of the first transistor when the precharge circuit precharges the bitline, and a second voltage which is different from the first voltage is applied to the gate of the first transistor when the read circuit senses a change in a voltage of the bitline.

...read moreread less

119 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48

Collapse