Home
/
Authors
/
Gregory J. Fredeman

Author

Gregory J. Fredeman

Bio: Gregory J. Fredeman is an academic researcher from IBM. The author has contributed to research in topics: Dram & eDRAM. The author has an hindex of 14, co-authored 35 publications receiving 665 citations. Previous affiliations of Gregory J. Fredeman include GlobalFoundries.

Topics: Dram, eDRAM, Cache, CPU cache, Sense amplifier ...read more

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

A Compact eFUSE Programmable Array Memory for SOI CMOS

[...]

John M. Safran¹, Alan J. Leslie¹, Gregory J. Fredeman¹, Chandrasekharan Kothandaraman¹, Alberto Cestero¹, Xiang Chen¹, R. Rajeevakumar¹, Deok-kee Kim¹, Yan Zun Li¹, Dan Moy¹, Norman Robson¹, T. Kirihata¹, S. S. Iyer¹ - Show less +9 more•Institutions (1)

IBM¹

14 Jun 2007

TL;DR: A compact eFUSE programmable array memory configured as a 4 Kb one-time programmable ROM (OTPROM) is presented, demonstrating a >10X density increase over traditional VLSI fuse circuits.

...read moreread less

Abstract: Demonstrating a >10X density increase over traditional VLSI fuse circuits, a compact eFUSE programmable array memory configured as a 4 Kb one-time programmable ROM (OTPROM) is presented using a 6.2 mum2 NiSix silicide electromigration ITIR cell in 65 nm SOI CMOS. A 20 mus programming time at 1.5 V is achieved by asymmetrical scaling of the fuse and a shared differential sensing scheme. Having zero process cost adder, eFUSE is fully compatible with standard VLSI manufacturing.

...read moreread less

92 citations

Journal Article•DOI•

A 45 nm SOI Embedded DRAM Macro for the POWER™ Processor 32 MByte On-Chip L3 Cache

[...]

John E. Barth¹, Donald W. Plass¹, Erik A. Nelson¹, Chorng-Lii Hwang¹, Gregory J. Fredeman¹, Michael A. Sperling¹, Abraham Mathews¹, T. Kirihata¹, William Robert Reohr¹, K Nair¹, Nianzheng Caon¹ - Show less +7 more•Institutions (1)

IBM¹

01 Jan 2011-IEEE Journal of Solid-state Circuits

TL;DR: A 1.35 ns random access and 1.7 ns-random-cycle SOI embedded-DRAM macro has been developed for the POWER7™ high-performance microprocessor, allowing the embedded DRAM to operate reliably without constraining of the microprocessor voltage supply windows.

...read moreread less

Abstract: A 1.35 ns random access and 1.7 ns-random-cycle SOI embedded-DRAM macro has been developed for the POWER7™ high-performance microprocessor. The macro employs a 6 transistor micro sense-amplifier architecture with extended precharge scheme to enhance the sensing margin for product quality. The detailed study shows a 67% bit-line power reduction with only 1.7% area overhead, while improving a read zero margin by more than 500ps. The array voltage window is improved by the programmable BL voltage generator, allowing the embedded DRAM to operate reliably without constraining of the microprocessor voltage supply windows. The 2.5nm gate oxide transistor cell with deep-trench capacitor is accessed by the 1.7 V wordline high voltage (VPP) with V WL low voltage (VWL), and both are generated internally within the microprocessor. This results in a 32 MB on-chip L3 on-chip-cache for 8 cores in a 567 mm POWER7™ die.

...read moreread less

63 citations

Proceedings Article•

A 500 MHz Random Cycle, 1.5 ns Latency, SOI Embedded DRAM Macro Featuring a Three-Transistor Micro Sense Amplifier

[...]

John E. Barth, William Robert Reohr, Paul C. Parries, Gregory J. Fredeman, John W. Golz, Stanley E. Schuster, Richard E. Matick, Hillery C. Hunter, Charles C. Tanner, Joseph Harig, Hoki Kim, Babar A. Khan, John Griesemer, Robert P. Havreluk, Kenji Yanagisawa, Toshiaki Kirihata, Subramanian S. Iyer - Show less +13 more

01 Jan 2008

TL;DR: In this article, the authors describe a 500MHz random cycle Silicon on Insulator (SOI) embedded DRAM macro which features a three-transistor micro sense amplifier, realizing significant performance gains over traditional array design methods.

...read moreread less

Abstract: -As microprocessors enter the highly multi-core/multi-threaded era, higher density, lower latency embedded memory will be required to meet cache design needs. This paper describes a 500MHz random cycle Silicon on Insulator (SOI) embedded DRAM macro which features a three-transistor micro sense amplifier, realizing significant performance gains over traditional array design methods. To address the realities of process integration, we describe the features and issues associated with integrating this DRAM into SOI technology, including deep trench processing and floating body effects. After a brief description of the macro architecture, details are provided on the three-transistor micro sense amplifier scheme, which is key to achieving a high transfer ratio with minimal area overhead. The paper concludes with hardware results and a summary.

...read moreread less

62 citations

Journal Article•DOI•

An 800-MHz embedded DRAM with a concurrent refresh mode

[...]

T. Kirihata¹, Paul C. Parries¹, David R. Hanson¹, Hoki Kim¹, John W. Golz¹, Gregory J. Fredeman¹, R. Rajeevakumar¹, J. Griesemer¹, Norman Robson¹, Alberto Cestero¹, Babar A. Khan¹, Geng Wang¹, M. Wordeman¹, Subramanian S. Iyer¹ - Show less +10 more•Institutions (1)

IBM¹

13 Sep 2004

TL;DR: An 800-MHz embedded DRAM macro employs a memory cell utilizing a device from the 90-nm high-performance technology menu; a 2.2-nm gate oxide 1.5 V IO device to improve the memory utilization to over 99% for a 64 /spl mu/s data retention time.

...read moreread less

Abstract: An 800-MHz embedded DRAM macro employs a memory cell utilizing a device from the 90-nm high-performance technology menu; a 2.2-nm gate oxide 1.5 V IO device. A concurrent refresh mode is designed to improve the memory utilization to over 99% for a 64 /spl mu/s data retention time. A concurrent refresh scheduler utilizes up-count and down-count registers to identify at least one array to be refreshed at every clock cycle, emulating a classical distributed refresh mode. A command multiplier employs low frequency phased clock signals to generate the clock, commands, and addresses at rates up to 4/spl times/ that of the tester frequency. The macro integrates masked redundancy allocation logic during at speed multibank test. The hardware results show a 312-MHz random access frequency and 800-MHz multibank frequency at 1.2 V, respectively.

...read moreread less

59 citations

Proceedings Article•DOI•

A 500MHz Random Cycle 1.5ns-Latency, SOI Embedded DRAM Macro Featuring a 3T Micro Sense Amplifier

[...]

John E. Barth¹, William Robert Reohr¹, Paul C. Parries¹, Gregory J. Fredeman¹, John W. Golz¹, Stanley E. Schuster¹, Richard E. Matick¹, Hillery C. Hunter¹, C. Tanner¹, J. Harig¹, Hyun-Chul Kim¹, Babar A. Khan¹, J. Griesemer¹, R.P. Havreluk¹, K. Yanagisawa¹, T. Kirihata¹, S. S. Iyer¹ - Show less +13 more•Institutions (1)

IBM¹

18 Jun 2007

TL;DR: A prototype SOI embedded DRAM macro is developed for high-performance microprocessors and introduces performance-enhancing 3T micro sense amplifier architecture (muSA), which confirms 1.5ns random access time with a 1V supply at 85deg and low voltage operation with a 600mV supply.

...read moreread less

Abstract: A prototype SOI embedded DRAM macro is developed for high-performance microprocessors and introduces performance-enhancing 3T micro sense amplifier architecture (muSA). The macro was characterized via a test chip fabricated in a 65nm SOI deep-trench DRAM process. Measurements confirm 1.5ns random access time with a 1V supply at 85deg and low voltage operation with a 600mV supply.

...read moreread less

59 citations

1
2
3
4
…
5
6
7

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

A Scaling Roadmap and Performance Evaluation of In-Plane and Perpendicular MTJ Based STT-MRAMs for High-Density Cache Memory

[...]

Ki-Chul Chun¹, Hui Zhao¹, Jonathan Harms¹, Tae-Hyoung Kim², Jian-Ping Wang¹, Chris H. Kim¹ - Show less +2 more•Institutions (2)

University of Minnesota¹, Nanyang Technological University²

01 Jan 2013-IEEE Journal of Solid-state Circuits

TL;DR: The studies based on the proposed scaling methodology show that in-plane STT-MRAM will outperform SRAM from 15 nm node, while its perpendicular counterpart requires further innovations in MTJ material in order to overcome the poor write performance scaling from 22 nm node onwards.

...read moreread less

Abstract: This paper explores the scalability of in-plane and perpendicular MTJ based STT-MRAMs from 65 nm to 8 nm while taking into consideration realistic variability effects. We focus on the read and write performances of a STT-MRAM based cache rather than the obvious advantages such as the denser bit-cell and zero static power. An accurate MTJ macromodel capturing key MTJ properties was adopted for efficient Monte Carlo simulations. For the simulation of access devices and peripheral circuitries, ITRS projected transistor parameters were utilized and calibrated using the MASTAR tool that has been widely used in industry. 6T SRAM and STT-MRAM arrays were implemented with aggressive assist schemes to mimic industrial memory designs. A constant JC0·RA/VDD scaling scenario was used which to the first order gives the optimal balance between read and write margins of STT-MRAMs. The thermal stability factor ensuring a 10 year retention time was obtained by adjusting the free layer thickness as well as assuming improvement in the crystalline anisotropy. Our studies based on the proposed scaling methodology show that in-plane STT-MRAM will outperform SRAM from 15 nm node, while its perpendicular counterpart requires further innovations in MTJ material in order to overcome the poor write performance scaling from 22 nm node onwards.

...read moreread less

322 citations

Proceedings Article•DOI•

DRISA: a DRAM-based Reconfigurable In-Situ Accelerator

[...]

Shuangchen Li¹, Niu Dimin², Malladi Krishna T², Zheng Hongzhong², Bob Brennan², Yuan Xie¹ - Show less +2 more•Institutions (2)

University of California, Santa Barbara¹, Samsung²

14 Oct 2017

TL;DR: DRISA, a DRAM-based Reconfigurable In-Situ Accelerator architecture, is proposed to provide both powerful computing capability and large memory capacity/bandwidth to address the memory wall problem in traditional von Neumann architecture.

...read moreread less

Abstract: Data movement between the processing units and the memory in traditional von Neumann architecture is creating the “memory wall” problem. To bridge the gap, two approaches, the memory-rich processor (more on-chip memory) and the compute-capable memory (processing-in-memory) have been studied. However, the first one has strong computing capability but limited memory capacity/bandwidth, whereas the second one is the exact the opposite.To address the challenge, we propose DRISA, a DRAM-based Reconfigurable In-Situ Accelerator architecture, to provide both powerful computing capability and large memory capacity/bandwidth. DRISA is primarily composed of DRAM memory arrays, in which every memory bitline can perform bitwise Boolean logic operations (such as NOR). DRISA can be reconfigured to compute various functions with the combination of the functionally complete Boolean logic operations and the proposed hierarchical internal data movement designs. We further optimize DRISA to achieve high performance by simultaneously activating multiple rows and subarrays to provide massive parallelism, unblocking the internal data movement bottlenecks, and optimizing activation latency and energy. We explore four design options and present a comprehensive case study to demonstrate significant acceleration of convolutional neural networks. The experimental results show that DRISA can achieve 8.8× speedup and 1.2× better energy efficiency compared with ASICs, and 7.7× speedup and 15× better energy efficiency over GPUs with integer operations.CCS CONCEPTS• Hardware → Dynamic memory; • Computer systems organization → reconfigurable computing; Neural networks;

...read moreread less

315 citations

Proceedings Article•DOI•

Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs

[...]

Mrinmoy Ghosh¹, Hsien-Hsin S. Lee¹•Institutions (1)

Georgia Institute of Technology¹

01 Dec 2007

TL;DR: The basic concept behind the scheme is that a DRAM row that was recently read or written to by the processor does not need to be refreshed again by the periodic refresh operation, thereby eliminating excessive refreshes and the energy dissipated.

...read moreread less

Abstract: DRAMs require periodic refresh for preserving data stored in them. The refresh interval for DRAMs depends on the vendor and the de- sign technology they use. For each refresh in a DRAM row, the stored information in each cell is read out and then written back to itself as each DRAM bit read is self-destructive. The refresh pro- cess is inevitable for maintaining data correctness, unfortunately, at the expense of power and bandwidth overhead. The future trend to integrate layers of 3D die-stacked DRAMs on top of a proces- sor further exacerbates the situation as accesses to these DRAMs will be more frequent and hiding refresh cycles in the available slack becomes increasingly difficult. Moreover, due to the implica- tion of temperature increase, the refresh interval of 3D die-stacked DRAMs will become shorter than those of conventional ones. This paper proposes an innovative scheme to alleviate the en- ergy consumed in DRAMs. By employing a time-out counter for each memory row of a DRAM module, all the unnecessary periodic refresh operations can be eliminated. The basic concept behind our scheme is that a DRAM row that was recently read or written to by the processor (or other devices that share the same DRAM) does not need to be refreshed again by the periodic refresh opera- tion, thereby eliminating excessive refreshes and the energy dissi- pated. Based on this concept, we propose a low-cost technique in the memory controller for DRAM power reduction. The simulation results show that our technique can reduce up to 86% of all refresh operations and 59.3% on the average for a 2GB DRAM. This in turn results in a 52.6% energy savings for refresh operations. The overall energy saving in the DRAM is up to 25.7% with an average of 12.13% obtained for SPLASH-2, SPECint2000, and Biobench benchmark programs simulated on a 2GB DRAM. For a 64MB 3D DRAM, the energy saving is up to 21% and 9.37% on an average when the refresh rate is 64 ms. For a faster 32ms refresh rate the maximum and average savings are 12% and 6.8% respectively.

...read moreread less

305 citations

Journal Article•DOI•

Power7: IBM's Next-Generation Server Processor

[...]

Ron Kalla¹, Balaram Sinharoy¹, William J. Starke¹, Michael Stephen Floyd¹•Institutions (1)

IBM¹

01 Mar 2010-IEEE Micro

TL;DR: Power Systems™ continue strong 7th Generation Power chip: Balanced Multi-Core design EDRAM technology SMT4 greater then 4X performance in same power envelope as previous generation.

...read moreread less

Abstract: The Power7 is IBM's first eight-core processor, with each core capable of four-way simultaneous-multithreading operation. Its key architectural features include an advanced memory hierarchy with three levels of on-chip cache; embedded-DRAM devices used in the highest level of the cache; and a new memory interface. This balanced multicore design scales from 1 to 32 sockets in commercial and scientific environments.

...read moreread less

259 citations

Proceedings Article•DOI•

Reducing cache power with low-cost, multi-bit error-correcting codes

[...]

Christopher B. Wilkerson¹, Alaa R. Alameldeen¹, Zeshan A. Chishti¹, Wei Wu¹, Dinesh Somasekhar¹, Shih-Lien Lu¹ - Show less +2 more•Institutions (1)

Intel¹

19 Jun 2010

TL;DR: The significant impact of variations on refresh time and cache power consumption for large eDRAM caches is shown and Hi-ECC, a technique that incorporates multi-bit error-correcting codes to significantly reduce refresh rate, is proposed.

...read moreread less

Abstract: Technology advancements have enabled the integration of large on-die embedded DRAM (eDRAM) caches. eDRAM is significantly denser than traditional SRAMs, but must be periodically refreshed to retain data. Like SRAM, eDRAM is susceptible to device variations, which play a role in determining refresh time for eDRAM cells. Refresh power potentially represents a large fraction of overall system power, particularly during low-power states when the CPU is idle. Future designs need to reduce cache power without incurring the high cost of flushing cache data when entering low-power states. In this paper, we show the significant impact of variations on refresh time and cache power consumption for large eDRAM caches. We propose Hi-ECC, a technique that incorporates multi-bit error-correcting codes to significantly reduce refresh rate. Multi-bit error-correcting codes usually have a complex decoder design and high storage cost. Hi-ECC avoids the decoder complexity by using strong ECC codes to identify and disable sections of the cache with multi-bit failures, while providing efficient single-bit error correction for the common case. Hi-ECC includes additional optimizations that allow us to amortize the storage cost of the code over large data words, providing the benefit of multi-bit correction at same storage cost as a single-bit error-correcting (SECDED) code (2% overhead). Our proposal achieves a 93% reduction in refresh power vs. a baseline eDRAM cache without error correcting capability, and a 66% reduction in refresh power vs. a system using SECDED codes.

...read moreread less

231 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109

Collapse