scispace - formally typeset
Search or ask a question
Author

Kenneth J. Reyer

Bio: Kenneth J. Reyer is an academic researcher from IBM. The author has contributed to research in topics: Clock signal & Sense amplifier. The author has an hindex of 6, co-authored 18 publications receiving 91 citations.

Papers
More filters
Patent
09 Feb 2005
TL;DR: In this article, an apparatus for implementing multiple memory column redundancy within an individual memory array includes a plurality of memory array elements internally partitioned into at least a pair of subcolumn elements.
Abstract: An apparatus for implementing multiple memory column redundancy within an individual memory array includes a plurality of memory array elements internally partitioned into at least a pair of subcolumn elements. At least one spare memory element is configured at a size corresponding to one of the subcolumn elements. An input redundancy multiplexing stage and an output redundancy multiplexing stage are configured for steering around one or more defective memory array elements, and an input bit decoding stage and an output bit decoding stage are configured for implementing an additional, external multiplexing stage with respect to the input redundancy multiplexing stage and the output redundancy multiplexing stage.

18 citations

Journal ArticleDOI
TL;DR: A 1.1 Mb embedded DRAM macro (eDRAM), for next-generation IBM SOI processors, employs 14 nm FinFET logic technology with 0.0174 μm2 deep-trench capacitor cell that enables a high voltage gain of a power-gated inverter at mid-level input voltage.
Abstract: A 1.1 Mb embedded DRAM macro (eDRAM), for next-generation IBM SOI processors, employs 14 nm FinFET logic technology with $\hbox{0.0174}~\mu\hbox{m}^{2}$ deep-trench capacitor cell. A Gated-feedback sense amplifier enables a high voltage gain of a power-gated inverter at mid-level input voltage, while supporting 66 cells per local bit-line. A dynamic-and-gate-thin-oxide word-line driver that tracks standard logic process variation improves the eDRAM array performance with reduced area. The 1.1 $~$ Mb macro composed of 8 $\times$ 2 72 Kb subarrays is organized with a center interface block architecture, allowing 1 ns access latency and 1 ns bank interleaving operation using two banks, each having 2 ns random access cycle. 5 GHz operation has been demonstrated in a system prototype, which includes 6 instances of 1.1 Mb eDRAM macros, integrated with an array-built-in-self-test engine, phase-locked loop (PLL), and word-line high and word-line low voltage generators. The advantage of the 14 nm FinFET array over the 22 nm array was confirmed using direct tester control of the 1.1 Mb eDRAM macros integrated in 16 Mb inline monitor.

18 citations

Patent
14 Jun 2007
TL;DR: In this paper, a write control circuitry and control method for a memory array configured with multiple memory subarrays is provided for a write controller that selectively enables a local write control signal to its associated memory subarray.
Abstract: Write control circuitry and control method are provided for a memory array configured with multiple memory subarrays. The write control circuitry includes multiple subarray write controllers associated with the multiple memory subarrays, each subarray write controller selectively enabling a local write control signal to its associated memory subarray. The selectively enabling is responsive to a received subarray select signal, wherein only one subarray select signal is active at a time. At least some subarray write controllers are powered at least in part via a switched power node, wherein powering of the switched power node is distributively implemented among the subarray write controllers. In one example, the distributively implemented powering of the switched power node is accomplished via multiple inverters distributed among the subarray write controllers, each inverter having an output coupled to the switched power node, and an input coupled to receive a global write enable signal.

10 citations

Patent
09 Feb 2005
TL;DR: In this paper, a method and apparatus for implementing ABIST data compression and serialization for memory built-in self test of SRAM with redundancy is presented, which includes providing detection signals asserted for one failing data out, two failing data outs, and greater than two failing outs.
Abstract: A method and apparatus for implementing ABIST data compression and serialization for memory built-in self test of SRAM with redundancy. The method includes providing detection signals asserted for one failing data out, two failing data outs, and greater than two failing data outs. The method also includes individually encoding the failing bit position of each corresponding failing data out with a binary representation value corresponding therewith. The method further includes serializing results of the providing detection signals and the individually encoding, and transmitting results of the serializing to a redundancy support register function on a single fail buss.

9 citations

Patent
31 Oct 2013
TL;DR: In this paper, a single-ended input sense amplifier uses a pass device to couple the input local bit line to a global bit-line evaluation node, and the sense amplifier also includes a pair of cross-coupled inverters, a first inverter of which has an input that coupled directly to the global bit line evaluation node.
Abstract: A single-ended input sense amplifier uses a pass device to couple the input local bit-line to a global bit-line evaluation node. The sense amplifier also includes a pair of cross-coupled inverters, a first inverter of which has an input that coupled directly to the global bit-line evaluation node. The output of the second inverter is selectively coupled to the global bit-line evaluation node in response to a control signal, so that when the pass device is active, the local bit line charges or discharges the global bit-line evaluation node without being affected substantially by a state of the output of the second inverter. When the control signal is in the other state, the cross-coupled inverter forms a latch. An internal output control circuit of the second inverter interrupts the feedback provided by the second inverter in response to the control signal.

7 citations


Cited by
More filters
Proceedings ArticleDOI
14 Oct 2017
TL;DR: DRISA, a DRAM-based Reconfigurable In-Situ Accelerator architecture, is proposed to provide both powerful computing capability and large memory capacity/bandwidth to address the memory wall problem in traditional von Neumann architecture.
Abstract: Data movement between the processing units and the memory in traditional von Neumann architecture is creating the “memory wall” problem. To bridge the gap, two approaches, the memory-rich processor (more on-chip memory) and the compute-capable memory (processing-in-memory) have been studied. However, the first one has strong computing capability but limited memory capacity/bandwidth, whereas the second one is the exact the opposite.To address the challenge, we propose DRISA, a DRAM-based Reconfigurable In-Situ Accelerator architecture, to provide both powerful computing capability and large memory capacity/bandwidth. DRISA is primarily composed of DRAM memory arrays, in which every memory bitline can perform bitwise Boolean logic operations (such as NOR). DRISA can be reconfigured to compute various functions with the combination of the functionally complete Boolean logic operations and the proposed hierarchical internal data movement designs. We further optimize DRISA to achieve high performance by simultaneously activating multiple rows and subarrays to provide massive parallelism, unblocking the internal data movement bottlenecks, and optimizing activation latency and energy. We explore four design options and present a comprehensive case study to demonstrate significant acceleration of convolutional neural networks. The experimental results show that DRISA can achieve 8.8× speedup and 1.2× better energy efficiency compared with ASICs, and 7.7× speedup and 15× better energy efficiency over GPUs with integer operations.CCS CONCEPTS• Hardware → Dynamic memory; • Computer systems organization → reconfigurable computing; Neural networks;

315 citations

Patent
12 Mar 2008
TL;DR: In this paper, the read circuit senses a change in a voltage of the bitline of a bitline, and applies a voltage which is different from the first voltage to the gate of the first transistor when it senses a voltage change.
Abstract: A semiconductor memory device comprises memory cells, a bitline connected to the memory cells, a read circuit including a precharge circuit, and a first transistor connected between the bitline and the read circuit, wherein a first voltage is applied to a gate of the first transistor when the precharge circuit precharges the bitline, and a second voltage which is different from the first voltage is applied to the gate of the first transistor when the read circuit senses a change in a voltage of the bitline.

119 citations

Patent
27 Jan 2011
TL;DR: In this paper, a row address register is coupled with a column decoder to selectively activate a portion of the row based on the column address and the signal asserted on the word line.
Abstract: A disclosed example apparatus includes a row address register (412) to store a row address corresponding to a row (608) in a memory array (602). The example apparatus also includes a row decoder (604) coupled to the row address register to assert a signal on a wordline (704) of the row after the memory receives a column address. In addition, the example apparatus includes a column decoder (606) to selectively activate a portion of the row based on the column address and the signal asserted on the wordline.

66 citations

Proceedings ArticleDOI
13 Feb 2021
TL;DR: In this paper, the intrinsic charge sharing operation during a dynamic memory access can be used effectively to perform analog CIM computations: by reconfiguring existing eDRAM columns as charge domain circuits, thus, greatly minimizing peripheral circuit area and power overhead.
Abstract: The unprecedented growth in deep neural networks (DNN) size has led to massive amounts of data movement from off-chip memory to on-chip processing cores in modern machine learning (ML) accelerators. Compute-in-memory (CIM) designs performing analog DNN computations within a memory array, along with peripheral mixed-signal circuits, are being explored to mitigate this memory-wall bottleneck: consisting of memory latency and energy overhead. Embedded-dynamic random-access memory (eDRAM) [1], [2], which integrates the 1T1C (T=Transistor, C=Capacitor) DRAM bitcell monolithically along with high-performance logic transistors and interconnects, can enable custom CIM designs. It offers the densest embedded bitcell, a low pJ/bit access energy, a low soft error rate, high-endurance, high-performance, and high-bandwidth: all desired attributes for ML accelerators. In addition, the intrinsic charge sharing operation during a dynamic memory access can be used effectively to perform analog CIM computations: by reconfiguring existing eDRAM columns as charge domain circuits, thus, greatly minimizing peripheral circuit area and power overhead. Configuring a part of eDRAM as a CIM engine (for data conversion, DNN computations, and weight storage) and retaining the remaining part as a regular memory (for inputs, gradients during training, and non-CIM workload data) can help to meet the layer/kernel dependent variable storage needs during a DNN inference/training step. Thus, the high cost/bit of eDRAM can be amortized by repurposing part of existing large capacity, level-4 eDRAM caches [7] in high-end microprocessors, into large-scale CIM engines.

58 citations

Patent
Huilong Zhu1
28 Feb 2008
TL;DR: In this article, the annular gate electrode and non-annular contact via are formed sequentially in a self-aligned fashion while using a single sacrificial mandrel layer.
Abstract: A semiconductor structure and a method for fabricating the semiconductor structure include or provide a field effect device that includes a spacer shaped contact via. The spacer shaped contact via preferably comprises a spacer shaped annular contact via that is located surrounding and separated from an annular spacer shaped gate electrode at the center of which may be located a non-annular and non-spacer shaped second contact via. The annular gate electrode as well as the annular contact via and the non-annular contact via may be formed sequentially in a self-aligned fashion while using a single sacrificial mandrel layer.

58 citations