scispace - formally typeset
Search or ask a question
Author

Toshiaki Kirihata

Bio: Toshiaki Kirihata is an academic researcher from GlobalFoundries. The author has contributed to research in topics: Transistor & Sense amplifier. The author has an hindex of 9, co-authored 29 publications receiving 229 citations.

Papers
More filters
Patent
31 May 2001
TL;DR: In this paper, a single bitline direct sensing architecture employs a 4 transistor sense amplifier circuit located in each memory array, wherein the transistors function to selectively transfer data bits from either a true bitline or a complement bitline of the bitline pair to a data line.
Abstract: A single bitline direct sensing architecture employs a 4 transistor sense amplifier circuit located in each memory array, wherein the transistors function to selectively transfer data bits from either a true bitline or a complement bitline of the bitline pair to a data line. The data line is preferably arranged over a plurality of memory arrays. The data line may or may not be shared for the read and write operations. One current source is additionally used to precharge the datalines in a read mode, performing the function of a digital sensing scheme by detecting a resistance ratio between the current source and the transistor driven by the bitline for the corresponding array. A simple inverter may be used for detecting a level of the data line determined by the resistance ratio. The bitline pair is sensed in a single ended fashion, eliminating the need for a cross-coupled pair of CMOS devices, and thus reducing the required layout area. By accessing the bitline pair individually, two sets of control signals for the pre-charge, EQ0, EQ1, are developed to allow for bitline shielding in the array. This technique greatly reduces bitline to bitline coupling noise which is a concern for high speed, low cycle time memory applications. The simplicity of this single bitline architecture allows all data bit in a memory array to be transferred to the corresponding data lines, resulting in an ultimate bandwidth. The read and write data lines may be arranged with a pitch exactly the same as the bitline pitch. This thus makes it possible to transfer all of the data bits to the corresponding read data lines in a first memory array, while receiving all of the data bits from the corresponding write data lines in a second memory array.

28 citations

Patent
Hoki Kim1, Toshiaki Kirihata2, David R. Hanson2, Gregory J. Fredeman2, John Golz2 
15 Jul 2003
TL;DR: In this article, a DRAM memory bank is divided into a plurality of memory banks, and each bank has a pair of separate flag bit registers for each bank with the flag bit register that are shifted up/down respectively.
Abstract: In a DRAM, which includes a plurality of memory banks, there is a pair of separate flag bit registers for each bank with the flag bit registers that are shifted up/down respectively. A comparator for each bank provides a comparator output. An arbiter for each bank is connected to receive a flag bit up signal and a flag bit down signal from the flag bit registers for that bank and the comparator output from the comparator for that bank. The arbiters are connected to receive a conflict in signal and to provide a conflict out signal. The pair of flag bit registers represent a refresh status of each bank and designate memory banks or arrays that are ready for a refresh operation.

22 citations

Patent
25 Mar 2014
TL;DR: In this article, the memory cells having a charge-trap behavior are arranged in an NOR type memory array, allowing to create a physically unclonable fuse (PUF) generation using non-programmed memory cells, while stringing non-volatile bits in programmed memory cells.
Abstract: A method for identifying an unclonable chip uses hardware intrinsic keys and authentication responses employing intrinsic parameters of memory cells invariant and unique to the unclonable chip, wherein intrinsic parameters that characterize the chip can extend over its lifetime. The memory cells having a charge-trap behavior are arranged in an NOR type memory array, allowing to create a physically unclonable fuse (PUF) generation using non-programmed memory cells, while stringing non-volatile bits in programmed memory cells. The non-volatile memory cell bits are used for error-correction-code (ECC) for the generated PUF. The invention can further include a public identification using non-volatile bits, allowing hand shaking authentication using computer with dynamic challenge.

22 citations

Journal ArticleDOI
TL;DR: A 1.1 Mb embedded DRAM macro (eDRAM), for next-generation IBM SOI processors, employs 14 nm FinFET logic technology with 0.0174 μm2 deep-trench capacitor cell that enables a high voltage gain of a power-gated inverter at mid-level input voltage.
Abstract: A 1.1 Mb embedded DRAM macro (eDRAM), for next-generation IBM SOI processors, employs 14 nm FinFET logic technology with $\hbox{0.0174}~\mu\hbox{m}^{2}$ deep-trench capacitor cell. A Gated-feedback sense amplifier enables a high voltage gain of a power-gated inverter at mid-level input voltage, while supporting 66 cells per local bit-line. A dynamic-and-gate-thin-oxide word-line driver that tracks standard logic process variation improves the eDRAM array performance with reduced area. The 1.1 $~$ Mb macro composed of 8 $\times$ 2 72 Kb subarrays is organized with a center interface block architecture, allowing 1 ns access latency and 1 ns bank interleaving operation using two banks, each having 2 ns random access cycle. 5 GHz operation has been demonstrated in a system prototype, which includes 6 instances of 1.1 Mb eDRAM macros, integrated with an array-built-in-self-test engine, phase-locked loop (PLL), and word-line high and word-line low voltage generators. The advantage of the 14 nm FinFET array over the 22 nm array was confirmed using direct tester control of the 1.1 Mb eDRAM macros integrated in 16 Mb inline monitor.

18 citations

Patent
03 Jul 2001
TL;DR: In this article, an integrated redundancy eDRAM architecture for an embedded DRAM macro system having a wide data bandwidth and wide internal bus width is disclosed which provides column and row redundancy for defective columns and rows of the e-DRAM system.
Abstract: An integrated redundancy eDRAM architecture system for an embedded DRAM macro system having a wide data bandwidth and wide internal bus width is disclosed which provides column and row redundancy for defective columns and rows of the eDRAM macro system. Internally generated column and row addresses of defective columns and rows of each micro-cell block are stored in a memory device, such as a fuse bank, during an eDRAM macro test mode in order for the information to be quickly retrieved during each cycle of eDRAM operation to provide an SRAM-like operation. A column steering circuit steers column redundant elements to replace defective column elements. Redundancy information is either supplied from a SRAM fuse data storage device or from a TAG memory device depending on whether a read or write operation, respectively, is being performed. The integrated redundancy eDRAM architecture system enables data to be sent and received to and from the eDRAM macro system without adding any extra delay to the data flow, thereby protecting data flow pattern integrity.

17 citations


Cited by
More filters
Proceedings ArticleDOI
14 Oct 2017
TL;DR: DRISA, a DRAM-based Reconfigurable In-Situ Accelerator architecture, is proposed to provide both powerful computing capability and large memory capacity/bandwidth to address the memory wall problem in traditional von Neumann architecture.
Abstract: Data movement between the processing units and the memory in traditional von Neumann architecture is creating the “memory wall” problem. To bridge the gap, two approaches, the memory-rich processor (more on-chip memory) and the compute-capable memory (processing-in-memory) have been studied. However, the first one has strong computing capability but limited memory capacity/bandwidth, whereas the second one is the exact the opposite.To address the challenge, we propose DRISA, a DRAM-based Reconfigurable In-Situ Accelerator architecture, to provide both powerful computing capability and large memory capacity/bandwidth. DRISA is primarily composed of DRAM memory arrays, in which every memory bitline can perform bitwise Boolean logic operations (such as NOR). DRISA can be reconfigured to compute various functions with the combination of the functionally complete Boolean logic operations and the proposed hierarchical internal data movement designs. We further optimize DRISA to achieve high performance by simultaneously activating multiple rows and subarrays to provide massive parallelism, unblocking the internal data movement bottlenecks, and optimizing activation latency and energy. We explore four design options and present a comprehensive case study to demonstrate significant acceleration of convolutional neural networks. The experimental results show that DRISA can achieve 8.8× speedup and 1.2× better energy efficiency compared with ASICs, and 7.7× speedup and 15× better energy efficiency over GPUs with integer operations.CCS CONCEPTS• Hardware → Dynamic memory; • Computer systems organization → reconfigurable computing; Neural networks;

315 citations

Patent
Curt L. Cotner1, Roger Lee Miller1
30 Sep 2008
TL;DR: In this article, the authors propose access control at the row level in a relational database table, where a user's security label is encoded with security information concerning the user, and when a user requests access to a row, a security mechanism compares the user security information with the security information in the row.
Abstract: Access control methods provide multilevel and mandatory access control for a database management system. The access control techniques provide access control at the row level in a relational database table. The database table contains a security label column within which is recorded a security label that is defined within a hierarchical security scheme. A user's security label is encoded with security information concerning the user. When a user requests access to a row, a security mechanism compares the user's security information with the security information in the row. If the user's security dominates the row's security, the user is given access to the row.

138 citations

Patent
Dean A. Klein1
16 Feb 2011
TL;DR: In this article, the memory cells that are unable to retain data bits are identified by a modified sense amplifier and a refresh counter in the DRAM generates refresh row addresses that are used to refresh rows of memory cells.
Abstract: A DRAM includes a register storing subsets of row addresses corresponding to rows containing at least one memory cell that is unable to store a data bit during a normal refresh cycle. Each subset includes all but the most significant bit of a corresponding row address. A refresh counter in the DRAM generates refresh row addresses that are used to refresh rows of memory cells. The refresh row addresses are compared to the subsets of row addresses that are stored in the register. In the event of a match, the row of memory cells corresponding to the matching subset of bits is refreshed. The number of refreshes occurring each refresh cycle will depend upon the number of bits in the subset that are omitted from the row address. The memory cells that are unable to retain data bits are identified by a modified sense amplifier.

135 citations

Journal ArticleDOI
TL;DR: This paper shows that with the apparent slowing down of semiconductor scaling and the advent of the Internet of Things, there is a focus on heterogeneous integration and system-level scaling, and proposes ways in which this transformation can evolve to provide a significant value at the system level while providing a significantly lower barrier to entry compared with a chip-based SoC approach.
Abstract: Moore’s law has so far relied on the aggressive scaling of CMOS silicon minimum features of over $1000\times $ for over four decades, and recently, on the adoption of innovative features, such as Cu interconnects, low- $k$ dielectrics for interconnects, strained channels, and high- $k$ materials for gate dielectrics, resulting in a better power performance, cost per function, and density every generation. This has spawned a vibrant system-on-chip (SoC) approach, where progressively more function has been integrated on a single die. The integration of multiple dies on packages and boards has, however, scaled only modestly by a factor of three to five times. In this paper, we show that with the apparent slowing down of semiconductor scaling and the advent of the Internet of Things, there is a focus on heterogeneous integration and system-level scaling. Packaging is undergoing a transformation that focuses on overall system performance and cost rather than on individual components. We propose ways in which this transformation can evolve to provide a significant value at the system level while providing a significantly lower barrier to entry compared with a chip-based SoC approach that is currently used. This transformation is already under way with 3-D stacking of dies and will evolve to make heterogeneous integration the backbone of sustaining Moore’s law in the years ahead.

108 citations

Journal ArticleDOI
TL;DR: The latest progress in the area of micro/nanoscale 3D assembly, covering the various classes of methods through rolling, folding, curving, and buckling assembly, is discussed, focusing on the design concepts, principles, and applications of different methods, followed by an outlook on the remaining challenges and open opportunities.
Abstract: The miniaturization of electronics has been an important topic of study for several decades. The established roadmaps following Moore's Law have encountered bottlenecks in recent years, as planar processing techniques are already close to their physical limits. To bypass some of the intrinsic challenges of planar technologies, more and more efforts have been devoted to the development of 3D electronics, through either direct 3D fabrication or indirect 3D assembly. Recent research efforts into direct 3D fabrication have focused on the development of 3D transistor technologies and 3D heterogeneous integration schemes, but these technologies are typically constrained by the accessible range of sophisticated 3D geometries and the complexity of the fabrication processes. As an alternative route, 3D assembly methods make full use of mature planar technologies to form predefined 2D precursor structures in the desired materials and sizes, which are then transformed into targeted 3D mesostructures by mechanical deformation. The latest progress in the area of micro/nanoscale 3D assembly, covering the various classes of methods through rolling, folding, curving, and buckling assembly, is discussed, focusing on the design concepts, principles, and applications of different methods, followed by an outlook on the remaining challenges and open opportunities.

94 citations