scispace - formally typeset
Search or ask a question
Author

Kai Ni

Bio: Kai Ni is an academic researcher from Rochester Institute of Technology. The author has contributed to research in topics: Computer science & Non-volatile memory. The author has an hindex of 19, co-authored 99 publications receiving 1571 citations. Previous affiliations of Kai Ni include Katholieke Universiteit Leuven & University of Notre Dame.

Papers published on a yearly basis

Papers
More filters
Proceedings ArticleDOI
01 Dec 2017
TL;DR: A transient Presiach model is developed that accurately predicts minor loop trajectories and remnant polarization charge for arbitrary pulse width, voltage, and history of FeFET synapses and reveals a 103 to 106 acceleration in online learning latency over multi-state RRAM based analog synapses.
Abstract: The memory requirement of at-scale deep neural networks (DNN) dictate that synaptic weight values be stored and updated in off-chip memory such as DRAM, limiting the energy efficiency and training time. Monolithic cross-bar / pseudo cross-bar arrays with analog non-volatile memories capable of storing and updating weights on-chip offer the possibility of accelerating DNN training. Here, we harness the dynamics of voltage controlled partial polarization switching in ferroelectric-FETs (FeFET) to demonstrate such an analog synapse. We develop a transient Presiach model that accurately predicts minor loop trajectories and remnant polarization charge (P r ) for arbitrary pulse width, voltage, and history. We experimentally demonstrate a 5-bit FeFET synapse with symmetric potentiation and depression characteristics, and a 45x tunable range in conductance with 75ns update pulse. A circuit macro-model is used to evaluate and benchmark on-chip learning performance (area, latency, energy, accuracy) of FeFET synaptic core revealing a 103 to 106 acceleration in online learning latency over multi-state RRAM based analog synapses.

367 citations

Journal ArticleDOI
01 Aug 2018
TL;DR: This Perspective argues that electronics is poised to enter a new era of scaling – hyper-scaling – driven by advances in beyond-Boltzmann transistors, embedded non-volatile memories, monolithic three-dimensional integration and heterogeneous integration techniques.
Abstract: In the past five decades, the semiconductor industry has gone through two distinct eras of scaling: the geometric (or classical) scaling era and the equivalent (or effective) scaling era. As transistor and memory features approach 10 nanometres, it is apparent that room for further scaling in the horizontal direction is running out. In addition, the rise of data abundant computing is exacerbating the interconnect bottleneck that exists in conventional computing architecture between the compute cores and the memory blocks. Here we argue that electronics is poised to enter a new, third era of scaling — hyper-scaling — in which resources are added when needed to meet the demands of data abundant workloads. This era will be driven by advances in beyond-Boltzmann transistors, embedded non-volatile memories, monolithic three-dimensional integration and heterogeneous integration techniques. This Perspective argues that electronics is poised to enter a new era of scaling – hyper-scaling – driven by advances in beyond-Boltzmann transistors, embedded non-volatile memories, monolithic three-dimensional integration, and heterogeneous integration techniques.

343 citations

Journal ArticleDOI
TL;DR: In this paper, the critical design criteria of Hf0.5Zr 0.5O2 (HZO)-based ferroelectric field effect transistor (FeFET) for nonvolatile memory application were established.
Abstract: We fabricate, characterize, and establish the critical design criteria of Hf0.5Zr0.5O2 (HZO)-based ferroelectric field effect transistor (FeFET) for nonvolatile memory application. We quantify ${V}_{\textsf {TH}}$ shift from electron (hole) trapping in the vicinity of ferroelectric (FE)/interlayer (IL) interface, induced by erase (program) pulse, and ${V}_{\textsf {TH}}$ shift from polarization switching to determine true memory window (MW). The devices exhibit extrapolated retention up to 10 years at 85 °C and endurance up to $5\times 10^{6}$ cycles initiated by the IL breakdown. Endurance up to 1012 cycles of partial polarization switching is shown in metal–FE–metal capacitor, in the absence of IL. A comprehensive metal–FE–insulator–semiconductor FeFET model is developed to quantify the electric field distribution in the gate-stack, and an IL design guideline is established to markedly enhance MW, retention characteristics, and cycling endurance.

247 citations

Journal ArticleDOI
01 Nov 2019
TL;DR: It is shown that ternary content-addressable memories (TCAMs) can be used as attentional memories, in which the distance between a query vector and each stored entry is computed within the memory itself, thus avoiding data transfer.
Abstract: Deep neural networks are efficient at learning from large sets of labelled data, but struggle to adapt to previously unseen data. In pursuit of generalized artificial intelligence, one approach is to augment neural networks with an attentional memory so that they can draw on already learnt knowledge patterns and adapt to new but similar tasks. In current implementations of such memory augmented neural networks (MANNs), the content of a network’s memory is typically transferred from the memory to the compute unit (a central processing unit or graphics processing unit) to calculate similarity or distance norms. The processing unit hardware incurs substantial energy and latency penalties associated with transferring the data from the memory and updating the data at random memory addresses. Here, we show that ternary content-addressable memories (TCAMs) can be used as attentional memories, in which the distance between a query vector and each stored entry is computed within the memory itself, thus avoiding data transfer. Our compact and energy-efficient TCAM cell is based on two ferroelectric field-effect transistors. We evaluate the performance of our ferroelectric TCAM array prototype for one- and few-shot learning applications. When compared with a MANN where cosine distance calculations are performed on a graphics processing unit, the ferroelectric TCAM approach provides a 60-fold reduction in energy and 2,700-fold reduction in latency for a single memory search operation. A compact ternary content-addressable memory cell, which is based on two ferroelectric field-effect transistors, can provide memory augmented neural networks with improved energy and latency performance compared with traditional approaches based on graphics processing units.

190 citations

Proceedings ArticleDOI
18 Jun 2018
TL;DR: In this paper, the authors developed a compact model of ferroelectric field effect transistors (FeFET) for memory applications, enabling their exploration at the circuit and architecture level.
Abstract: In this work we develop a compact model of ferroelectric field-effect-transistors (FeFET) for memory applications, enabling their exploration at the circuit and architecture level. In contrast to Landau-Khalatnikov (L-K) based approaches, the presented model is founded on the combination of a nucleation dominated multi-domain Presiach theory of ferroelectric switching with a conventional transistor model. The model successfully reproduces the evolution of the FeFET memory window as a function of the program and erase conditions (amplitude, pulse width, and history). To calibrate the model, we fabricated 10nm thick Hf 0.4 Zr 0.6 O 2 (HZO) MFM capacitors and FeFETs and characterized the polarization switching dynamics. Our results highlight the importance of accounting for the switching history, minor loop trajectory, and coupled time-voltage response of the ferroelectric to quantitatively reproduce the measured FeFET characteristics.

120 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This Review provides an overview of memory devices and the key computational primitives enabled by these memory devices as well as their applications spanning scientific computing, signal processing, optimization, machine learning, deep learning and stochastic computing.
Abstract: Traditional von Neumann computing systems involve separate processing and memory units. However, data movement is costly in terms of time and energy and this problem is aggravated by the recent explosive growth in highly data-centric applications related to artificial intelligence. This calls for a radical departure from the traditional systems and one such non-von Neumann computational approach is in-memory computing. Hereby certain computational tasks are performed in place in the memory itself by exploiting the physical attributes of the memory devices. Both charge-based and resistance-based memory devices are being explored for in-memory computing. In this Review, we provide a broad overview of the key computational primitives enabled by these memory devices as well as their applications spanning scientific computing, signal processing, optimization, machine learning, deep learning and stochastic computing. This Review provides an overview of memory devices and the key computational primitives for in-memory computing, and examines the possibilities of applying this computing approach to a wide range of applications.

841 citations

Journal ArticleDOI
23 Jan 2018
TL;DR: This comprehensive review summarizes state of the art, challenges, and prospects of the neuro-inspired computing with emerging nonvolatile memory devices and presents a device-circuit-algorithm codesign methodology to evaluate the impact of nonideal device effects on the system-level performance.
Abstract: This comprehensive review summarizes state of the art, challenges, and prospects of the neuro-inspired computing with emerging nonvolatile memory devices. First, we discuss the demand for developing neuro-inspired architecture beyond today’s von-Neumann architecture. Second, we summarize the various approaches to designing the neuromorphic hardware (digital versus analog, spiking versus nonspiking, online training versus offline training) and discuss why emerging nonvolatile memory is attractive for implementing the synapses in the neural network. Then, we discuss the desired device characteristics of the synaptic devices (e.g., multilevel states, weight update nonlinearity/asymmetry, variation/noise), and survey a few representative material systems and device prototypes reported in the literature that show the analog conductance tuning. These candidates include phase change memory, resistive memory, ferroelectric memory, floating-gate transistors, etc. Next, we introduce the crossbar array architecture to accelerate the weighted sum and weight update operations that are commonly used in the neuro-inspired machine learning algorithms, and review the recent progresses of array-level experimental demonstrations for pattern recognition tasks. In addition, we discuss the peripheral neuron circuit design issues and present a device-circuit-algorithm codesign methodology to evaluate the impact of nonideal device effects on the system-level performance (e.g., learning accuracy). Finally, we give an outlook on the customization of the learning algorithms for efficient hardware implementation.

730 citations

Journal ArticleDOI
TL;DR: This Review surveys the four physical mechanisms that lead to resistive switching materials enable novel, in-memory information processing, which may resolve the von Neumann bottleneck and examines the device requirements for systems based on RSMs.
Abstract: The rapid increase in information in the big-data era calls for changes to information-processing paradigms, which, in turn, demand new circuit-building blocks to overcome the decreasing cost-effectiveness of transistor scaling and the intrinsic inefficiency of using transistors in non-von Neumann computing architectures. Accordingly, resistive switching materials (RSMs) based on different physical principles have emerged for memories that could enable energy-efficient and area-efficient in-memory computing. In this Review, we survey the four physical mechanisms that lead to such resistive switching: redox reactions, phase transitions, spin-polarized tunnelling and ferroelectric polarization. We discuss how these mechanisms equip RSMs with desirable properties for representation capability, switching speed and energy, reliability and device density. These properties are the key enablers of processing-in-memory platforms, with applications ranging from neuromorphic computing and general-purpose memcomputing to cybersecurity. Finally, we examine the device requirements for such systems based on RSMs and provide suggestions to address challenges in materials engineering, device optimization, system integration and algorithm design. Resistive switching materials enable novel, in-memory information processing, which may resolve the von Neumann bottleneck. This Review focuses on how the switching mechanisms and the resultant electrical properties lead to various computing applications.

564 citations

Journal ArticleDOI
23 Apr 2020-Nature
TL;DR: This work shifts the search for the fundamental limits of ferroelectricity to simpler transition-metal oxide systems—that is, from perovskite-derived complex oxides to fluorite-structure binary oxides—in which ‘reverse’ size effects counterintuitively stabilize polar symmetry in the ultrathin regime.
Abstract: Ultrathin ferroelectric materials could potentially enable low-power logic and nonvolatile memories1,2. As ferroelectric materials are made thinner, however, the ferroelectricity is usually suppressed. Size effects in ferroelectrics have been thoroughly investigated in perovskite oxides—the archetypal ferroelectric system3. Perovskites, however, have so far proved unsuitable for thickness scaling and integration with modern semiconductor processes4. Here we report ferroelectricity in ultrathin doped hafnium oxide (HfO2), a fluorite-structure oxide grown by atomic layer deposition on silicon. We demonstrate the persistence of inversion symmetry breaking and spontaneous, switchable polarization down to a thickness of one nanometre. Our results indicate not only the absence of a ferroelectric critical thickness but also enhanced polar distortions as film thickness is reduced, unlike in perovskite ferroelectrics. This approach to enhancing ferroelectricity in ultrathin layers could provide a route towards polarization-driven memories and ferroelectric-based advanced transistors. This work shifts the search for the fundamental limits of ferroelectricity to simpler transition-metal oxide systems—that is, from perovskite-derived complex oxides to fluorite-structure binary oxides—in which ‘reverse’ size effects counterintuitively stabilize polar symmetry in the ultrathin regime. Enhanced switchable ferroelectric polarization is achieved in doped hafnium oxide films grown directly onto silicon using low-temperature atomic layer deposition, even at thicknesses of just one nanometre.

431 citations

Journal ArticleDOI
TL;DR: A comprehensive review on emerging artificial neuromorphic devices and their applications is offered, showing that anion/cation migration-based memristive devices, phase change, and spintronic synapses have been quite mature and possess excellent stability as a memory device, yet they still suffer from challenges in weight updating linearity and symmetry.
Abstract: The rapid development of information technology has led to urgent requirements for high efficiency and ultralow power consumption. In the past few decades, neuromorphic computing has drawn extensive attention due to its promising capability in processing massive data with extremely low power consumption. Here, we offer a comprehensive review on emerging artificial neuromorphic devices and their applications. In light of the inner physical processes, we classify the devices into nine major categories and discuss their respective strengths and weaknesses. We will show that anion/cation migration-based memristive devices, phase change, and spintronic synapses have been quite mature and possess excellent stability as a memory device, yet they still suffer from challenges in weight updating linearity and symmetry. Meanwhile, the recently developed electrolyte-gated synaptic transistors have demonstrated outstanding energy efficiency, linearity, and symmetry, but their stability and scalability still need to be optimized. Other emerging synaptic structures, such as ferroelectric, metal–insulator transition based, photonic, and purely electronic devices also have limitations in some aspects, therefore leading to the need for further developing high-performance synaptic devices. Additional efforts are also demanded to enhance the functionality of artificial neurons while maintaining a relatively low cost in area and power, and it will be of significance to explore the intrinsic neuronal stochasticity in computing and optimize their driving capability, etc. Finally, by looking into the correlations between the operation mechanisms, material systems, device structures, and performance, we provide clues to future material selections, device designs, and integrations for artificial synapses and neurons.

373 citations