scispace - formally typeset
Search or ask a question
Author

Michael Niemier

Bio: Michael Niemier is an academic researcher from University of Notre Dame. The author has contributed to research in topics: Logic gate & Nanomagnet. The author has an hindex of 30, co-authored 194 publications receiving 3449 citations. Previous affiliations of Michael Niemier include Georgia Institute of Technology & University of California, Berkeley.


Papers
More filters
Journal ArticleDOI
01 Apr 2018
TL;DR: There are increasing gaps between the computational complexity and energy efficiency required for the continued scaling of deep neural networks and the hardware capacity actually available with current CMOS technology scaling, in situations where edge inference is required.
Abstract: Deep neural networks offer considerable potential across a range of applications, from advanced manufacturing to autonomous cars. A clear trend in deep neural networks is the exponential growth of network size and the associated increases in computational complexity and memory consumption. However, the performance and energy efficiency of edge inference, in which the inference (the application of a trained network to new data) is performed locally on embedded platforms that have limited area and power budget, is bounded by technology scaling. Here we analyse recent data and show that there are increasing gaps between the computational complexity and energy efficiency required by data scientists and the hardware capacity made available by hardware architects. We then discuss various architecture and algorithm innovations that could help to bridge the gaps. This Perspective highlights the existence of gaps between the computational complexity and energy efficiency required for the continued scaling of deep neural networks and the hardware capacity actually available with current CMOS technology scaling, in situations where edge inference is required; it then discusses various architecture and algorithm innovations that could help to bridge these gaps.

354 citations

Journal ArticleDOI
01 Nov 2019
TL;DR: It is shown that ternary content-addressable memories (TCAMs) can be used as attentional memories, in which the distance between a query vector and each stored entry is computed within the memory itself, thus avoiding data transfer.
Abstract: Deep neural networks are efficient at learning from large sets of labelled data, but struggle to adapt to previously unseen data. In pursuit of generalized artificial intelligence, one approach is to augment neural networks with an attentional memory so that they can draw on already learnt knowledge patterns and adapt to new but similar tasks. In current implementations of such memory augmented neural networks (MANNs), the content of a network’s memory is typically transferred from the memory to the compute unit (a central processing unit or graphics processing unit) to calculate similarity or distance norms. The processing unit hardware incurs substantial energy and latency penalties associated with transferring the data from the memory and updating the data at random memory addresses. Here, we show that ternary content-addressable memories (TCAMs) can be used as attentional memories, in which the distance between a query vector and each stored entry is computed within the memory itself, thus avoiding data transfer. Our compact and energy-efficient TCAM cell is based on two ferroelectric field-effect transistors. We evaluate the performance of our ferroelectric TCAM array prototype for one- and few-shot learning applications. When compared with a MANN where cosine distance calculations are performed on a graphics processing unit, the ferroelectric TCAM approach provides a 60-fold reduction in energy and 2,700-fold reduction in latency for a single memory search operation. A compact ternary content-addressable memory cell, which is based on two ferroelectric field-effect transistors, can provide memory augmented neural networks with improved energy and latency performance compared with traditional approaches based on graphics processing units.

190 citations

Journal ArticleDOI
TL;DR: Progress toward complete and reliable NML systems is reviewed and fundamental characteristics a device must possess if it is to be used in a digital system are reviewed.
Abstract: Quoting the International Technology Roadmap for Semiconductors (ITRS) 2009 Emerging Research Devices section, 'Nanomagnetic logic (NML) has potential advantages relative to CMOS of being non-volatile, dense, low-power, and radiation-hard. Such magnetic elements are compatible with MRAM technology, which can provide input–output interfaces. Compatibility with MRAM also promises a natural integration of memory and logic. Nanomagnetic logic also appears to be scalable to the ultimate limit of using individual atomic spins.' This article reviews progress toward complete and reliable NML systems. More specifically, we (i) review experimental progress toward fundamental characteristics a device must possess if it is to be used in a digital system, (ii) consider how the NML design space may impact the system-level energy (especially when considering the clock needed to drive a computation), (iii) explain—using both the NML design space and a discussion of clocking as context—how reliable circuit operation may be achieved, (iv) highlight experimental efforts regarding CMOS friendly clock structures for NML systems, (v) explain how electrical I/O could be achieved, and (vi) conclude with a brief discussion of suitable architectures for this technology. Throughout the article, we attempt to identify important areas for future work.

182 citations

Journal ArticleDOI
TL;DR: In this article, the first demonstration of deterministically placed quantum-dot cellular automata (QCA) devices is presented, where devices are controlled by on-chip local fields.
Abstract: We report local control of nanomagnets that can be arranged to perform computation in a cellular automata-like architecture. This letter represents the first demonstration of deterministically placed quantum-dot cellular automata (QCA) devices (of any implementation), where devices are controlled by on-chip local fields.

158 citations

Journal ArticleDOI
TL;DR: The design of dataow components for a simple microprocessor being designed exclusively in QCA are discussed and problems associated with initial designs and enumerated solutions to these problems are explained.
Abstract: SUMMARY The quantum cellular automata (QCA) is currently being investigated as an alternative to CMOS VLSI. While some simple logical circuits and devices have been studied, little if any work has been done in considering the architecture for systems of QCA devices. This work discusses the progress of one of the rst such eorts. Namely, the design of dataow components for a simple microprocessor being designed exclusively in QCA are discussed. Problems associated with initial designs and enumerated solutions to these problems (usually stemming from oorplanning techniques) are explained. Finally, areas of future research direction for circuit design in QCA are presented. Copyright ? 2001 John Wiley & Sons, Ltd.

129 citations


Cited by
More filters
01 Jan 2011

2,117 citations

Journal ArticleDOI
01 Jun 2018
TL;DR: This Review Article examines the development of in-memory computing using resistive switching devices, where the two-terminal structure of the devices, theirresistive switching properties, and direct data processing in the memory can enable area- and energy-efficient computation.
Abstract: Modern computers are based on the von Neumann architecture in which computation and storage are physically separated: data are fetched from the memory unit, shuttled to the processing unit (where computation takes place) and then shuttled back to the memory unit to be stored. The rate at which data can be transferred between the processing unit and the memory unit represents a fundamental limitation of modern computers, known as the memory wall. In-memory computing is an approach that attempts to address this issue by designing systems that compute within the memory, thus eliminating the energy-intensive and time-consuming data movement that plagues current designs. Here we review the development of in-memory computing using resistive switching devices, where the two-terminal structure of the devices, their resistive switching properties, and direct data processing in the memory can enable area- and energy-efficient computation. We examine the different digital, analogue, and stochastic computing schemes that have been proposed, and explore the microscopic physical mechanisms involved. Finally, we discuss the challenges in-memory computing faces, including the required scaling characteristics, in delivering next-generation computing. This Review Article examines the development of in-memory computing using resistive switching devices.

1,193 citations

01 Jan 2016
TL;DR: The design of analog cmos integrated circuits is universally compatible with any devices to read and is available in the book collection an online access to it is set as public so you can download it instantly.
Abstract: Thank you for downloading design of analog cmos integrated circuits. Maybe you have knowledge that, people have look hundreds times for their chosen books like this design of analog cmos integrated circuits, but end up in malicious downloads. Rather than enjoying a good book with a cup of coffee in the afternoon, instead they juggled with some harmful virus inside their computer. design of analog cmos integrated circuits is available in our book collection an online access to it is set as public so you can download it instantly. Our digital library spans in multiple countries, allowing you to get the most less latency time to download any of our books like this one. Kindly say, the design of analog cmos integrated circuits is universally compatible with any devices to read.

1,038 citations

Journal ArticleDOI
TL;DR: This Review provides an overview of memory devices and the key computational primitives enabled by these memory devices as well as their applications spanning scientific computing, signal processing, optimization, machine learning, deep learning and stochastic computing.
Abstract: Traditional von Neumann computing systems involve separate processing and memory units. However, data movement is costly in terms of time and energy and this problem is aggravated by the recent explosive growth in highly data-centric applications related to artificial intelligence. This calls for a radical departure from the traditional systems and one such non-von Neumann computational approach is in-memory computing. Hereby certain computational tasks are performed in place in the memory itself by exploiting the physical attributes of the memory devices. Both charge-based and resistance-based memory devices are being explored for in-memory computing. In this Review, we provide a broad overview of the key computational primitives enabled by these memory devices as well as their applications spanning scientific computing, signal processing, optimization, machine learning, deep learning and stochastic computing. This Review provides an overview of memory devices and the key computational primitives for in-memory computing, and examines the possibilities of applying this computing approach to a wide range of applications.

841 citations