scispace - formally typeset
Search or ask a question

Showing papers in "arXiv: Emerging Technologies in 2018"


Journal ArticleDOI
TL;DR: In this paper, the authors proposed a novel "simultaneous logic in-memory" (SLIM) methodology that allows to implement both memory and logic operations simultaneously on the same bitcell in a non-destructive manner without losing the previously stored Memory state.
Abstract: Von Neumann architecture based computers isolate/physically separate computation and storage units i.e. data is shuttled between computation unit (processor) and memory unit to realize logic/ arithmetic and storage functions. This to-and-fro movement of data leads to a fundamental limitation of modern computers, known as the memory wall. Logic in-Memory (LIM) approaches aim to address this bottleneck by computing inside the memory units and thereby eliminating the energy-intensive and time-consuming data movement. However, most LIM approaches reported in literature are not truly "simultaneous" as during LIM operation the bitcell can be used only as a Memory cell or only as a Logic cell. The bitcell is not capable of storing both the Memory/Logic outputs simultaneously. Here, we propose a novel 'Simultaneous Logic in-Memory' (SLIM) methodology that allows to implement both Memory and Logic operations simultaneously on the same bitcell in a non-destructive manner without losing the previously stored Memory state. Through extensive experiments we demonstrate the SLIM methodology using non-filamentary bilayer analog OxRAM devices with NMOS transistors (2T-1R bitcell). Detailed programming scheme, array level implementation and controller architecture are also proposed. Furthermore, to study the impact of introducing SLIM array in the memory hierarchy, a simple image processing application (edge detection) is also investigated. It has been estimated that by performing all computations inside the SLIM array, the total Energy Delay Product (EDP) reduces by ~ 40x in comparison to a modern-day computer. EDP saving owing to reduction in data transfer between CPU Memory is observed to be ~ 780x.

384 citations


Posted Content
TL;DR: This review aims to explain the principles of quantum programming, which are quite different from classical programming, with straightforward algebra that makes understanding of the underlying fascinating quantum mechanical principles optional.
Abstract: As quantum computers become available to the general public, the need has arisen to train a cohort of quantum programmers, many of whom have been developing classical computer programs for most of their careers. While currently available quantum computers have less than 100 qubits, quantum computing hardware is widely expected to grow in terms of qubit count, quality, and connectivity. This review aims to explain the principles of quantum programming, which are quite different from classical programming, with straightforward algebra that makes understanding of the underlying fascinating quantum mechanical principles optional. We give an introduction to quantum computing algorithms and their implementation on real quantum hardware. We survey 20 different quantum algorithms, attempting to describe each in a succinct and self-contained fashion. We show how these algorithms can be implemented on IBM's quantum computer, and in each case, we discuss the results of the implementation with respect to differences between the simulator and the actual hardware runs. This article introduces computer scientists, physicists, and engineers to quantum algorithms and provides a blueprint for their implementations.

173 citations


Journal ArticleDOI
TL;DR: In this paper, the authors show that neurons built with nanoscale vanadium dioxide active memristors possess all three classes of excitability and most of the known biological neuronal dynamics, and are intrinsically stochastic.
Abstract: Neuromorphic networks of artificial neurons and synapses can solve computational hard problems with energy efficiencies unattainable for von Neumann architectures. For image processing, silicon neuromorphic processors outperform graphic processing units (GPUs) in energy efficiency by a large margin, but they deliver much lower chip-scale throughput. The performance-efficiency dilemma for silicon processors may not be overcome by Moore's law scaling of complementary metal-oxide-semiconductor (CMOS) field-effect transistors. Scalable and biomimetic active memristor neurons and passive memristor synapses form a self-sufficient basis for a transistorless neural network. However, previous demonstrations of memristor neurons only showed simple integrate-and-fire (I&F) behaviors and did not reveal the rich dynamics and computational complexity of biological neurons. Here we show that neurons built with nanoscale vanadium dioxide active memristors possess all three classes of excitability and most of the known biological neuronal dynamics, and are intrinsically stochastic. With the favorable size and power scaling, there is a path toward an all-memristor neuromorphic cortical computer.

144 citations


Journal ArticleDOI
TL;DR: Yang et al. as discussed by the authors demonstrate that LSTM can be implemented with a memristor crossbar, which has a small circuit footprint to store a large number of parameters and in-memory computing capability that circumvents thevon Neumann bottleneck.
Abstract: Recent breakthroughs in recurrent deep neural networks with long short-term memory (LSTM) units has led to major advances in artificial intelligence. State-of-the-art LSTM models with significantly increased complexity and a large number of parameters, however, have a bottleneck in computing power resulting from limited memory capacity and data communication bandwidth. Here we demonstrate experimentally that LSTM can be implemented with a memristor crossbar, which has a small circuit footprint to store a large number of parameters and in-memory computing capability that circumvents the 'von Neumann bottleneck'. We illustrate the capability of our system by solving real-world problems in regression and classification, which shows that memristor LSTM is a promising low-power and low-latency hardware platform for edge inference.

132 citations


Journal ArticleDOI
TL;DR: In this article, a new type of photonic accelerator based on coherent detection is presented, which can be operated at high (GHz) speeds and very low (sub-aJ) energies per multiply-and-accumulate (MAC), using the massive spatial multiplexing enabled by standard free-space optical components.
Abstract: Recent success in deep neural networks has generated strong interest in hardware accelerators to improve speed and energy consumption. This paper presents a new type of photonic accelerator based on coherent detection that is scalable to large ($N \gtrsim 10^6$) networks and can be operated at high (GHz) speeds and very low (sub-aJ) energies per multiply-and-accumulate (MAC), using the massive spatial multiplexing enabled by standard free-space optical components. In contrast to previous approaches, both weights and inputs are optically encoded so that the network can be reprogrammed and trained on the fly. Simulations of the network using models for digit- and image-classification reveal a "standard quantum limit" for optical neural networks, set by photodetector shot noise. This bound, which can be as low as 50 zJ/MAC, suggests performance below the thermodynamic (Landauer) limit for digital irreversible computation is theoretically possible in this device. The proposed accelerator can implement both fully-connected and convolutional networks. We also present a scheme for back-propagation and training that can be performed in the same hardware. This architecture will enable a new class of ultra-low-energy processors for deep learning.

79 citations


Book ChapterDOI
TL;DR: This article reviews the recent progress in integrated neuromorphic photonics, provides an overview of neuromorphic computing, discusses the associated technology (microelectronic and photonic) platforms and compare their metric performance, and provides an in-depth description of photonic neurons and a candidate interconnection architecture.
Abstract: In an age overrun with information, the ability to process reams of data has become crucial. The demand for data will continue to grow as smart gadgets multiply and become increasingly integrated into our daily lives. Next-generation industries in artificial intelligence services and high-performance computing are so far supported by microelectronic platforms. These data-intensive enterprises rely on continual improvements in hardware. Their prospects are running up against a stark reality: conventional one-size-fits-all solutions offered by digital electronics can no longer satisfy this need, as Moore's law (exponential hardware scaling), interconnection density, and the von Neumann architecture reach their limits. With its superior speed and reconfigurability, analog photonics can provide some relief to these problems; however, complex applications of analog photonics have remained largely unexplored due to the absence of a robust photonic integration industry. Recently, the landscape for commercially-manufacturable photonic chips has been changing rapidly and now promises to achieve economies of scale previously enjoyed solely by microelectronics. The scientific community has set out to build bridges between the domains of photonic device physics and neural networks, giving rise to the field of \emph{neuromorphic photonics}. This article reviews the recent progress in integrated neuromorphic photonics. We provide an overview of neuromorphic computing, discuss the associated technology (microelectronic and photonic) platforms and compare their metric performance. We discuss photonic neural network approaches and challenges for integrated neuromorphic photonic processors while providing an in-depth description of photonic neurons and a candidate interconnection architecture. We conclude with a future outlook of neuro-inspired photonic processing.

64 citations


Journal ArticleDOI
TL;DR: A reception model consisting of a set of pure loss queuing systems is proposed, which can be used in rate control algorithms to optimally determine the optimal release rate of molecules in drug delivery applications.
Abstract: This paper considers the scenario of a targeted drug delivery system, which consists of deploying a number of biological nanomachines close to a biological target (e.g. a tumor), able to deliver drug molecules in the diseased area. Suitably located transmitters are designed to release a continuous flow of drug molecules in the surrounding environment, where they diffuse and reach the target. These molecules are received when they chemically react with compliant receptors deployed on the receiver surface. In these conditions, if the release rate is relatively high and the drug absorption time is significant, congestion may happen, essentially at the receiver site. This phenomenon limits the drug absorption rate and makes the signal transmission ineffective, with an undesired diffusion of drug molecules elsewhere in the body. The original contribution of this paper consists of a theoretical analysis of the causes of congestion in diffusion-based molecular communications. For this purpose, it is proposed a reception model consisting of a set of pure loss queuing systems. The proposed model exhibits an excellent agreement with the results of a simulation campaign made by using the Biological and Nano-Scale communication simulator version 2 (BiNS2), a well-known simulator for molecular communications, whose reliability has been assessed through in-vitro experiments. The obtained results can be used in rate control algorithms to optimally determine the optimal release rate of molecules in drug delivery applications.

59 citations


Posted Content
TL;DR: In this article, a qualitative and quantitative understanding of the errors and the loss of DNA molecules is provided to help guide the design of future DNA data storage systems by providing a quantitative and qualitative understanding.
Abstract: Owing to its longevity and enormous information density, DNA, the molecule encoding biological information, has emerged as a promising archival storage medium. However, due to technological constraints, data can only be written onto many short DNA molecules that are stored in an unordered way, and can only be read by sampling from this DNA pool. Moreover, imperfections in writing (synthesis), reading (sequencing), storage, and handling of the DNA, in particular amplification via PCR, lead to a loss of DNA molecules and induce errors within the molecules. In order to design DNA storage systems, a qualitative and quantitative understanding of the errors and the loss of molecules is crucial. In this paper, we characterize those error probabilities by analyzing data from our own experiments as well as from experiments of two different groups. We find that errors within molecules are mainly due to synthesis and sequencing, while imperfections in handling and storage lead to a significant loss of sequences. The aim of our study is to help guide the design of future DNA data storage systems by providing a quantitative and qualitative understanding of the DNA data storage channel.

53 citations


Journal ArticleDOI
TL;DR: This work proposes a purely photonic operation of an Integrate-and-Fire Spiking neuron, based on the phase change dynamics of Ge2Sb2Te5 (GST) embedded on top of a microring resonator, which alleviates the energy constraints of PCMs in electrical domain.
Abstract: The rapid growth of brain-inspired computing coupled with the inefficiencies in the CMOS implementations of neuromrphic systems has led to intense exploration of efficient hardware implementations of the functional units of the brain, namely, neurons and synapses. However, efforts have largely been invested in implementations in the electrical domain with potential limitations of switching speed, packing density of large integrated systems and interconnect losses. As an alternative, neuromorphic engineering in the photonic domain has recently gained attention. In this work, we demonstrate a purely photonic operation of an Integrate-and-Fire Spiking neuron, based on the phase change dynamics of Ge$_2$Sb$_2$Te$_5$ (GST) embedded on top of a microring resonator, which alleviates the energy constraints of PCMs in electrical domain. We also show that such a neuron can be potentially integrated with on-chip synapses into an all-Photonic Spiking Neural network inferencing framework which promises to be ultrafast and can potentially offer a large operating bandwidth.

51 citations


Journal ArticleDOI
TL;DR: In this paper, the authors discuss how to employ one such property, memory (time non-locality), in a novel physics-based approach to computation, and focus on digital memcomputing machines (DMMs) that are scalable.
Abstract: It is well known that physical phenomena may be of great help in computing some difficult problems efficiently. A typical example is prime factorization that may be solved in polynomial time by exploiting quantum entanglement on a quantum computer. There are, however, other types of (non-quantum) physical properties that one may leverage to compute efficiently a wide range of hard problems. In this perspective we discuss how to employ one such property, memory (time non-locality), in a novel physics-based approach to computation: Memcomputing. In particular, we focus on digital memcomputing machines (DMMs) that are scalable. DMMs can be realized with non-linear dynamical systems with memory. The latter property allows the realization of a new type of Boolean logic, one that is self-organizing. Self-organizing logic gates are "terminal-agnostic", namely they do not distinguish between input and output terminals. When appropriately assembled to represent a given combinatorial/optimization problem, the corresponding self-organizing circuit converges to the equilibrium points that express the solutions of the problem at hand. In doing so, DMMs take advantage of the long-range order that develops during the transient dynamics. This collective dynamical behavior, reminiscent of a phase transition, or even the "edge of chaos", is mediated by families of classical trajectories (instantons) that connect critical points of increasing stability in the system's phase space. The topological character of the solution search renders DMMs robust against noise and structural disorder. Since DMMs are non-quantum systems described by ordinary differential equations, not only can they be built in hardware with available technology, they can also be simulated efficiently on modern classical computers. As an example, we will show the polynomial-time solution of the subset-sum problem for the worst...

51 citations


Posted Content
TL;DR: In this article, an in-vessel molecular communication testbed using magnetic nanoparticles dispersed in an aqueous suspension is presented, where an electronic pump for injection via a Y-connector provides a background flow for signal propagation.
Abstract: Simple and easy to implement testbeds are needed to further advance molecular communication research. To this end, this paper presents an in-vessel molecular communication testbed using magnetic nanoparticles dispersed in an aqueous suspension as they are also used for drug targeting in biotechnology. The transmitter is realized by an electronic pump for injection via a Y-connector. A second pump provides a background flow for signal propagation. For signal reception, we employ a susceptometer, an electronic device including a coil, where the magnetic particles move through and generate an electrical signal. We present experimental results for the transmission of a binary sequence and the system response following a single injection. For this flow-driven particle transport, we propose a simple parameterized mathematical model for evaluating the system response.

Posted Content
TL;DR: A photonics circuit architecture which could consume a fraction of energy per inference compared with state of the art electronics is proposed.
Abstract: Convolutional Neural Networks (CNNs) are a class of Artificial Neural Networks(ANNs) that employ the method of convolving input images with filter-kernels for object recognition and classification purposes. In this paper, we propose a photonics circuit architecture which could consume a fraction of energy per inference compared with state of the art electronics.

Journal ArticleDOI
TL;DR: In this article, the authors presented ODIN, a 0.086-mm$^2$ 64k-synapse 256-neuron online-learning digital spiking neuromorphic processor in 28nm FDSOI CMOS achieving a minimum energy per synaptic operation (SOP) of 12.7pJ.
Abstract: Shifting computing architectures from von Neumann to event-based spiking neural networks (SNNs) uncovers new opportunities for low-power processing of sensory data in applications such as vision or sensorimotor control. Exploring roads toward cognitive SNNs requires the design of compact, low-power and versatile experimentation platforms with the key requirement of online learning in order to adapt and learn new features in uncontrolled environments. However, embedding online learning in SNNs is currently hindered by high incurred complexity and area overheads. In this work, we present ODIN, a 0.086-mm$^2$ 64k-synapse 256-neuron online-learning digital spiking neuromorphic processor in 28nm FDSOI CMOS achieving a minimum energy per synaptic operation (SOP) of 12.7pJ. It leverages an efficient implementation of the spike-driven synaptic plasticity (SDSP) learning rule for high-density embedded online learning with only 0.68$\mu$m$^2$ per 4-bit synapse. Neurons can be independently configured as a standard leaky integrate-and-fire (LIF) model or as a custom phenomenological model that emulates the 20 Izhikevich behaviors found in biological spiking neurons. Using a single presentation of 6k 16$\times$16 MNIST training images to a single-layer fully-connected 10-neuron network with on-chip SDSP-based learning, ODIN achieves a classification accuracy of 84.5% while consuming only 15nJ/inference at 0.55V using rank order coding. ODIN thus enables further developments toward cognitive neuromorphic devices for low-power, adaptive and low-cost processing.

Posted Content
TL;DR: A lumped bio-physical model of HBC is developed, supported by experimental validations that provide insight into some of the key discrepancies found in previous studies, and capacitive voltage mode termination can improve the low frequency loss by up to 50 dB, which helps broadband communication significantly.
Abstract: Human Body Communication (HBC) has emerged as an alternative to radio wave communication for connecting low power, miniaturized wearable and implantable devices in, on and around the human body which uses the human body as the communication channel. Previous studies characterizing the human body channel has reported widely varying channel response much of which has been attributed to the variation in measurement setup. This calls for the development of a unifying bio physical model of HBC supported by in depth analysis and an understanding of the effect of excitation, termination modality on HBC measurements. This paper characterizes the human body channel up to 1MHz frequency to evaluate it as a medium for broadband communication. A lumped bio physical model of HBC is developed, supported by experimental validations that provides insight into some of the key discrepancies found in previous studies. Voltage loss measurements are carried out both with an oscilloscope and a miniaturized wearable prototype to capture the effects of non common ground. Results show that the channel loss is strongly dependent on the termination impedance at the receiver end, with up to 4dB variation in average loss for different termination in an oscilloscope and an additional 9 dB channel loss with wearable prototype compared to an oscilloscope measurement. The measured channel response with capacitive termination reduces low frequency loss and allows flat band transfer function down to 13 KHz, establishing the human body as a broadband communication channel. Analysis of the measured results and the simulation model shows that (1) high impedance (2) capacitive termination should be used at the receiver end for accurate voltage mode loss measurements of the HBC channel at low frequencies.

Posted Content
TL;DR: This article presents RxNN, a fast and accurate simulation framework to evaluate large-scale DNNs on resistive crossbar systems, and implements RxNN by extending the Caffe machine learning framework, which demonstrates that RxNN enables fast model-in-the-loop retraining of Dnns to partially mitigate the accuracy degradation.
Abstract: Resistive crossbars designed with non-volatile memory devices have emerged as promising building blocks for Deep Neural Network (DNN) hardware, due to their ability to compactly and efficiently realize vector-matrix multiplication (VMM), the dominant computational kernel in DNNs. However, a key challenge with resistive crossbars is that they suffer from a range of device and circuit level non-idealities such as interconnect parasitics, peripheral circuits, sneak paths, and process variations. These non-idealities can lead to errors in VMMs, eventually degrading the DNN's accuracy. It is therefore critical to study the impact of crossbar non-idealities on the accuracy of large-scale DNNs. However, this is challenging because existing device and circuit models are too slow to use in application-level evaluations. We present RxNN, a fast and accurate simulation framework to evaluate large-scale DNNs on resistive crossbar systems. RxNN splits and maps the computations involved in each DNN layer into crossbar operations, and evaluates them using a Fast Crossbar Model (FCM) that accurately captures the errors arising due to crossbar non-idealities while being four-to-five orders of magnitude faster than circuit simulation. FCM models a crossbar-based VMM operation using three stages - non-linear models for the input and output peripheral circuits (DACs and ADCs), and an equivalent non-ideal conductance matrix for the core crossbar array. We implement RxNN by extending the Caffe machine learning framework and use it to evaluate a suite of six large-scale DNNs developed for the ImageNet Challenge. Our experiments reveal that resistive crossbar non-idealities can lead to significant accuracy degradations (9.6%-32%) for these large-scale DNNs. To the best of our knowledge, this work is the first quantitative evaluation of the accuracy of large-scale DNNs on resistive crossbar based hardware.

Journal ArticleDOI
TL;DR: An analog backpropagation learning circuits for various memristive learning architectures, such as deep neural network, binary neuralNetwork, multiple neuralnetwork, hierarchical temporal memory, and long short-term memory are proposed.
Abstract: The on-chip implementation of learning algorithms would speed-up the training of neural networks in crossbar arrays The circuit level design and implementation of backpropagation algorithm using gradient descent operation for neural network architectures is an open problem In this paper, we proposed the analog backpropagation learning circuits for various memristive learning architectures, such as Deep Neural Network (DNN), Binary Neural Network (BNN), Multiple Neural Network (MNN), Hierarchical Temporal Memory (HTM) and Long-Short Term Memory (LSTM) The circuit design and verification is done using TSMC 180nm CMOS process models, and TiO2 based memristor models The application level validations of the system are done using XOR problem, MNIST character and Yale face image databases

Journal ArticleDOI
TL;DR: In this article, the optical properties of phase-change materials (PCMs) are utilized to enable energy-efficient hardware implementations of neuromorphic systems which emulate the functional units of the brain.
Abstract: Spiking Neural Networks (SNNs) offer an event-driven and more biologically realistic alternative to standard Artificial Neural Networks based on analog information processing. This can potentially enable energy-efficient hardware implementations of neuromorphic systems which emulate the functional units of the brain, namely, neurons and synapses. Recent demonstrations of ultra-fast photonic computing devices based on phase-change materials (PCMs) show promise of addressing limitations of electrically driven neuromorphic systems. However, scaling these standalone computing devices to a parallel in-memory computing primitive is a challenge. In this work, we utilize the optical properties of the PCM, Ge\textsubscript{2}Sb\textsubscript{2}Te\textsubscript{5} (GST), to propose a Photonic Spiking Neural Network computing primitive, comprising of a non-volatile synaptic array integrated seamlessly with previously explored `integrate-and-fire' neurons. The proposed design realizes an `in-memory' computing platform that leverages the inherent parallelism of wavelength-division-multiplexing (WDM). We show that the proposed computing platform can be used to emulate a SNN inferencing engine for image classification tasks. The proposed design not only bridges the gap between isolated computing devices and parallel large-scale implementation, but also paves the way for ultra-fast computing and localized on-chip learning.

Posted Content
TL;DR: The standard 8 transistor (8T) digital SRAM array can be configured as an analoglike in-memory multibit dot-product engine (DPE) by applying appropriate analog voltages to the read ports of the 8TSRAM array and sensing the output current, an approximate analog–digital DPE can be implemented.
Abstract: Large scale digital computing almost exclusively relies on the von-Neumann architecture which comprises of separate units for storage and computations. The energy expensive transfer of data from the memory units to the computing cores results in the well-known von-Neumann bottleneck. Various approaches aimed towards bypassing the von-Neumann bottleneck are being extensively explored in the literature. Emerging non-volatile memristive technologies have been shown to be very efficient in computing analog dot products in an in-situ fashion. The memristive analog computation of the dot product results in much faster operation as opposed to digital vector in-memory bit-wise Boolean computations. However, challenges with respect to large scale manufacturing coupled with the limited endurance of memristors have hindered rapid commercialization of memristive based computing solutions. In this work, we show that the standard 8 transistor (8T) digital SRAM array can be configured as an analog-like in-memory multi-bit dot product engine. By applying appropriate analog voltages to the read-ports of the 8T SRAM array, and sensing the output current, an approximate analog-digital dot-product engine can be implemented. We present two different configurations for enabling multi-bit dot product computations in the 8T SRAM cell array, without modifying the standard bit-cell structure. Since our proposal preserves the standard 8T-SRAM array structure, it can be used as a storage element with standard read-write instructions, and also as an on-demand analog-like dot product accelerator.

Journal ArticleDOI
TL;DR: In this article, a taxonomy of potential applications that can rely on a specific class of such communications techniques, commonly referred to as molecular communications, is presented. But, although most of these proposals show how devices can communicate at the nanoscales, they leave in the background specific applications of these new technologies.
Abstract: In recent years, progresses in nanotechnology have established the foundations for implementing nanomachines capable of carrying out simple but significant tasks Under this stimulus, researchers have been proposing various solutions for realizing nanoscale communications, considering both electromagnetic and biological communications Their aim is to extend the capabilities of nanodevices, so as to enable the execution of more complex tasks by means of mutual coordination, achievable through communications However, although most of these proposals show how devices can communicate at the nanoscales, they leave in the background specific applications of these new technologies Thus, this paper shows an overview of the actual and potential applications that can rely on a specific class of such communications techniques, commonly referred to as molecular communications In particular, we focus on health-related applications This decision is due to the rapidly increasing interests of research communities and companies to minimally invasive, biocompatible, and targeted health-care solutions Molecular communication techniques have actually the potentials of becoming the main technology for implementing advanced medical solution Hence, in this paper we provide a taxonomy of potential applications, illustrate them in some details, along with the existing open challenges for them to be actually deployed, and draw future perspectives

Journal ArticleDOI
TL;DR: The results demonstrate that dual-polarization NFT can work in practice and enable an increased spectral efficiency in NFT-based communication systems, which are currently based on single polarization channels.
Abstract: New services and applications are causing an exponential increase in internet traffic. In a few years, current fiber optic communication system infrastructure will not be able to meet this demand because fiber nonlinearity dramatically limits the information transmission rate. Eigenvalue communication could potentially overcome these limitations. It relies on a mathematical technique called "nonlinear Fourier transform (NFT)" to exploit the "hidden" linearity of the nonlinear Schr\"odinger equation as the master model for signal propagation in an optical fiber. We present here the theoretical tools describing the NFT for the Manakov system and report on experimental transmission results for dual polarization in fiber optic eigenvalue communications. A transmission of up to 373.5 km with bit error rate less than the hard-decision forward error correction threshold has been achieved. Our results demonstrate that dual-polarization NFT can work in practice and enable an increased spectral efficiency in NFT-based communication systems, which are currently based on single polarization channels.

Posted Content
TL;DR: How deep binary networks can be accelerated in modified von Neumann machines by enabling binary convolutions within the static random access memory (SRAM) arrays is demonstrated.
Abstract: Deep neural networks are a biologically-inspired class of algorithms that have recently demonstrated state-of-the-art accuracies involving large-scale classification and recognition tasks. Indeed, a major landmark that enables efficient hardware accelerators for deep networks is the recent advances from the machine learning community that have demonstrated aggressively scaled deep binary networks with state-of-the-art accuracies. In this paper, we demonstrate how deep binary networks can be accelerated in modified von-Neumann machines by enabling binary convolutions within the SRAM array. In general, binary convolutions consist of bit-wise XNOR followed by a population-count (popcount). We present a charge sharing XNOR and popcount operation in 10 transistor SRAM cells. We have employed multiple circuit techniques including dual-read-worldines (Dual-RWL) along with a dual-stage ADC that overcomes the inaccuracies of a low precision ADC, to achieve a fairly accurate popcount. In addition, a key highlight of the present work is the fact that we propose sectioning of the SRAM array by adding switches onto the read-bitlines, thereby achieving improved parallelism. This is beneficial for deep networks, where the kernel size grows and requires to be stored in multiple sub-banks. As such, one needs to evaluate the partial popcount from multiple sub-banks and sum them up for achieving the final popcount. For n-sections per sub-array, we can perform n convolutions within one particular sub-bank, thereby improving overall system throughput as well as the energy efficiency. Our results at the array level show that the energy consumption and delay per-operation was 1.914pJ and 45ns, respectively. Moreover, an energy improvement of 2.5x, and a performance improvement of 4x was achieved by using the proposed sectioned-SRAM, compared to a non-sectioned SRAM design.

Journal ArticleDOI
TL;DR: In this article, a spiking neural network architecture that supports the use of memristive devices as synaptic elements is presented, and mixed-signal analog-digital interfacing circuits which mitigate the effect of variability in their conductance values and exploit their variability in the switching threshold, for implementing stochastic learning.
Abstract: Memristive devices represent a promising technology for building neuromorphic electronic systems. In addition to their compactness and non-volatility features, they are characterized by computationally relevant physical properties, such as state-dependence, non-linear conductance changes, and intrinsic variability in both their switching threshold and conductance values, that make them ideal devices for emulating the bio-physics of real synapses. In this paper we present a spiking neural network architecture that supports the use of memristive devices as synaptic elements, and propose mixed-signal analog-digital interfacing circuits which mitigate the effect of variability in their conductance values and exploit their variability in the switching threshold, for implementing stochastic learning. The effect of device variability is mitigated by using pairs of memristive devices configured in a complementary push-pull mechanism and interfaced to a current-mode normalizer circuit. The stochastic learning mechanism is obtained by mapping the desired change in synaptic weight into a corresponding switching probability that is derived from the intrinsic stochastic behavior of memristive devices. We demonstrate the features of the CMOS circuits and apply the architecture proposed to a standard neural network hand-written digit classification benchmark based on the MNIST data-set. We evaluate the performance of the approach proposed on this benchmark using behavioral-level spiking neural network simulation, showing both the effect of the reduction in conductance variability produced by the current-mode normalizer circuit, and the increase in performance as a function of the number of memristive devices used in each synapse.

Journal ArticleDOI
TL;DR: The MNIST dataset is leveraged to investigate the energy and accuracy tradeoffs of seven distinct network topologies in SPICE using the 14nm HP-FinFET technology library with the nominal voltage of 0.8V, in which an MRAM-based neuron is used as the activation function.
Abstract: Magnetoresistive random access memory (MRAM) technologies with thermally unstable nanomagnets are leveraged to develop an intrinsic stochastic neuron as a building block for restricted Boltzmann machines (RBMs) to form deep belief networks (DBNs). The embedded MRAM-based neuron is modeled using precise physics equations. The simulation results exhibit the desired sigmoidal relation between the input voltages and probability of the output state. A probabilistic inference network simulator (PIN-Sim) is developed to realize a circuit-level model of an RBM utilizing resistive crossbar arrays along with differential amplifiers to implement the positive and negative weight values. The PIN-Sim is composed of five main blocks to train a DBN, evaluate its accuracy, and measure its power consumption. The MNIST dataset is leveraged to investigate the energy and accuracy tradeoffs of seven distinct network topologies in SPICE using the 14nm HP-FinFET technology library with the nominal voltage of 0.8V, in which an MRAM-based neuron is used as the activation function. The software and hardware level simulations indicate that a $784\times200\times10$ topology can achieve less than 5% error rates with $\sim400 pJ$ energy consumption. The error rates can be reduced to 2.5% by using a $784\times500\times500\times500\times10$ DBN at the cost of $\sim10\times$ higher energy consumption and significant area overhead. Finally, the effects of specific hardware-level parameters on power dissipation and accuracy tradeoffs are identified via the developed PIN-Sim framework.

Journal ArticleDOI
TL;DR: In this paper, a path balancing technology mapping algorithm for dc-biased Single Flux Quantum (SFQ) circuits is presented, which is a new algorithm for generating a mapping solution for a given Boolean network such that the average logic level difference among fanin gates of each gate in the network is minimized.
Abstract: This paper presents a path balancing technology mapping algorithm, which is a new algorithm for generating a mapping solution for a given Boolean network such that the average logic level difference among fanin gates of each gate in the network is minimized. Path balancing technology mapping is required in dc-biased Single Flux Quantum (SFQ) circuits for guaranteeing the correct operation, and it is beneficial in CMOS circuits to reduce the hazard issues. We present a dynamic programming based algorithm for path balancing technology mapping which generates optimal solutions for dc-biased SFQ (e.g. Rapid SFQ or RSFQ) circuits with tree structure and acts as an effective heuristic for circuits with general Directed Acyclic Graph (DAG) structure. Experimental results show that our path balancing technology mapper reduces the balancing overhead by up to 2.7 times and with an average of 21% compared to the state-of-the-art academic technology mappers.

Journal ArticleDOI
TL;DR: The results demonstrate that ONN is capable of classifying 512 visual patterns into a set of classes with a maximum number of elements up to fourteen, and allows for designing multilevel output cascades of neural networks with high net data throughput.
Abstract: The current study uses a novel method of multilevel neurons and high order synchronization effects described by a family of special metrics, for pattern recognition in an oscillatory neural network (ONN). The output oscillator (neuron) of the network has multilevel variations in its synchronization value with the reference oscillator, and allows classification of an input pattern into a set of classes. The ONN model is implemented on thermally-coupled vanadium dioxide oscillators. The ONN is trained by the simulated annealing algorithm for selection of the network parameters. The results demonstrate that ONN is capable of classifying 512 visual patterns (as a cell array 3 * 3, distributed by symmetry into 102 classes) into a set of classes with a maximum number of elements up to fourteen. The classification capability of the network depends on the interior noise level and synchronization effectiveness parameter. The model allows for designing multilevel output cascades of neural networks with high net data throughput. The presented method can be applied in ONNs with various coupling mechanisms and oscillator topology.

Journal ArticleDOI
TL;DR: In this article, the hardware implementation of a neuromorphic system is presented, which is composed of a Leaky Integrate-and-Fire with Latency (LIFL) neuron and a Spike-Timing Dependent Plasticity (STDP) synapse.
Abstract: In this paper, the hardware implementation of a neuromorphic system is presented. This system is composed of a Leaky Integrate-and-Fire with Latency (LIFL) neuron and a Spike-Timing Dependent Plasticity (STDP) synapse. LIFL neuron model allows to encode more information than the common Integrate-and-Fire models, typically considered for neuromorphic implementations. In our system LIFL neuron is implemented using CMOS circuits while memristor is used for the implementation of the STDP synapse. A description of the entire circuit is provided. Finally, the capabilities of the proposed architecture have been evaluated by simulating a motif composed of three neurons and two synapses. The simulation results confirm the validity of the proposed system and its suitability for the design of more complex spiking neural networks

Journal ArticleDOI
TL;DR: A new model for mechanical computing is demonstrated that requires only two basic parts, links, and rotary joints, and suffice to create all necessary combinatorial and sequential logic required for a Turing-complete computational system.
Abstract: A new paradigm for mechanical computing is demonstrated that requires only two basic parts, links and rotary joints. These basic parts are combined into two main higher level structures, locks and balances, and suffice to create all necessary combinatorial and sequential logic required for a Turing-complete computational system. While working systems have yet to be implemented using this new paradigm, the mechanical simplicity of the systems described may lend themselves better to, e.g., microfabrication, than previous mechanical computing designs. Additionally, simulations indicate that if molecular-scale implementations could be realized, they would be far more energy-efficient than conventional electronic computers.

Journal ArticleDOI
TL;DR: In this paper, the concept of a probabilistic or p-bit, intermediate between the standard bits of digital electronics and the emerging q-bits of quantum computing, is introduced and demonstrated.
Abstract: We introduce the concept of a probabilistic or p-bit, intermediate between the standard bits of digital electronics and the emerging q-bits of quantum computing. We show that low barrier magnets or LBM's provide a natural physical representation for p-bits and can be built either from perpendicular magnets (PMA) designed to be close to the in-plane transition or from circular in-plane magnets (IMA). Magnetic tunnel junctions (MTJ) built using LBM's as free layers can be combined with standard NMOS transistors to provide three-terminal building blocks for large scale probabilistic circuits that can be designed to perform useful functions. Interestingly, this three-terminal unit looks just like the 1T/MTJ device used in embedded MRAM technology, with only one difference: the use of an LBM for the MTJ free layer. We hope that the concept of p-bits and p-circuits will help open up new application spaces for this emerging technology. However, a p-bit need not involve an MTJ, any fluctuating resistor could be combined with a transistor to implement it, while completely digital implementations using conventional CMOS technology are also possible. The p-bit also provides a conceptual bridge between two active but disjoint fields of research, namely stochastic machine learning and quantum computing. First, there are the applications that are based on the similarity of a p-bit to the binary stochastic neuron (BSN), a well-known concept in machine learning. Three-terminal p-bits could provide an efficient hardware accelerator for the BSN. Second, there are the applications that are based on the p-bit being like a poor man's q-bit. Initial demonstrations based on full SPICE simulations show that several optimization problems including quantum annealing are amenable to p-bit implementations which can be scaled up at room temperature using existing technology.

Journal ArticleDOI
TL;DR: In this article, the memristor is shown to be a nonlinear composition of two resistors with active hysteresis, and a vacancy transport model is shown that a physically realizable memory resistor is a non-linear composite of two nonlinear resistors.
Abstract: The memory resistor abbreviated memristor was a harmless postulate in 1971. In the decade since 2008, a device claiming to be the missing memristor is on the prowl, seeking recognition as a fundamental circuit element, sometimes wanting electronics textbooks to be rewritten, always promising remarkable digital, analog and neuromorphic computing possibilities. A systematic discussion about the fundamental nature of the device is almost universally absent. This report investigates the assertion that the memristor is a fundamental passive circuit element, from the perspective that electrical engineering is the science of charge management. With a periodic table of fundamental elements, we demonstrate that there can only be three fundamental passive circuit elements. The ideal memristor is shown to be an unphysical active device. A vacancy transport model further reveals that a physically realizable memristor is a nonlinear composition of two resistors with active hysteresis.

Posted Content
TL;DR: In this paper, the authors proposed an energy-efficient spatial modulation based molecular communication (SM-MC) scheme, in which a transmitted symbol is composed of two parts, i.e., a space derived symbol and a concentration derived symbol.
Abstract: In this paper, we propose an energy-efficient spatial modulation based molecular communication (SM-MC) scheme, in which a transmitted symbol is composed of two parts, i.e., a space derived symbol and a concentration derived symbol. The space symbol is transmitted by embedding the information into the index of a single activated transmitter nanomachine. The concentration symbol is drawn according to the conventional concentration shift keying (CSK) constellation. Befitting from a single active transmitter during each symbol transmission period, SM-MC can avoid the inter-link interference problem existing in the current multiple-input multiple-output (MIMO) MC schemes, which hence enables low-complexity symbol detection and performance improvement. Specifically, in our low-complexity scheme, the space symbol is first detected by energy comparison, and then the concentration symbol is detected by the equal gain combining assisted CSK demodulation. In this paper, we analyze the symbol error rate (SER) of the SM-MC and its special case, namely the space shift keying based MC (SSK-MC), where only space symbol is transmitted and no CSK modulation is invoked. Finally, the analytical results are validated by computer simulations, and our studies demonstrate that both the SM-MC and SSK-MC are capable of achieving better SER performance than the conventional MIMO-MC and single-input single-output MC (SISO-MC) when the same symbol rate is assumed.