scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Enabling scientific computing on memristive accelerators

02 Jun 2018-pp 367-382
TL;DR: This paper presents the first proposal to enable scientific computing on memristive crossbars, and three techniques are explored — reducing overheads by exploiting exponent range locality, early termination of fixed-point computation, and static operation scheduling — that together enable a fixed- Point Memristive accelerator to perform high-precision floating point without the exorbitant cost of naïve floating-point emulation on fixed-pointers.
Abstract: Linear algebra is ubiquitous across virtually every field of science and engineering, from climate modeling to macroeconomics. This ubiquity makes linear algebra a prime candidate for hardware acceleration, which can improve both the run time and the energy efficiency of a wide range of scientific applications. Recent work on memristive hardware accelerators shows significant potential to speed up matrix-vector multiplication (MVM), a critical linear algebra kernel at the heart of neural network inference tasks. Regrettably, the proposed hardware is constrained to a narrow range of workloads: although the eight- to 16-bit computations afforded by memristive MVM accelerators are acceptable for machine learning, they are insufficient for scientific computing where high-precision floating point is the norm. This paper presents the first proposal to enable scientific computing on memristive crossbars. Three techniques are explored---reducing overheads by exploiting exponent range locality, early termination of fixed-point computation, and static operation scheduling---that together enable a fixed-point memristive accelerator to perform high-precision floating point without the exorbitant cost of naive floating-point emulation on fixed-point hardware. A heterogeneous collection of crossbars with varying sizes is proposed to efficiently handle sparse matrices, and an algorithm for mapping the dense subblocks of a sparse matrix to an appropriate set of crossbars is investigated. The accelerator can be combined with existing GPU-based systems to handle datasets that cannot be efficiently handled by the memristive accelerator alone. The proposed optimizations permit the memristive MVM concept to be applied to a wide range of problem domains, respectively improving the execution time and energy dissipation of sparse linear solvers by 10.3x and 10.9x over a purely GPU-based system.
Citations
More filters
Journal ArticleDOI
TL;DR: This Review provides an overview of memory devices and the key computational primitives enabled by these memory devices as well as their applications spanning scientific computing, signal processing, optimization, machine learning, deep learning and stochastic computing.
Abstract: Traditional von Neumann computing systems involve separate processing and memory units. However, data movement is costly in terms of time and energy and this problem is aggravated by the recent explosive growth in highly data-centric applications related to artificial intelligence. This calls for a radical departure from the traditional systems and one such non-von Neumann computational approach is in-memory computing. Hereby certain computational tasks are performed in place in the memory itself by exploiting the physical attributes of the memory devices. Both charge-based and resistance-based memory devices are being explored for in-memory computing. In this Review, we provide a broad overview of the key computational primitives enabled by these memory devices as well as their applications spanning scientific computing, signal processing, optimization, machine learning, deep learning and stochastic computing. This Review provides an overview of memory devices and the key computational primitives for in-memory computing, and examines the possibilities of applying this computing approach to a wide range of applications.

841 citations

Proceedings ArticleDOI
22 Jun 2019
TL;DR: FloatPIM is proposed, a fully-digital scalable PIM architecture that accelerates CNN in both training and testing phases and natively supports floating-point representation, thus enabling accurate CNN training.
Abstract: Processing In-Memory (PIM) has shown a great potential to accelerate inference tasks of Convolutional Neural Network (CNN). However, existing PIM architectures do not support high precision computation, e.g., in floating point precision, which is essential for training accurate CNN models. In addition, most of the existing PIM approaches require analog/mixed-signal circuits, which do not scale, exploiting insufficiently reliable multi-bit Non-Volatile Memory (NVM). In this paper, we propose FloatPIM, a fully-digital scalable PIM architecture that accelerates CNN in both training and testing phases. FloatPIM natively supports floating-point representation, thus enabling accurate CNN training. FloatPIM also enables fast communication between neighboring memory blocks to reduce internal data movement of the PIM architecture. We evaluate the efficiency of FloatPIM on ImageNet dataset using popular large-scale neural networks. Our evaluation shows that FloatPIM supporting floating point precision can achieve up to 5.1% higher classification accuracy as compared to existing PIM architectures with limited fixed-point precision. FloatPIM training is on average 303.2× and 48.6× (4.3× and 15.8×) faster and more energy efficient as compared to GTX 1080 GPU (PipeLayer [1] PIM accelerator). For testing, FloatPIM also provides 324.8× and 297.9× (6.3× and 21.6×) speedup and energy efficiency as compared to GPU (ISAAC [2] PIM accelerator) respectively.

190 citations


Cites background from "Enabling scientific computing on me..."

  • ...Work in [23] exploited the conventional analog-based memristive accelerator to support floating point operations....

    [...]

  • ...For example, recent work in [23] designed an analog-based memristive accelerator to support floating point operations....

    [...]

Journal ArticleDOI
01 Jul 2018
TL;DR: A Memristor-based hardware and software system that uses a tantalum oxide memristor crossbar can be used to solve static and time-evolving partial differential equations at high precision, and to simulate an argon plasma reactor.
Abstract: Memristive devices have been extensively studied for data-intensive tasks such as artificial neural networks. These types of computing tasks are considered to be ‘soft’ as they can tolerate low computing precision without suffering from performance degradation. However, ‘hard’ computing tasks, which require high precision and accurate solutions, dominate many applications and are difficult to implement with memristors because the devices normally offer low native precision and suffer from high device variability. Here we report a complete memristor-based hardware and software system that can perform high-precision computing tasks, making memristor-based in-memory computing approaches attractive for general high-performance computing environments. We experimentally implement a numerical partial differential equation solver using a tantalum oxide memristor crossbar system, which we use to solve static and time-evolving problems. We also illustrate the practical capabilities of our memristive hardware by using it to simulate an argon plasma reactor. A memristor-based hardware and software system that uses a tantalum oxide memristor crossbar can be used to solve static and time-evolving partial differential equations at high precision, and to simulate an argon plasma reactor.

171 citations

Journal ArticleDOI
03 Jun 2022-Science
TL;DR: In this article , a review of the use of memristive devices in data encryption, data security, and radio frequency switches for mobile communications can be found in Section 2.1.
Abstract: Memristive devices, which combine a resistor with memory functions such that voltage pulses can change their resistance (and hence their memory state) in a nonvolatile manner, are beginning to be implemented in integrated circuits for memory applications. However, memristive devices could have applications in many other technologies, such as non–von Neumann in-memory computing in crossbar arrays, random number generation for data security, and radio-frequency switches for mobile communications. Progress toward the integration of memristive devices in commercial solid-state electronic circuits and other potential applications will depend on performance and reliability challenges that still need to be addressed, as described here. Description Putting memristors to work Memristors, which are resistors that change conductivity and act as memories, are not only being used in commercial computing but have several application areas in computing and communications. Lanza et al. review how devices such as phase-change memories, resistive random-access memories, and magnetoresistive random-access memories are being integrated into silicon electronics. Memristors also are finding use in artificial intelligence when integrated in three-dimensional crossbar arrays for low-power, non–von Neuman architectures. Other applications include random-number generation for data encryption and radiofrequency switches for mobile communications. —PDS A review explains how resistors with memory functions are being integrated into electronics and new computer architectures. BACKGROUND Memristive devices exhibit an electrical resistance that can be adjusted to two or more nonvolatile levels by applying electrical stresses. The core of the most advanced memristive devices is a metal/insulator/metal nanocell made of phase-change, metal-oxide, magnetic, or ferroelectric materials, which is often placed in series with other circuit elements (resistor, selector, transistor) to enhance their performance in array configurations (i.e., avoid damage during state transition, minimize intercell disturbance). The memristive effect was discovered in 1969 and the first commercial product appeared in 2006, consisting of a 4-megabit nonvolatile memory based on magnetic materials. In the past few years, the switching endurance, data retention time, energy consumption, switching time, integration density, and price of memristive nonvolatile memories has been remarkably improved (depending on the materials used, values up to ~1015 cycles, >10 years, ~0.1 pJ, ~10 ns, 256 gigabits per die, and ≤$0.30 per gigabit have been achieved). ADVANCES As of 2021, memristive memories are being used as standalone memory and are also embedded in application-specific integrated circuits for the Internet of Things (smart watches and glasses, medical equipment, computers), and their market value exceeds $621 million. Recent studies have shown that memristive devices may also be exploited for advanced computation, data security, and mobile communication. Advanced computation refers to the hardware implementation of artificial neural networks by exploiting memristive attributes such as progressive conductance increase and decrease, vector matrix multiplication (in crossbar arrays), and spike timing–dependent plasticity; state-of-the-art developments have achieved >10 trillion operations per second per watt. Data encryption can be realized by exploiting the stochasticity inherent in the memristive effect, which manifests as random fluctuations (within a given range) of the switching voltages/times and state currents. For example, true random number generator and physical unclonable functions produce random codes when exposing a population of memristive devices to an electrical stress at 50% of switching probability (it is impossible to predict which devices will switch because that depends on their atomic structure). Mobile communication can also benefit from memristive devices because they could be employed as 5G and terahertz switches with low energy consumption owing to the nonvolatile nature of the resistive states; the current commercial technology is based on silicon transistors, but they are volatile and consume data both during switching and when idle. State-of-the-art developments have achieved cutoff frequencies of >100 THz with excellent insertion loss and isolation. OUTLOOK Consolidating memristive memories in the market and creating new commercial memristive technologies requires further enhancement of their performance, integration density, and cost, which may be achieved via materials and structure engineering. Market forecasts expect the memristive memories market to grow up to ~$5.6 billion by 2026, which will represent ~2% of the nearly $280 billion memory market. Phase-change and metal-oxide memristive memories should improve switching endurance and reduce energy consumption and variability, and the magnetic ones should offer improved integration density. Ferroelectric memristive memories still suffer low switching endurance, which is hindering commercialization. The figures of merit of memristive devices for advanced computation highly depend on the application, but maximizing endurance, retention, and conductance range while minimizing temporal conductance fluctuations are general goals. Memristive devices for data encryption and mobile communication require higher switching endurance, and two-dimensional materials prototypes are being investigated. Part of Science’s coverage of the 75th anniversary of the discovery of the transistor Fundamental memristive effects and their applications. Memristive devices, in which electrical resistance can be adjusted to two or more nonvolatile levels, can be fabricated using different materials (top row). This allows adjusting their performance to fulfill the requirements of different technologies. Memristive memories are a reality, and important progress is being achieved in advanced computation, security systems, and mobile communication (bottom row).

118 citations

Journal ArticleDOI
TL;DR: This work explores and consolidates the various approaches that have been proposed to address the critical challenges faced by analog accelerators, for both neural network inference and training, and highlights the key design trade-offs underlying these techniques.
Abstract: Analog hardware accelerators, which perform computation within a dense memory array, have the potential to overcome the major bottlenecks faced by digital hardware for data-heavy workloads such as deep learning. Exploiting the intrinsic computational advantages of memory arrays, however, has proven to be challenging principally due to the overhead imposed by the peripheral circuitry and due to the non-ideal properties of memory devices that play the role of the synapse. We review the existing implementations of these accelerators for deep supervised learning, organizing our discussion around the different levels of the accelerator design hierarchy, with an emphasis on circuits and architecture. We explore and consolidate the various approaches that have been proposed to address the critical challenges faced by analog accelerators, for both neural network inference and training, and highlight the key design trade-offs underlying these techniques.

92 citations

References
More filters
Book
01 Apr 2003
TL;DR: This chapter discusses methods related to the normal equations of linear algebra, and some of the techniques used in this chapter were derived from previous chapters of this book.
Abstract: Preface 1. Background in linear algebra 2. Discretization of partial differential equations 3. Sparse matrices 4. Basic iterative methods 5. Projection methods 6. Krylov subspace methods Part I 7. Krylov subspace methods Part II 8. Methods related to the normal equations 9. Preconditioned iterations 10. Preconditioning techniques 11. Parallel implementations 12. Parallel preconditioners 13. Multigrid methods 14. Domain decomposition methods Bibliography Index.

13,484 citations


"Enabling scientific computing on me..." refers background or methods in this paper

  • ...We focus on the latter since they are the primary methods used today [19]....

    [...]

  • ...representing a mesh of points that approximate the original model [19]....

    [...]

Journal ArticleDOI
TL;DR: An iterative method for solving linear systems, which has the property of minimizing at every step the norm of the residual vector over a Krylov subspace.
Abstract: We present an iterative method for solving linear systems, which has the property of minimizing at every step the norm of the residual vector over a Krylov subspace. The algorithm is derived from t...

10,907 citations


"Enabling scientific computing on me..." refers methods in this paper

  • ...There are many Krylov subspace solvers with different behaviors depending on the matrix structure, including conjugate gradient (CG) for symmetric positive definite matrices (SPD) [21], as well as BiConjugate Gradient (BiCG), Stabilized BiCG (BiCG-STAB) [22], and Generalized Minimal Residual [23] for non-SPD matrices....

    [...]

Journal ArticleDOI
TL;DR: An iterative algorithm is given for solving a system Ax=k of n linear equations in n unknowns and it is shown that this method is a special case of a very general method which also includes Gaussian elimination.
Abstract: An iterative algorithm is given for solving a system Ax=k of n linear equations in n unknowns. The solution is given in n steps. It is shown that this method is a special case of a very general method which also includes Gaussian elimination. These general algorithms are essentially algorithms for finding an n dimensional ellipsoid. Connections are made with the theory of orthogonal polynomials and continued fractions.

7,598 citations


"Enabling scientific computing on me..." refers methods in this paper

  • ...We evaluate the proposed accelerator on 20 matrices from the SuiteSparse matrix collection [14] using CG for symmetric positive definite matrices (SPD) and BiCG-STAB for non-SPD matrices....

    [...]

  • ...There are many Krylov subspace solvers with different behaviors depending on the matrix structure, including conjugate gradient (CG) for symmetric positive definite matrices (SPD) [21], as well as BiConjugate Gradient (BiCG), Stabilized BiCG (BiCG-STAB) [22], and Generalized Minimal Residual [23] for non-SPD matrices....

    [...]

Journal ArticleDOI
TL;DR: Numerical experiments indicate that the new variant of Bi-CG, named Bi- CGSTAB, is often much more efficient than CG-S, so that in some cases rounding errors can even result in severe cancellation effects in the solution.
Abstract: Recently the Conjugate Gradients-Squared (CG-S) method has been proposed as an attractive variant of the Bi-Conjugate Gradients (Bi-CG) method. However, it has been observed that CG-S may lead to a rather irregular convergence behaviour, so that in some cases rounding errors can even result in severe cancellation effects in the solution. In this paper, another variant of Bi-CG is proposed which does not seem to suffer from these negative effects. Numerical experiments indicate also that the new variant, named Bi-CGSTAB, is often much more efficient than CG-S.

4,722 citations


"Enabling scientific computing on me..." refers methods in this paper

  • ...There are many Krylov subspace solvers with different behaviors depending on the matrix structure, including conjugate gradient (CG) for symmetric positive definite matrices (SPD) [21], as well as BiConjugate Gradient (BiCG), Stabilized BiCG (BiCG-STAB) [22], and Generalized Minimal Residual [23] for non-SPD matrices....

    [...]

Journal ArticleDOI
TL;DR: The University of Florida Sparse Matrix Collection, a large and actively growing set of sparse matrices that arise in real applications, is described and a new multilevel coarsening scheme is proposed to facilitate this task.
Abstract: We describe the University of Florida Sparse Matrix Collection, a large and actively growing set of sparse matrices that arise in real applications The Collection is widely used by the numerical linear algebra community for the development and performance evaluation of sparse matrix algorithms It allows for robust and repeatable experiments: robust because performance results with artificially generated matrices can be misleading, and repeatable because matrices are curated and made publicly available in many formats Its matrices cover a wide spectrum of domains, include those arising from problems with underlying 2D or 3D geometry (as structural engineering, computational fluid dynamics, model reduction, electromagnetics, semiconductor devices, thermodynamics, materials, acoustics, computer graphics/vision, robotics/kinematics, and other discretizations) and those that typically do not have such geometry (optimization, circuit simulation, economic and financial modeling, theoretical and quantum chemistry, chemical process simulation, mathematics and statistics, power networks, and other networks and graphs) We provide software for accessing and managing the Collection, from MATLAB™, Mathematica™, Fortran, and C, as well as an online search capability Graph visualization of the matrices is provided, and a new multilevel coarsening scheme is proposed to facilitate this task

3,456 citations


"Enabling scientific computing on me..." refers background or methods in this paper

  • ...The proposed system is evaluated on two highperformance iterative solvers with a set of 20 input matrices from the SuiteSparse collection [14], representing problem domains within computational fluid dynamics, structural analysis, circuit analysis, and seven others....

    [...]

  • ...We evaluate the proposed accelerator on 20 matrices from the SuiteSparse matrix collection [14] using CG for symmetric positive definite matrices (SPD) and BiCG-STAB for non-SPD matrices....

    [...]

  • ...The matrices generated by scientific applications are typically sparse [14]....

    [...]