On-Chip Sparse Learning Acceleration With CMOS and Resistive Synaptic Devices

doi:10.1109/TNANO.2015.2478861

Journal ArticleDOI

On-Chip Sparse Learning Acceleration With CMOS and Resistive Synaptic Devices

Jae-sun Seo, +10 more

- 15 Sep 2015 -

IEEE Transactions on Nanotechnology

- Vol. 14, Iss: 6, pp 969-979

TLDR

This paper cooptimizes algorithm, architecture, circuit, and device for real-time energy-efficient on-chip hardware acceleration of sparse coding and shows that 65 nm implementation of the CMOS ASIC and PARCA scheme accelerates sparse coding computation by 394 and 2140×, respectively, compared to software running on a eight-core CPU.

Abstract:

Many recent advances in sparse coding led its wide adoption in signal processing, pattern classification, and object recognition applications. Even with improved performance in state-of-the-art algorithms and the hardware platform of CPUs/GPUs, solving a sparse coding problem still requires expensive computations, making real-time large-scale learning a very challenging problem. In this paper, we cooptimize algorithm, architecture, circuit, and device for real-time energy-efficient on-chip hardware acceleration of sparse coding. The principle of hardware acceleration is to recognize the properties of learning algorithms, which involve many parallel operations of data fetch and matrix/vector multiplication/addition. Today's von Neumann architecture, however, is not suitable for such parallelization, due to the separation of memory and the computing unit that makes sequential operations inevitable. Such principle drives both the selection of algorithms and the design evolution from CPU to CMOS application-specific integrated circuits (ASIC) to parallel architecture with resistive crosspoint array (PARCA) that we propose. The CMOS ASIC scheme implements sparse coding with SRAM dictionaries and all-digital circuits, and PARCA employs resistive-RAM dictionaries with special read and write circuits. We show that 65 nm implementation of the CMOS ASIC and PARCA scheme accelerates sparse coding computation by $394$ and $2140\times$ , respectively, compared to software running on a eight-core CPU. Simulated power for both hardware schemes lie in the milli-Watt range, making it viable for portable single-chip learning applications.

On-Chip Sparse Learning Acceleration With CMOS and Resistive Synaptic Devices

Citations

Memory devices and applications for in-memory computing

Acceleration of Deep Neural Network Training with Resistive Cross-Point Devices: Design Considerations.

Recent advances in convolutional neural network acceleration

Large-Scale Neuromorphic Spiking Array Processors: A Quest to Mimic the Brain

Training Deep Convolutional Neural Networks with Resistive Cross-Point Devices.

References

Gradient-based learning applied to document recognition

LIBSVM: A library for support vector machines

A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems

$rm K$ -SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation

Least angle regression

Related Papers (5)

Training and operation of an integrated neuromorphic network based on metal-oxide memristors

Deep learning

Nanoscale Memristor Device as Synapse in Neuromorphic Systems

The missing memristor found

Gradient-based learning applied to document recognition