scispace - formally typeset
L

Ling Li

Researcher at Chinese Academy of Sciences

Publications -  58
Citations -  3898

Ling Li is an academic researcher from Chinese Academy of Sciences. The author has contributed to research in topics: Computer science & Artificial neural network. The author has an hindex of 11, co-authored 45 publications receiving 3049 citations.

Papers
More filters
Proceedings ArticleDOI

DaDianNao: A Machine-Learning Supercomputer

TL;DR: This article introduces a custom multi-chip machine-learning architecture, showing that, on a subset of the largest known neural network layers, it is possible to achieve a speedup of 450.65x over a GPU, and reduce the energy by 150.31x on average for a 64-chip system.
Proceedings ArticleDOI

ShiDianNao: shifting vision processing closer to the sensor

TL;DR: This paper proposes an accelerator which is 60x more energy efficient than the previous state-of-the-art neural network accelerator, designed down to the layout at 65 nm, with a modest footprint and consuming only 320 mW, but still about 30x faster than high-end GPUs.
Proceedings ArticleDOI

Cambricon-x: an accelerator for sparse neural networks

TL;DR: A novel accelerator is proposed, Cambricon-X, to exploit the sparsity and irregularity of NN models for increased efficiency and experimental results show that this accelerator achieves, on average, 7.23x speedup and 6.43x energy saving against the state-of-the-art NN accelerator.
Proceedings ArticleDOI

Cambricon-s: addressing irregularity in sparse neural networks through a cooperative software/hardware approach

TL;DR: A software-based coarse-grained pruning technique, together with local quantization, significantly reduces the size of indexes and improves the network compression ratio and a hardware accelerator is designed to address the remaining irregularity of sparse synapses and neurons efficiently.
Journal ArticleDOI

DaDianNao: A Neural Network Supercomputer

TL;DR: A custom multi-chip machine-learning architecture containing a combination of custom storage and computational units, with electrical and optical inter-chip interconnects separately is introduced, and it is shown that, on a subset of the largest known neural network layers, it is possible to achieve a speedup of 656.63× over a GPU, and reduce the energy by 184.05× on average for a 64-chip system.