scispace - formally typeset
S

Sehoon Kim

Researcher at University of California, Berkeley

Publications -  15
Citations -  277

Sehoon Kim is an academic researcher from University of California, Berkeley. The author has contributed to research in topics: Computer science & Quantization (signal processing). The author has an hindex of 3, co-authored 7 publications receiving 58 citations.

Papers
More filters
Posted Content

A Survey of Quantization Methods for Efficient Neural Network Inference

TL;DR: In this paper, the authors present a survey of approaches to quantizing the numerical values in deep neural network computations, covering the advantages/disadvantages of current methods and presenting a useful snapshot of the current research in quantization for Neural Networks.
Proceedings Article

I-BERT: Integer-only BERT Quantization

TL;DR: This work proposes a novel integer-only quantization scheme for Transformer based models that quantizes the entire inference process, and demonstrates how to approximate nonlinear operations in Transformer architectures, e.g., GELU, Softmax, and Layer Normalization, with lightweight integer computations.
Proceedings ArticleDOI

A Fast Post-Training Pruning Framework for Transformers

TL;DR: A fast post-training pruning framework for Transformers that prunes Transformers in less than 3 minutes on a single GPU, which is over two orders of magnitude faster than existing pruning approaches that retrain.
Proceedings ArticleDOI

Squeezeformer: An Efficient Transformer for Automatic Speech Recognition

TL;DR: The Squeezeformer model, which consistently outperforms the state-of-the-art ASR models under the same training schemes, is proposed, which incorporates the Temporal U-Net structure and incorporates an efficient depth-wise downsampling layer to sub-sample the input signal.
Posted Content

I-BERT: Integer-only BERT Quantization

TL;DR: I-BERT as discussed by the authors proposes a quantization scheme for Transformer based models that quantizes the entire inference with integer-only arithmetic, based on lightweight integer only approximation methods for nonlinear operations, e.g., GELU, Softmax, and Layer Normalization.