S
Sehoon Kim
Researcher at University of California, Berkeley
Publications - 15
Citations - 277
Sehoon Kim is an academic researcher from University of California, Berkeley. The author has contributed to research in topics: Computer science & Quantization (signal processing). The author has an hindex of 3, co-authored 7 publications receiving 58 citations.
Papers
More filters
Posted Content
A Survey of Quantization Methods for Efficient Neural Network Inference
TL;DR: In this paper, the authors present a survey of approaches to quantizing the numerical values in deep neural network computations, covering the advantages/disadvantages of current methods and presenting a useful snapshot of the current research in quantization for Neural Networks.
Proceedings Article
I-BERT: Integer-only BERT Quantization
TL;DR: This work proposes a novel integer-only quantization scheme for Transformer based models that quantizes the entire inference process, and demonstrates how to approximate nonlinear operations in Transformer architectures, e.g., GELU, Softmax, and Layer Normalization, with lightweight integer computations.
Proceedings ArticleDOI
A Fast Post-Training Pruning Framework for Transformers
TL;DR: A fast post-training pruning framework for Transformers that prunes Transformers in less than 3 minutes on a single GPU, which is over two orders of magnitude faster than existing pruning approaches that retrain.
Proceedings ArticleDOI
Squeezeformer: An Efficient Transformer for Automatic Speech Recognition
Sehoon Kim,Amir Gholami,Albert Shaw,Nicholas Lee,Karttikeya Mangalam,Jitendra Malik,Michael W. Mahoney,Kurt Keutzer +7 more
TL;DR: The Squeezeformer model, which consistently outperforms the state-of-the-art ASR models under the same training schemes, is proposed, which incorporates the Temporal U-Net structure and incorporates an efficient depth-wise downsampling layer to sub-sample the input signal.
Posted Content
I-BERT: Integer-only BERT Quantization
TL;DR: I-BERT as discussed by the authors proposes a quantization scheme for Transformer based models that quantizes the entire inference with integer-only arithmetic, based on lightweight integer only approximation methods for nonlinear operations, e.g., GELU, Softmax, and Layer Normalization.