J
Jocelyn Huang
Researcher at Nvidia
Publications - 9
Citations - 419
Jocelyn Huang is an academic researcher from Nvidia. The author has contributed to research in topics: Computer science & Acoustic model. The author has an hindex of 4, co-authored 6 publications receiving 190 citations.
Papers
More filters
Proceedings ArticleDOI
Quartznet: Deep Automatic Speech Recognition with 1D Time-Channel Separable Convolutions
Samuel Kriman,Stanislav Beliaev,Boris Ginsburg,Jocelyn Huang,Oleksii Kuchaiev,Vitaly Lavrukhin,Ryan Leary,Jason Li,Yang Zhang +8 more
TL;DR: A new end-to-end neural acoustic model for automatic speech recognition that achieves near state-of-the-art accuracy on LibriSpeech and Wall Street Journal, while having fewer parameters than all competing models.
Posted Content
NeMo: a toolkit for building AI applications using Neural Modules.
Oleksii Kuchaiev,Jason Li,Huyen Nguyen,Oleksii Hrinchuk,Ryan Leary,Boris Ginsburg,Samuel Kriman,Stanislav Beliaev,Vitaly Lavrukhin,Jack Cook,Patrice Castonguay,Mariya Popova,Jocelyn Huang,Jonathan Cohen +13 more
TL;DR: NeMo (Neural Modules) is a Python framework-agnostic toolkit for creating AI applications through re-usability, abstraction, and composition that provides built-in support for distributed training and mixed precision on latest NVIDIA GPUs.
Posted Content
QuartzNet: Deep Automatic Speech Recognition with 1D Time-Channel Separable Convolutions.
Samuel Kriman,Stanislav Beliaev,Boris Ginsburg,Jocelyn Huang,Oleksii Kuchaiev,Vitaly Lavrukhin,Ryan Leary,Jason Li,Yang Zhang +8 more
TL;DR: In this paper, an end-to-end neural acoustic model for automatic speech recognition is proposed, which is composed of multiple blocks with residual connections between them, each block consists of one or more modules with 1D time-channel separable convolutional layers.
Posted Content
Cross-Language Transfer Learning, Continuous Learning, and Domain Adaptation for End-to-End Automatic Speech Recognition
Jocelyn Huang,Oleksii Kuchaiev,Patrick K. O'Neill,Vitaly Lavrukhin,Jason Li,Adriana B. Flores,Georg Kucsko,Boris Ginsburg +7 more
TL;DR: This paper demonstrates the efficacy of transfer learning and continuous learning for various automatic speech recognition (ASR) tasks and shows that in all three cases, transfer learning from a good base model has higher accuracy than a model trained from scratch.
Proceedings ArticleDOI
Cross-Language Transfer Learning and Domain Adaptation for End-to-End Automatic Speech Recognition
Luo Jian,Jianzong Wang,Cheng Ning,Edward Xiao,Xiao Jing,Georg Kucsko,Patrick K. O'Neill,Jagadeesh Balam,Slyne Deng,Adriana B. Flores,Boris Ginsburg,Jocelyn Huang,Oleksii Kuchaiev,Vitaly Lavrukhin,Jason Li +14 more
TL;DR: This paper demonstrates the efficacy of transfer learning and continuous learning for various automatic speech recognition (ASR) tasks using end-to-end models trained with CTC loss, and indicates that, for fine-tuning, larger pre-trained models are better than small pre- trained models, even if the dataset for Finetuning is small.