scispace - formally typeset
J

Jocelyn Huang

Researcher at Nvidia

Publications -  9
Citations -  419

Jocelyn Huang is an academic researcher from Nvidia. The author has contributed to research in topics: Computer science & Acoustic model. The author has an hindex of 4, co-authored 6 publications receiving 190 citations.

Papers
More filters
Proceedings ArticleDOI

Quartznet: Deep Automatic Speech Recognition with 1D Time-Channel Separable Convolutions

TL;DR: A new end-to-end neural acoustic model for automatic speech recognition that achieves near state-of-the-art accuracy on LibriSpeech and Wall Street Journal, while having fewer parameters than all competing models.
Posted Content

NeMo: a toolkit for building AI applications using Neural Modules.

TL;DR: NeMo (Neural Modules) is a Python framework-agnostic toolkit for creating AI applications through re-usability, abstraction, and composition that provides built-in support for distributed training and mixed precision on latest NVIDIA GPUs.
Posted Content

QuartzNet: Deep Automatic Speech Recognition with 1D Time-Channel Separable Convolutions.

TL;DR: In this paper, an end-to-end neural acoustic model for automatic speech recognition is proposed, which is composed of multiple blocks with residual connections between them, each block consists of one or more modules with 1D time-channel separable convolutional layers.
Posted Content

Cross-Language Transfer Learning, Continuous Learning, and Domain Adaptation for End-to-End Automatic Speech Recognition

TL;DR: This paper demonstrates the efficacy of transfer learning and continuous learning for various automatic speech recognition (ASR) tasks and shows that in all three cases, transfer learning from a good base model has higher accuracy than a model trained from scratch.
Proceedings ArticleDOI

Cross-Language Transfer Learning and Domain Adaptation for End-to-End Automatic Speech Recognition

TL;DR: This paper demonstrates the efficacy of transfer learning and continuous learning for various automatic speech recognition (ASR) tasks using end-to-end models trained with CTC loss, and indicates that, for fine-tuning, larger pre-trained models are better than small pre- trained models, even if the dataset for Finetuning is small.