scispace - formally typeset
G

Gang Li

Researcher at Microsoft

Publications -  21
Citations -  4677

Gang Li is an academic researcher from Microsoft. The author has contributed to research in topics: Word error rate & Artificial neural network. The author has an hindex of 16, co-authored 21 publications receiving 4295 citations.

Papers
More filters
Proceedings ArticleDOI

1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs.

TL;DR: This work shows empirically that in SGD training of deep neural networks, one can, at no or nearly no loss of accuracy, quantize the gradients aggressively—to but one bit per value—if the quantization error is carried forward across minibatches (error feedback), and implements data-parallel deterministically distributed SGD by combining this finding with AdaGrad.
Proceedings Article

Conversational Speech Transcription Using Context-Dependent Deep Neural Networks.

TL;DR: Context-Dependent Deep-Neural-Network HMMs, or CD-DNN-HMMs, combine the classic artificial-neural-network HMMs with traditional context-dependent acoustic modeling and deep-belief-network pre-training to greatly outperform conventional CD-GMM (Gaussian mixture model) HMMs.
Proceedings Article

Conversational speech transcription using context-dependent deep neural networks

TL;DR: Context-Dependent Deep-Neural-Network (CD-DNN-HMMs) as mentioned in this paper combine the classic artificial-neural-network HMMs with traditional context-dependent acoustic modeling and deep-belief-network pre-training.
Proceedings ArticleDOI

Feature engineering in Context-Dependent Deep Neural Networks for conversational speech transcription

TL;DR: This work investigates the potential of Context-Dependent Deep-Neural-Network HMMs, or CD-DNN-HMMs, from a feature-engineering perspective to reduce the word error rate for speaker-independent transcription of phone calls.
Proceedings ArticleDOI

KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition

TL;DR: Experiments demonstrate that the proposed adaptation technique can provide 2%-30% relative error reduction against the already very strong speaker independent CD-DNN-HMM systems using different adaptation sets under both supervised and unsupervised adaptation setups.