G
Gang Li
Researcher at Microsoft
Publications - 21
Citations - 4677
Gang Li is an academic researcher from Microsoft. The author has contributed to research in topics: Word error rate & Artificial neural network. The author has an hindex of 16, co-authored 21 publications receiving 4295 citations.
Papers
More filters
Proceedings ArticleDOI
1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs.
TL;DR: This work shows empirically that in SGD training of deep neural networks, one can, at no or nearly no loss of accuracy, quantize the gradients aggressively—to but one bit per value—if the quantization error is carried forward across minibatches (error feedback), and implements data-parallel deterministically distributed SGD by combining this finding with AdaGrad.
Proceedings Article
Conversational Speech Transcription Using Context-Dependent Deep Neural Networks.
Frank Seide,Gang Li,Dong Yu +2 more
TL;DR: Context-Dependent Deep-Neural-Network HMMs, or CD-DNN-HMMs, combine the classic artificial-neural-network HMMs with traditional context-dependent acoustic modeling and deep-belief-network pre-training to greatly outperform conventional CD-GMM (Gaussian mixture model) HMMs.
Proceedings Article
Conversational speech transcription using context-dependent deep neural networks
Dong Yu,Frank Seide,Gang Li +2 more
TL;DR: Context-Dependent Deep-Neural-Network (CD-DNN-HMMs) as mentioned in this paper combine the classic artificial-neural-network HMMs with traditional context-dependent acoustic modeling and deep-belief-network pre-training.
Proceedings ArticleDOI
Feature engineering in Context-Dependent Deep Neural Networks for conversational speech transcription
TL;DR: This work investigates the potential of Context-Dependent Deep-Neural-Network HMMs, or CD-DNN-HMMs, from a feature-engineering perspective to reduce the word error rate for speaker-independent transcription of phone calls.
Proceedings ArticleDOI
KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition
TL;DR: Experiments demonstrate that the proposed adaptation technique can provide 2%-30% relative error reduction against the already very strong speaker independent CD-DNN-HMM systems using different adaptation sets under both supervised and unsupervised adaptation setups.