Proceedings ArticleDOI
Fast speaker adaptation of hybrid NN/HMM model for speech recognition based on discriminative learning of speaker code
Ossama Abdel-Hamid,Hui Jiang +1 more
- pp 7942-7946
TLDR
A new fast speaker adaptation method for the hybrid NN-HMM speech recognition model that can achieve over 10% relative reduction in phone error rate by using only seven utterances for adaptation.Abstract:
In this paper, we propose a new fast speaker adaptation method for the hybrid NN-HMM speech recognition model. The adaptation method depends on a joint learning of a large generic adaptation neural network for all speakers as well as multiple small speaker codes (one per speaker). The joint training method uses all training data along with speaker labels to update adaptation NN weights and speaker codes based on the standard back-propagation algorithm. In this way, the learned adaptation NN is capable of transforming each speaker features into a generic speaker-independent feature space when a small speaker code is given. Adaptation to a new speaker can be simply done by learning a new speaker code using the same back-propagation algorithm without changing any NN weights. In this method, a separate speaker code is learned for each speaker while the large adaptation NN is learned from the whole training set. The main advantage of this method is that the size of speaker codes is very small. As a result, it is possible to conduct a very fast adaptation of the hybrid NN/HMM model for each speaker based on only a small amount of adaptation data (i.e., just a few utterances). Experimental results on TIMIT have shown that it can achieve over 10% relative reduction in phone error rate by using only seven utterances for adaptation.read more
Citations
More filters
Journal ArticleDOI
A regression approach to speech enhancement based on deep neural networks
TL;DR: The proposed DNN approach can well suppress highly nonstationary noise, which is tough to handle in general, and is effective in dealing with noisy speech data recorded in real-world scenarios without the generation of the annoying musical artifact commonly observed in conventional enhancement methods.
Proceedings ArticleDOI
Speaker adaptation of neural network acoustic models using i-vectors
TL;DR: This work proposes to adapt deep neural network acoustic models to a target speaker by supplying speaker identity vectors (i-vectors) as input features to the network in parallel with the regular acoustic features for ASR, comparable in performance to DNNs trained on speaker-adapted features with the advantage that only one decoding pass is needed.
Journal ArticleDOI
Speech Recognition Using Deep Neural Networks: A Systematic Review
TL;DR: A thorough examination of the different studies that have been conducted since 2006, when deep learning first arose as a new area of machine learning, for speech applications is provided.
Proceedings ArticleDOI
Learning hidden unit contributions for unsupervised speaker adaptation of neural network acoustic models
Pawel Swietojanski,Steve Renals +1 more
TL;DR: This paper proposes a simple yet effective model-based neural network speaker adaptation technique that learns speaker-specific hidden unit contributions given adaptation data, without requiring any form of speaker-adaptive training, or labelled adaptation data.
Proceedings ArticleDOI
Improving DNN speaker independence with I-vector inputs
TL;DR: Modifications of the basic algorithm are developed which result in significant reductions in word error rates (WERs), and the algorithms are shown to combine well with speaker adaptation by backpropagation, resulting in a 9% relative WER reduction.
References
More filters
Journal ArticleDOI
Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition
TL;DR: A pre-trained deep neural network hidden Markov model (DNN-HMM) hybrid architecture that trains the DNN to produce a distribution over senones (tied triphone states) as its output that can significantly outperform the conventional context-dependent Gaussian mixture model (GMM)-HMMs.
Journal ArticleDOI
Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models
TL;DR: An important feature of the method is that arbitrary adaptation data can be used—no special enrolment sentences are needed and that as more data is used the adaptation performance improves.
Journal ArticleDOI
Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains
Jean-Luc Gauvain,Chin-Hui Lee +1 more
TL;DR: A framework for maximum a posteriori (MAP) estimation of hidden Markov models (HMM) is presented, and Bayesian learning is shown to serve as a unified approach for a wide range of speech recognition applications.
Journal ArticleDOI
Acoustic Modeling Using Deep Belief Networks
TL;DR: It is shown that better phone recognition on the TIMIT dataset can be achieved by replacing Gaussian mixture models by deep neural networks that contain many layers of features and a very large number of parameters.
Journal ArticleDOI
Maximum likelihood linear transformations for HMM-based speech recognition
TL;DR: The paper compares the two possible forms of model-based transforms: unconstrained, where any combination of mean and variance transform may be used, and constrained, which requires the variance transform to have the same form as the mean transform.