Proceedings ArticleDOI
Noise and speaker compensation in the Log filter bank domain
Vikas Joshi,Raghavendra Bilgi,Srinivasan Umesh,Luz García,Carmen Benitez +4 more
- pp 4709-4712
TLDR
The elegance of the proposed approach is that given the speech data, the authors obtain directly MFCC features that are robust to noise and speaker-variations that show a significant relative improvement over baseline on Aurora-4 task.Abstract:
In this paper, we propose a method to compensate for noise and speaker-variability directly in the Log filter-bank (FB) domain, so that MFCC features are robust to noise and speaker-variations. For noise-compensation, we use Vector Taylor Series (VTS) approach in the Log FB domain, and speaker-normalization is also done in the Log FB domain using Linear Vocal tract length (VTLN) matrices. For VTLN, optimal selection of warp-factor is done in Log FB domain using canonical GMM model, avoiding the two-pass approach needed by a HMM model. Further, this can be efficiently implemented using sufficient statistics obtained from the GMM and the FB-VTLN-matrices. The warp-factor selection using GMM can also be done in cepstral domain by applying DCT matrices without the usual approximations associated with conventional linear-VTLN. The elegance of the proposed approach is that given the speech data, we obtain directly MFCC features that are robust to noise and speaker-variations. The proposed approach, show a significant relative improvement of 31% over baseline on Aurora-4 task.read more
Citations
More filters
Journal ArticleDOI
An iterative longest matching segment approach to speech enhancement with additive noise and channel distortion
Ji Ming,Danny Crookes +1 more
TL;DR: This paper presents a new approach to speech enhancement from single-channel measurements involving both noise and channel distortion (i.e., convolutional noise), and demonstrates its applications for robust speech recognition and for improving noisy speech quality.
Proceedings ArticleDOI
Speech Enhancement from Additive Noise and Channel Distortion - a Corpus-Based Approach
Ji Ming,Danny Crookes +1 more
TL;DR: This paper presents a new approach to single-channel speech enhancement involving both noise and channel distortion (i.e., convolutional noise) based on finding longest matching segments (LMS) from a corpus of clean, wideband speech.
References
More filters
Proceedings ArticleDOI
A vector Taylor series approach for environment-independent speech recognition
TL;DR: This work introduces the use of a vector Taylor series (VTS) expansion to characterize efficiently and accurately the effects on speech statistics of unknown additive noise and unknown linear filtering in a transmission channel.
Journal ArticleDOI
A frequency warping approach to speaker normalization
L. Lee,Richard Rose +1 more
TL;DR: An efficient means for estimating a linear frequency Warping factor and a simple mechanism for implementing frequency warping by modifying the filterbank in mel-frequency cepstrum feature analysis are presented.
Journal ArticleDOI
Frequency warping for VTLN and speaker adaptation by linear transformation of standard MFCC
TL;DR: The performance of the new LT was comparable to that of regular VTLN implemented by warping the Mel filterbank, when the MLS criterion was used for FW estimation, and it is shown that the approximations involved do not lead to any performance degradation.
Proceedings ArticleDOI
Implementing frequency-warping and VTLN through linear transformation of conventional MFCC.
TL;DR: The proposed method exploits the bandlimited interpolation idea (in the frequency-domain) to do the necessary frequency-warping and yields exact results as long as the cepstral coefficients are que-frency limited.
Proceedings Article
A computationally efficient approach to warp factor estimation in VTLN using EM algorithm and sufficient statistics.
TL;DR: This paper develops a computationally efficient approach for warp factor estimation in Vocal Tract Length Normalization (VTLN) that has recognition performance that is comparable to conventional VTLN and yet is computationally more efficient.
Related Papers (5)
A recursive feature vector normalization approach for robust speech recognition in noise
O. Viikki,D.K. Bye,K. Laurila +2 more