Noise and speaker compensation in the Log filter bank domain

doi:10.1109/ICASSP.2012.6288970

Proceedings ArticleDOI

Noise and speaker compensation in the Log filter bank domain

- pp 4709-4712

TLDR

The elegance of the proposed approach is that given the speech data, the authors obtain directly MFCC features that are robust to noise and speaker-variations that show a significant relative improvement over baseline on Aurora-4 task.

Abstract:

In this paper, we propose a method to compensate for noise and speaker-variability directly in the Log filter-bank (FB) domain, so that MFCC features are robust to noise and speaker-variations. For noise-compensation, we use Vector Taylor Series (VTS) approach in the Log FB domain, and speaker-normalization is also done in the Log FB domain using Linear Vocal tract length (VTLN) matrices. For VTLN, optimal selection of warp-factor is done in Log FB domain using canonical GMM model, avoiding the two-pass approach needed by a HMM model. Further, this can be efficiently implemented using sufficient statistics obtained from the GMM and the FB-VTLN-matrices. The warp-factor selection using GMM can also be done in cepstral domain by applying DCT matrices without the usual approximations associated with conventional linear-VTLN. The elegance of the proposed approach is that given the speech data, we obtain directly MFCC features that are robust to noise and speaker-variations. The proposed approach, show a significant relative improvement of 31% over baseline on Aurora-4 task.

Noise and speaker compensation in the Log filter bank domain

Citations

An iterative longest matching segment approach to speech enhancement with additive noise and channel distortion

Speech Enhancement from Additive Noise and Channel Distortion - a Corpus-Based Approach

References

A vector Taylor series approach for environment-independent speech recognition

A frequency warping approach to speaker normalization

Frequency warping for VTLN and speaker adaptation by linear transformation of standard MFCC

Implementing frequency-warping and VTLN through linear transformation of conventional MFCC.

A computationally efficient approach to warp factor estimation in VTLN using EM algorithm and sufficient statistics.

Related Papers (5)

Noise robust estimate of speech dynamics for speaker recognition

A recursive feature vector normalization approach for robust speech recognition in noise

Robust Speech Recognition Based on Vector Taylor Series

Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition

Accurate compensation in the log-spectral domain for noisy speech recognition