Noise and speaker compensation in the Log filter bank domain
TL;DR: The elegance of the proposed approach is that given the speech data, the authors obtain directly MFCC features that are robust to noise and speaker-variations that show a significant relative improvement over baseline on Aurora-4 task.
Abstract: In this paper, we propose a method to compensate for noise and speaker-variability directly in the Log filter-bank (FB) domain, so that MFCC features are robust to noise and speaker-variations. For noise-compensation, we use Vector Taylor Series (VTS) approach in the Log FB domain, and speaker-normalization is also done in the Log FB domain using Linear Vocal tract length (VTLN) matrices. For VTLN, optimal selection of warp-factor is done in Log FB domain using canonical GMM model, avoiding the two-pass approach needed by a HMM model. Further, this can be efficiently implemented using sufficient statistics obtained from the GMM and the FB-VTLN-matrices. The warp-factor selection using GMM can also be done in cepstral domain by applying DCT matrices without the usual approximations associated with conventional linear-VTLN. The elegance of the proposed approach is that given the speech data, we obtain directly MFCC features that are robust to noise and speaker-variations. The proposed approach, show a significant relative improvement of 31% over baseline on Aurora-4 task.
...read more
Citations
8 citations
3 citations
References
477 citations
322 citations
"Noise and speaker compensation in t..." refers background in this paper
...Both noise compensation and speaker normalization are done in the feature domain....
[...]
46 citations
"Noise and speaker compensation in t..." refers methods in this paper
...VTS model for noisy cepstra is given by, cy = cx +D ∗ log(1 + e D−1(cn−cx)) (9) where cy is the noisy cepstra, cx is the clean cepstra, cn is the noise vector due to additive noise at the input and D is the DCT matrix....
[...]
35 citations
"Noise and speaker compensation in t..." refers background or methods in this paper
...5 are obtained for low pass (LP) filtered and high pass (HP) filtered cepstral coefficients at different SNR levels....
[...]
...VTS model for noisy cepstra is given by, cy = cx +D ∗ log(1 + e D−1(cn−cx)) (9) where cy is the noisy cepstra, cx is the clean cepstra, cn is the noise vector due to additive noise at the input and D is the DCT matrix....
[...]
...The basic idea of this analysis, is that for any good ”noise compensating” method, histogram of ”cleaned” features should match those of the original speech features....
[...]
31 citations
"Noise and speaker compensation in t..." refers methods in this paper
...• VTLN Warping : Warping is done in Log FB domain by multiplying the VTS compensated Log FB coefficients (Fvts) with warping matrices (A0....
[...]