scispace - formally typeset
Proceedings ArticleDOI

Noise and speaker compensation in the Log filter bank domain

TLDR
The elegance of the proposed approach is that given the speech data, the authors obtain directly MFCC features that are robust to noise and speaker-variations that show a significant relative improvement over baseline on Aurora-4 task.
Abstract
In this paper, we propose a method to compensate for noise and speaker-variability directly in the Log filter-bank (FB) domain, so that MFCC features are robust to noise and speaker-variations. For noise-compensation, we use Vector Taylor Series (VTS) approach in the Log FB domain, and speaker-normalization is also done in the Log FB domain using Linear Vocal tract length (VTLN) matrices. For VTLN, optimal selection of warp-factor is done in Log FB domain using canonical GMM model, avoiding the two-pass approach needed by a HMM model. Further, this can be efficiently implemented using sufficient statistics obtained from the GMM and the FB-VTLN-matrices. The warp-factor selection using GMM can also be done in cepstral domain by applying DCT matrices without the usual approximations associated with conventional linear-VTLN. The elegance of the proposed approach is that given the speech data, we obtain directly MFCC features that are robust to noise and speaker-variations. The proposed approach, show a significant relative improvement of 31% over baseline on Aurora-4 task.

read more

Citations
More filters
Journal ArticleDOI

An iterative longest matching segment approach to speech enhancement with additive noise and channel distortion

TL;DR: This paper presents a new approach to speech enhancement from single-channel measurements involving both noise and channel distortion (i.e., convolutional noise), and demonstrates its applications for robust speech recognition and for improving noisy speech quality.
Proceedings ArticleDOI

Speech Enhancement from Additive Noise and Channel Distortion - a Corpus-Based Approach

Ji Ming, +1 more
TL;DR: This paper presents a new approach to single-channel speech enhancement involving both noise and channel distortion (i.e., convolutional noise) based on finding longest matching segments (LMS) from a corpus of clean, wideband speech.
References
More filters
Proceedings ArticleDOI

A vector Taylor series approach for environment-independent speech recognition

TL;DR: This work introduces the use of a vector Taylor series (VTS) expansion to characterize efficiently and accurately the effects on speech statistics of unknown additive noise and unknown linear filtering in a transmission channel.
Journal ArticleDOI

A frequency warping approach to speaker normalization

TL;DR: An efficient means for estimating a linear frequency Warping factor and a simple mechanism for implementing frequency warping by modifying the filterbank in mel-frequency cepstrum feature analysis are presented.
Journal ArticleDOI

Frequency warping for VTLN and speaker adaptation by linear transformation of standard MFCC

TL;DR: The performance of the new LT was comparable to that of regular VTLN implemented by warping the Mel filterbank, when the MLS criterion was used for FW estimation, and it is shown that the approximations involved do not lead to any performance degradation.
Proceedings ArticleDOI

Implementing frequency-warping and VTLN through linear transformation of conventional MFCC.

TL;DR: The proposed method exploits the bandlimited interpolation idea (in the frequency-domain) to do the necessary frequency-warping and yields exact results as long as the cepstral coefficients are que-frency limited.
Proceedings Article

A computationally efficient approach to warp factor estimation in VTLN using EM algorithm and sufficient statistics.

TL;DR: This paper develops a computationally efficient approach for warp factor estimation in Vocal Tract Length Normalization (VTLN) that has recognition performance that is comparable to conventional VTLN and yet is computationally more efficient.
Related Papers (5)