scispace - formally typeset
Proceedings ArticleDOI

Noise and speaker compensation in the Log filter bank domain

TLDR
The elegance of the proposed approach is that given the speech data, the authors obtain directly MFCC features that are robust to noise and speaker-variations that show a significant relative improvement over baseline on Aurora-4 task.
Abstract
In this paper, we propose a method to compensate for noise and speaker-variability directly in the Log filter-bank (FB) domain, so that MFCC features are robust to noise and speaker-variations. For noise-compensation, we use Vector Taylor Series (VTS) approach in the Log FB domain, and speaker-normalization is also done in the Log FB domain using Linear Vocal tract length (VTLN) matrices. For VTLN, optimal selection of warp-factor is done in Log FB domain using canonical GMM model, avoiding the two-pass approach needed by a HMM model. Further, this can be efficiently implemented using sufficient statistics obtained from the GMM and the FB-VTLN-matrices. The warp-factor selection using GMM can also be done in cepstral domain by applying DCT matrices without the usual approximations associated with conventional linear-VTLN. The elegance of the proposed approach is that given the speech data, we obtain directly MFCC features that are robust to noise and speaker-variations. The proposed approach, show a significant relative improvement of 31% over baseline on Aurora-4 task.

read more

Citations
More filters
Journal ArticleDOI

An iterative longest matching segment approach to speech enhancement with additive noise and channel distortion

TL;DR: This paper presents a new approach to speech enhancement from single-channel measurements involving both noise and channel distortion (i.e., convolutional noise), and demonstrates its applications for robust speech recognition and for improving noisy speech quality.
Proceedings ArticleDOI

Speech Enhancement from Additive Noise and Channel Distortion - a Corpus-Based Approach

Ji Ming, +1 more
TL;DR: This paper presents a new approach to single-channel speech enhancement involving both noise and channel distortion (i.e., convolutional noise) based on finding longest matching segments (LMS) from a corpus of clean, wideband speech.
References
More filters
Proceedings Article

Sub-Band Level Histogram Equalization for Robust Speech Recognition.

TL;DR: A novel modification of Histogram Equalization approach to robust speech recognition is described, known as Sub-band Histograms Equalization (S-HEQ), which has better equalization of the sub-bands as well as the overall cepstral histogram.
Proceedings ArticleDOI

Rapid joint speaker and noise compensation for robust speech recognition

TL;DR: The work presented in this paper aims at jointly applying speaker adaptation and model-based noise compensation by embedding speaker adaptation as part of the noise mismatch function to give a faster and more optimum adaptation compared to compensating for these two factors separately.
Proceedings ArticleDOI

Combining speaker and noise feature normalization techniques for Automatic Speech Recognition

TL;DR: A new combined speaker-noise normalization strategy which reduces the effect of noise in VTLN by applying Histogram Equalization is proposed and experimented in AURorA2 and AURORA4 databases.
Proceedings Article

Efficient Speaker and Noise Normalization for Robust Speech Recognition.

TL;DR: The recently proposed T-VTLN approach to speaker normalization where matrix transformations are directly applied on cepstral features is investigated, showing that the speaker-specific warp-factors estimated even from noisy speech using this approach closely match those from clean-speech.
Related Papers (5)