Journal ArticleDOI
Accurate compensation in the log-spectral domain for noisy speech recognition
TLDR
Experimental results for digit recognition in the car reveal that the proposed technique significantly outperform the baseline, and first order VTS, and the compensation algorithm is found to be more accurate and faster than an approximate numerical integration technique.Abstract:
This paper presents a new algorithm for noise compensation in the log-spectral domain. We first note that using a Gaussian mixture assumption a compensation algorithm in the log-spectral domain is completely defined by three parameters for each Gaussian component: the noisy speech mean, the noisy speech variance, and the covariance of clean and noisy speech. Starting from a well known mismatch function we propose two new approximations which allow deriving analytical expressions for the above mentioned parameters, and hence develop a new noise compensation algorithm in the log-spectral domain. In addition to theoretical derivations we discuss implementation issues of the proposed method and analyze its computational complexity. Experimental results for digit recognition in the car reveal that the proposed technique significantly outperform the baseline, and first order VTS. For example at 10 db signal to noise ratio the baseline, first order VTS, and the proposed method lead to recognition accuracies 82.6%, 85.5%, and 90.1%. The superiority of the proposed method to VTS can be attributed to the accuracy of the employed approximations. The compensation algorithm is also found to be more accurate and faster than an approximate numerical integration technique.read more
Citations
More filters
Journal ArticleDOI
Normalization of the Speech Modulation Spectra for Robust Speech Recognition
TL;DR: The temporal structure normalization (TSN) filter to reduce the noise effects by normalizing the modulation spectra to reference spectra is proposed and delivers competitive results when compared to other state-of-the-art temporal filters.
Proceedings ArticleDOI
Stereo-Based Stochastic Mapping for Robust Speech Recognition
TL;DR: A stochastic mapping technique for robust speech recognition that uses stereo data based on constructing a Gaussian mixture model for the joint distribution of the clean and noisy features and using this distribution to predict the clean speech during testing.
Journal ArticleDOI
Stereo-Based Stochastic Mapping for Robust Speech Recognition
TL;DR: A stochastic mapping technique for robust speech recognition that uses stereo data based on constructing a Gaussian mixture model for the joint distribution of the clean and noisy features and using this distribution to predict the clean speech during testing.
Journal ArticleDOI
A Study on the Generalization Capability of Acoustic Models for Robust Speech Recognition
TL;DR: By improving the model's generalization capability through SME training, speech recognition performance can be significantly improved in both matched and low to medium mismatched testing cases with no language model constraints.
Dissertation
Robust speech features and acoustic models for speech recognition
TL;DR: This thesis examines techniques to improve the robustness of automatic speech recognition (ASR) systems against noise distortions, and proposes to normalize the temporal structure of both training and testing speech features to reduce the feature-model mismatch.
References
More filters
Book
Probability, random variables, and stochastic processes
TL;DR: In this paper, the meaning of probability and random variables are discussed, as well as the axioms of probability, and the concept of a random variable and repeated trials are discussed.
Book
Random variables and stochastic processes
TL;DR: An electromagnetic pulse counter having successively operable, contact-operating armatures that are movable to a rest position, an intermediate position and an active position between the main pole and the secondary pole of a magnetic circuit.
Journal ArticleDOI
Speech recognition in noisy environments: a survey
TL;DR: The survey indicates that the essential points in noisy speech recognition consist of incorporating time and frequency correlations, giving more importance to high SNR portions of speech in decision making, exploiting task-specific a priori knowledge both of speech and of noise, using class-dependent processing, and including auditory models in speech processing.
Journal ArticleDOI
A useful theorem for nonlinear devices having Gaussian inputs
TL;DR: Application is made to the interesting special cases of conventional cross-correlation and autocorrelation functions, and Bussgang's theorem is easily proved.