scispace - formally typeset
Journal ArticleDOI

Accurate compensation in the log-spectral domain for noisy speech recognition

Mohamed Afify
- 18 Apr 2005 - 
- Vol. 13, Iss: 3, pp 388-398
TLDR
Experimental results for digit recognition in the car reveal that the proposed technique significantly outperform the baseline, and first order VTS, and the compensation algorithm is found to be more accurate and faster than an approximate numerical integration technique.
Abstract
This paper presents a new algorithm for noise compensation in the log-spectral domain. We first note that using a Gaussian mixture assumption a compensation algorithm in the log-spectral domain is completely defined by three parameters for each Gaussian component: the noisy speech mean, the noisy speech variance, and the covariance of clean and noisy speech. Starting from a well known mismatch function we propose two new approximations which allow deriving analytical expressions for the above mentioned parameters, and hence develop a new noise compensation algorithm in the log-spectral domain. In addition to theoretical derivations we discuss implementation issues of the proposed method and analyze its computational complexity. Experimental results for digit recognition in the car reveal that the proposed technique significantly outperform the baseline, and first order VTS. For example at 10 db signal to noise ratio the baseline, first order VTS, and the proposed method lead to recognition accuracies 82.6%, 85.5%, and 90.1%. The superiority of the proposed method to VTS can be attributed to the accuracy of the employed approximations. The compensation algorithm is also found to be more accurate and faster than an approximate numerical integration technique.

read more

Citations
More filters
Journal ArticleDOI

Normalization of the Speech Modulation Spectra for Robust Speech Recognition

TL;DR: The temporal structure normalization (TSN) filter to reduce the noise effects by normalizing the modulation spectra to reference spectra is proposed and delivers competitive results when compared to other state-of-the-art temporal filters.
Proceedings ArticleDOI

Stereo-Based Stochastic Mapping for Robust Speech Recognition

TL;DR: A stochastic mapping technique for robust speech recognition that uses stereo data based on constructing a Gaussian mixture model for the joint distribution of the clean and noisy features and using this distribution to predict the clean speech during testing.
Journal ArticleDOI

Stereo-Based Stochastic Mapping for Robust Speech Recognition

TL;DR: A stochastic mapping technique for robust speech recognition that uses stereo data based on constructing a Gaussian mixture model for the joint distribution of the clean and noisy features and using this distribution to predict the clean speech during testing.
Journal ArticleDOI

A Study on the Generalization Capability of Acoustic Models for Robust Speech Recognition

TL;DR: By improving the model's generalization capability through SME training, speech recognition performance can be significantly improved in both matched and low to medium mismatched testing cases with no language model constraints.
Dissertation

Robust speech features and acoustic models for speech recognition

Xiong Xiao
TL;DR: This thesis examines techniques to improve the robustness of automatic speech recognition (ASR) systems against noise distortions, and proposes to normalize the temporal structure of both training and testing speech features to reduce the feature-model mismatch.
References
More filters
Book

Probability, random variables, and stochastic processes

TL;DR: In this paper, the meaning of probability and random variables are discussed, as well as the axioms of probability, and the concept of a random variable and repeated trials are discussed.
Book

Random variables and stochastic processes

TL;DR: An electromagnetic pulse counter having successively operable, contact-operating armatures that are movable to a rest position, an intermediate position and an active position between the main pole and the secondary pole of a magnetic circuit.
Journal ArticleDOI

Speech recognition in noisy environments: a survey

TL;DR: The survey indicates that the essential points in noisy speech recognition consist of incorporating time and frequency correlations, giving more importance to high SNR portions of speech in decision making, exploiting task-specific a priori knowledge both of speech and of noise, using class-dependent processing, and including auditory models in speech processing.
Journal ArticleDOI

A useful theorem for nonlinear devices having Gaussian inputs

TL;DR: Application is made to the interesting special cases of conventional cross-correlation and autocorrelation functions, and Bussgang's theorem is easily proved.
Related Papers (5)