scispace - formally typeset
Proceedings ArticleDOI

Improved cepstral mean and variance normalization using Bayesian framework

Reads0
Chats0
TLDR
This work proposes to use posterior estimates of mean and variance in CMVN, instead of the maximum likelihood estimates, and has shown to preserve discriminable information without increase in computational cost, making it particularly relevant for Interactive Voice Response (IVR)-based applications.
Abstract
Cepstral Mean and Variance Normalization (CMVN) is a computationally efficient normalization technique for noise robust speech recognition. The performance of CMVN is known to degrade for short utterances, due to insufficient data for parameter estimation and loss of discriminable information as all utterances are forced to have zero mean and unit variance. In this work, we propose to use posterior estimates of mean and variance in CMVN, instead of the maximum likelihood estimates. This Bayesian approach, in addition to providing a robust estimate of parameters, is also shown to preserve discriminable information without increase in computational cost, making it particularly relevant for Interactive Voice Response (IVR)-based applications. The relative WER reduction of this approach w.r.t. Cepstral Mean Normalization, CMVN and Histogram Equalization are (i) 40.1%, 27% and 4.3% with the Aurora2 database for all utterances, (ii) 25.7%, 38.6% and 30.4% with the Aurora2 database for short utterances, and (iii) 18.7%, 12.6% and 2.5% with the Aurora4 database.

read more

Citations
More filters
Journal ArticleDOI

Incorporating Noise Robustness in Speech Command Recognition by Noise Augmentation of Training Data.

TL;DR: A novel technique is proposed for noise robustness by augmenting noise in training data and achieves much better results than existing state-of-the-art techniques, thus setting a new benchmark.
Proceedings ArticleDOI

Study of fusion strategies and exploiting the combination of MFCC and PNCC features for robust biometric speaker identification

TL;DR: A new combination of features and normalization methods is investigated for robust biometric speaker identification and maximum, mean and weighted sum fusions of model scores are used to enhance the Speaker Identification Accuracy (SIA).
Journal ArticleDOI

Robust acoustic bird recognition for habitat monitoring with wireless sensor networks

TL;DR: Experimental results for the identification of 36 bird species from Tonga lake demonstrate that the proposed TRD–GTECC feature is highly effective and performs satisfactorily compared to popular front-ends considered in this study.
Journal ArticleDOI

AHW-BGOA-DNN: a novel deep learning model for epileptic seizure detection

TL;DR: A novel Deep Learning model for epileptic seizure detection which hybridizes Adaptive Haar Wavelet-based Binary Grasshopper Optimization Algorithm and Deep Neural Network (AHW-BGOA-DNN) is presented, which is found to be reliable and accurate over the existing state-of-the-art techniques.
Proceedings ArticleDOI

Incorporating uncertainty as a Quality Measure in I-Vector Based Language Recognition.

TL;DR: A new quality measure based on the i-vector posterior covariance is proposed and incorporate it into the recognition process to improve the recognition accuracy.
References
More filters
Proceedings Article

The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions

TL;DR: A database designed to evaluate the performance of speech recognition algorithms in noisy conditions and recognition results are presented for the first standard DSR feature extraction scheme that is based on a cepstral analysis.

Cepstrum analysis technique for automatic speaker verification

S. Furui
TL;DR: New techniques for automatic speaker verification using telephone speech based on a set of functions of time obtained from acoustic analysis of a fixed, sentence-long utterance using a new time warping method using a dynamic programming technique.
Journal ArticleDOI

Cepstral analysis technique for automatic speaker verification

TL;DR: In this paper, a set of functions of time obtained from acoustic analysis of a fixed, sentence-long utterance are extracted by means of LPC analysis successively throughout an utterance to form time functions, and frequency response distortions introduced by transmission systems are removed.
Journal ArticleDOI

Speech recognition in noisy environments: a survey

TL;DR: The survey indicates that the essential points in noisy speech recognition consist of incorporating time and frequency correlations, giving more importance to high SNR portions of speech in decision making, exploiting task-specific a priori knowledge both of speech and of noise, using class-dependent processing, and including auditory models in speech processing.
Proceedings ArticleDOI

A vector Taylor series approach for environment-independent speech recognition

TL;DR: This work introduces the use of a vector Taylor series (VTS) expansion to characterize efficiently and accurately the effects on speech statistics of unknown additive noise and unknown linear filtering in a transmission channel.
Related Papers (5)