Improved cepstral mean and variance normalization using Bayesian framework

doi:10.1109/ASRU.2013.6707722

Proceedings ArticleDOI

Improved cepstral mean and variance normalization using Bayesian framework

N. Vishnu Prasad, +1 more

- pp 156-161

Chats0

TLDR

This work proposes to use posterior estimates of mean and variance in CMVN, instead of the maximum likelihood estimates, and has shown to preserve discriminable information without increase in computational cost, making it particularly relevant for Interactive Voice Response (IVR)-based applications.

Abstract:

Cepstral Mean and Variance Normalization (CMVN) is a computationally efficient normalization technique for noise robust speech recognition. The performance of CMVN is known to degrade for short utterances, due to insufficient data for parameter estimation and loss of discriminable information as all utterances are forced to have zero mean and unit variance. In this work, we propose to use posterior estimates of mean and variance in CMVN, instead of the maximum likelihood estimates. This Bayesian approach, in addition to providing a robust estimate of parameters, is also shown to preserve discriminable information without increase in computational cost, making it particularly relevant for Interactive Voice Response (IVR)-based applications. The relative WER reduction of this approach w.r.t. Cepstral Mean Normalization, CMVN and Histogram Equalization are (i) 40.1%, 27% and 4.3% with the Aurora2 database for all utterances, (ii) 25.7%, 38.6% and 30.4% with the Aurora2 database for short utterances, and (iii) 18.7%, 12.6% and 2.5% with the Aurora4 database.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Incorporating Noise Robustness in Speech Command Recognition by Noise Augmentation of Training Data.

Ayesha Pervaiz, +7 more

- 19 Apr 2020 -

Sensors

TL;DR: A novel technique is proposed for noise robustness by augmenting noise in training data and achieves much better results than existing state-of-the-art techniques, thus setting a new benchmark.

...read moreread less

Proceedings ArticleDOI

Study of fusion strategies and exploiting the combination of MFCC and PNCC features for robust biometric speaker identification

Musab T. S. Al-Kaltakchi, +3 more

TL;DR: A new combination of features and normalization methods is investigated for robust biometric speaker identification and maximum, mean and weighted sum fusions of model scores are used to enhance the Speaker Identification Accuracy (SIA).

...read moreread less

Journal ArticleDOI

Robust acoustic bird recognition for habitat monitoring with wireless sensor networks

Amira Boulmaiz, +3 more

- 27 Jul 2016 -

International Journal of Speech Technolo...

TL;DR: Experimental results for the identification of 36 bird species from Tonga lake demonstrate that the proposed TRD–GTECC feature is highly effective and performs satisfactorily compared to popular front-ends considered in this study.

...read moreread less

Journal ArticleDOI

AHW-BGOA-DNN: a novel deep learning model for epileptic seizure detection

H. Anila Glory, +5 more

- 01 Jun 2021 -

Neural Computing and Applications

TL;DR: A novel Deep Learning model for epileptic seizure detection which hybridizes Adaptive Haar Wavelet-based Binary Grasshopper Optimization Algorithm and Deep Neural Network (AHW-BGOA-DNN) is presented, which is found to be reliable and accurate over the existing state-of-the-art techniques.

...read moreread less

Proceedings ArticleDOI

Incorporating uncertainty as a Quality Measure in I-Vector Based Language Recognition.

Amir Hossein Poorjam, +3 more

TL;DR: A new quality measure based on the i-vector posterior covariance is proposed and incorporate it into the recognition process to improve the recognition accuracy.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions

David Pearce, +1 more

TL;DR: A database designed to evaluate the performance of speech recognition algorithms in noisy conditions and recognition results are presented for the first standard DSR feature extraction scheme that is based on a cepstral analysis.

...read moreread less

Cepstrum analysis technique for automatic speaker verification

S. Furui

TL;DR: New techniques for automatic speaker verification using telephone speech based on a set of functions of time obtained from acoustic analysis of a fixed, sentence-long utterance using a new time warping method using a dynamic programming technique.

...read moreread less

Journal ArticleDOI

Cepstral analysis technique for automatic speaker verification

Sadaoki Furui

- 01 Apr 1981 -

IEEE Transactions on Acoustics, Speech, ...

TL;DR: In this paper, a set of functions of time obtained from acoustic analysis of a fixed, sentence-long utterance are extracted by means of LPC analysis successively throughout an utterance to form time functions, and frequency response distortions introduced by transmission systems are removed.

...read moreread less

Journal ArticleDOI

Speech recognition in noisy environments: a survey

Yifan Gong

- 01 Apr 1995 -

Speech Communication

TL;DR: The survey indicates that the essential points in noisy speech recognition consist of incorporating time and frequency correlations, giving more importance to high SNR portions of speech in decision making, exploiting task-specific a priori knowledge both of speech and of noise, using class-dependent processing, and including auditory models in speech processing.

...read moreread less

Proceedings ArticleDOI

A vector Taylor series approach for environment-independent speech recognition

Pedro J. Moreno, +2 more

TL;DR: This work introduces the use of a vector Taylor series (VTS) expansion to characterize efficiently and accurately the effects on speech statistics of unknown additive noise and unknown linear filtering in a transmission channel.

...read moreread less

Related Papers (5)

Speaker Verification Using Adapted Gaussian Mixture Models

Douglas A. Reynolds, +2 more

- 01 Jan 2000 -

Digital Signal Processing

Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences

S. Davis, +1 more

- 01 Aug 1980 -

IEEE Transactions on Acoustics, Speech, ...

Cepstral domain segmental feature vector normalization for noise robust speech recognition

Olli Viikki, +1 more

- 01 Aug 1998 -

Speech Communication

Front-End Factor Analysis for Speaker Verification

Najim Dehak, +4 more

- 01 May 2011 -

IEEE Transactions on Audio, Speech, and ...

Improved cepstral mean and variance normalization using Bayesian framework

Citations

Incorporating Noise Robustness in Speech Command Recognition by Noise Augmentation of Training Data.

Study of fusion strategies and exploiting the combination of MFCC and PNCC features for robust biometric speaker identification

Robust acoustic bird recognition for habitat monitoring with wireless sensor networks

AHW-BGOA-DNN: a novel deep learning model for epileptic seizure detection

Incorporating uncertainty as a Quality Measure in I-Vector Based Language Recognition.

References

The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions

Cepstrum analysis technique for automatic speaker verification

Cepstral analysis technique for automatic speaker verification

Speech recognition in noisy environments: a survey

A vector Taylor series approach for environment-independent speech recognition

Related Papers (5)

The Kaldi Speech Recognition Toolkit

Speaker Verification Using Adapted Gaussian Mixture Models

Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences

Cepstral domain segmental feature vector normalization for noise robust speech recognition

Front-End Factor Analysis for Speaker Verification