Proceedings ArticleDOI
Improved cepstral mean and variance normalization using Bayesian framework
N. Vishnu Prasad,Srinivasan Umesh +1 more
- pp 156-161
Reads0
Chats0
TLDR
This work proposes to use posterior estimates of mean and variance in CMVN, instead of the maximum likelihood estimates, and has shown to preserve discriminable information without increase in computational cost, making it particularly relevant for Interactive Voice Response (IVR)-based applications.Abstract:
Cepstral Mean and Variance Normalization (CMVN) is a computationally efficient normalization technique for noise robust speech recognition. The performance of CMVN is known to degrade for short utterances, due to insufficient data for parameter estimation and loss of discriminable information as all utterances are forced to have zero mean and unit variance. In this work, we propose to use posterior estimates of mean and variance in CMVN, instead of the maximum likelihood estimates. This Bayesian approach, in addition to providing a robust estimate of parameters, is also shown to preserve discriminable information without increase in computational cost, making it particularly relevant for Interactive Voice Response (IVR)-based applications. The relative WER reduction of this approach w.r.t. Cepstral Mean Normalization, CMVN and Histogram Equalization are (i) 40.1%, 27% and 4.3% with the Aurora2 database for all utterances, (ii) 25.7%, 38.6% and 30.4% with the Aurora2 database for short utterances, and (iii) 18.7%, 12.6% and 2.5% with the Aurora4 database.read more
Citations
More filters
Journal ArticleDOI
Incorporating Noise Robustness in Speech Command Recognition by Noise Augmentation of Training Data.
Ayesha Pervaiz,Fawad Hussain,Huma Israr,Muhammad Tahir,Fawad Riasat Raja,Naveed Khan Baloch,Farruh Ishmanov,Yousaf Bin Zikria +7 more
TL;DR: A novel technique is proposed for noise robustness by augmenting noise in training data and achieves much better results than existing state-of-the-art techniques, thus setting a new benchmark.
Proceedings ArticleDOI
Study of fusion strategies and exploiting the combination of MFCC and PNCC features for robust biometric speaker identification
TL;DR: A new combination of features and normalization methods is investigated for robust biometric speaker identification and maximum, mean and weighted sum fusions of model scores are used to enhance the Speaker Identification Accuracy (SIA).
Journal ArticleDOI
Robust acoustic bird recognition for habitat monitoring with wireless sensor networks
TL;DR: Experimental results for the identification of 36 bird species from Tonga lake demonstrate that the proposed TRD–GTECC feature is highly effective and performs satisfactorily compared to popular front-ends considered in this study.
Journal ArticleDOI
AHW-BGOA-DNN: a novel deep learning model for epileptic seizure detection
TL;DR: A novel Deep Learning model for epileptic seizure detection which hybridizes Adaptive Haar Wavelet-based Binary Grasshopper Optimization Algorithm and Deep Neural Network (AHW-BGOA-DNN) is presented, which is found to be reliable and accurate over the existing state-of-the-art techniques.
Proceedings ArticleDOI
Incorporating uncertainty as a Quality Measure in I-Vector Based Language Recognition.
TL;DR: A new quality measure based on the i-vector posterior covariance is proposed and incorporate it into the recognition process to improve the recognition accuracy.
References
More filters
Proceedings Article
The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions
David Pearce,Hans-Günter Hirsch +1 more
TL;DR: A database designed to evaluate the performance of speech recognition algorithms in noisy conditions and recognition results are presented for the first standard DSR feature extraction scheme that is based on a cepstral analysis.
Cepstrum analysis technique for automatic speaker verification
TL;DR: New techniques for automatic speaker verification using telephone speech based on a set of functions of time obtained from acoustic analysis of a fixed, sentence-long utterance using a new time warping method using a dynamic programming technique.
Journal ArticleDOI
Cepstral analysis technique for automatic speaker verification
TL;DR: In this paper, a set of functions of time obtained from acoustic analysis of a fixed, sentence-long utterance are extracted by means of LPC analysis successively throughout an utterance to form time functions, and frequency response distortions introduced by transmission systems are removed.
Journal ArticleDOI
Speech recognition in noisy environments: a survey
TL;DR: The survey indicates that the essential points in noisy speech recognition consist of incorporating time and frequency correlations, giving more importance to high SNR portions of speech in decision making, exploiting task-specific a priori knowledge both of speech and of noise, using class-dependent processing, and including auditory models in speech processing.
Proceedings ArticleDOI
A vector Taylor series approach for environment-independent speech recognition
TL;DR: This work introduces the use of a vector Taylor series (VTS) expansion to characterize efficiently and accurately the effects on speech statistics of unknown additive noise and unknown linear filtering in a transmission channel.
Related Papers (5)
Speaker Verification Using Adapted Gaussian Mixture Models
Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences
S. Davis,Paul Mermelstein +1 more