Proceedings ArticleDOI
Static and dynamic spectral features: their noise robustness and optimal weights for ASR
Chen Yang,F.K. Soong,Tan Lee +2 more
- Vol. 1, pp 241-244
Reads0
Chats0
TLDR
It is found that the dynamic cepstrum is more robust to additive noise than its static counterpart, and a simple yet effective strategy of exponentially weighting the likelihoods that are contributed by the static and dynamic features during the decoding process is proposed.Abstract:
In this paper, we investigate the relative noise robustness between dynamic and static spectral features, by using two speaker independent continuous digit databases in English (Aurora2) and Cantonese (CUDigit) It is found that the dynamic cepstrum is more robust to additive noise than its static counterpart The results are consistent across different types of noise and under various SNRs Optimal exponential weights for exploiting unequal noise robustness of the two features are discriminatively trained in a development set When tested under various noise conditions, the optimal weights yielded relative word error rate reductions of 366% and 419% for Aurora2 and CUDigit, respectively The proposed weighting is attractive for many ASR applications in noise because: (1) no noise estimation for feature compensation; (2) no adaptation of clean HMMs to a noisy environment; and (3) only a trivial change in the decoding process by weighting log likelihoods of static and dynamic components separatelyread more
Citations
More filters
Journal ArticleDOI
Transforming Binary Uncertainties for Robust Speech Recognition
TL;DR: This work proposes a supervised approach using regression trees to learn the nonlinear transformation of the uncertainty from the linear spectral domain to the cepstral domain, which is used by a decoder that exploits the variance associated with the enhanced cEPstral features to improve robust speech recognition.
Journal ArticleDOI
Low bias histogram-based estimation of mutual information for feature selection
TL;DR: By canceling the first order bias, the estimation avoids the bias accumulation problem that affects classical methods and, on a synthetic feature selection problem, only the proposed method results in the exact number of features to be chosen in the Gaussian case when compared to four other approaches.
Dissertation
A Psychoacoustic Engineering Approach to Machine Sound Source Separation in Reverberant Environments
TL;DR: In this article, a study was conducted that compared several computational precedence models and their impact on the performance of a baseline separation algorithm, which included a precedence model, which was replaced with the other precedence models during the investigation.
Journal ArticleDOI
On the Effects of Filterbank Design and Energy Computation on Robust Speech Recognition
TL;DR: Experimental results show that selecting the appropriate filterbank and energy computation scheme can lead to significant error rate reduction over both MFCC and perceptual linear predicion (PLP) features for a variety of speech recognition tasks.
Book
Automatic Evaluation of Tracheoesophageal Substitute Voices
TL;DR: This thesis examined how auto matic methods can be used in order to provide an objective means of the evaluation of substitut e voices and found the correlation between human and au tomatic ratings was as good as the agreement among the human rater group.
References
More filters
Journal ArticleDOI
Suppression of acoustic noise in speech using spectral subtraction
TL;DR: A stand-alone noise suppression algorithm that resynthesizes a speech waveform and can be used as a pre-processor to narrow-band voice communications systems, speech recognition systems, or speaker authentication systems.
Journal ArticleDOI
Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models
TL;DR: An important feature of the method is that arbitrary adaptation data can be used—no special enrolment sentences are needed and that as more data is used the adaptation performance improves.
Journal ArticleDOI
RASTA processing of speech
Hynek Hermansky,Nelson Morgan +1 more
TL;DR: The theoretical and experimental foundations of the RASTA method are reviewed, the relationship with human auditory perception is discussed, the original method is extended to combinations of additive noise and convolutional noise, and an application is shown to speech enhancement.
Proceedings Article
The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions
David Pearce,Hans-Günter Hirsch +1 more
TL;DR: A database designed to evaluate the performance of speech recognition algorithms in noisy conditions and recognition results are presented for the first standard DSR feature extraction scheme that is based on a cepstral analysis.
Journal ArticleDOI
Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification
TL;DR: The cepstrum was found to be the most effective, providing an identification accuracy of 70% for speech 50 msec in duration, which increased to more than 98% for a duration of 0.5 sec.
Related Papers (5)
Static and Dynamic Spectral Features: Their Noise Robustness and Optimal Weights for ASR
Chen Yang,F.K. Soong,Tan Lee +2 more