scispace - formally typeset
Proceedings ArticleDOI

Static and dynamic spectral features: their noise robustness and optimal weights for ASR

Reads0
Chats0
TLDR
It is found that the dynamic cepstrum is more robust to additive noise than its static counterpart, and a simple yet effective strategy of exponentially weighting the likelihoods that are contributed by the static and dynamic features during the decoding process is proposed.
Abstract
In this paper, we investigate the relative noise robustness between dynamic and static spectral features, by using two speaker independent continuous digit databases in English (Aurora2) and Cantonese (CUDigit) It is found that the dynamic cepstrum is more robust to additive noise than its static counterpart The results are consistent across different types of noise and under various SNRs Optimal exponential weights for exploiting unequal noise robustness of the two features are discriminatively trained in a development set When tested under various noise conditions, the optimal weights yielded relative word error rate reductions of 366% and 419% for Aurora2 and CUDigit, respectively The proposed weighting is attractive for many ASR applications in noise because: (1) no noise estimation for feature compensation; (2) no adaptation of clean HMMs to a noisy environment; and (3) only a trivial change in the decoding process by weighting log likelihoods of static and dynamic components separately

read more

Citations
More filters
Journal ArticleDOI

Transforming Binary Uncertainties for Robust Speech Recognition

TL;DR: This work proposes a supervised approach using regression trees to learn the nonlinear transformation of the uncertainty from the linear spectral domain to the cepstral domain, which is used by a decoder that exploits the variance associated with the enhanced cEPstral features to improve robust speech recognition.
Journal ArticleDOI

Low bias histogram-based estimation of mutual information for feature selection

TL;DR: By canceling the first order bias, the estimation avoids the bias accumulation problem that affects classical methods and, on a synthetic feature selection problem, only the proposed method results in the exact number of features to be chosen in the Gaussian case when compared to four other approaches.
Dissertation

A Psychoacoustic Engineering Approach to Machine Sound Source Separation in Reverberant Environments

TL;DR: In this article, a study was conducted that compared several computational precedence models and their impact on the performance of a baseline separation algorithm, which included a precedence model, which was replaced with the other precedence models during the investigation.
Journal ArticleDOI

On the Effects of Filterbank Design and Energy Computation on Robust Speech Recognition

TL;DR: Experimental results show that selecting the appropriate filterbank and energy computation scheme can lead to significant error rate reduction over both MFCC and perceptual linear predicion (PLP) features for a variety of speech recognition tasks.
Book

Automatic Evaluation of Tracheoesophageal Substitute Voices

TL;DR: This thesis examined how auto matic methods can be used in order to provide an objective means of the evaluation of substitut e voices and found the correlation between human and au tomatic ratings was as good as the agreement among the human rater group.
References
More filters
Journal ArticleDOI

Suppression of acoustic noise in speech using spectral subtraction

TL;DR: A stand-alone noise suppression algorithm that resynthesizes a speech waveform and can be used as a pre-processor to narrow-band voice communications systems, speech recognition systems, or speaker authentication systems.
Journal ArticleDOI

Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models

TL;DR: An important feature of the method is that arbitrary adaptation data can be used—no special enrolment sentences are needed and that as more data is used the adaptation performance improves.
Journal ArticleDOI

RASTA processing of speech

TL;DR: The theoretical and experimental foundations of the RASTA method are reviewed, the relationship with human auditory perception is discussed, the original method is extended to combinations of additive noise and convolutional noise, and an application is shown to speech enhancement.
Proceedings Article

The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions

TL;DR: A database designed to evaluate the performance of speech recognition algorithms in noisy conditions and recognition results are presented for the first standard DSR feature extraction scheme that is based on a cepstral analysis.
Journal ArticleDOI

Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification

TL;DR: The cepstrum was found to be the most effective, providing an identification accuracy of 70% for speech 50 msec in duration, which increased to more than 98% for a duration of 0.5 sec.
Related Papers (5)