scispace - formally typeset
Proceedings ArticleDOI

A linear predictive front-end processor for speech recognition in noisy environments

Yariv Ephraim, +2 more
- Vol. 12, pp 1324-1327
TLDR
This work investigates the performance of a recent algorithm for linear predictive (LP) modeling of speech signals, which have been degraded by uncorrelated additive noise, as a front-end processor in a speech recognition system.
Abstract
We investigate the performance of a recent algorithm for linear predictive (LP) modeling of speech signals, which have been degraded by uncorrelated additive noise, as a front-end processor in a speech recognition system. The system is speaker dependent, and recognizes isolated words, based on dynamic time warping principles. The LP model for the clean speech is estimated through appropriate composite modeling of the noisy speech. This is done by minimizing the Itakura-Saito distortion measure between the sample spectrum of the noisy speech and the power spectral density of the composite model. This approach results in a "filtering-modeling" scheme in which the filter for the noisy speech, and the LP model for the clean speech, are alternatively optimized. The proposed system was tested using the 26 word English alphabet, the ten English digits, and the three command words, "stop," "error," and "repeat," which were contaminated by additive white noise at 5-20 dB signal to noise ratios (SNR's). By replacing the standard LP analysis with the proposed algorithm, during training on the clean speech and testing on the noisy speech, we achieve an improvement in recognition accuracy equivalent to an increase in input SNR of approximately 10 dB.

read more

Citations
More filters
Journal ArticleDOI

Speech recognition in noisy environments: a survey

TL;DR: The survey indicates that the essential points in noisy speech recognition consist of incorporating time and frequency correlations, giving more importance to high SNR portions of speech in decision making, exploiting task-specific a priori knowledge both of speech and of noise, using class-dependent processing, and including auditory models in speech processing.
Journal ArticleDOI

Speech recognition in adverse environments

TL;DR: This paper reviews several promising methods that were proposed in the past few years to deal with the problem of automatic speech recognition in an adverse environment, discussing methods or algorithms in six categories: signal enhancement preprocessing; special transducer arrangements; noise masking; stress compensation; robust distortion measures; and novel speech representations.
Journal ArticleDOI

A family of distortion measures based upon projection operation for robust speech recognition

TL;DR: It is found that the orientation (or direction) of the cepstral vector is less susceptible to noise perturbation than the vector norm, and a family of distortion measures based on the projection between two cEPstral vectors is proposed, which have the same computational efficiency as the band-pass cepStral distortion measure.
Journal ArticleDOI

The short-time modified coherence representation and noisy speech recognition

TL;DR: Initial implementation of the SMC in a speaker-dependent isolated word recognizer shows an improvement in recognition accuracy equivalent to an increase in input SNR of approximately 13 dB, as compared to the LPC recognizer.
Journal ArticleDOI

A minimax classification approach with application to robust speech recognition

TL;DR: A generalized likelihood ratio test is developed and shown to be optimal in the sense of achieving the highest asymptotic exponential rate of decay of the error probability for the worst-case mismatch situation.
References
More filters
Book

Linear Prediction of Speech

John E. Markel, +1 more
TL;DR: Speech Analysis and Synthesis Models: Basic Physical Principles, Speech Synthesis Structures, and Considerations in Choice of Analysis.
Journal ArticleDOI

On the use of bandpass liftering in speech recognition

TL;DR: This paper has found that a bandpass "liftering" process reduces the variability of the statistical components of LPC-based spectral measurements and hence it is desirable to use such a liftering process in a speech recognizer.
Journal ArticleDOI

Isolated and Connected Word Recognition--Theory and Selected Applications

TL;DR: This paper discusses word recognition as a classical pattern-recognition problem and shows how some fundamental concepts of signal processing, information theory, and computer science can be combined to give us the capability of robust recognition of isolated words and simple connected word sequences.
Journal ArticleDOI

A modified K-means clustering algorithm for use in isolated work recognition

TL;DR: A clustering algorithm based on a standard K-means approach which requires no user parameter specification is presented and experimental data show that this new algorithm performs as well or better than the previously used clustering techniques when tested as part of a speaker-independent isolated word recognition system.
Proceedings ArticleDOI

On the use of bandpass liftering in speech recognition

TL;DR: It is found that measurements of speech spectral envelopes are prone to statistical variations due to window position fluctuations, excitation interference, measurement noise, etc. and may possess spurious characteristics because of analysis model constraints and that a statistical model can be established to predict the variances of the cepstral coefficient measurements.
Related Papers (5)