A linear predictive front-end processor for speech recognition in noisy environments

doi:10.1109/ICASSP.1987.1169458

Proceedings ArticleDOI

A linear predictive front-end processor for speech recognition in noisy environments

- Vol. 12, pp 1324-1327

TLDR

This work investigates the performance of a recent algorithm for linear predictive (LP) modeling of speech signals, which have been degraded by uncorrelated additive noise, as a front-end processor in a speech recognition system.

Abstract:

We investigate the performance of a recent algorithm for linear predictive (LP) modeling of speech signals, which have been degraded by uncorrelated additive noise, as a front-end processor in a speech recognition system. The system is speaker dependent, and recognizes isolated words, based on dynamic time warping principles. The LP model for the clean speech is estimated through appropriate composite modeling of the noisy speech. This is done by minimizing the Itakura-Saito distortion measure between the sample spectrum of the noisy speech and the power spectral density of the composite model. This approach results in a "filtering-modeling" scheme in which the filter for the noisy speech, and the LP model for the clean speech, are alternatively optimized. The proposed system was tested using the 26 word English alphabet, the ten English digits, and the three command words, "stop," "error," and "repeat," which were contaminated by additive white noise at 5-20 dB signal to noise ratios (SNR's). By replacing the standard LP analysis with the proposed algorithm, during training on the clean speech and testing on the noisy speech, we achieve an improvement in recognition accuracy equivalent to an increase in input SNR of approximately 10 dB.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Speech recognition in noisy environments: a survey

Yifan Gong

- 01 Apr 1995 -

Speech Communication

TL;DR: The survey indicates that the essential points in noisy speech recognition consist of incorporating time and frequency correlations, giving more importance to high SNR portions of speech in decision making, exploiting task-specific a priori knowledge both of speech and of noise, using class-dependent processing, and including auditory models in speech processing.

...read moreread less

Journal ArticleDOI

Speech recognition in adverse environments

Biing-Hwang Juang

- 01 Jul 1991 -

Computer Speech & Language

TL;DR: This paper reviews several promising methods that were proposed in the past few years to deal with the problem of automatic speech recognition in an adverse environment, discussing methods or algorithms in six categories: signal enhancement preprocessing; special transducer arrangements; noise masking; stress compensation; robust distortion measures; and novel speech representations.

...read moreread less

Journal ArticleDOI

A family of distortion measures based upon projection operation for robust speech recognition

D. Mansour, +1 more

- 01 Nov 1989 -

IEEE Transactions on Acoustics, Speech, ...

TL;DR: It is found that the orientation (or direction) of the cepstral vector is less susceptible to noise perturbation than the vector norm, and a family of distortion measures based on the projection between two cEPstral vectors is proposed, which have the same computational efficiency as the band-pass cepStral distortion measure.

...read moreread less

Journal ArticleDOI

The short-time modified coherence representation and noisy speech recognition

D. Mansour, +1 more

- 01 Jun 1989 -

IEEE Transactions on Acoustics, Speech, ...

TL;DR: Initial implementation of the SMC in a speaker-dependent isolated word recognizer shows an improvement in recognition accuracy equivalent to an increase in input SNR of approximately 13 dB, as compared to the LPC recognizer.

...read moreread less

Journal ArticleDOI

A minimax classification approach with application to robust speech recognition

Neri Merhav, +1 more

- 01 Jan 1993 -

IEEE Transactions on Speech and Audio Pr...

TL;DR: A generalized likelihood ratio test is developed and shown to be optimal in the sense of achieving the highest asymptotic exponential rate of decay of the error probability for the worst-case mismatch situation.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

Linear Prediction of Speech

John E. Markel, +1 more

TL;DR: Speech Analysis and Synthesis Models: Basic Physical Principles, Speech Synthesis Structures, and Considerations in Choice of Analysis.

...read moreread less

Journal ArticleDOI

On the use of bandpass liftering in speech recognition

Biing-Hwang Juang, +2 more

- 01 Jul 1987 -

IEEE Transactions on Acoustics, Speech, ...

TL;DR: This paper has found that a bandpass "liftering" process reduces the variability of the statistical components of LPC-based spectral measurements and hence it is desirable to use such a liftering process in a speech recognizer.

...read moreread less

Journal ArticleDOI

Isolated and Connected Word Recognition--Theory and Selected Applications

Lawrence R. Rabiner, +1 more

- 01 May 1981 -

IEEE Transactions on Communications

TL;DR: This paper discusses word recognition as a classical pattern-recognition problem and shows how some fundamental concepts of signal processing, information theory, and computer science can be combined to give us the capability of robust recognition of isolated words and simple connected word sequences.

...read moreread less

Journal ArticleDOI

A modified K-means clustering algorithm for use in isolated work recognition

Jay G. Wilpon, +1 more

- 01 Jun 1985 -

IEEE Transactions on Acoustics, Speech, ...

TL;DR: A clustering algorithm based on a standard K-means approach which requires no user parameter specification is presented and experimental data show that this new algorithm performs as well or better than the previously used clustering techniques when tested as part of a speaker-independent isolated word recognition system.

...read moreread less

Proceedings ArticleDOI

On the use of bandpass liftering in speech recognition

Biing-Hwang Juang, +2 more

TL;DR: It is found that measurements of speech spectral envelopes are prone to statistical variations due to window position fluctuations, excitation interference, measurement noise, etc. and may possess spurious characteristics because of analysis model constraints and that a statistical model can be established to predict the variances of the cepstral coefficient measurements.

...read moreread less