A harmonic-model-based front end for robust speech recognition

Open AccessProceedings Article

A harmonic-model-based front end for robust speech recognition

Chats0

TLDR

A new robustness algorithm is presented which exploits properties inherent to the speech signal itself to denoise the recognition features and achieves significant improvements in recognition accuracy on the Aurora 2 task.

Abstract:

Speech recognition accuracy degrades significantly when the speech has been corrupted by noise, especially when the system has been trained on clean speech. Many compensation algorithms have been developed which require reliable online noise estimates or a priori knowledge of the noise. In situations where such estimates or knowledge is difficult to obtain, these methods fail. We present a new robustness algorithm which avoids these problems by making no assumptions about the corrupting noise. Instead, we exploit properties inherent to the speech signal itself to denoise the recognition features. In this method, speech is decomposed into harmonic and noise-like components, which are then processed independently and recombined. By processing noise-corrupted speech in this manner we achieve significant improvements in recognition accuracy on the Aurora 2 task.

Citations

PDF

Open Access

More filters

Proceedings Article

The CHiME corpus: a resource and a challenge for computational hearing in multisource environments.

Heidi Christensen, +3 more

TL;DR: A new corpus designed for noise-robust speech processing research, CHiME, which includes around 40 hours of background recordings from a head and torso simulator positioned in a domestic setting, and a comprehensive set of binaural impulse responses collected in the same environment.

...read moreread less

Journal ArticleDOI

Transforming Binary Uncertainties for Robust Speech Recognition

Soundararajan Srinivasan, +1 more

- 01 Sep 2007 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: This work proposes a supervised approach using regression trees to learn the nonlinear transformation of the uncertainty from the linear spectral domain to the cepstral domain, which is used by a decoder that exploits the variance associated with the enhanced cEPstral features to improve robust speech recognition.

...read moreread less

Journal ArticleDOI

SpEx: Multi-Scale Time Domain Speaker Extraction Network

Chenglin Xu, +3 more

- 14 Apr 2020 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: Wang et al. as mentioned in this paper proposed a time-domain speaker extraction network (SpEx) that converts the mixture speech into multi-scale embedding coefficients instead of decomposing the speech signal into magnitude and phase spectra.

...read moreread less

Journal ArticleDOI

A schema-based model for phonemic restoration

Soundararajan Srinivasan, +1 more

- 01 Jan 2005 -

Speech Communication

TL;DR: This work presents a schema-based model for phonemic restoration that employs a missing data speech recognition system to decode speech based on intact portions and activates word templates corresponding to the words containing the masked phonemes.

...read moreread less

Proceedings ArticleDOI

Robust speech recognition using cepstral domain missing data techniques and noisy masks

H. Van Hamme

TL;DR: A recognizer based on the recently described cepstral-domain MDT approach using missing data masks computed from the noisy signal is described, which exploits a novel decision criterion that integrates harmonicity with signal-to-noise ratio and which makes minimal assumptions on the noise.

...read moreread less

References

PDF

Open Access

More filters

Journal ArticleDOI

Suppression of acoustic noise in speech using spectral subtraction

S. Boll

- 01 Apr 1979 -

IEEE Transactions on Acoustics, Speech, ...

TL;DR: A stand-alone noise suppression algorithm that resynthesizes a speech waveform and can be used as a pre-processor to narrow-band voice communications systems, speech recognition systems, or speaker authentication systems.

...read moreread less

Proceedings Article

The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions

David Pearce, +1 more

TL;DR: A database designed to evaluate the performance of speech recognition algorithms in noisy conditions and recognition results are presented for the first standard DSR feature extraction scheme that is based on a cepstral analysis.

...read moreread less

Journal ArticleDOI

Robust continuous speech recognition using parallel model combination

Mark J. F. Gales, +1 more

- 16 Sep 1996 -

IEEE Transactions on Speech and Audio Pr...

TL;DR: After training on clean speech data, the performance of the recognizer was found to be severely degraded when noise was added to the speech signal at between 10 and 18 dB, but using PMC the performance was restored to a level comparable with that obtained when training directly in the noise corrupted environment.

...read moreread less

Journal ArticleDOI

An iterative algorithm for decomposition of speech signals into periodic and aperiodic components

B. Yegnanarayana, +2 more

- 01 Jan 1998 -

IEEE Transactions on Speech and Audio Pr...

TL;DR: The algorithm for decomposition of a synthetic speech signal made of a mixture of periodic and aperiodic components was demonstrated and the ability of the algorithm to apply to natural speech is demonstrated.

...read moreread less

Proceedings ArticleDOI

HNM: a simple, efficient harmonic+noise model for speech

Jean Laroche, +2 more

TL;DR: The pitch-synchronous analysis technique makes use of a coarse estimate of the pitch and simultaneously calculates the various parameters of the model and refines the pitch estimate, yielding more natural resyntheses.

...read moreread less

A harmonic-model-based front end for robust speech recognition

Citations

The CHiME corpus: a resource and a challenge for computational hearing in multisource environments.

Transforming Binary Uncertainties for Robust Speech Recognition

SpEx: Multi-Scale Time Domain Speaker Extraction Network

A schema-based model for phonemic restoration

Robust speech recognition using cepstral domain missing data techniques and noisy masks

References

Suppression of acoustic noise in speech using spectral subtraction

The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions

Robust continuous speech recognition using parallel model combination

An iterative algorithm for decomposition of speech signals into periodic and aperiodic components

HNM: a simple, efficient harmonic+noise model for speech

Related Papers (5)

Pre-emphasis and speech recognition

Noise compensation for speech recognition using probabilistic models

Effect of phase-sensitive environment model and higher order VTS on noisy speech feature enhancement [speech recognition applications]

Using speech/non-speech detection to bias recognition search on noisy data

Decoding speech in the presence of other sources