scispace - formally typeset
Open AccessProceedings Article

A harmonic-model-based front end for robust speech recognition

Reads0
Chats0
TLDR
A new robustness algorithm is presented which exploits properties inherent to the speech signal itself to denoise the recognition features and achieves significant improvements in recognition accuracy on the Aurora 2 task.
Abstract
Speech recognition accuracy degrades significantly when the speech has been corrupted by noise, especially when the system has been trained on clean speech. Many compensation algorithms have been developed which require reliable online noise estimates or a priori knowledge of the noise. In situations where such estimates or knowledge is difficult to obtain, these methods fail. We present a new robustness algorithm which avoids these problems by making no assumptions about the corrupting noise. Instead, we exploit properties inherent to the speech signal itself to denoise the recognition features. In this method, speech is decomposed into harmonic and noise-like components, which are then processed independently and recombined. By processing noise-corrupted speech in this manner we achieve significant improvements in recognition accuracy on the Aurora 2 task.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings Article

The CHiME corpus: a resource and a challenge for computational hearing in multisource environments.

TL;DR: A new corpus designed for noise-robust speech processing research, CHiME, which includes around 40 hours of background recordings from a head and torso simulator positioned in a domestic setting, and a comprehensive set of binaural impulse responses collected in the same environment.
Journal ArticleDOI

Transforming Binary Uncertainties for Robust Speech Recognition

TL;DR: This work proposes a supervised approach using regression trees to learn the nonlinear transformation of the uncertainty from the linear spectral domain to the cepstral domain, which is used by a decoder that exploits the variance associated with the enhanced cEPstral features to improve robust speech recognition.
Journal ArticleDOI

SpEx: Multi-Scale Time Domain Speaker Extraction Network

TL;DR: Wang et al. as mentioned in this paper proposed a time-domain speaker extraction network (SpEx) that converts the mixture speech into multi-scale embedding coefficients instead of decomposing the speech signal into magnitude and phase spectra.
Journal ArticleDOI

A schema-based model for phonemic restoration

TL;DR: This work presents a schema-based model for phonemic restoration that employs a missing data speech recognition system to decode speech based on intact portions and activates word templates corresponding to the words containing the masked phonemes.
Proceedings ArticleDOI

Robust speech recognition using cepstral domain missing data techniques and noisy masks

TL;DR: A recognizer based on the recently described cepstral-domain MDT approach using missing data masks computed from the noisy signal is described, which exploits a novel decision criterion that integrates harmonicity with signal-to-noise ratio and which makes minimal assumptions on the noise.
References
More filters
Journal ArticleDOI

Suppression of acoustic noise in speech using spectral subtraction

TL;DR: A stand-alone noise suppression algorithm that resynthesizes a speech waveform and can be used as a pre-processor to narrow-band voice communications systems, speech recognition systems, or speaker authentication systems.
Proceedings Article

The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions

TL;DR: A database designed to evaluate the performance of speech recognition algorithms in noisy conditions and recognition results are presented for the first standard DSR feature extraction scheme that is based on a cepstral analysis.
Journal ArticleDOI

Robust continuous speech recognition using parallel model combination

TL;DR: After training on clean speech data, the performance of the recognizer was found to be severely degraded when noise was added to the speech signal at between 10 and 18 dB, but using PMC the performance was restored to a level comparable with that obtained when training directly in the noise corrupted environment.
Journal ArticleDOI

An iterative algorithm for decomposition of speech signals into periodic and aperiodic components

TL;DR: The algorithm for decomposition of a synthetic speech signal made of a mixture of periodic and aperiodic components was demonstrated and the ability of the algorithm to apply to natural speech is demonstrated.
Proceedings ArticleDOI

HNM: a simple, efficient harmonic+noise model for speech

TL;DR: The pitch-synchronous analysis technique makes use of a coarse estimate of the pitch and simultaneously calculates the various parameters of the model and refines the pitch estimate, yielding more natural resyntheses.
Related Papers (5)