Time-Frequency Masking in the Complex Domain for Speech Dereverberation and Denoising

doi:10.1109/TASLP.2017.2696307

Open AccessJournal ArticleDOI

Time-Frequency Masking in the Complex Domain for Speech Dereverberation and Denoising

Donald S. Williamson, +1 more

- 01 Jul 2017 -

IEEE Transactions on Audio, Speech, and ...

- Vol. 25, Iss: 7, pp 1492-1501

TLDR

This paper performs dereverberation and denoising using supervised learning with a deep neural network and defines the complex ideal ratio mask so that direct speech results after the mask is applied to reverberant and noisy speech.

Abstract:

In real-world situations, speech is masked by both background noise and reverberation, which negatively affect perceptual quality and intelligibility. In this paper, we address monaural speech separation in reverberant and noisy environments. We perform dereverberation and denoising using supervised learning with a deep neural network. Specifically, we enhance the magnitude and phase by performing separation with an estimate of the complex ideal ratio mask. We define the complex ideal ratio mask so that direct speech results after the mask is applied to reverberant and noisy speech. Our approach is evaluated using simulated and real room impulse responses, and with background noises. The proposed approach improves objective speech quality and intelligibility significantly. Evaluations and comparisons show that it outperforms related methods in many reverberant and noisy environments.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments

Zixing Zhang, +5 more

- 24 Apr 2018 -

ACM Transactions on Intelligent Systems ...

TL;DR: A review of recently developed, representative deep learning approaches for tackling non-stationary additive and convolutional degradation of speech with the aim of providing guidelines for those involved in the development of environmentally robust speech recognition systems.

...read moreread less

Journal ArticleDOI

Speaker-Independent Speech Separation With Deep Attractor Network

Yi Luo, +2 more

- 01 Apr 2018 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: In this article, a neural network is used to project the time-frequency representation of the mixture signal into a high-dimensional embedding space and a reference point (attractor) is created to represent each speaker.

...read moreread less

Proceedings ArticleDOI

Speech Denoising with Deep Feature Losses.

Francois G. Germain, +2 more

TL;DR: In this article, a fully-convolutional context aggregation network using a deep feature loss is proposed to denoise speech signals by processing the raw waveform directly, which achieves state-of-the-art performance in objective speech quality metrics and in large-scale perceptual experiments with human listeners.

...read moreread less

Proceedings ArticleDOI

End-to-End Speech Separation with Unfolded Iterative Phase Reconstruction

Zhong-Qiu Wang, +3 more

TL;DR: In this paper, the authors proposed an end-to-end approach for single-channel speaker-independent multi-speaker speech separation, where time-frequency (T-F) masking, the short-time Fourier transform (STFT), and its inverse are represented as layers within a deep network.

...read moreread less

Posted Content

Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments

Zixing Zhang, +5 more

- 30 May 2017 -

arXiv: Sound

TL;DR: Recently developed, representative deep learning approaches for tackling non-stationary additive and convolutional degradation of speech with the aim of providing guidelines for those involved in the development of environmentally robust speech recognition systems are reviewed.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Binaural classification for reverberant speech segregation using deep neural networks

Yi Jiang, +3 more

- 01 Dec 2014 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: Evaluations and comparisons show that DNN-based binaural classification produces superior segregation performance in a variety of multisource and reverberant conditions.

...read moreread less

Journal ArticleDOI

Theory of Speech masking by reverberation

R. H. Bolt, +1 more

- 01 Nov 1949 -

Journal of the Acoustical Society of Ame...

TL;DR: In this article, a general statistical theory for the masking effect of reverberation on the intelligibility of words is developed for a series of discrete pulses distributed statistically over a 30-db range in sound pressure level in a given frequency band.

...read moreread less

Proceedings ArticleDOI

Recognizing reverberant speech with RASTA-PLP

Brian Kingsbury, +1 more

TL;DR: The authors' experimental variant on RASTA processing provides a statistically significant improvement in performance on the reverberant speech, with a best word error rate of 64.1%.

...read moreread less

Proceedings ArticleDOI

A deep neural network for time-domain signal reconstruction

Yuxuan Wang, +1 more

TL;DR: A new deep network is proposed that directly reconstructs the time-domain clean signal through an inverse fast Fourier transform layer and significantly outperforms a recent non-negative matrix factorization based separation system in both objective speech intelligibility and quality.

...read moreread less

Journal ArticleDOI

A Supervised Learning Approach to Monaural Segregation of Reverberant Speech

Zhaozhang Jin, +1 more

- 01 May 2009 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: A supervised learning approach to monaural segregation of reverberant voiced speech is proposed, which learns to map from a set of pitch-based auditory features to a grouping cue encoding the posterior probability of a time-frequency (T-F) unit being target dominant given observed features.

...read moreread less