Time-Frequency Masking in the Complex Domain for Speech Dereverberation and Denoising

doi:10.1109/TASLP.2017.2696307

Open AccessJournal ArticleDOI

Time-Frequency Masking in the Complex Domain for Speech Dereverberation and Denoising

Donald S. Williamson, +1 more

- 01 Jul 2017 -

IEEE Transactions on Audio, Speech, and ...

- Vol. 25, Iss: 7, pp 1492-1501

TLDR

This paper performs dereverberation and denoising using supervised learning with a deep neural network and defines the complex ideal ratio mask so that direct speech results after the mask is applied to reverberant and noisy speech.

Abstract:

In real-world situations, speech is masked by both background noise and reverberation, which negatively affect perceptual quality and intelligibility. In this paper, we address monaural speech separation in reverberant and noisy environments. We perform dereverberation and denoising using supervised learning with a deep neural network. Specifically, we enhance the magnitude and phase by performing separation with an estimate of the complex ideal ratio mask. We define the complex ideal ratio mask so that direct speech results after the mask is applied to reverberant and noisy speech. Our approach is evaluated using simulated and real room impulse responses, and with background noises. The proposed approach improves objective speech quality and intelligibility significantly. Evaluations and comparisons show that it outperforms related methods in many reverberant and noisy environments.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments

Zixing Zhang, +5 more

- 24 Apr 2018 -

ACM Transactions on Intelligent Systems ...

TL;DR: A review of recently developed, representative deep learning approaches for tackling non-stationary additive and convolutional degradation of speech with the aim of providing guidelines for those involved in the development of environmentally robust speech recognition systems.

...read moreread less

Journal ArticleDOI

Speaker-Independent Speech Separation With Deep Attractor Network

Yi Luo, +2 more

- 01 Apr 2018 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: In this article, a neural network is used to project the time-frequency representation of the mixture signal into a high-dimensional embedding space and a reference point (attractor) is created to represent each speaker.

...read moreread less

Proceedings ArticleDOI

Speech Denoising with Deep Feature Losses.

Francois G. Germain, +2 more

TL;DR: In this article, a fully-convolutional context aggregation network using a deep feature loss is proposed to denoise speech signals by processing the raw waveform directly, which achieves state-of-the-art performance in objective speech quality metrics and in large-scale perceptual experiments with human listeners.

...read moreread less

Proceedings ArticleDOI

End-to-End Speech Separation with Unfolded Iterative Phase Reconstruction

Zhong-Qiu Wang, +3 more

TL;DR: In this paper, the authors proposed an end-to-end approach for single-channel speaker-independent multi-speaker speech separation, where time-frequency (T-F) masking, the short-time Fourier transform (STFT), and its inverse are represented as layers within a deep network.

...read moreread less

Posted Content

Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments

Zixing Zhang, +5 more

- 30 May 2017 -

arXiv: Sound

TL;DR: Recently developed, representative deep learning approaches for tackling non-stationary additive and convolutional degradation of speech with the aim of providing guidelines for those involved in the development of environmentally robust speech recognition systems are reviewed.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Deep Recurrent De-Noising Auto-Encoder and blind de-reverberation for reverberated speech recognition

Felix Weninger, +3 more

TL;DR: Results on the 2014 REVERB Challenge development set indicate that the DAE front-end provides complementary performance gains to multi-condition training, feature transformations, and model adaptation.

...read moreread less

Proceedings ArticleDOI

Robust speech dereverberation based on non-negativity and sparse nature of speech spectrograms

Hirokazu Kameoka, +2 more

TL;DR: This paper presents a blind dereverberation method designed to recover the subband envelope of an original speech signal from its reverberant version, formulated as a blind deconvolution problem with non-negative constraints, regularized by the sparse nature of speech spectrograms.

...read moreread less

Proceedings ArticleDOI

A Supervised Learning Approach to Monaural Segregation of Reverberant Speech

Zhaozhang Jin, +1 more

TL;DR: A supervised learning approach to monaural segregation of reverberant voiced speech is proposed, which learns to map from a set of pitch-based auditory features to a grouping cue encoding the posterior probability of a time-frequency (T-F) unit being target dominant given observed features.

...read moreread less

Journal ArticleDOI

Dynamic Precedence Effect Modeling for Source Separation in Reverberant Environments

Christopher Hummersone, +2 more

- 01 Sep 2010 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: This study tests a previously proposed binaural separation/precedence model in real rooms with a range of reverberant conditions and concludes that adaptation is necessary and can yield significant gains in separation performance.

...read moreread less

Journal ArticleDOI

Pitch-based monaural segregation of reverberant speech

Nicoleta Roman, +1 more

- 29 Jun 2006 -

Journal of the Acoustical Society of Ame...

TL;DR: This work proposes a two-stage monaural separation system that combines the inverse filtering of the room impulse response corresponding to target location and a pitch-based speech segregation method, and shows that the proposed system results in considerable signal-to-noise ratio gains across different conditions.

...read moreread less