Time-Frequency Masking in the Complex Domain for Speech Dereverberation and Denoising

doi:10.1109/TASLP.2017.2696307

Open AccessJournal ArticleDOI

Time-Frequency Masking in the Complex Domain for Speech Dereverberation and Denoising

Donald S. Williamson, +1 more

- 01 Jul 2017 -

IEEE Transactions on Audio, Speech, and ...

- Vol. 25, Iss: 7, pp 1492-1501

TLDR

This paper performs dereverberation and denoising using supervised learning with a deep neural network and defines the complex ideal ratio mask so that direct speech results after the mask is applied to reverberant and noisy speech.

Abstract:

In real-world situations, speech is masked by both background noise and reverberation, which negatively affect perceptual quality and intelligibility. In this paper, we address monaural speech separation in reverberant and noisy environments. We perform dereverberation and denoising using supervised learning with a deep neural network. Specifically, we enhance the magnitude and phase by performing separation with an estimate of the complex ideal ratio mask. We define the complex ideal ratio mask so that direct speech results after the mask is applied to reverberant and noisy speech. Our approach is evaluated using simulated and real room impulse responses, and with background noises. The proposed approach improves objective speech quality and intelligibility significantly. Evaluations and comparisons show that it outperforms related methods in many reverberant and noisy environments.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Data-driven Design of Perfect Reconstruction Filterbank for DNN-based Sound Source Enhancement

Daiki Takeuchi, +4 more

TL;DR: In this paper, the warped filterbank frame (WFBF) is considered as PRFB and the frequency characteristic of learned WFBF was in between STFT and the wavelet transform, and its effectiveness was confirmed by comparison with a standard STFT-based DNN whose input feature is compressed into the mel scale.

...read moreread less

Journal ArticleDOI

A new Genetic Algorithm based fusion scheme in monaural CASA system to improve the performance of the speech

S. Shoba, +1 more

- 01 Jan 2020 -

Journal of Ambient Intelligence and Huma...

TL;DR: A new method is proposed in this research work to obtain a T–F binary mask from the segments of unvoiced speech and the performance of the proposed GA based fusion scheme is evaluated using measures such as quality and intelligibility.

...read moreread less

Proceedings ArticleDOI

Environment-Dependent Attention-Driven Recurrent Convolutional Neural Network for Robust Speech Enhancement.

Meng Ge, +5 more

TL;DR: The proposed end-to-end environment-dependent attention-driven approach integrates an attention mechanism into bidirectional long short-term memory to acquire the weighted dynamic context between consecutive frames and outperformed existing methods on REVERB challenge.

...read moreread less

Proceedings ArticleDOI

End-to-End Sound Source Enhancement Using Deep Neural Network in the Modified Discrete Cosine Transform Domain

Yuma Koizumi, +4 more

TL;DR: An end-to-end deep neural network (DNN)-based source enhancement on the basis of a time-frequency (T-F) mask processing in the modified discrete cosine transform (MDCT)-domain is presented.

...read moreread less

Proceedings ArticleDOI

A consolidated view of loss functions for supervised deep learning-based speech enhancement

Sebastian Braun, +1 more

TL;DR: In this article, the authors investigated a wide variety of loss spectral functions for a recurrent neural network architecture suitable to operate in online frame-by-frame processing and found that combining magnitude-only with phase-aware objectives always leads to improvements, even when the phase is not enhanced.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization.

John C. Duchi, +2 more

TL;DR: Adaptive subgradient methods as discussed by the authors dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning, which allows us to find needles in haystacks in the form of very predictive but rarely seen features.

...read moreread less

Journal Article

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

John C. Duchi, +2 more

- 01 Feb 2011 -

Journal of Machine Learning Research

TL;DR: This work describes and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal functions that can be chosen in hindsight.

...read moreread less

Journal ArticleDOI

Multitask Learning

Rich Caruana

TL;DR: Multi-task Learning (MTL) as mentioned in this paper is an approach to inductive transfer that improves generalization by using the domain information contained in the training signals of related tasks as an inductive bias.

...read moreread less

Journal ArticleDOI

Image method for efficiently simulating small‐room acoustics

Jont B. Allen, +1 more

- 01 Nov 1976 -

Journal of the Acoustical Society of Ame...

TL;DR: The theoretical and practical use of image techniques for simulating the impulse response between two points in a small rectangular room, when convolved with any desired input signal, simulates room reverberation of the input signal.

...read moreread less

Journal ArticleDOI

Perceptual linear predictive (PLP) analysis of speech

Hynek Hermansky

- 01 Apr 1990 -

Journal of the Acoustical Society of Ame...

TL;DR: A new technique for the analysis of speech, the perceptual linear predictive (PLP) technique, which uses three concepts from the psychophysics of hearing to derive an estimate of the auditory spectrum, and yields a low-dimensional representation of speech.

...read moreread less