Time-Frequency Masking in the Complex Domain for Speech Dereverberation and Denoising
TLDR
This paper performs dereverberation and denoising using supervised learning with a deep neural network and defines the complex ideal ratio mask so that direct speech results after the mask is applied to reverberant and noisy speech.Abstract:
In real-world situations, speech is masked by both background noise and reverberation, which negatively affect perceptual quality and intelligibility. In this paper, we address monaural speech separation in reverberant and noisy environments. We perform dereverberation and denoising using supervised learning with a deep neural network. Specifically, we enhance the magnitude and phase by performing separation with an estimate of the complex ideal ratio mask. We define the complex ideal ratio mask so that direct speech results after the mask is applied to reverberant and noisy speech. Our approach is evaluated using simulated and real room impulse responses, and with background noises. The proposed approach improves objective speech quality and intelligibility significantly. Evaluations and comparisons show that it outperforms related methods in many reverberant and noisy environments.read more
Citations
More filters
Proceedings ArticleDOI
Deep Convolutional Neural Network-Based Inverse Filtering Approach for Speech De-Reverberation
TL;DR: In this article, a spectral-domain inverse filtering approach for single-channel speech de-reverberation using deep convolutional neural network (CNN) is proposed. But the proposed method is not suitable for high-frequency speech, where the room impulse response filter is longer than the short-time Fourier transform (STFT) analysis window.
Proceedings ArticleDOI
Far-Field Speech Enhancement Using Heteroscedastic Autoencoder for Improved Speech Recognition.
Shashi Kumar,Shakti P. Rath +1 more
TL;DR: A more generalized loss based on non-zero mean and heteroscedastic co-variance distribution for the residual variables is proposed for ASR systems trained on clean speech and outperforms the conventional DA and MSE loss by a large margin.
Journal ArticleDOI
Dereverberation of autoregressive envelopes for far-field speech recognition
TL;DR: A neural model for speech dereverberation using the long-term sub-band envelopes of speech is developed which estimates the envelope gain which when applied to reverberant signals suppresses the late reflection components in the far-field signal.
Journal ArticleDOI
Speech Enhancement by Multiple Propagation through the Same Neural Network
Tomasz Grzywalski,Szymon Drgas +1 more
TL;DR: Previous efforts are extended and demonstrated how the multi-forward-pass speech enhancement can be successfully applied to other architectures, namely the ResBLSTM and Transformer-Net and the results show that performing speech enhancement up to five times still brings improvements to speech intelligibility, but the gain becomes smaller with each iteration.
Posted Content
Phase reconstruction based on recurrent phase unwrapping with deep neural networks.
TL;DR: Wang et al. as mentioned in this paper proposed a recurrent phase unwrapping (RPU) method to estimate phase derivatives instead of phase itself, which allows them to avoid the sensitivity problem and then, phase is recursively estimated based on the estimated derivatives.
References
More filters
Proceedings Article
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization.
TL;DR: Adaptive subgradient methods as discussed by the authors dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning, which allows us to find needles in haystacks in the form of very predictive but rarely seen features.
Journal Article
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
TL;DR: This work describes and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal functions that can be chosen in hindsight.
Journal ArticleDOI
Multitask Learning
TL;DR: Multi-task Learning (MTL) as mentioned in this paper is an approach to inductive transfer that improves generalization by using the domain information contained in the training signals of related tasks as an inductive bias.
Journal ArticleDOI
Image method for efficiently simulating small‐room acoustics
Jont B. Allen,David A. Berkley +1 more
TL;DR: The theoretical and practical use of image techniques for simulating the impulse response between two points in a small rectangular room, when convolved with any desired input signal, simulates room reverberation of the input signal.
Journal ArticleDOI
Perceptual linear predictive (PLP) analysis of speech
TL;DR: A new technique for the analysis of speech, the perceptual linear predictive (PLP) technique, which uses three concepts from the psychophysics of hearing to derive an estimate of the auditory spectrum, and yields a low-dimensional representation of speech.