scispace - formally typeset
Open AccessJournal ArticleDOI

Time-Frequency Masking in the Complex Domain for Speech Dereverberation and Denoising

TLDR
This paper performs dereverberation and denoising using supervised learning with a deep neural network and defines the complex ideal ratio mask so that direct speech results after the mask is applied to reverberant and noisy speech.
Abstract
In real-world situations, speech is masked by both background noise and reverberation, which negatively affect perceptual quality and intelligibility. In this paper, we address monaural speech separation in reverberant and noisy environments. We perform dereverberation and denoising using supervised learning with a deep neural network. Specifically, we enhance the magnitude and phase by performing separation with an estimate of the complex ideal ratio mask. We define the complex ideal ratio mask so that direct speech results after the mask is applied to reverberant and noisy speech. Our approach is evaluated using simulated and real room impulse responses, and with background noises. The proposed approach improves objective speech quality and intelligibility significantly. Evaluations and comparisons show that it outperforms related methods in many reverberant and noisy environments.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

gpuRIR: A Python Library for Room Impulse Response Simulation with GPU Acceleration

TL;DR: In this article, the authors present a new implementation that dramatically improves the computation speed of the Image Source Method (ISM) by using Graphic Processing Units (GPUs) to parallelize both the simulation of multiple RIRs and the computation of the images inside each RIR.
Proceedings ArticleDOI

Investigations on Data Augmentation and Loss Functions for Deep Learning Based Speech-Background Separation.

TL;DR: Data-augmented training combined with a novel loss function yields improvements in signal to distortion ratio (SDR) and perceptual evaluation of speech quality (PESQ) as compared to the best published result on CHiME-2 medium vocabulary data set when using a CNN+BLSTM network.
Proceedings ArticleDOI

Late Reverberation Suppression Using Recurrent Neural Networks with Long Short-Term Memory

TL;DR: A supervised speech dereverberation algorithm that models late reverberation using a recurrent neural network (RNN) with long short-term memory (LSTM) to take advantage of LSTM's ability to capture a long history can be effectively removed by the proposed approach.
Posted Content

A consolidated view of loss functions for supervised deep learning-based speech enhancement

TL;DR: This work investigates a wide variety of loss spectral functions for a recurrent neural network architecture suitable to operate in online frame-by-frame processing and reveals that combining magnitude-only with phase-aware objectives always leads to improvements, even when the phase is not enhanced.
Proceedings ArticleDOI

A Method of Improved CNN Traffic Classification

TL;DR: Compared with the traditional classification method, the experimental results show that the proposed CNN traffic classification method can improve the accuracy and reduce the time of classification.
References
More filters
Proceedings Article

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization.

TL;DR: Adaptive subgradient methods as discussed by the authors dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning, which allows us to find needles in haystacks in the form of very predictive but rarely seen features.
Journal Article

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

TL;DR: This work describes and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal functions that can be chosen in hindsight.
Journal ArticleDOI

Multitask Learning

TL;DR: Multi-task Learning (MTL) as mentioned in this paper is an approach to inductive transfer that improves generalization by using the domain information contained in the training signals of related tasks as an inductive bias.
Journal ArticleDOI

Image method for efficiently simulating small‐room acoustics

TL;DR: The theoretical and practical use of image techniques for simulating the impulse response between two points in a small rectangular room, when convolved with any desired input signal, simulates room reverberation of the input signal.
Journal ArticleDOI

Perceptual linear predictive (PLP) analysis of speech

TL;DR: A new technique for the analysis of speech, the perceptual linear predictive (PLP) technique, which uses three concepts from the psychophysics of hearing to derive an estimate of the auditory spectrum, and yields a low-dimensional representation of speech.
Related Papers (5)