Speech Denoising with Deep Feature Losses.

doi:10.21437/INTERSPEECH.2019-1924

Open AccessProceedings ArticleDOI

Speech Denoising with Deep Feature Losses.

- pp 2723-2727

TLDR

In this article, a fully-convolutional context aggregation network using a deep feature loss is proposed to denoise speech signals by processing the raw waveform directly, which achieves state-of-the-art performance in objective speech quality metrics and in large-scale perceptual experiments with human listeners.

Abstract:

We present an end-to-end deep learning approach to denoising speech signals by processing the raw waveform directly. Given input audio containing speech corrupted by an additive background signal, the system aims to produce a processed signal that contains only the speech content. Recent approaches have shown promising results using various deep network architectures. In this paper, we propose to train a fully-convolutional context aggregation network using a deep feature loss. That loss is based on comparing the internal feature activations in a different network, trained for acoustic environment detection and domestic audio tagging. Our approach outperforms the state-of-the-art in objective speech quality metrics and in large-scale perceptual experiments with human listeners. It also outperforms an identical network trained using traditional regression losses. The advantage of the new approach is particularly pronounced for the hardest data with the most intrusive background noise, for which denoising is most needed and most challenging.

Citations

PDF

Open Access

More filters

Proceedings Article

MetricGAN: Generative Adversarial Networks based Black-box Metric Scores Optimization for Speech Enhancement.

Szu-Wei Fu, +3 more

TL;DR: In this article, the authors proposed a novel metricGAN approach with an aim to optimize the generator with respect to one or multiple evaluation metrics, based on which the generated data can also be arbitrarily specified by users.

...read moreread less

Proceedings ArticleDOI

Real Time Speech Enhancement in the Waveform Domain.

Alexandre Défossez, +2 more

TL;DR: Empirical evidence shows that the proposed causal speech enhancement model, based on an encoder-decoder architecture with skip-connections, is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb.

...read moreread less

Journal ArticleDOI

PHASEN: A Phase-and-Harmonics-Aware Speech Enhancement Network

Dacheng Yin, +3 more

TL;DR: This paper proposes a phase-and-harmonics-aware deep neural network (DNN), named PHASEN, which has the ability to handle detailed phase patterns and to utilize harmonic patterns, and outperforms previous methods by a large margin on four metrics.

...read moreread less

Posted Content

Phase-aware Speech Enhancement with Deep Complex U-Net

Hyeong-Seok Choi, +5 more

- 07 Mar 2019 -

arXiv: Sound

TL;DR: A novel loss function, weighted source-to-distortion ratio (wSDR) loss, which is designed to directly correlate with a quantitative evaluation measure and achieves state-of-the-art performance in all metrics.

...read moreread less

Proceedings ArticleDOI

Weighted Speech Distortion Losses for Neural-Network-Based Real-Time Speech Enhancement

Yangyang Xia, +5 more

TL;DR: This paper investigates several aspects of training a RNN (recurrent neural network) that impact the objective and subjective quality of enhanced speech for real-time single-channel speech enhancement and proposes two novel mean-squared-error-based learning objectives.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.

...read moreread less

Proceedings Article

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, +1 more

TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.

...read moreread less

Journal ArticleDOI

ImageNet Large Scale Visual Recognition Challenge

Olga Russakovsky, +11 more

- 01 Dec 2015 -

International Journal of Computer Vision

TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.

...read moreread less

Posted Content

Multi-Scale Context Aggregation by Dilated Convolutions

Fisher Yu, +1 more

- 23 Nov 2015 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: In this article, a new convolutional network module is proposed to aggregate multi-scale contextual information without losing resolution, and the architecture is based on the fact that dilated convolutions support exponential expansion of the receptive field without loss of resolution or coverage.

...read moreread less