SEGAN: Speech Enhancement Generative Adversarial Network

doi:10.21437/INTERSPEECH.2017-1428

Open AccessProceedings ArticleDOI

SEGAN: Speech Enhancement Generative Adversarial Network

Santiago Pascual, +2 more

- pp 3642-3646

Chats0

TLDR

This work proposes the use of generative adversarial networks for speech enhancement, and operates at the waveform level, training the model end-to-end, and incorporate 28 speakers and 40 different noise conditions into the same model, such that model parameters are shared across them.

Abstract:

Current speech enhancement techniques operate on the spectral domain and/or exploit some higher-level feature. The majority of them tackle a limited number of noise conditions and rely on first-order statistics. To circumvent these issues, deep networks are being increasingly used, thanks to their ability to learn complex functions from large example sets. In this work, we propose the use of generative adversarial networks for speech enhancement. In contrast to current techniques, we operate at the waveform level, training the model end-to-end, and incorporate 28 speakers and 40 different noise conditions into the same model, such that model parameters are shared across them. We evaluate the proposed model using an independent, unseen test set with two speakers and 20 alternative noise conditions. The enhanced samples confirm the viability of the proposed model, and both objective and subjective evaluations confirm the effectiveness of it. With that, we open the exploration of generative architectures for speech enhancement, which may progressively incorporate further speech-centric design choices to improve their performance.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation

Yi Luo, +1 more

- 20 Sep 2018 -

arXiv: Sound

TL;DR: A fully convolutional time-domain audio separation network (Conv-TasNet), a deep learning framework for end-to-end time- domain speech separation, which significantly outperforms previous time–frequency masking methods in separating two- and three-speaker mixtures.

...read moreread less

Journal ArticleDOI

Supervised Speech Separation Based on Deep Learning: An Overview

DeLiang Wang, +1 more

- 01 Oct 2018 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: A comprehensive overview of deep learning-based supervised speech separation can be found in this paper, where three main components of supervised separation are discussed: learning machines, training targets, and acoustic features.

...read moreread less

Journal ArticleDOI

A Survey on Deep Learning: Algorithms, Techniques, and Applications

Samira Pouyanfar, +8 more

- 18 Sep 2018 -

ACM Computing Surveys

TL;DR: A comprehensive review of historical and recent state-of-the-art approaches in visual, audio, and text processing; social network analysis; and natural language processing is presented, followed by the in-depth analysis on pivoting and groundbreaking advances in deep learning applications.

...read moreread less

Posted Content

The History Began from AlexNet: A Comprehensive Survey on Deep Learning Approaches.

Md. Zahangir Alom, +8 more

- 03 Mar 2018 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This report presents a brief survey on development of DL approaches, including Deep Neural Network (DNN), Convolutional neural network (CNN), Recurrent Neural network (RNN) including Long Short Term Memory (LSTM) and Gated Recurrent Units (GRU), Auto-Encoder (AE), Deep Belief Network (DBN), Generative Adversarial Network (GAN), and Deep Reinforcement Learning (DRL).

...read moreread less

Journal ArticleDOI

Deep Learning for Audio Signal Processing

Hendrik Purwins, +5 more

- 01 Apr 2019 -

IEEE Journal of Selected Topics in Signa...

TL;DR: Speech, music, and environmental sound processing are considered side-by-side, in order to point out similarities and differences between the domains, highlighting general methods, problems, key references, and potential for cross fertilization between areas.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Journal ArticleDOI

Generative Adversarial Nets

Ian Goodfellow, +7 more

TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.

...read moreread less

Posted Content

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

Kaiming He, +3 more

- 06 Feb 2015 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This work proposes a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit and derives a robust initialization method that particularly considers the rectifier nonlinearities.

...read moreread less

Posted Content

Image-to-Image Translation with Conditional Adversarial Networks

Phillip Isola, +3 more

- 21 Nov 2016 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: Conditional Adversarial Network (CA) as discussed by the authors is a general-purpose solution to image-to-image translation problems, which can be used to synthesize photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks.

...read moreread less

Posted Content

TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

Martín Abadi, +39 more

- 01 Jan 2015 -

arXiv: Distributed, Parallel, and Cluste...

TL;DR: The TensorFlow interface and an implementation of that interface that is built at Google are described, which has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields.

...read moreread less