scispace - formally typeset
Open AccessPosted Content

Noise Adaptive Speech Enhancement using Domain Adversarial Training.

Reads0
Chats0
TLDR
In this article, the authors proposed a noise adaptive speech enhancement (SE) system, which employs a domain adversarial training (DAT) approach to tackle the issue of a noise type mismatch between the training and testing conditions.
Abstract
In this study, we propose a novel noise adaptive speech enhancement (SE) system, which employs a domain adversarial training (DAT) approach to tackle the issue of a noise type mismatch between the training and testing conditions Such a mismatch is a critical problem in deep-learning-based SE systems A large mismatch may cause a serious performance degradation to the SE performance Because we generally use a well-trained SE system to handle various unseen noise types, a noise type mismatch commonly occurs in real-world scenarios The proposed noise adaptive SE system contains an encoder-decoder-based enhancement model and a domain discriminator model During adaptation, the DAT approach encourages the encoder to produce noise-invariant features based on the information from the discriminator model and consequentially increases the robustness of the enhancement model to unseen noise types Herein, we regard stationary noises as the source domain (with the ground truth of clean speech) and non-stationary noises as the target domain (without the ground truth) We evaluated the proposed system on TIMIT sentences The experiment results show that the proposed noise adaptive SE system successfully provides significant improvements in PESQ (190%), SSNR (393%), and STOI (270%) over the SE system without an adaptation

read more

Citations
More filters
Journal ArticleDOI

Domain Adversarial for Acoustic Emotion Recognition

TL;DR: It is shown that exploiting unlabeled data consistently leads to better emotion recognition performance across all emotional dimensions, and the effect of adversarial training on the feature representation across the proposed deep learning architecture is visualize.
Posted Content

MetricGAN: Generative Adversarial Networks based Black-box Metric Scores Optimization for Speech Enhancement

TL;DR: A novel MetricGAN approach with an aim to optimize the generator with respect to one or multiple evaluation metrics, which could not be fully optimized by Lp or conventional adversarial losses is proposed.
Proceedings ArticleDOI

A Cross-Task Transfer Learning Approach to Adapting Deep Speech Enhancement Models to Unseen Background Noise Using Paired Senone Classifiers

TL;DR: An environment adaptation approach that improves deep speech enhancement models via minimizing the Kullback-Leibler divergence between posterior probabilities produced by a multi-condition senone classifier fed with noisy speech features to transfer an existing deep neural network (DNN) speech enhancer to specific noisy environments without using noisy/clean paired target waveforms needed in conventional DNN-based spectral regression.
Journal ArticleDOI

RemixIT: Continual Self-Training of Speech Enhancement Models via Bootstrapped Remixing

TL;DR: In this article , a self-supervised method for training speech enhancement without the need of a single isolated in-domain speech nor a noise waveform is presented. But, this method is not suitable for unsupervised domain adaptation.
References
More filters
Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Journal ArticleDOI

Generative Adversarial Nets

TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.
Posted Content

Adam: A Method for Stochastic Optimization

TL;DR: In this article, the adaptive estimates of lower-order moments are used for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimate of lowerorder moments.
Proceedings ArticleDOI

Learning Phrase Representations using RNN Encoder--Decoder for Statistical Machine Translation

TL;DR: In this paper, the encoder and decoder of the RNN Encoder-Decoder model are jointly trained to maximize the conditional probability of a target sequence given a source sequence.
Journal ArticleDOI

Suppression of acoustic noise in speech using spectral subtraction

TL;DR: A stand-alone noise suppression algorithm that resynthesizes a speech waveform and can be used as a pre-processor to narrow-band voice communications systems, speech recognition systems, or speaker authentication systems.
Related Papers (5)