DNN-based source enhancement self-optimized by reinforcement learning using sound quality measurements

doi:10.1109/ICASSP.2017.7952122

Proceedings ArticleDOI

DNN-based source enhancement self-optimized by reinforcement learning using sound quality measurements

- pp 81-85

TLDR

It was confirmed that the output sound quality of the RL-based source enhancement function improved as the number of iterations was increased and finally outperformed the conventional method.

Abstract:

We investigated whether a deep neural network (DNN)-based source enhancement function can be self-optimized by reinforcement learning (RL). The use of a DNN is a powerful approach to describing the relationship between two sets of variables and can be useful for source enhancement function design. By training the DNN using a huge amount of training data, sound quality of output signals are improved. However, collecting a huge amount of training data is often difficult in practice. To use limited training data efficiently, we focus on the “self-optimization” of DNN-based source enhancement function in which RL is commonly utilized in the development of game playing computers. As a reward for RL, quantitative metrics that reflect a human's perceptual score (perceptual score), e.g., perceptual evaluation methods for audio source separation (PEASS), are utilized. To investigate whether the sound quality is improved by RL-based source enhancement, subjective tests were conducted. It was confirmed that the output sound quality of the RL-based source enhancement function improved as the number of iterations was increased and finally outperformed the conventional method.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

End-to-End Waveform Utterance Enhancement for Direct Evaluation Metrics Optimization by Fully Convolutional Neural Networks

Szu-Wei Fu, +4 more

- 01 Sep 2018 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: In this paper, an end-to-end utterance-based speech enhancement framework using fully convolutional neural networks (FCN) was proposed to reduce the gap between the model optimization and the evaluation criterion.

...read moreread less

Proceedings Article

MetricGAN: Generative Adversarial Networks based Black-box Metric Scores Optimization for Speech Enhancement.

Szu-Wei Fu, +3 more

TL;DR: In this article, the authors proposed a novel metricGAN approach with an aim to optimize the generator with respect to one or multiple evaluation metrics, based on which the generated data can also be arbitrarily specified by users.

...read moreread less

Proceedings ArticleDOI

Speech Enhancement Using Self-Adaptation and Multi-Head Self-Attention

Yuma Koizumi, +4 more

TL;DR: This paper investigates a self-adaptation method for speech enhancement using auxiliary speaker-aware features; it extracts a speaker representation used for adaptation directly from the test utterance and uses multi-task learning of speech enhancement and speaker identification, and uses the output of the final hidden layer of speaker identification branch as an auxiliary feature.

...read moreread less

Journal ArticleDOI

DeepMMSE: A Deep Learning Approach to MMSE-Based Noise Power Spectral Density Estimation

Qiquan Zhang, +4 more

- 14 Apr 2020 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: The proposed noise PSD tracker, called DeepMMSE makes no assumptions about the characteristics of the noise or the speech, exhibits no tracking delay, and produces an accurate estimate that requires no bias correction, and when employed in a speech enhancement framework is able to outperform state-of-the-art noise PSd trackers, as well as multiple deep learning approaches to speech enhancement.

...read moreread less

Proceedings ArticleDOI

Perceptually Guided Speech Enhancement Using Deep Neural Networks

Yan Zhao, +3 more

TL;DR: This paper proposes a new deep neural networks based enhancement approach by incorporating a speech perception model into the loss function, and uses the short-time objective intelligibility metric in the loss in addition to the mean squared error.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Learning representations by back-propagating errors

David E. Rumelhart, +2 more

- 01 Jan 1988 -

Nature

TL;DR: Back-propagation repeatedly adjusts the weights of the connections in the network so as to minimize a measure of the difference between the actual output vector of the net and the desired output vector, which helps to represent important features of the task domain.

...read moreread less

Journal Article

Speech enhancement using a minimum mean square error short-time spectral amplitude estimator

Ephraim

- 01 Jan 1984 -

IEEE Transactions on Acoustics, Speech, ...

TL;DR: This paper derives a minimum mean-square error STSA estimator, based on modeling speech and noise spectral components as statistically independent Gaussian random variables, which results in a significant reduction of the noise, and provides enhanced speech with colorless residual noise.

...read moreread less

Proceedings ArticleDOI

Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs

Antony William Rix, +3 more

TL;DR: A new model has been developed for use across a wider range of network conditions, including analogue connections, codecs, packet loss and variable delay, known as perceptual evaluation of speech quality (PESQ).

...read moreread less