scispace - formally typeset
Proceedings ArticleDOI

DNN-based source enhancement self-optimized by reinforcement learning using sound quality measurements

TLDR
It was confirmed that the output sound quality of the RL-based source enhancement function improved as the number of iterations was increased and finally outperformed the conventional method.
Abstract
We investigated whether a deep neural network (DNN)-based source enhancement function can be self-optimized by reinforcement learning (RL). The use of a DNN is a powerful approach to describing the relationship between two sets of variables and can be useful for source enhancement function design. By training the DNN using a huge amount of training data, sound quality of output signals are improved. However, collecting a huge amount of training data is often difficult in practice. To use limited training data efficiently, we focus on the “self-optimization” of DNN-based source enhancement function in which RL is commonly utilized in the development of game playing computers. As a reward for RL, quantitative metrics that reflect a human's perceptual score (perceptual score), e.g., perceptual evaluation methods for audio source separation (PEASS), are utilized. To investigate whether the sound quality is improved by RL-based source enhancement, subjective tests were conducted. It was confirmed that the output sound quality of the RL-based source enhancement function improved as the number of iterations was increased and finally outperformed the conventional method.

read more

Citations
More filters
Journal ArticleDOI

End-to-End Waveform Utterance Enhancement for Direct Evaluation Metrics Optimization by Fully Convolutional Neural Networks

TL;DR: In this paper, an end-to-end utterance-based speech enhancement framework using fully convolutional neural networks (FCN) was proposed to reduce the gap between the model optimization and the evaluation criterion.
Proceedings Article

MetricGAN: Generative Adversarial Networks based Black-box Metric Scores Optimization for Speech Enhancement.

TL;DR: In this article, the authors proposed a novel metricGAN approach with an aim to optimize the generator with respect to one or multiple evaluation metrics, based on which the generated data can also be arbitrarily specified by users.
Proceedings ArticleDOI

Speech Enhancement Using Self-Adaptation and Multi-Head Self-Attention

TL;DR: This paper investigates a self-adaptation method for speech enhancement using auxiliary speaker-aware features; it extracts a speaker representation used for adaptation directly from the test utterance and uses multi-task learning of speech enhancement and speaker identification, and uses the output of the final hidden layer of speaker identification branch as an auxiliary feature.
Journal ArticleDOI

DeepMMSE: A Deep Learning Approach to MMSE-Based Noise Power Spectral Density Estimation

TL;DR: The proposed noise PSD tracker, called DeepMMSE makes no assumptions about the characteristics of the noise or the speech, exhibits no tracking delay, and produces an accurate estimate that requires no bias correction, and when employed in a speech enhancement framework is able to outperform state-of-the-art noise PSd trackers, as well as multiple deep learning approaches to speech enhancement.
Proceedings ArticleDOI

Perceptually Guided Speech Enhancement Using Deep Neural Networks

TL;DR: This paper proposes a new deep neural networks based enhancement approach by incorporating a speech perception model into the loss function, and uses the short-time objective intelligibility metric in the loss in addition to the mean squared error.
References
More filters
Journal ArticleDOI

Learning representations by back-propagating errors

TL;DR: Back-propagation repeatedly adjusts the weights of the connections in the network so as to minimize a measure of the difference between the actual output vector of the net and the desired output vector, which helps to represent important features of the task domain.
Journal ArticleDOI

Human-level control through deep reinforcement learning

TL;DR: This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Journal ArticleDOI

Mastering the game of Go with deep neural networks and tree search

TL;DR: Using this search algorithm, the program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0.5, the first time that a computer program has defeated a human professional player in the full-sized game of Go.
Journal Article

Speech enhancement using a minimum mean square error short-time spectral amplitude estimator

TL;DR: This paper derives a minimum mean-square error STSA estimator, based on modeling speech and noise spectral components as statistically independent Gaussian random variables, which results in a significant reduction of the noise, and provides enhanced speech with colorless residual noise.
Proceedings ArticleDOI

Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs

TL;DR: A new model has been developed for use across a wider range of network conditions, including analogue connections, codecs, packet loss and variable delay, known as perceptual evaluation of speech quality (PESQ).
Related Papers (5)