Proceedings ArticleDOI
DNN-based source enhancement self-optimized by reinforcement learning using sound quality measurements
Yuma Koizumi,Kenta Niwa,Yusuke Hioka,Kobayashi Kazunori,Yoichi Haneda +4 more
- pp 81-85
TLDR
It was confirmed that the output sound quality of the RL-based source enhancement function improved as the number of iterations was increased and finally outperformed the conventional method.Abstract:
We investigated whether a deep neural network (DNN)-based source enhancement function can be self-optimized by reinforcement learning (RL). The use of a DNN is a powerful approach to describing the relationship between two sets of variables and can be useful for source enhancement function design. By training the DNN using a huge amount of training data, sound quality of output signals are improved. However, collecting a huge amount of training data is often difficult in practice. To use limited training data efficiently, we focus on the “self-optimization” of DNN-based source enhancement function in which RL is commonly utilized in the development of game playing computers. As a reward for RL, quantitative metrics that reflect a human's perceptual score (perceptual score), e.g., perceptual evaluation methods for audio source separation (PEASS), are utilized. To investigate whether the sound quality is improved by RL-based source enhancement, subjective tests were conducted. It was confirmed that the output sound quality of the RL-based source enhancement function improved as the number of iterations was increased and finally outperformed the conventional method.read more
Citations
More filters
Journal ArticleDOI
End-to-End Waveform Utterance Enhancement for Direct Evaluation Metrics Optimization by Fully Convolutional Neural Networks
TL;DR: In this paper, an end-to-end utterance-based speech enhancement framework using fully convolutional neural networks (FCN) was proposed to reduce the gap between the model optimization and the evaluation criterion.
Proceedings Article
MetricGAN: Generative Adversarial Networks based Black-box Metric Scores Optimization for Speech Enhancement.
TL;DR: In this article, the authors proposed a novel metricGAN approach with an aim to optimize the generator with respect to one or multiple evaluation metrics, based on which the generated data can also be arbitrarily specified by users.
Proceedings ArticleDOI
Speech Enhancement Using Self-Adaptation and Multi-Head Self-Attention
TL;DR: This paper investigates a self-adaptation method for speech enhancement using auxiliary speaker-aware features; it extracts a speaker representation used for adaptation directly from the test utterance and uses multi-task learning of speech enhancement and speaker identification, and uses the output of the final hidden layer of speaker identification branch as an auxiliary feature.
Journal ArticleDOI
DeepMMSE: A Deep Learning Approach to MMSE-Based Noise Power Spectral Density Estimation
TL;DR: The proposed noise PSD tracker, called DeepMMSE makes no assumptions about the characteristics of the noise or the speech, exhibits no tracking delay, and produces an accurate estimate that requires no bias correction, and when employed in a speech enhancement framework is able to outperform state-of-the-art noise PSd trackers, as well as multiple deep learning approaches to speech enhancement.
Proceedings ArticleDOI
Perceptually Guided Speech Enhancement Using Deep Neural Networks
TL;DR: This paper proposes a new deep neural networks based enhancement approach by incorporating a speech perception model into the loss function, and uses the short-time objective intelligibility metric in the loss in addition to the mean squared error.
References
More filters
Journal ArticleDOI
Learning representations by back-propagating errors
TL;DR: Back-propagation repeatedly adjusts the weights of the connections in the network so as to minimize a measure of the difference between the actual output vector of the net and the desired output vector, which helps to represent important features of the task domain.
Journal ArticleDOI
Human-level control through deep reinforcement learning
Volodymyr Mnih,Koray Kavukcuoglu,David Silver,Andrei Rusu,Joel Veness,Marc G. Bellemare,Alex Graves,Martin Riedmiller,Andreas K. Fidjeland,Georg Ostrovski,Stig Petersen,Charles Beattie,Amir Sadik,Ioannis Antonoglou,Helen King,Dharshan Kumaran,Daan Wierstra,Shane Legg,Demis Hassabis +18 more
TL;DR: This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Journal ArticleDOI
Mastering the game of Go with deep neural networks and tree search
David Silver,Aja Huang,Chris J. Maddison,Arthur Guez,Laurent Sifre,George van den Driessche,Julian Schrittwieser,Ioannis Antonoglou,Veda Panneershelvam,Marc Lanctot,Sander Dieleman,Dominik Grewe,John Nham,Nal Kalchbrenner,Ilya Sutskever,Timothy P. Lillicrap,Madeleine Leach,Koray Kavukcuoglu,Thore Graepel,Demis Hassabis +19 more
TL;DR: Using this search algorithm, the program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0.5, the first time that a computer program has defeated a human professional player in the full-sized game of Go.
Journal Article
Speech enhancement using a minimum mean square error short-time spectral amplitude estimator
TL;DR: This paper derives a minimum mean-square error STSA estimator, based on modeling speech and noise spectral components as statistically independent Gaussian random variables, which results in a significant reduction of the noise, and provides enhanced speech with colorless residual noise.
Proceedings ArticleDOI
Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs
TL;DR: A new model has been developed for use across a wider range of network conditions, including analogue connections, codecs, packet loss and variable delay, known as perceptual evaluation of speech quality (PESQ).