Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks
TLDR
The proposed method can generate more natural spectral parameters and $F_0$ than conventional minimum generation error training algorithm regardless of its hyperparameter settings, and it is found that a Wasserstein GAN minimizing the Earth-Mover's distance works the best in terms of improving the synthetic speech quality.Abstract:
A method for statistical parametric speech synthesis incorporating generative adversarial networks (GANs) is proposed. Although powerful deep neural networks techniques can be applied to artificially synthesize speech waveform, the synthetic speech quality is low compared with that of natural speech. One of the issues causing the quality degradation is an oversmoothing effect often observed in the generated speech parameters. A GAN introduced in this paper consists of two neural networks: a discriminator to distinguish natural and generated samples, and a generator to deceive the discriminator. In the proposed framework incorporating the GANs, the discriminator is trained to distinguish natural and generated speech parameters, while the acoustic models are trained to minimize the weighted sum of the conventional minimum generation loss and an adversarial loss for deceiving the discriminator. Since the objective of the GANs is to minimize the divergence (i.e., distribution difference) between the natural and generated speech parameters, the proposed method effectively alleviates the oversmoothing effect on the generated speech parameters. We evaluated the effectiveness for text-to-speech and voice conversion, and found that the proposed method can generate more natural spectral parameters and $F_0$ than conventional minimum generation error training algorithm regardless of its hyperparameter settings. Furthermore, we investigated the effect of the divergence of various GANs, and found that a Wasserstein GAN minimizing the Earth-Mover's distance works the best in terms of improving the synthetic speech quality.read more
Citations
More filters
Journal ArticleDOI
A Survey of Deep Learning: Platforms, Applications and Emerging Research Trends
William G. Hatcher,Wei Yu +1 more
TL;DR: A thorough investigation of deep learning in its applications and mechanisms is sought, as a categorical collection of state of the art in deep learning research, to provide a broad reference for those seeking a primer on deep learning and its various implementations, platforms, algorithms, and uses in a variety of smart-world systems.
Posted Content
A Review on Generative Adversarial Networks: Algorithms, Theory, and Applications
TL;DR: This paper attempts to provide a review on various GANs methods from the perspectives of algorithms, theory, and applications, and compares the commonalities and differences of these GAns methods.
Proceedings ArticleDOI
StarGAN-VC: non-parallel many-to-many Voice Conversion Using Star Generative Adversarial Networks
TL;DR: StarGAN-VC as discussed by the authors uses a variant of a generative adversarial network (GAN) called StarGAN to learn many-to-many mappings across different attribute domains using a single generator.
Proceedings ArticleDOI
WaveNet Vocoder with Limited Training Data for Voice Conversion.
TL;DR: Experimental results show that the WaveNet vocoders built using the proposed method outperform conventional STRAIGHT vocoder, and the system achieves an average naturalness MOS of 4.13 in VCC 2018, which is the highest among all submitted systems.
Journal ArticleDOI
Speech analysis for health: Current state-of-the-art and the increasing impact of deep learning
TL;DR: Current state-of-the-art approaches with speech-based health detection are reviewed, placing a particular focus on the impact of deep learning within this domain.
References
More filters
Journal ArticleDOI
Generative Adversarial Nets
Ian Goodfellow,Jean Pouget-Abadie,Mehdi Mirza,Bing Xu,David Warde-Farley,Sherjil Ozair,Aaron Courville,Yoshua Bengio +7 more
TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.
Journal ArticleDOI
Reducing the Dimensionality of Data with Neural Networks
TL;DR: In this article, an effective way of initializing the weights that allows deep autoencoder networks to learn low-dimensional codes that work much better than principal components analysis as a tool to reduce the dimensionality of data is described.
Proceedings ArticleDOI
"Why Should I Trust You?": Explaining the Predictions of Any Classifier
TL;DR: In this article, the authors propose LIME, a method to explain models by presenting representative individual predictions and their explanations in a non-redundant way, framing the task as a submodular optimization problem.
Proceedings Article
Algorithms for Non-negative Matrix Factorization
Daniel D. Lee,H. Sebastian Seung +1 more
TL;DR: Two different multiplicative algorithms for non-negative matrix factorization are analyzed and one algorithm can be shown to minimize the conventional least squares error while the other minimizes the generalized Kullback-Leibler divergence.
Proceedings Article
Deep Sparse Rectifier Neural Networks
TL;DR: This paper shows that rectifying neurons are an even better model of biological neurons and yield equal or better performance than hyperbolic tangent networks in spite of the hard non-linearity and non-dierentiabil ity.