scispace - formally typeset
Open AccessJournal ArticleDOI

Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks

TLDR
The proposed method can generate more natural spectral parameters and $F_0$ than conventional minimum generation error training algorithm regardless of its hyperparameter settings, and it is found that a Wasserstein GAN minimizing the Earth-Mover's distance works the best in terms of improving the synthetic speech quality.
Abstract
A method for statistical parametric speech synthesis incorporating generative adversarial networks (GANs) is proposed. Although powerful deep neural networks techniques can be applied to artificially synthesize speech waveform, the synthetic speech quality is low compared with that of natural speech. One of the issues causing the quality degradation is an oversmoothing effect often observed in the generated speech parameters. A GAN introduced in this paper consists of two neural networks: a discriminator to distinguish natural and generated samples, and a generator to deceive the discriminator. In the proposed framework incorporating the GANs, the discriminator is trained to distinguish natural and generated speech parameters, while the acoustic models are trained to minimize the weighted sum of the conventional minimum generation loss and an adversarial loss for deceiving the discriminator. Since the objective of the GANs is to minimize the divergence (i.e., distribution difference) between the natural and generated speech parameters, the proposed method effectively alleviates the oversmoothing effect on the generated speech parameters. We evaluated the effectiveness for text-to-speech and voice conversion, and found that the proposed method can generate more natural spectral parameters and $F_0$ than conventional minimum generation error training algorithm regardless of its hyperparameter settings. Furthermore, we investigated the effect of the divergence of various GANs, and found that a Wasserstein GAN minimizing the Earth-Mover's distance works the best in terms of improving the synthetic speech quality.

read more

Citations
More filters
Journal ArticleDOI

A Survey of Deep Learning: Platforms, Applications and Emerging Research Trends

TL;DR: A thorough investigation of deep learning in its applications and mechanisms is sought, as a categorical collection of state of the art in deep learning research, to provide a broad reference for those seeking a primer on deep learning and its various implementations, platforms, algorithms, and uses in a variety of smart-world systems.
Posted Content

A Review on Generative Adversarial Networks: Algorithms, Theory, and Applications

TL;DR: This paper attempts to provide a review on various GANs methods from the perspectives of algorithms, theory, and applications, and compares the commonalities and differences of these GAns methods.
Proceedings ArticleDOI

StarGAN-VC: non-parallel many-to-many Voice Conversion Using Star Generative Adversarial Networks

TL;DR: StarGAN-VC as discussed by the authors uses a variant of a generative adversarial network (GAN) called StarGAN to learn many-to-many mappings across different attribute domains using a single generator.
Proceedings ArticleDOI

WaveNet Vocoder with Limited Training Data for Voice Conversion.

TL;DR: Experimental results show that the WaveNet vocoders built using the proposed method outperform conventional STRAIGHT vocoder, and the system achieves an average naturalness MOS of 4.13 in VCC 2018, which is the highest among all submitted systems.
Journal ArticleDOI

Speech analysis for health: Current state-of-the-art and the increasing impact of deep learning

TL;DR: Current state-of-the-art approaches with speech-based health detection are reviewed, placing a particular focus on the impact of deep learning within this domain.
References
More filters
Journal ArticleDOI

Generative Adversarial Nets

TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.
Journal ArticleDOI

Reducing the Dimensionality of Data with Neural Networks

TL;DR: In this article, an effective way of initializing the weights that allows deep autoencoder networks to learn low-dimensional codes that work much better than principal components analysis as a tool to reduce the dimensionality of data is described.
Proceedings ArticleDOI

"Why Should I Trust You?": Explaining the Predictions of Any Classifier

TL;DR: In this article, the authors propose LIME, a method to explain models by presenting representative individual predictions and their explanations in a non-redundant way, framing the task as a submodular optimization problem.
Proceedings Article

Algorithms for Non-negative Matrix Factorization

TL;DR: Two different multiplicative algorithms for non-negative matrix factorization are analyzed and one algorithm can be shown to minimize the conventional least squares error while the other minimizes the generalized Kullback-Leibler divergence.
Proceedings Article

Deep Sparse Rectifier Neural Networks

TL;DR: This paper shows that rectifying neurons are an even better model of biological neurons and yield equal or better performance than hyperbolic tangent networks in spite of the hard non-linearity and non-dierentiabil ity.
Related Papers (5)