Journal ArticleDOI
INCA Algorithm for Training Voice Conversion Systems From Nonparallel Corpora
Reads0
Chats0
TLDR
This paper proposes a new iterative alignment method that allows pairing phonetically equivalent acoustic vectors from nonparallel utterances from different speakers, even under cross-lingual conditions, and it does not require any phonetic or linguistic information.Abstract:
Most existing voice conversion systems, particularly those based on Gaussian mixture models, require a set of paired acoustic vectors from the source and target speakers to learn their corresponding transformation function. The alignment of phonetically equivalent source and target vectors is not problematic when the training corpus is parallel, which means that both speakers utter the same training sentences. However, in some practical situations, such as cross-lingual voice conversion, it is not possible to obtain such parallel utterances. With an aim towards increasing the versatility of current voice conversion systems, this paper proposes a new iterative alignment method that allows pairing phonetically equivalent acoustic vectors from nonparallel utterances from different speakers, even under cross-lingual conditions. This method is based on existing voice conversion techniques, and it does not require any phonetic or linguistic information. Subjective evaluation experiments show that the performance of the resulting voice conversion system is very similar to that of an equivalent system trained on a parallel corpus.read more
Citations
More filters
Proceedings ArticleDOI
Phonetic posteriorgrams for many-to-one voice conversion without parallel data training
TL;DR: This paper proposes a novel approach to voice conversion with non-parallel training data to bridge between speakers by means of Phonetic PosteriorGrams obtained from a speaker-independent automatic speech recognition system.
Proceedings ArticleDOI
Voice Conversion from Unaligned Corpora using Variational Autoencoding Wasserstein Generative Adversarial Networks
TL;DR: In this article, a variational autoencoding Wasserstein generative adversarial network (VAW-GAN) is proposed for voice conversion from unaligned speech corpora.
Proceedings ArticleDOI
Voice conversion from non-parallel corpora using variational auto-encoder
TL;DR: In this article, a variational auto-encoder-decoder framework for spectral conversion with unaligned corpora is proposed. But it does not use parallel corpora or phonetic alignments to train a spectral conversion system.
Journal ArticleDOI
An overview of voice conversion systems
TL;DR: An overview of real-world applications of VC systems, extensively study existing systems proposed in the literature, and discuss remaining challenges are provided.
Journal ArticleDOI
An Overview of Voice Conversion and Its Challenges: From Statistical Modeling to Deep Learning
TL;DR: This article provides a comprehensive overview of the state-of-the-art of voice conversion techniques and their performance evaluation methods from the statistical approaches to deep learning, and discusses their promise and limitations.
References
More filters
Book
Fundamentals of speech recognition
TL;DR: This book presents a meta-modelling framework for speech recognition that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of manually modeling speech.
Proceedings ArticleDOI
Voice conversion through vector quantization
TL;DR: The authors propose a new voice conversion technique through vector quantization and spectrum mapping which makes it possible to precisely control voice individuality.
Journal ArticleDOI
Voice transformation using PSOLA technique
TL;DR: A new system for voice conversion is described that combines a PSOLA (Pitch Synchronous Overlap and Add)-derived synthesizer and a module for spectral transformation, which produces a satisfyingly natural “transformed” voice.
High-resolution voice transformation
Jan P. Santen,Alexander Kain +1 more
TL;DR: A new type of speech corpus that is especially suited to VT research and development by consisting of naturally time-aligned sentences is proposed, which results in high-quality speech samples that only differ in their segmental properties, the focus of transformation.