INCA Algorithm for Training Voice Conversion Systems From Nonparallel Corpora

doi:10.1109/TASL.2009.2038669

Journal ArticleDOI

INCA Algorithm for Training Voice Conversion Systems From Nonparallel Corpora

Daniel Erro, +2 more

- 01 Jul 2010 -

IEEE Transactions on Audio, Speech, and ...

- Vol. 18, Iss: 5, pp 944-953

Chats0

TLDR

This paper proposes a new iterative alignment method that allows pairing phonetically equivalent acoustic vectors from nonparallel utterances from different speakers, even under cross-lingual conditions, and it does not require any phonetic or linguistic information.

Abstract:

Most existing voice conversion systems, particularly those based on Gaussian mixture models, require a set of paired acoustic vectors from the source and target speakers to learn their corresponding transformation function. The alignment of phonetically equivalent source and target vectors is not problematic when the training corpus is parallel, which means that both speakers utter the same training sentences. However, in some practical situations, such as cross-lingual voice conversion, it is not possible to obtain such parallel utterances. With an aim towards increasing the versatility of current voice conversion systems, this paper proposes a new iterative alignment method that allows pairing phonetically equivalent acoustic vectors from nonparallel utterances from different speakers, even under cross-lingual conditions. This method is based on existing voice conversion techniques, and it does not require any phonetic or linguistic information. Subjective evaluation experiments show that the performance of the resulting voice conversion system is very similar to that of an equivalent system trained on a parallel corpus.

INCA Algorithm for Training Voice Conversion Systems From Nonparallel Corpora

Citations

Phonetic posteriorgrams for many-to-one voice conversion without parallel data training

Voice Conversion from Unaligned Corpora using Variational Autoencoding Wasserstein Generative Adversarial Networks

Voice conversion from non-parallel corpora using variational auto-encoder

An overview of voice conversion systems

An Overview of Voice Conversion and Its Challenges: From Statistical Modeling to Deep Learning

References

Fundamentals of speech recognition

Voice conversion through vector quantization

Voice transformation using PSOLA technique

Harmonic plus noise models for speech, combined with statistical methods, for speech and speaker modification

High-resolution voice transformation

Related Papers (5)

Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory

Continuous probabilistic transform for voice conversion

Spectral voice conversion for text-to-speech synthesis

Phonetic posteriorgrams for many-to-one voice conversion without parallel data training

Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds