Can voice conversion be used to reduce non-native accents?

doi:10.1109/ICASSP.2014.6855134

Proceedings ArticleDOI

Can voice conversion be used to reduce non-native accents?

- pp 7879-7883

TLDR

A modification of the conventional training process for VC that allows it to perform as an AC transform is proposed, which pair source and target vectors based not on their ordering within a parallel corpus, but based on their linguistic similarity.

Abstract:

Voice-conversion (VC) techniques aim to transform utterances from a source speaker to sound as if they had been produced by a target speaker. This includes not only organic properties (i.e., voice quality) but also linguistic cues (i.e., regional accents) of the target speaker. For this reason, VC is generally ill-suited for accent-conversion (AC) purposes, where the goal is to capture the voice quality of the target speaker but the regional accent of the source speaker. In this paper, we propose a modification of the conventional training process for VC that allows it to perform as an AC transform. The approach consists of pairing source and target vectors based not on their ordering within a parallel corpus, as is commonly done in VC, but based on their linguistic similarity. We validate the AC approach on a corpus containing native-accented and Spanish-accented utterances, and compare it against conventional VC through a series of perceptual listening tests. We also analyze the extent to which phonological differences between the two languages (Spanish and American English) help predict the relative performance of the two methods.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

L2-ARCTIC: A Non-Native English Speech Corpus

Guanlong Zhao, +6 more

TL;DR: L2-ARCTIC is introduced, a speech corpus of non-native English that is intended for research in voice conversion, accent conversion, and mispronunciation detection, and is publicly accessible at https://psi.tamu.edu/l2-arctic-corpus/.

...read moreread less

Proceedings ArticleDOI

Accent Conversion Using Phonetic Posteriorgrams

Guanlong Zhao, +4 more

TL;DR: An approach that matches frames between the two speakers based on their phonetic (rather than acoustic) similarity and improves the ratings of acoustic quality and native accent while retaining the voice identity of the non-native speaker is proposed.

...read moreread less

Proceedings ArticleDOI

Foreign Accent Conversion by Synthesizing Speech from Phonetic Posteriorgrams.

Guanlong Zhao, +2 more

TL;DR: This work presents a framework for FAC that eliminates the need for conventional vocoders and therefore the need to use the native speaker’s excitation, and produces speech that sounds more clear, natural, and similar to the non-native speaker compared with a baseline system.

...read moreread less

Posted Content

Voice Conversion Using Sequence-to-Sequence Learning of Context Posterior Probabilities

Hiroyuki Miyoshi, +3 more

- 10 Apr 2017 -

arXiv: Sound

TL;DR: In this article, a sequence-to-sequence learning of context posterior probabilities is proposed to convert speaker individuality such as phonetic property and speaking rate contained in the posterior probabilities, where the source posterior probabilities are directly used for predicting target speech parameters.

...read moreread less

Journal ArticleDOI

NAUTILUS: A Versatile Voice Cloning System

Hieu-Thi Luong, +1 more

- 30 Oct 2020 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: In this paper, a novel speech synthesis system, called NAUTILUS, is proposed that can generate speech with a target voice either from a text input or a reference utterance of an arbitrary source speaker.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones

Eric Moulines, +1 more

- 01 Dec 1990 -

Speech Communication

TL;DR: In a common framework several algorithms that have been proposed recently, in order to improve the voice quality of a text-to-speech synthesis based on acoustical units concatenation based on pitch-synchronous overlap-add approach are reviewed.

...read moreread less

Journal ArticleDOI

Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory

Tomoki Toda, +2 more

- 01 Nov 2007 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: In this article, a Gaussian mixture model (GMM) of the joint probability density of source and target features is employed for performing spectral conversion between speakers, and a conversion method based on the maximum-likelihood estimation of a spectral parameter trajectory is proposed.

...read moreread less

Proceedings ArticleDOI

Low-delay voice conversion based on maximum likelihood estimation of spectral parameter trajectory.

Takashi Muramatsu, +4 more

TL;DR: The 9th Annual Conference of the International Speech Communication Association, September 22-26, 2008, Brisbane, Australia as discussed by the authors, was held at the University of Queensland, Queensland, Australia.

...read moreread less

Proceedings ArticleDOI

Voice conversion through vector quantization

Masanobu Abe, +3 more

TL;DR: The authors propose a new voice conversion technique through vector quantization and spectrum mapping which makes it possible to precisely control voice individuality.

...read moreread less

Journal ArticleDOI

Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model

Tomoki Toda, +2 more

- 01 Mar 2008 -

Speech Communication

TL;DR: Experimental results demonstrate that the MLE- based mapping with dynamic features can significantly improve the mapping performance compared with the MMSE-based mapping in both the articulatory-to-acoustic mapping and the inversion mapping.

...read moreread less

Related Papers (5)

Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory

Tomoki Toda, +2 more

- 01 Nov 2007 -

IEEE Transactions on Audio, Speech, and ...

Phonetic posteriorgrams for many-to-one voice conversion without parallel data training

Lifa Sun, +4 more

Librispeech: An ASR corpus based on public domain audio books

Vassil Panayotov, +3 more

Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds

Hideki Kawahara, +2 more

- 01 Apr 1999 -

Speech Communication

Can voice conversion be used to reduce non-native accents?

Citations

L2-ARCTIC: A Non-Native English Speech Corpus

Accent Conversion Using Phonetic Posteriorgrams

Foreign Accent Conversion by Synthesizing Speech from Phonetic Posteriorgrams.

Voice Conversion Using Sequence-to-Sequence Learning of Context Posterior Probabilities

NAUTILUS: A Versatile Voice Cloning System

References

Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones

Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory

Low-delay voice conversion based on maximum likelihood estimation of spectral parameter trajectory.

Voice conversion through vector quantization

Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model

Related Papers (5)

Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory

Phonetic posteriorgrams for many-to-one voice conversion without parallel data training

Librispeech: An ASR corpus based on public domain audio books

Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds

Tacotron: Towards End-to-End Speech Synthesis