scispace - formally typeset
Proceedings ArticleDOI

Can voice conversion be used to reduce non-native accents?

TLDR
A modification of the conventional training process for VC that allows it to perform as an AC transform is proposed, which pair source and target vectors based not on their ordering within a parallel corpus, but based on their linguistic similarity.
Abstract
Voice-conversion (VC) techniques aim to transform utterances from a source speaker to sound as if they had been produced by a target speaker. This includes not only organic properties (i.e., voice quality) but also linguistic cues (i.e., regional accents) of the target speaker. For this reason, VC is generally ill-suited for accent-conversion (AC) purposes, where the goal is to capture the voice quality of the target speaker but the regional accent of the source speaker. In this paper, we propose a modification of the conventional training process for VC that allows it to perform as an AC transform. The approach consists of pairing source and target vectors based not on their ordering within a parallel corpus, as is commonly done in VC, but based on their linguistic similarity. We validate the AC approach on a corpus containing native-accented and Spanish-accented utterances, and compare it against conventional VC through a series of perceptual listening tests. We also analyze the extent to which phonological differences between the two languages (Spanish and American English) help predict the relative performance of the two methods.

read more

Citations
More filters
Proceedings ArticleDOI

L2-ARCTIC: A Non-Native English Speech Corpus

TL;DR: L2-ARCTIC is introduced, a speech corpus of non-native English that is intended for research in voice conversion, accent conversion, and mispronunciation detection, and is publicly accessible at https://psi.tamu.edu/l2-arctic-corpus/.
Proceedings ArticleDOI

Accent Conversion Using Phonetic Posteriorgrams

TL;DR: An approach that matches frames between the two speakers based on their phonetic (rather than acoustic) similarity and improves the ratings of acoustic quality and native accent while retaining the voice identity of the non-native speaker is proposed.
Proceedings ArticleDOI

Foreign Accent Conversion by Synthesizing Speech from Phonetic Posteriorgrams.

TL;DR: This work presents a framework for FAC that eliminates the need for conventional vocoders and therefore the need to use the native speaker’s excitation, and produces speech that sounds more clear, natural, and similar to the non-native speaker compared with a baseline system.
Posted Content

Voice Conversion Using Sequence-to-Sequence Learning of Context Posterior Probabilities

TL;DR: In this article, a sequence-to-sequence learning of context posterior probabilities is proposed to convert speaker individuality such as phonetic property and speaking rate contained in the posterior probabilities, where the source posterior probabilities are directly used for predicting target speech parameters.
Journal ArticleDOI

NAUTILUS: A Versatile Voice Cloning System

TL;DR: In this paper, a novel speech synthesis system, called NAUTILUS, is proposed that can generate speech with a target voice either from a text input or a reference utterance of an arbitrary source speaker.
References
More filters
Journal ArticleDOI

Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones

TL;DR: In a common framework several algorithms that have been proposed recently, in order to improve the voice quality of a text-to-speech synthesis based on acoustical units concatenation based on pitch-synchronous overlap-add approach are reviewed.
Journal ArticleDOI

Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory

TL;DR: In this article, a Gaussian mixture model (GMM) of the joint probability density of source and target features is employed for performing spectral conversion between speakers, and a conversion method based on the maximum-likelihood estimation of a spectral parameter trajectory is proposed.
Proceedings ArticleDOI

Low-delay voice conversion based on maximum likelihood estimation of spectral parameter trajectory.

TL;DR: The 9th Annual Conference of the International Speech Communication Association, September 22-26, 2008, Brisbane, Australia as discussed by the authors, was held at the University of Queensland, Queensland, Australia.
Proceedings ArticleDOI

Voice conversion through vector quantization

TL;DR: The authors propose a new voice conversion technique through vector quantization and spectrum mapping which makes it possible to precisely control voice individuality.
Journal ArticleDOI

Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model

TL;DR: Experimental results demonstrate that the MLE- based mapping with dynamic features can significantly improve the mapping performance compared with the MMSE-based mapping in both the articulatory-to-acoustic mapping and the inversion mapping.
Related Papers (5)