Voice transformation using PSOLA technique

doi:10.1016/0167-6393(92)90012-V

Journal ArticleDOI

Voice transformation using PSOLA technique

H. Valbret, +2 more

- Vol. 11, Iss: 2, pp 175-187

Chats0

TLDR

A new system for voice conversion is described that combines a PSOLA (Pitch Synchronous Overlap and Add)-derived synthesizer and a module for spectral transformation, which produces a satisfyingly natural “transformed” voice.

Abstract:

In this contribution, a new system for voice conversion is described. The proposed architecture combines a PSOLA (Pitch Synchronous Overlap and Add)-derived synthesizer and a module for spectral transformation. The synthesizer based on the classical source-filter decomposition allows prosodic and spectral transformations to be performed independently. Prosodic modifications are applied on the excitation signal using the TD-PSOLA scheme; converted speech is then synthesized using the transformed spectral parameters. Two different approaches to derive spectral transformations, borrowed from the speech-recognition domain, are compared: Linear Multivariate Regression (LMR) and Dynamic Frequency Warping (DFW). Vector-quantization is carried out as a preliminary stage to render the spectral transformations dependent of the acoustical realization of sounds. A formal listening test shows that the synthesizer produces a satisfyingly natural “transformed” voice. LMR proves yet to allow a slightly better conversion than DFW. Still there is room for improvement in the spectral transformation stage.

Voice transformation using PSOLA technique

Citations

Emotional speech synthesis based on DNN and PAD emotional state model

A methodforsimultaneously extract thefundamental frequency ofaspeech signal and segment it

A voice conversion method mapping segmented frames with linear multivariate regression

Hilbert phase methods for glottal activity detection

A Study about the Users's Preferred Playing Speeds on Categorized Video Content using WSOLA method

References

An Algorithm for Vector Quantizer Design

Linear Prediction of Speech

Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones

Voice conversion through vector quantization

Speaker recognition

Related Papers (5)

Voice conversion through vector quantization

Continuous probabilistic transform for voice conversion

Spectral voice conversion for text-to-speech synthesis

Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory

Transformation of formants for voice conversion using artificial neural networks