scispace - formally typeset
Journal ArticleDOI

Voice transformation using PSOLA technique

H. Valbret, +2 more
- Vol. 11, Iss: 2, pp 175-187
Reads0
Chats0
TLDR
A new system for voice conversion is described that combines a PSOLA (Pitch Synchronous Overlap and Add)-derived synthesizer and a module for spectral transformation, which produces a satisfyingly natural “transformed” voice.
Abstract
In this contribution, a new system for voice conversion is described. The proposed architecture combines a PSOLA (Pitch Synchronous Overlap and Add)-derived synthesizer and a module for spectral transformation. The synthesizer based on the classical source-filter decomposition allows prosodic and spectral transformations to be performed independently. Prosodic modifications are applied on the excitation signal using the TD-PSOLA scheme; converted speech is then synthesized using the transformed spectral parameters. Two different approaches to derive spectral transformations, borrowed from the speech-recognition domain, are compared: Linear Multivariate Regression (LMR) and Dynamic Frequency Warping (DFW). Vector-quantization is carried out as a preliminary stage to render the spectral transformations dependent of the acoustical realization of sounds. A formal listening test shows that the synthesizer produces a satisfyingly natural “transformed” voice. LMR proves yet to allow a slightly better conversion than DFW. Still there is room for improvement in the spectral transformation stage.

read more

Citations
More filters
Proceedings ArticleDOI

Novel Amplitude Scaling method for bilinear frequency Warping-based Voice Conversion

TL;DR: A novel AS technique which linearly transfers the amplitude of the frequency-warped spectrum using the knowledge of a Gaussian Mixture Model (GMM)-based converted spectrum without adding any spurious peaks is proposed.
Proceedings Article

The sigma algorithm for estimation of reference-quality glottal closure instants from Electroglottograph signals

TL;DR: A new method for EGG-based GCI estimation named SIGMA, which is based upon the stationary wavelet transform, peak detection with a group delay function and Gaussian Mixture Modelling for discrimination between true and false GCI candidates is described.
Proceedings Article

Dynamic model selection for spectral voice conversion.

TL;DR: A new method for spectral voice conversion called Dynamic Model Selection (DMS), in which a set of potential best models with increasing complexity including a mixture of Gaussian and probabilistic principal component analyzers are considered during the conversion of a source speech signal into a target speech signal.
Proceedings ArticleDOI

Voice Morphing that improves TTS quality using an optimal dynamic frequency warping-and-weighting transform

TL;DR: This paper presents a dynamic programming algorithm that simultaneously estimates the Optimal Frequency Warping and Weighting transform (ODFWW) and therefore needs no preprocessing step and fine-tuning while source/target-speaker data are matched using the Matching-Minimization algorithm.
Journal ArticleDOI

Structured Sparse Spectral Transforms and Structural Measures for Voice Conversion

TL;DR: This work investigates a structured sparse spectral transform method for voice conversion (VC) to perform frequency warping and spectral shaping simultaneously on high-dimensional (D) STRAIGHT spectra and shows that embedding the ROS in spectral transforms offers flexibility in tradeoffs between spectral distortion and structure preservation.
References
More filters
Journal ArticleDOI

An Algorithm for Vector Quantizer Design

TL;DR: An efficient and intuitive algorithm is presented for the design of vector quantizers based either on a known probabilistic model or on a long training sequence of data.
Book

Linear Prediction of Speech

John E. Markel, +1 more
TL;DR: Speech Analysis and Synthesis Models: Basic Physical Principles, Speech Synthesis Structures, and Considerations in Choice of Analysis.
Journal ArticleDOI

Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones

TL;DR: In a common framework several algorithms that have been proposed recently, in order to improve the voice quality of a text-to-speech synthesis based on acoustical units concatenation based on pitch-synchronous overlap-add approach are reviewed.
Proceedings ArticleDOI

Voice conversion through vector quantization

TL;DR: The authors propose a new voice conversion technique through vector quantization and spectrum mapping which makes it possible to precisely control voice individuality.
Related Papers (5)