Implementation of vtln for statistical speech synthesis

Open Access

Implementation of vtln for statistical speech synthesis

Lakshmi Saheer, +3 more

- pp 224-229

Chats0

TLDR

The EM formulation helps to embed the feature normalization in the HMM training and enables the use of multiple (appropriate) warping factors for different state clusters of the same speaker.

Abstract:

Vocal tract length normalization is an important feature normalization technique that can be used to perform speaker adaptation when very little adaptation data is available. It was shown earlier that VTLN can be applied to statistical speech synthesis and was shown to give additive improvements to CMLLR. This paper presents an EM optimization for estimating more accurate warping factors. The EM formulation helps to embed the feature normalization in the HMM training. This helps in estimating the warping factors more efficiently and enables the use of multiple (appropriate) warping factors for different state clusters of the same speaker.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Current trends in multilingual speech processing

Hervé Bourlard, +12 more

- 22 Nov 2011 -

Sadhana-academy Proceedings in Engineeri...

TL;DR: Recent work at Idiap Research Institute in the domain of multilingual speech processing is described and some insights into emerging challenges for the research community are provided.

...read moreread less

Journal ArticleDOI

Vocal Tract Length Normalization for Statistical Parametric Speech Synthesis

Lakshmi Saheer, +2 more

- 01 Sep 2012 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: This paper presents an efficient implementation of VTLN using expectation maximization and addresses the key challenges faced in implementing V TLN for synthesis.

...read moreread less

Journal ArticleDOI

VTLN Using Analytically Determined Linear-Transformation on Conventional MFCC

D. R. Sanand, +1 more

- 01 Jul 2012 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: A method to analytically obtain a linear-transformation on the conventional Mel frequency cepstral coefficients (MFCC) features that corresponds to conventional vocal tract length normalization (VTLN)-warped MFCC features, thereby simplifying the VTLN processing.

...read moreread less

Proceedings ArticleDOI

Voice Morphing that improves TTS quality using an optimal dynamic frequency warping-and-weighting transform

Yannis Agiomyrgiannakis, +1 more

TL;DR: This paper presents a dynamic programming algorithm that simultaneously estimates the Optimal Frequency Warping and Weighting transform (ODFWW) and therefore needs no preprocessing step and fine-tuning while source/target-speaker data are matched using the Matching-Minimization algorithm.

...read moreread less

Proceedings ArticleDOI

Combining vocal tract length normalization with hierarchial linear transformations

Lakshmi Saheer, +3 more

TL;DR: A novel technique for propagating the gender information from the VTLN prior through constrained structural maximum a posteriori linear regression (CSMAPLR) adaptation is presented and experiments show that the resulting transformation has improved speech quality with better naturalness, intelligibility and improved speaker similarity.

...read moreread less

References

PDF

Open Access

Michael Pitz, +1 more

- 15 Aug 2005 -

IEEE Transactions on Speech and Audio Pr...

Frequency warping for VTLN and speaker adaptation by linear transformation of standard MFCC

Sankaran Panchapagesan, +1 more

- 01 Jan 2009 -

Computer Speech & Language

Implementation of vtln for statistical speech synthesis

Citations

Current trends in multilingual speech processing

Vocal Tract Length Normalization for Statistical Parametric Speech Synthesis

VTLN Using Analytically Determined Linear-Transformation on Conventional MFCC

Voice Morphing that improves TTS quality using an optimal dynamic frequency warping-and-weighting transform

Combining vocal tract length normalization with hierarchial linear transformations

References

Numerical recipes in C

Maximum likelihood linear transformations for HMM-based speech recognition

A maximum-likelihood approach to stochastic matching for robust speech recognition

Analysis of Speaker Adaptation Algorithms for HMM-Based Speech Synthesis and a Constrained SMAPLR Adaptation Algorithm

A frequency warping approach to speaker normalization

Related Papers (5)

Mel-generalized cepstral analysis - a unified approach to speech spectral estimation.

Statistical Parametric Speech Synthesis

A frequency warping approach to speaker normalization

Vocal tract normalization equals linear transformation in cepstral space

Frequency warping for VTLN and speaker adaptation by linear transformation of standard MFCC