scispace - formally typeset
Open Access

Implementation of vtln for statistical speech synthesis

Reads0
Chats0
TLDR
The EM formulation helps to embed the feature normalization in the HMM training and enables the use of multiple (appropriate) warping factors for different state clusters of the same speaker.
Abstract
Vocal tract length normalization is an important feature normalization technique that can be used to perform speaker adaptation when very little adaptation data is available. It was shown earlier that VTLN can be applied to statistical speech synthesis and was shown to give additive improvements to CMLLR. This paper presents an EM optimization for estimating more accurate warping factors. The EM formulation helps to embed the feature normalization in the HMM training. This helps in estimating the warping factors more efficiently and enables the use of multiple (appropriate) warping factors for different state clusters of the same speaker.

read more

Citations
More filters
Journal ArticleDOI

Current trends in multilingual speech processing

TL;DR: Recent work at Idiap Research Institute in the domain of multilingual speech processing is described and some insights into emerging challenges for the research community are provided.
Journal ArticleDOI

Vocal Tract Length Normalization for Statistical Parametric Speech Synthesis

TL;DR: This paper presents an efficient implementation of VTLN using expectation maximization and addresses the key challenges faced in implementing V TLN for synthesis.
Journal ArticleDOI

VTLN Using Analytically Determined Linear-Transformation on Conventional MFCC

TL;DR: A method to analytically obtain a linear-transformation on the conventional Mel frequency cepstral coefficients (MFCC) features that corresponds to conventional vocal tract length normalization (VTLN)-warped MFCC features, thereby simplifying the VTLN processing.
Proceedings ArticleDOI

Voice Morphing that improves TTS quality using an optimal dynamic frequency warping-and-weighting transform

TL;DR: This paper presents a dynamic programming algorithm that simultaneously estimates the Optimal Frequency Warping and Weighting transform (ODFWW) and therefore needs no preprocessing step and fine-tuning while source/target-speaker data are matched using the Matching-Minimization algorithm.
Proceedings ArticleDOI

Combining vocal tract length normalization with hierarchial linear transformations

TL;DR: A novel technique for propagating the gender information from the VTLN prior through constrained structural maximum a posteriori linear regression (CSMAPLR) adaptation is presented and experiments show that the resulting transformation has improved speech quality with better naturalness, intelligibility and improved speaker similarity.
References
More filters

Numerical recipes in C

TL;DR: The Diskette v 2.06, 3.5''[1.44M] for IBM PC, PS/2 and compatibles [DOS] Reference Record created on 2004-09-07, modified on 2016-08-08.
Journal ArticleDOI

Maximum likelihood linear transformations for HMM-based speech recognition

TL;DR: The paper compares the two possible forms of model-based transforms: unconstrained, where any combination of mean and variance transform may be used, and constrained, which requires the variance transform to have the same form as the mean transform.
Journal ArticleDOI

A maximum-likelihood approach to stochastic matching for robust speech recognition

TL;DR: A maximum-likelihood (ML) stochastic matching approach to decrease the acoustic mismatch between a test utterance and a given set of speech models so as to reduce the recognition performance degradation caused by distortions in the test utterances and/or the model set.
Journal ArticleDOI

Analysis of Speaker Adaptation Algorithms for HMM-Based Speech Synthesis and a Constrained SMAPLR Adaptation Algorithm

TL;DR: A new adaptation algorithm is proposed called constrained structural maximum a posteriori linear regression (CSMAPLR) whose derivation is based on the knowledge obtained in this analysis and on the results of comparing several conventional adaptation algorithms.
Journal ArticleDOI

A frequency warping approach to speaker normalization

TL;DR: An efficient means for estimating a linear frequency Warping factor and a simple mechanism for implementing frequency warping by modifying the filterbank in mel-frequency cepstrum feature analysis are presented.
Related Papers (5)