Open Access
Implementation of vtln for statistical speech synthesis
Lakshmi Saheer,John Dines,Philip N. Garner,Hui Liang +3 more
- pp 224-229
Reads0
Chats0
TLDR
The EM formulation helps to embed the feature normalization in the HMM training and enables the use of multiple (appropriate) warping factors for different state clusters of the same speaker.Abstract:
Vocal tract length normalization is an important feature normalization technique that can be used to perform speaker adaptation when very little adaptation data is available. It was shown earlier that VTLN can be applied to statistical speech synthesis and was shown to give additive improvements to CMLLR. This paper presents an EM optimization for estimating more accurate warping factors. The EM formulation helps to embed the feature normalization in the HMM training. This helps in estimating the warping factors more efficiently and enables the use of multiple (appropriate) warping factors for different state clusters of the same speaker.read more
Citations
More filters
Journal ArticleDOI
Current trends in multilingual speech processing
Hervé Bourlard,Hervé Bourlard,John Dines,Mathew Magimai-Doss,Philip N. Garner,David Imseng,David Imseng,Petr Motlicek,Hui Liang,Hui Liang,Lakshmi Saheer,Lakshmi Saheer,Fabio Valente +12 more
TL;DR: Recent work at Idiap Research Institute in the domain of multilingual speech processing is described and some insights into emerging challenges for the research community are provided.
Journal ArticleDOI
Vocal Tract Length Normalization for Statistical Parametric Speech Synthesis
TL;DR: This paper presents an efficient implementation of VTLN using expectation maximization and addresses the key challenges faced in implementing V TLN for synthesis.
Journal ArticleDOI
VTLN Using Analytically Determined Linear-Transformation on Conventional MFCC
D. R. Sanand,Srinivasan Umesh +1 more
TL;DR: A method to analytically obtain a linear-transformation on the conventional Mel frequency cepstral coefficients (MFCC) features that corresponds to conventional vocal tract length normalization (VTLN)-warped MFCC features, thereby simplifying the VTLN processing.
Proceedings ArticleDOI
Voice Morphing that improves TTS quality using an optimal dynamic frequency warping-and-weighting transform
TL;DR: This paper presents a dynamic programming algorithm that simultaneously estimates the Optimal Frequency Warping and Weighting transform (ODFWW) and therefore needs no preprocessing step and fine-tuning while source/target-speaker data are matched using the Matching-Minimization algorithm.
Proceedings ArticleDOI
Combining vocal tract length normalization with hierarchial linear transformations
TL;DR: A novel technique for propagating the gender information from the VTLN prior through constrained structural maximum a posteriori linear regression (CSMAPLR) adaptation is presented and experiments show that the resulting transformation has improved speech quality with better naturalness, intelligibility and improved speaker similarity.
References
More filters
Numerical recipes in C
TL;DR: The Diskette v 2.06, 3.5''[1.44M] for IBM PC, PS/2 and compatibles [DOS] Reference Record created on 2004-09-07, modified on 2016-08-08.
Journal ArticleDOI
Maximum likelihood linear transformations for HMM-based speech recognition
TL;DR: The paper compares the two possible forms of model-based transforms: unconstrained, where any combination of mean and variance transform may be used, and constrained, which requires the variance transform to have the same form as the mean transform.
Journal ArticleDOI
A maximum-likelihood approach to stochastic matching for robust speech recognition
Ananth Sankar,Chin-Hui Lee +1 more
TL;DR: A maximum-likelihood (ML) stochastic matching approach to decrease the acoustic mismatch between a test utterance and a given set of speech models so as to reduce the recognition performance degradation caused by distortions in the test utterances and/or the model set.
Journal ArticleDOI
Analysis of Speaker Adaptation Algorithms for HMM-Based Speech Synthesis and a Constrained SMAPLR Adaptation Algorithm
TL;DR: A new adaptation algorithm is proposed called constrained structural maximum a posteriori linear regression (CSMAPLR) whose derivation is based on the knowledge obtained in this analysis and on the results of comparing several conventional adaptation algorithms.
Journal ArticleDOI
A frequency warping approach to speaker normalization
L. Lee,Richard Rose +1 more
TL;DR: An efficient means for estimating a linear frequency Warping factor and a simple mechanism for implementing frequency warping by modifying the filterbank in mel-frequency cepstrum feature analysis are presented.