VTLN-based voice conversion
Summary (1 min read)
Introduction
- Vocal tract length normalization [1] tries to compensate for the effect of speaker dependent vocal tract lengths by warping the frequency axis of the amplitude spectrum.
- Subsequently, in Section 3, the authors apply this training procedure to conventional warping functions depending on only one parameter.
- All these considerations are discussed in Section 4.
- For unvoiced signal parts, pseudo periods are used.
- This class mapping is basis for an arbitrary statistical voice conversion parameter training.
2.1. Statistical Voice Conversion Parameter Training
- Let XI1 = X1, . . . , XI be the spectra belonging to source classkS and Y J1 those of the mapped clasŝkT (kS), the authors generally estimate the parameter vectorϑ by minimizing the sum of the euclidean distances between all target class spectra and transformed source class spectra.
- Here, the authors utilize the spectral conversion functionFϑ′ depending on the parameter vectorϑ′.
- Fϑ′(Xi, ω)|2 dω (1) In conjunction with a suitable smoothing technique, the authors often can neglect the variety of the classes’ observation spec- tra by introducing a mean approximation without an essential effect on the voice conversion parameters.
- In speech recognition, several VTLN warping functions have been proposed whose parameters usually are limited to one variable, the warping factorα.
- (5) 4. WARPING FUNCTIONS WITH SEVERAL PARAMETERS.
4.1. Piece-Wise Linear Warping with Several Segments
- One of the adversarial properties of the conventional warping functions with one parameter is that the whole frequency axis is always warped in the same direction, either to lower or to higher frequencies.
- These functions are not able to model spectral conversions where certain parts of the axis move to higher frequencies, and other parts to lower frequencies, or vice versa.
- Such functions would require at least one inflection point and would cross theω̃ = ω diagonal.
- Applying the VTLN technique to voice conversion, the authors want to use more exact models than in speech recognition, i. e. warping functions with several parameters, for a better description of the individual characteristics of the speakers’ vocal tracts.
- The correspondingω̃s are the parameters of the warping function.
5.1. Iterative Integrating Smoothing
- Basis of the voice conversion technique delineated in this paper is the automatic class segmentation and mapping described in Section 2.
- To avoid that the class-dependent voice conversion parameters jump at the class boundaries causing distinctly auconstant function over the time representing the mean pa-.
5.2. Deviation Penalty
- Viewing Figures 1 and 2, the authors note that for certain classes the obtained parameter values highly deviate from the mean.
- In Table 1, the authors show results for warping functions with one parameter (cf. Section 3).
Did you find this useful? Give us your feedback
Citations
6 citations
6 citations
Cites methods from "VTLN-based voice conversion"
...Some traditional methods, such as VTLN [7] and GMM [8], are conducted on parallel data (i....
[...]
...Some traditional methods, such as VTLN [7] and GMM [8], are conducted on parallel data (i.e. the same content spoken by different speakers)....
[...]
6 citations
5 citations
Cites background or methods from "VTLN-based voice conversion"
...To respond to such concern, frequency warping methods are proposed [55, 87–90]....
[...]
...Different from the nonparametric frequency warping methods, several parametric frequency warping techniques were studied in [87] within the framework of vocal tract length normalization (VTLN)....
[...]
...A large number of statistical parametric and frequency warping approaches have attempted to achieve a robust spectral mapping [8,12,34,44,45,55,59,61,68,87–89,103] for voice conversion applications....
[...]
5 citations
Cites background from "VTLN-based voice conversion"
...Ano Autor Técnica de transformação Modelo de representação do sinal 1986 Shikano [231] VQ e Codebook LPC 1988 Abe [1] Codebook Parâmetros Espectrais Diversos 1990 Abe [2] Codebook e HMM LPC 1991 Valbret [284] Regressão Linear e DFW LPC 1995 Childers [32] Glottal origem Modeling LF Glotal e LPC 1995 Narendranath [169] ANN Frequências Formantes 1996 Rinscheid [209] Topological Features Map LPC 1996 Verhelst [287] VQ e Codebook LPC 1996 Lee [130] ANN LPC do Cepstrum e Resíduo do LPC 1997 Kim [118] HMM, VQ e Codebook LPC e Cepstrum 1998 Arriola [80] LRE Coe cientes e Resíduo LPC 1998 Arslan [8] HMM e STASC LSF 1998 Kain [107; 108] GMM Bark e LSF 1998 Stylianou [257] GMM HNM, Bark e MFCC 1999 Arslan [6] STASC LSF 2001 Kain [109] GMM LPC 2001 Zhang [314] MLLR MFCC e LSF 2001 Lopez [135] VQ e ANN Coe cientes e Resíduo LPC 2001 Mashimo [149] GMM MFCC e STRAIGHT 2001 Toda [270] GMM e DFW STRAIGHT 2002 Türk [275] STASC DWT 2002 Watanabe [292] ANN Envelope Espectral 2003 Sündermann ∗ [245; 246; 247] DFW VTLN 2003 Kumar∗ [125] GMM MFCC 2003 Türk [276] Codebook LSF 2003 Ye e Young [298] GMM e PWLT MEL 2003 Orphanidou [182] Codebook e GTM LPC 2003 Rentzos [206] HMM MFCC, LPC e LF Glotal 2004 Rentzos [207] HMM MFCC, LPC e LF Glotal 2004 Duxans [47] GMM, HMM e árvore de decisão LSF 2004 Orphanidou [183] RBFNN DWT 2004 Ye e Young [300] GMM e Codebook LSF 2004 Wilde [294] PPCA LSF 2004 Pribilova [195] Non-linear Frequency Scale Mapping HNM 2004 P tzinger [189] Weighted Linear Interpolation Coe cientes e Resíduo LPC 2005 Toda [268] Maximum Likelihood MFCC e STRAIGHT 2005 Zhang [310] GMM Resíduo LPC, Bark e LSF 2005 Kang [111] GMM e Codebook STRAIGHT e LSF 2006 Nurminen [175] GMM Bark e LSF 2006 Nurminen [176] GMM e K-means LSF 2006 Duxans ∗ [48] GMM, HMM e CART HNM, LPC e LSF 2006 Ye e Young [301] GMM LPC e LSF 2006 Sündermann ∗ [244] Unit Selection VTLN 2006 Rao [203] Transformação linear LPC 2006 Shuang ∗ [233] Frequency Warping Frequências Formantes 2007 Dutoit [45] Frame Selection MFCC e LPC 2007 Erro ∗ [49; 50] WFW HNM e LSF 2007 Fujii [60] Unit Selection LPC e MFCC 2007 Guido [79] ANN DWT 2007 Hanzlicek [83] GMM LSF 2008 Hanzlicek [84] Warpings Functions LSF 2008 Shuang [234] Frequency Warping Frequências Formantes 2008 Yue [305] GMM e HMM LPC e LSF 2008 Zhang ∗ [311; 313] Codebook STRAIGHT e LSF 2008 Desai [41; 42] ANN Coe cientes Cepstrais 2008 Pozo [194] GMM LF Glotal e LPC 2009 Popa [193] SVD e Asymmetric Bilinear Model LSF 2009 Uriz∗ [282] Frequency Warping e Frame Selection LSF 2009 Uriz∗ [283] K-Histogramas e Frame Selection LSF 2009 Zhang ∗ [312] Codebook STRAIGHT e LSF 2009 Yutani [306] MSD, GMM e HMM MFCC e F0 2010 Desai [41] ANN Coe cientes Cepstrais 2010 Godoy∗ [72] DFW e GMM Coe cientes Cepstrais 2010 Helander [89] Regressão Linear e GMM Coe cientes Cepstrais 2010 Lanchantin [127] DMS LSF 2012 Zorilua [318] DFW e GMM HNM e LPC 2012 Song [249] SVR e GMM STRAIGHT...
[...]
...Esta abordagem tem sido aplicada em síntese estatística de fala obtendo sentenças de fala independentes de falante, as quais podem ser readaptados a um falante especí co por um conjunto pequeno de dados, e tem sido usada em conversão de voz [246; 247; 248] para transformar o espectro de uma dada classe acústica de um falante para outro....
[...]
...Como visto na seção anterior, os segmentos pitch-sincronizados [50; 109; 246; 247] pressupõem que o tamanho dos segmentos é ajustado de tal forma que contenha um número inteiro de períodos de um sinal de voz quase-periódico....
[...]
...Já em 2003, Sündermann [245; 246; 247] reintroduz uma técnica proposta por Kamm et al....
[...]
References
328 citations
229 citations
217 citations
103 citations
"VTLN-based voice conversion" refers background in this paper
...INTRODUCTION Vocal tract length normalization [1] tries to compensate for the effect of speaker dependent vocal tract lengths by warping the frequency axis of the amplitude spectrum....
[...]
78 citations