VTLN-based voice conversion
Summary (1 min read)
Introduction
- Vocal tract length normalization [1] tries to compensate for the effect of speaker dependent vocal tract lengths by warping the frequency axis of the amplitude spectrum.
- Subsequently, in Section 3, the authors apply this training procedure to conventional warping functions depending on only one parameter.
- All these considerations are discussed in Section 4.
- For unvoiced signal parts, pseudo periods are used.
- This class mapping is basis for an arbitrary statistical voice conversion parameter training.
2.1. Statistical Voice Conversion Parameter Training
- Let XI1 = X1, . . . , XI be the spectra belonging to source classkS and Y J1 those of the mapped clasŝkT (kS), the authors generally estimate the parameter vectorϑ by minimizing the sum of the euclidean distances between all target class spectra and transformed source class spectra.
- Here, the authors utilize the spectral conversion functionFϑ′ depending on the parameter vectorϑ′.
- Fϑ′(Xi, ω)|2 dω (1) In conjunction with a suitable smoothing technique, the authors often can neglect the variety of the classes’ observation spec- tra by introducing a mean approximation without an essential effect on the voice conversion parameters.
- In speech recognition, several VTLN warping functions have been proposed whose parameters usually are limited to one variable, the warping factorα.
- (5) 4. WARPING FUNCTIONS WITH SEVERAL PARAMETERS.
4.1. Piece-Wise Linear Warping with Several Segments
- One of the adversarial properties of the conventional warping functions with one parameter is that the whole frequency axis is always warped in the same direction, either to lower or to higher frequencies.
- These functions are not able to model spectral conversions where certain parts of the axis move to higher frequencies, and other parts to lower frequencies, or vice versa.
- Such functions would require at least one inflection point and would cross theω̃ = ω diagonal.
- Applying the VTLN technique to voice conversion, the authors want to use more exact models than in speech recognition, i. e. warping functions with several parameters, for a better description of the individual characteristics of the speakers’ vocal tracts.
- The correspondingω̃s are the parameters of the warping function.
5.1. Iterative Integrating Smoothing
- Basis of the voice conversion technique delineated in this paper is the automatic class segmentation and mapping described in Section 2.
- To avoid that the class-dependent voice conversion parameters jump at the class boundaries causing distinctly auconstant function over the time representing the mean pa-.
5.2. Deviation Penalty
- Viewing Figures 1 and 2, the authors note that for certain classes the obtained parameter values highly deviate from the mean.
- In Table 1, the authors show results for warping functions with one parameter (cf. Section 3).
Did you find this useful? Give us your feedback
Citations
433 citations
Cites methods from "VTLN-based voice conversion"
...As alternatives to data-driven statistical conversion methods, frequency warping based approaches to voice conversion were introduced in Toda et al. (2001), Sundermann and Ney (2003), Erro et al. (2010), Godoy et al. (2012) and Erro et al. (2013)....
[...]
...As alternatives to data-driven statistical conversion met hods, frequency warping based approaches to voice conversion wer e introduced in (Toda et al., 2001; Sundermann and Ney, 2003; Erro et al., 2010; Godoy et al., 2012; Erro et al., 2013)....
[...]
371 citations
187 citations
Cites methods from "VTLN-based voice conversion"
...In recent literature, the warping function is either realized by a single parameter, such as VTLN-based approaches [26], [134]–[137], or represented as a...
[...]
185 citations
Cites background from "VTLN-based voice conversion"
...Further improvements related to frequency-warping were presented in [8]–[10]....
[...]
179 citations
Cites background from "VTLN-based voice conversion"
...A large number of statistical parametric approaches have attempted to achieve a robust spectral mapping....
[...]
References
328 citations
229 citations
217 citations
103 citations
"VTLN-based voice conversion" refers background in this paper
...INTRODUCTION Vocal tract length normalization [1] tries to compensate for the effect of speaker dependent vocal tract lengths by warping the frequency axis of the amplitude spectrum....
[...]
78 citations