Q2. What is the purpose of this paper?
In this paper the authors train a neural network to learn a transformation function which can transform the speaker dependent parameters extracted from the speech of the source speaker to match with that of the target speaker.
Q3. How many transitions do formants have in continuous speech?
But in continuous speech, since the vocal tract changes its shape continuously, the extracted formants will have many transitions.
Q4. What was used to excite the formant synthesizer for voiced frames?
Fant’s model (Fant, 1986) was used to excite the formant synthesizer for voiced frames and random noise for the case of unvoiced frames.
Q5. What is the way to train a neural network?
The first three formants from these two corresponding steady voiced regions are used as a pair of input and output formant vectors to a neural network.
Q6. What is the method for transforming the vocal tract parameters?
prosodic modifications were incorporated in the excitation signal using PSOLA (Pitch Synchronous Overlap Add) technique and speech was synthesized using the transformed spectral parameters.
Q7. What are the characteristics of the source speaker?
In the present study suprasegmental features of the source speaker are retained, while using the transformed vocal tract parameters for synthesis.
Q8. What are the two problems to be addressed in the development of a speech recognition system?
They are (1) identification of speaker characteristics or acquisition of speaker dependent knowledge in the analysis phase and (2) incorporation of the speaker specific knowledge while synthesis during the transformation phase.