What contributions have the authors mentioned in the paper "Transformation of formants for voice conversion using artificial neural networks" ?

In this paper the authors propose a scheme for developing a voice conversion system that converts the speech signal uttered by a source speaker to a speech signal having the voice characteristics of the target speaker. The scheme consists of a formant analysis phase, followed by a learning phase in which the implicit formant transformation is captured by a neural network.

What is the purpose of this paper?

In this paper the authors train a neural network to learn a transformation function which can transform the speaker dependent parameters extracted from the speech of the source speaker to match with that of the target speaker.

How many transitions do formants have in continuous speech?

But in continuous speech, since the vocal tract changes its shape continuously, the extracted formants will have many transitions.

What was used to excite the formant synthesizer for voiced frames?

Fant’s model (Fant, 1986) was used to excite the formant synthesizer for voiced frames and random noise for the case of unvoiced frames.

What is the way to train a neural network?

The first three formants from these two corresponding steady voiced regions are used as a pair of input and output formant vectors to a neural network.

What is the method for transforming the vocal tract parameters?

prosodic modifications were incorporated in the excitation signal using PSOLA (Pitch Synchronous Overlap Add) technique and speech was synthesized using the transformed spectral parameters.

What are the characteristics of the source speaker?

In the present study suprasegmental features of the source speaker are retained, while using the transformed vocal tract parameters for synthesis.

What are the two problems to be addressed in the development of a speech recognition system?

They are (1) identification of speaker characteristics or acquisition of speaker dependent knowledge in the analysis phase and (2) incorporation of the speaker specific knowledge while synthesis during the transformation phase.

(Open Access) Transformation of formants for voice conversion using artificial neural networks (1995) | M. Narendranath

Journal ArticleDOI

Analysis and Synthesis of Formant Spaces of British, Australian, and American Accents

Qin Yan, +3 more

- 01 Feb 2007 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: Comparative analysis of the formant spaces of three major accents of the English language, namely, British Received Pronunciation, General American, and Broad Australian, are modeled and compared and indicates that these accents are partly conveyed by the differences of theformants of vowels.

...read moreread less

Posted Content

An Overview of Voice Conversion and its Challenges: From Statistical Modeling to Deep Learning

Berrak Sisman, +3 more

- 09 Aug 2020 -

arXiv: Audio and Speech Processing

TL;DR: A comprehensive overview of the state-of-the-art of voice conversion techniques and their performance evaluation methods from the statistical approaches to deep learning, and discuss their promise and limitations can be found in this paper.

...read moreread less

Proceedings ArticleDOI

Auditory-Based Wavelet Packet Filterbank for Speech Recognition Using Neural Network

R Gandhiraj, +1 more

TL;DR: Two quantitative models for signal processing in auditory system (i) Gamma Tone Filter Bank (GTFB) and (ii) Wavelet Packet (WP) as front- ends for robust speech recognition are described.

...read moreread less

Proceedings ArticleDOI

Voice Conversion by Prosody and Vocal Tract Modification

K. Sreenivasa Rao, +1 more

TL;DR: The proposed methods modify the shape of the vocal tract system and the characteristics of the prosody according to the desired requirement by manipulating instants of significant excitation from the linear prediction residual of the speech signals.

...read moreread less

Posted Content

Introduction to Voice Presentation Attack Detection and Recent Advances

Sahidullah, +6 more

- 04 Jan 2019 -

arXiv: Sound

TL;DR: In the last few years significant progress has been made in the field of presentation attack detection (PAD) for automatic speaker recognition (ASV) for ASV, including the development of new speech corpora, standard evaluation protocols and advancements in front-end feature extraction and back-end classifiers as discussed by the authors.

...read moreread less

Transformation of formants for voice conversion using artificial neural networks

Figures

Citations

Analysis and Synthesis of Formant Spaces of British, Australian, and American Accents

An Overview of Voice Conversion and its Challenges: From Statistical Modeling to Deep Learning

Auditory-Based Wavelet Packet Filterbank for Speech Recognition Using Neural Network

Voice Conversion by Prosody and Vocal Tract Modification

Introduction to Voice Presentation Attack Detection and Recent Advances

References

Multilayer feedforward networks are universal approximators

Fundamentals of speech recognition

Analysis, synthesis, and perception of voice quality variations among female and male talkers

Speech analysis and synthesis by linear prediction of the speech wave.

Voice conversion through vector quantization

Related Papers (5)

Continuous probabilistic transform for voice conversion

Voice conversion through vector quantization

Voice transformation using PSOLA technique

Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory

Spectral voice conversion for text-to-speech synthesis

Frequently Asked Questions (8)

Q1. What contributions have the authors mentioned in the paper "Transformation of formants for voice conversion using artificial neural networks" ?

Q2. What is the purpose of this paper?

Q3. How many transitions do formants have in continuous speech?

Q4. What was used to excite the formant synthesizer for voiced frames?

Q5. What is the way to train a neural network?

Q6. What is the method for transforming the vocal tract parameters?

Q7. What are the characteristics of the source speaker?

Q8. What are the two problems to be addressed in the development of a speech recognition system?

Trending Questions (1)