scispace - formally typeset
Journal ArticleDOI

Nonuniform speaker normalization using affine transformation

Reads0
Chats0
TLDR
A well-motivated nonuniform speaker normalization model that affinely relates the formant frequencies of speakers enunciating the same sound is proposed and the corresponding universal-warping function that is required for normalization is shown to have the same parametric form as the mel scale formula.
Abstract
In this paper, a well-motivated nonuniform speaker normalization model that affinely relates the formant frequencies of speakers enunciating the same sound is proposed. Using the proposed affine model, the corresponding universal-warping function that is required for normalization is shown to have the same parametric form as the mel scale formula. The parameters of this universal-warping function are estimated from the vowel formant data and are shown to be close to the commonly used formula for the mel scale. This shows an interesting connection between nonuniform speaker normalization and the psychoacoustics based mel scale. In addition, the affine model fits the vowel formant data better than commonly used ad hoc normalization models. This work is motivated by a desire to improve the performance of speaker-independent speech recognition systems, where speaker normalization is conventionally done by assuming a linear-scaling relationship between spectra of speakers. The proposed affine relation is extended to describe the relationship between spectra of speakers enunciating the same sound. On a telephone-based connected digit recognition task, the proposed model provides improved recognition performance over the linear-scaling model.

read more

Citations
More filters
Journal ArticleDOI

Studies on inter-speaker variability in speech and its application in automatic speech recognition

TL;DR: The universal-warping approach to speaker normalization is described which unifies many of the vowel normalization approaches and also shows the relation between speech production, perception and auditory processing.
Proceedings ArticleDOI

Non-linear frequency warping for VTLN using subglottal resonances and the third formant frequency

TL;DR: This paper proposes a non-linear frequency warping scheme for VTLN based on mapping the subglottal resonances and the third formant frequency of a given utterance to those of a reference speaker.
Posted Content

A Bayesian Approach to Estimation of Speaker Normalization Parameters.

TL;DR: Experimental results on recognition of vowels and Hindi phrases from a medium vocabulary indicate that the Bayesian method improves the performance by a considerable margin.
Proceedings ArticleDOI

A Bayesian approach to speaker normalization using vowel formant frequency

TL;DR: A novel Bayesian approach to estimate speaker normalization parameters is proposed and an affine model is used, which captures the variation in length of the vocal tract more effectively than the linear model used in literature.
Proceedings Article

Characterizing speaker variability using spectral envelopes of vowel sounds.

TL;DR: Using dynamic programming, this paper finds mapping relations between smoothed spectral envelopes of speakers enunciating the same sound and shows that these relations are not linear but have a consistent non-uniform behavior.
References
More filters
Journal ArticleDOI

Control Methods Used in a Study of the Vowels

TL;DR: Control methods used in the evaluation of effects of language and dialectal backgrounds and vocal and auditory characteristics of the individuals concerned in a vowel study program at Bell Telephone Laboratories are discussed.
Journal ArticleDOI

Acoustic characteristics of American English vowels

TL;DR: Analysis of the formant data shows numerous differences between the present data and those of PB, both in terms of average frequencies of F1 and F2, and the degree of overlap among adjacent vowels.
Journal ArticleDOI

A frequency warping approach to speaker normalization

TL;DR: An efficient means for estimating a linear frequency Warping factor and a simple mechanism for implementing frequency warping by modifying the filterbank in mel-frequency cepstrum feature analysis are presented.
Proceedings ArticleDOI

A parametric approach to vocal tract length normalization

E. Eide, +1 more
TL;DR: A parametric method of normalisation is described which counteracts the effect of varied vocal tract length and is shown to be effective across a wide range of recognition systems and paradigms, but is particularly helpful in the case of a small amount of training data.
Related Papers (5)