Nonuniform speaker normalization using affine transformation

doi:10.1121/1.2951597

Journal ArticleDOI

Nonuniform speaker normalization using affine transformation

S. V. Bharath Kumar, +1 more

- 18 Sep 2008 -

Journal of the Acoustical Society of Ame...

- Vol. 124, Iss: 3, pp 1727-1738

Chats0

TLDR

A well-motivated nonuniform speaker normalization model that affinely relates the formant frequencies of speakers enunciating the same sound is proposed and the corresponding universal-warping function that is required for normalization is shown to have the same parametric form as the mel scale formula.

Abstract:

In this paper, a well-motivated nonuniform speaker normalization model that affinely relates the formant frequencies of speakers enunciating the same sound is proposed. Using the proposed affine model, the corresponding universal-warping function that is required for normalization is shown to have the same parametric form as the mel scale formula. The parameters of this universal-warping function are estimated from the vowel formant data and are shown to be close to the commonly used formula for the mel scale. This shows an interesting connection between nonuniform speaker normalization and the psychoacoustics based mel scale. In addition, the affine model fits the vowel formant data better than commonly used ad hoc normalization models. This work is motivated by a desire to improve the performance of speaker-independent speech recognition systems, where speaker normalization is conventionally done by assuming a linear-scaling relationship between spectra of speakers. The proposed affine relation is extended to describe the relationship between spectra of speakers enunciating the same sound. On a telephone-based connected digit recognition task, the proposed model provides improved recognition performance over the linear-scaling model.

Nonuniform speaker normalization using affine transformation

Citations

Studies on inter-speaker variability in speech and its application in automatic speech recognition

Non-linear frequency warping for VTLN using subglottal resonances and the third formant frequency

A Bayesian Approach to Estimation of Speaker Normalization Parameters.

A Bayesian approach to speaker normalization using vowel formant frequency

Characterizing speaker variability using spectral envelopes of vowel sounds.

References

Control Methods Used in a Study of the Vowels

Acoustic characteristics of American English vowels

The relation of pitch to frequency: a revised scale.

A frequency warping approach to speaker normalization

A parametric approach to vocal tract length normalization

Related Papers (5)

Improved automatic speech recognition through speaker normalization

Unsupervised speaker adaptation for speaker independent acoustic to articulatory speech inversion.

Speaker dependency of spectral features and speech production cues for automatic emotion classification

Automatic Recognition of Connected Vowels Only Using Speaker-invariant Representation of Speech Dynamics

Perceptual speech processing and phonetic feature mapping for robust vowel recognition