scispace - formally typeset
Journal ArticleDOI

Vocal tract normalization equals linear transformation in cepstral space

Michael Pitz, +1 more
- 15 Aug 2005 - 
- Vol. 13, Iss: 5, pp 930-944
Reads0
Chats0
TLDR
In this paper, the Jacobian determinant of the transformation matrix is computed analytically for three typical warping functions and it is shown that the matrices are diagonal dominant and thus can be approximated by quindiagonal matrices.
Abstract
Vocal tract normalization (VTN) is a widely used speaker normalization technique which reduces the effect of different lengths of the human vocal tract and results in an improved recognition accuracy of automatic speech recognition systems. We show that VTN results in a linear transformation in the cepstral domain, which so far have been considered as independent approaches of speaker normalization. We are now able to compute the Jacobian determinant of the transformation matrix, which allows the normalization of the probability distributions used in speaker-normalization for automatic speech recognition. We show that VTN can be viewed as a special case of Maximum Likelihood Linear Regression (MLLR). Consequently, we can explain previous experimental results that improvements obtained by VTN and subsequent MLLR are not additive in some cases. For three typical warping functions the transformation matrix is calculated analytically and we show that the matrices are diagonal dominant and thus can be approximated by quindiagonal matrices.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Cross-lingual frame selection method for polyglot speech synthesis

TL;DR: Evaluation results show that good performance regarding similarity (speaker identity) and naturalness (speech quality) can be achieved with the proposed method.
Proceedings ArticleDOI

Para-Linguistic Information Represented as Distortion of the Acoustic Universal Structure In Speech

TL;DR: Results showed that the structural size can be interpreted as magnitude of articulatory efforts made in speech production.

Acoustic model and language model adaptation for a mobile dictation service

TL;DR: In this work, performance of the TKK speech recognition system has been evaluated on law-related speech recorded on a mobile phone with the Mobi-Dic client application and language model adaptation was not able to significantly improve performance.
Dissertation

Estudio y modelización acústica del habla espontanea en diálogos hombre/máquina y entre personas

TL;DR: In this article, a nuevo esquema de anotacion de fenomenos de habla espontanea, asi como su aplicacion a two bases of data, one ligada a una tarea and formada by dialogos hombre-maquina, and the other completamente generica and formado by dialogueos naturales entre personas.
Journal ArticleDOI

Speaker adaptation techniques for speech recognition using probabilistic models

TL;DR: A survey of previous research into speaker adaptation techniques focusing particularly on maximum a posteriori (MAP) parameter estimation, maximum likelihood linear regression (MLLR), and eigenvoices is presented.
References
More filters
Journal ArticleDOI

Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences

TL;DR: In this article, several parametric representations of the acoustic signal were compared with regard to word recognition performance in a syllable-oriented continuous speech recognition system, and the emphasis was on the ability to retain phonetically significant acoustic information in the face of syntactic and duration variations.
Journal ArticleDOI

Perceptual linear predictive (PLP) analysis of speech

TL;DR: A new technique for the analysis of speech, the perceptual linear predictive (PLP) technique, which uses three concepts from the psychophysics of hearing to derive an estimate of the auditory spectrum, and yields a low-dimensional representation of speech.
Journal ArticleDOI

Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models

TL;DR: An important feature of the method is that arbitrary adaptation data can be used—no special enrolment sentences are needed and that as more data is used the adaptation performance improves.
Journal ArticleDOI

Maximum likelihood linear transformations for HMM-based speech recognition

TL;DR: The paper compares the two possible forms of model-based transforms: unconstrained, where any combination of mean and variance transform may be used, and constrained, which requires the variance transform to have the same form as the mean transform.
BookDOI

Acoustical and environmental robustness in automatic speech recognition

Alex Acero
TL;DR: This dissertation describes a number of algorithms developed to increase the robustness of automatic speech recognition systems with respect to changes in the environment, including the SNR-Dependent Cepstral Normalization, (SDCN) and the Codeword-Dependent Cep stral normalization (CDCN).
Related Papers (5)