scispace - formally typeset
Search or ask a question
Author

Luís Felipe Uebel

Bio: Luís Felipe Uebel is an academic researcher. The author has contributed to research in topics: Vocal tract. The author has an hindex of 1, co-authored 1 publications receiving 70 citations.
Topics: Vocal tract

Papers
More filters
Proceedings Article
16 Sep 1999
TL;DR: It was found that if multiple iterations of constrained MLLR is used there is no additional advantage to also using VTLN, and that as previously reported that the e ects of V TLN and unconstrained M LLR are largely additive.
Abstract: This paper investigates several di erent methods for performing vocal tract length normalisation (VTLN) which are either completely linear or piece-wise linear. Furthermore the combination of VTLN with either standard unconstrained maximum likelihood linear regression (MLLR) or constrained MLLR is considered. Results on the Switchboard corpus show that there is little di erence in performance between the di erent forms of VTLN, and that as previously reported that the e ects of VTLN and unconstrained MLLR are largely additive. However it was found that if multiple iterations of constrained MLLR is used there is no additional advantage to also using VTLN.

78 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: In this paper, the Jacobian determinant of the transformation matrix is computed analytically for three typical warping functions and it is shown that the matrices are diagonal dominant and thus can be approximated by quindiagonal matrices.
Abstract: Vocal tract normalization (VTN) is a widely used speaker normalization technique which reduces the effect of different lengths of the human vocal tract and results in an improved recognition accuracy of automatic speech recognition systems. We show that VTN results in a linear transformation in the cepstral domain, which so far have been considered as independent approaches of speaker normalization. We are now able to compute the Jacobian determinant of the transformation matrix, which allows the normalization of the probability distributions used in speaker-normalization for automatic speech recognition. We show that VTN can be viewed as a special case of Maximum Likelihood Linear Regression (MLLR). Consequently, we can explain previous experimental results that improvements obtained by VTN and subsequent MLLR are not additive in some cases. For three typical warping functions the transformation matrix is calculated analytically and we show that the matrices are diagonal dominant and thus can be approximated by quindiagonal matrices.

217 citations

30 Aug 2001
TL;DR: This paper reviews some popular speaker adaptation schemes that can be applied to continuous density hidden Markov models; linear transforms of model parameters such as maximum likelihood linear regression; and speaker clustering/speaker space methods such as eigenvoices.
Abstract: This paper reviews some popular speaker adaptation schemes that can be applied to continuous density hidden Markov models. These fall into three families based on MAP adaptation; linear transforms of model parameters such as maximum likelihood linear regression; and speaker clustering/speaker space methods such as eigenvoices. The strengths and weaknesses of each adaptation family are discussed along with extensions that have been proposed to improve the basic schemes which result in a number of hybrid approaches. A number of general extensions are discussed which include methods for improved unsupervised adaptation and discriminative adaptation. There is also a brief discussion of speaker normalisation and the relationship to model-based adaptation. The paper includes a brief discussion of other factors that directly interact with speaker adaptation of HMMs is included, such as adaptation to the acoustic environment and speaker-specific pronunciation dictionaries.

165 citations

Proceedings ArticleDOI
30 Nov 2003
TL;DR: After applying several conventional VTLN warping functions, the conventional piece-wise linear function is extended to several segments, allowing a more detailed warping of the source spectrum.
Abstract: In speech recognition, vocal tract length normalization (VTLN) is a well-studied technique for speaker normalization. As cross-language voice conversion aims at the transformation of a source speaker's voice into that of a target speaker using a different language, we want to investigate whether VTLN is an appropriate method to adapt the voice characteristics. After applying several conventional VTLN warping functions, we extend the conventional piece-wise linear function to several segments, allowing a more detailed warping of the source spectrum. Experiments on cross-language voice conversion are performed on three corpora of two languages and both speaker genders.

93 citations

Proceedings ArticleDOI
14 Dec 2003
TL;DR: After applying several conventional VTLN warping functions, the piecewise linear function is extended to several segments, allowing a more detailed warping of the source spectrum.
Abstract: In speech recognition, vocal tract length normalization (VTLN) is a well-studied technique for speaker normalization. As voice conversion aims at the transformation of a source speaker's voice into that of a target speaker, we want to investigate whether VTLN is an appropriate method to adapt the voice characteristics. After applying several conventional VTLN warping functions, we extend the piecewise linear function to several segments, allowing a more detailed warping of the source spectrum. Experiments on voice conversion are performed on three corpora of two languages and both speaker genders.

71 citations

Journal ArticleDOI
01 Aug 2000
TL;DR: The authors review the state of the art in core technology, large vocabulary continuous speech recognition, and highlight issues in moving toward applications, discussing system efficiency, portability across languages and tasks, and enhancing the system output by adding tags and nonlinguistic information.
Abstract: The past decade (1990-2000) has witnessed substantial advances in speech recognition technology, which when combined with the increase in computational power and storage capacity has resulted in a variety of commercial products already or soon to be on the market. The authors review the state of the art in core technology, large vocabulary continuous speech recognition, with a view toward highlighting recent advances. We then highlight issues in moving toward applications, discussing system efficiency, portability across languages and tasks, and enhancing the system output by adding tags and nonlinguistic information. Current performance in speech recognition and outstanding challenges for three classes of applications (dictation, audio indexation, and spoken language dialogue systems), are discussed.

64 citations