Rapid speaker adaptation in eigenvoice space

doi:10.1109/89.876308

Journal ArticleDOI

Rapid speaker adaptation in eigenvoice space

Roland Kuhn, +3 more

- 01 Nov 2000 -

IEEE Transactions on Speech and Audio Pr...

- Vol. 8, Iss: 6, pp 695-707

Chats0

TLDR

A new model-based speaker adaptation algorithm called the eigenvoice approach, which constrains the adapted model to be a linear combination of a small number of basis vectors obtained offline from a set of reference speakers, and thus greatly reduces the number of free parameters to be estimated from adaptation data.

Abstract:

This paper describes a new model-based speaker adaptation algorithm called the eigenvoice approach. The approach constrains the adapted model to be a linear combination of a small number of basis vectors obtained offline from a set of reference speakers, and thus greatly reduces the number of free parameters to be estimated from adaptation data. These "eigenvoice" basis vectors are orthogonal to each other and guaranteed to represent the most important components of variation between the reference speakers. Experimental results for a small-vocabulary task (letter recognition) given in the paper show that the approach yields major improvements in performance for tiny amounts of adaptation data. For instance, we obtained 16% relative improvement in error rate with one letter of supervised adaptation data, and 26% relative improvement with four letters of supervised adaptation data. After a comparison of the eigenvoice approach with other speaker adaptation algorithms, the paper concludes with a discussion of future work.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Statistical Parametric Speech Synthesis

Alan W. Black, +2 more

TL;DR: This paper gives a general overview of techniques in statistical parametric speech synthesis, and contrasts these techniques with the more conventional unit selection technology that has dominated speech synthesis over the last ten years.

...read moreread less

Journal ArticleDOI

Speaker Recognition by Machines and Humans: A tutorial review

John H. L. Hansen, +1 more

- 14 Oct 2015 -

IEEE Signal Processing Magazine

TL;DR: A comparative study of human versus machine speaker recognition is concluded, with an emphasis on prominent speaker-modeling techniques that have emerged in the last decade for automatic systems.

...read moreread less

Journal ArticleDOI

An overview of noise-robust automatic speech recognition

Jinyu Li, +3 more

- 01 Apr 2014 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: A thorough overview of modern noise-robust techniques for ASR developed over the past 30 years is provided and methods that are proven to be successful and that are likely to sustain or expand their future applicability are emphasized.

...read moreread less

Journal ArticleDOI

Eigenvoice modeling with sparse training data

Patrick Kenny, +2 more

- 18 Apr 2005 -

IEEE Transactions on Speech and Audio Pr...

TL;DR: This work derives an exact solution to the problem of maximum likelihood estimation of the supervector covariance matrix used in extended MAP (or EMAP) speaker adaptation and shows how it can be regarded as a new method of eigenvoice estimation.

...read moreread less

Journal ArticleDOI

Speech Synthesis Based on Hidden Markov Models

Keiichi Tokuda, +5 more

TL;DR: This paper gives a general overview of hidden Markov model (HMM)-based speech synthesis, which has recently been demonstrated to be very effective in synthesizing speech.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

Principal Component Analysis

Ian T. Jolliffe

TL;DR: In this article, the authors present a graphical representation of data using Principal Component Analysis (PCA) for time series and other non-independent data, as well as a generalization and adaptation of principal component analysis.

...read moreread less

Journal ArticleDOI

Eigenfaces for recognition

Matthew Turk, +1 more

- 01 Jan 1991 -

Journal of Cognitive Neuroscience

TL;DR: A near-real-time computer system that can locate and track a subject's head, and then recognize the person by comparing characteristics of the face to those of known individuals, and that is easy to implement using a neural network architecture.

...read moreread less

Journal ArticleDOI

Application of the Karhunen-Loeve procedure for the characterization of human faces

Michael Kirby, +1 more

- 01 Jan 1990 -

IEEE Transactions on Pattern Analysis an...

TL;DR: The use of natural symmetries (mirror images) in a well-defined family of patterns (human faces) is discussed within the framework of the Karhunen-Loeve expansion, which results in an extension of the data and imposes even and odd symmetry on the eigenfunctions of the covariance matrix.

...read moreread less

Journal ArticleDOI

Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models

C. J. Leggetter, +1 more

- 01 Apr 1995 -

Computer Speech & Language

TL;DR: An important feature of the method is that arbitrary adaptation data can be used—no special enrolment sentences are needed and that as more data is used the adaptation performance improves.

...read moreread less

Journal ArticleDOI

Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains

Jean-Luc Gauvain, +1 more

- 01 Apr 1994 -

IEEE Transactions on Speech and Audio Pr...

TL;DR: A framework for maximum a posteriori (MAP) estimation of hidden Markov models (HMM) is presented, and Bayesian learning is shown to serve as a unified approach for a wide range of speech recognition applications.

...read moreread less