Text-to-speech synthesis with arbitrary speaker's voice from average voice

Open AccessProceedings Article

Text-to-speech synthesis with arbitrary speaker's voice from average voice

- Vol. 1, pp 345-348

TLDR

It is demonstrated that a few sentences uttered by a target speaker are sufficient to adapt not only voice characteristics but also prosodic features and synthetic speech generated from adapted models using only four sentences is very close to that from speaker dependent models trained using a large amount of speech data.

Abstract:

This paper describes a technique for synthesizing speech with any desired voice. The technique is based on an HMM-based text-to-speech (TTS) system and MLLR adaptation algorithm. To generate speech of an arbitrarily given target speaker, speaker-independent speech units, i.e., average voice models, is adapted to the target speaker using MLLR framework. In addition to spectrum and pitch adaptation, we derive an algorithm for adaptation of state duration. We demonstrate that a few sentences uttered by a target speaker are sufficient to adapt not only voice characteristics but also prosodic features. Synthetic speech generated from adapted models using only four sentences is very close to that from speaker dependent models trained using a large amount of speech data.

Citations

PDF

Open Access

More filters

PatentDOI

Method, apparatus and computer program providing a multi-speaker database for concatenative text-to-speech synthesis

Andrew Aaron, +6 more

- 07 Apr 2005 -

Journal of the Acoustical Society of Ame...

TL;DR: In this article, a method, apparatus and a computer program product to generate an audible speech word that corresponds to text is presented, which includes providing a text word and, in response to the text word, processing pre-recorded speech segments that are derived from a plurality of speakers to selectively concatenate together speech segments based on at least one cost function.

...read moreread less

Proceedings Article

Eigenvoices for HMM-based speech synthesis

Kengo Shichiri, +6 more

TL;DR: This paper proposes an eigenvoice technique for speech synthesis, and applies it to an HMM-based speech synthesis system in which spectrum and F0 are modeled by HMMs, and synthetic speech generated from HMMs themselves.

...read moreread less

Journal ArticleDOI

New approach to the polyglot speech generation by means of an HMM-based speaker adaptable synthesizer

Javier Latorre, +2 more

- 01 Oct 2006 -

Speech Communication

TL;DR: The performance obtained with the HMM-based polyglot synthesis method is better than that of methods based on phone mapping for both adaptation and synthesis, and can be used to create synthesizers for languages where no speech resources are available.

...read moreread less

Proceedings Article

Modeling of various speaking styles and emotions for HMM-based speech synthesis.

Junichi Yamagishi, +3 more

TL;DR: This paper presents an approach to realizing various emotional expressions and speaking styles in synthetic speech using HMM-based speech synthesis, and shows two methods for modeling speaking styles and emotions.

...read moreread less

Average-Voice-Based Speech Synthesis

Junichi Yamagishi

TL;DR: This thesis describes a novel speech synthesis framework that incorporates " speaker adaptive training " into the parameter estimation procedure of average voice model to reduce the influence of speaker dependence and proposes an HSMM-based model adaptation algorithm to simultaneously transform both state output and state duration distributions.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models

C. J. Leggetter, +1 more

- 01 Apr 1995 -

Computer Speech & Language

TL;DR: An important feature of the method is that arbitrary adaptation data can be used—no special enrolment sentences are needed and that as more data is used the adaptation performance improves.

...read moreread less

Journal ArticleDOI

Continuous probabilistic transform for voice conversion

Yannis Stylianou, +2 more

- 01 Mar 1998 -

IEEE Transactions on Speech and Audio Pr...

TL;DR: The design of a new methodology for representing the relationship between two sets of spectral envelopes and the proposed transform greatly improves the quality and naturalness of the converted speech signals compared with previous proposed conversion methods.

...read moreread less

Proceedings ArticleDOI

Tree-based state tying for high accuracy acoustic modelling

Steve Young, +2 more

TL;DR: This paper describes a method of creating a tied-state continuous speech recognition system using a phonetic decision tree, which is shown to lead to similar recognition performance to that obtained using an earlier data-driven approach but to have the additional advantage of providing a mapping for unseen triphones.

...read moreread less

Proceedings Article

Simultaneous Modeling of Spectrum, Pitch and Duration in HMM-Based Speech Synthesis

Takayoshi Yoshimura, +4 more

TL;DR: An HMM-based speech synthesis system in which spectrum, pitch and state duration are modeled simultaneously in a unified framework of HMM is described.

...read moreread less

Proceedings ArticleDOI

Spectral voice conversion for text-to-speech synthesis

Alexander Kain, +1 more

TL;DR: A new voice conversion algorithm that modifies a source speaker's speech to sound as if produced by a target speaker is presented and is found to perform more reliably for small training sets than a previous approach.

...read moreread less