scispace - formally typeset
Open AccessProceedings Article

Text-to-speech synthesis with arbitrary speaker's voice from average voice

TLDR
It is demonstrated that a few sentences uttered by a target speaker are sufficient to adapt not only voice characteristics but also prosodic features and synthetic speech generated from adapted models using only four sentences is very close to that from speaker dependent models trained using a large amount of speech data.
Abstract
This paper describes a technique for synthesizing speech with any desired voice. The technique is based on an HMM-based text-to-speech (TTS) system and MLLR adaptation algorithm. To generate speech of an arbitrarily given target speaker, speaker-independent speech units, i.e., average voice models, is adapted to the target speaker using MLLR framework. In addition to spectrum and pitch adaptation, we derive an algorithm for adaptation of state duration. We demonstrate that a few sentences uttered by a target speaker are sufficient to adapt not only voice characteristics but also prosodic features. Synthetic speech generated from adapted models using only four sentences is very close to that from speaker dependent models trained using a large amount of speech data.

read more

Citations
More filters
PatentDOI

Method, apparatus and computer program providing a multi-speaker database for concatenative text-to-speech synthesis

TL;DR: In this article, a method, apparatus and a computer program product to generate an audible speech word that corresponds to text is presented, which includes providing a text word and, in response to the text word, processing pre-recorded speech segments that are derived from a plurality of speakers to selectively concatenate together speech segments based on at least one cost function.
Proceedings Article

Eigenvoices for HMM-based speech synthesis

TL;DR: This paper proposes an eigenvoice technique for speech synthesis, and applies it to an HMM-based speech synthesis system in which spectrum and F0 are modeled by HMMs, and synthetic speech generated from HMMs themselves.
Journal ArticleDOI

New approach to the polyglot speech generation by means of an HMM-based speaker adaptable synthesizer

TL;DR: The performance obtained with the HMM-based polyglot synthesis method is better than that of methods based on phone mapping for both adaptation and synthesis, and can be used to create synthesizers for languages where no speech resources are available.
Proceedings Article

Modeling of various speaking styles and emotions for HMM-based speech synthesis.

TL;DR: This paper presents an approach to realizing various emotional expressions and speaking styles in synthetic speech using HMM-based speech synthesis, and shows two methods for modeling speaking styles and emotions.

Average-Voice-Based Speech Synthesis

TL;DR: This thesis describes a novel speech synthesis framework that incorporates " speaker adaptive training " into the parameter estimation procedure of average voice model to reduce the influence of speaker dependence and proposes an HSMM-based model adaptation algorithm to simultaneously transform both state output and state duration distributions.
References
More filters
Journal ArticleDOI

Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models

TL;DR: An important feature of the method is that arbitrary adaptation data can be used—no special enrolment sentences are needed and that as more data is used the adaptation performance improves.
Journal ArticleDOI

Continuous probabilistic transform for voice conversion

TL;DR: The design of a new methodology for representing the relationship between two sets of spectral envelopes and the proposed transform greatly improves the quality and naturalness of the converted speech signals compared with previous proposed conversion methods.
Proceedings ArticleDOI

Tree-based state tying for high accuracy acoustic modelling

TL;DR: This paper describes a method of creating a tied-state continuous speech recognition system using a phonetic decision tree, which is shown to lead to similar recognition performance to that obtained using an earlier data-driven approach but to have the additional advantage of providing a mapping for unseen triphones.
Proceedings Article

Simultaneous Modeling of Spectrum, Pitch and Duration in HMM-Based Speech Synthesis

TL;DR: An HMM-based speech synthesis system in which spectrum, pitch and state duration are modeled simultaneously in a unified framework of HMM is described.
Proceedings ArticleDOI

Spectral voice conversion for text-to-speech synthesis

TL;DR: A new voice conversion algorithm that modifies a source speaker's speech to sound as if produced by a target speaker is presented and is found to perform more reliably for small training sets than a previous approach.
Related Papers (5)