Improved Average-Voice-based Speech Synthesis Using Gender-Mixed Modeling and a Parameter Generation Algorithm Considering GV

Open Access

Improved Average-Voice-based Speech Synthesis Using Gender-Mixed Modeling and a Parameter Generation Algorithm Considering GV

Junichi Yamagishi, +6 more

- pp 125-130

Chats0

TLDR

This paper incorporates a high-quality speech vocoding method STRAIGHT and a parameter generation algorithm with global variance into the system for improving quality of synthetic speech and introduces a feature-space speaker adaptive training algorithm and a gender mixed modeling technique for conducting further normalization of the average voice model.

Abstract:

For constructing a speech synthesis system which can achieve diverse voices, we have been developing a speaker independent approach of HMM-based speech synthesis in which statistical average voice models are adapted to a target speaker using a small amount of speech data. In this paper, we incorporate a high-quality speech vocoding method STRAIGHT and a parameter generation algorithm with global variance into the system for improving quality of synthetic speech. Furthermore, we introduce a feature-space speaker adaptive training algorithm and a gender mixed modeling technique for conducting further normalization of the average voice model. We build an English text-to-speech system using these techniques and show the performance of the system.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Analysis of Speaker Adaptation Algorithms for HMM-Based Speech Synthesis and a Constrained SMAPLR Adaptation Algorithm

Junichi Yamagishi, +4 more

- 01 Jan 2009 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: A new adaptation algorithm is proposed called constrained structural maximum a posteriori linear regression (CSMAPLR) whose derivation is based on the knowledge obtained in this analysis and on the results of comparing several conventional adaptation algorithms.

...read moreread less

Speaker-Independent HMM-based Speech Synthesis System: HTS-2007 System for the Blizzard Challenge 2007

Junichi Yamagishi, +3 more

TL;DR: This paper describes an HMM-based speech synthesis system developed by the HTS working group for the Blizzard Challenge 2007, and incorporates new features in the conventional system which underpin a speaker-independent approach: speaker adaptation techniques; adaptive training for HSMMs; and full covariance modeling using the CSMAPLR transforms.

...read moreread less

Journal ArticleDOI

Personalized Spectral and Prosody Conversion Using Frame-Based Codeword Distribution and Adaptive CRF

Yi-Chin Huang, +2 more

- 01 Jan 2013 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: The experimental results showed that the proposed voice conversion method, based on distribution-based alignment and prosodic word boundary detection, can improve the speech quality and speaker similarity of the converted speech.

...read moreread less

Proceedings ArticleDOI

Performance evaluation of the speaker-independent HMM-based speech synthesis system “HTS 2007” for the Blizzard Challenge 2007

Junichi Yamagishi, +4 more

TL;DR: This paper describes a speaker-independent/adaptive HMM-based speech synthesis system developed for the Blizzard Challenge 2007, which employs speaker adaptation, feature-space adaptive training, mixed-gender modeling, and full-covariance modeling using CSMAPLR transforms, in addition to several other techniques that have proved effective in previous systems.

...read moreread less

Speaker-Independent HMM-based Speech Synthesis System

Junichi Yamagishi, +3 more

References

PDF

Open Access

More filters

Journal ArticleDOI

Maximum likelihood linear transformations for HMM-based speech recognition

Mark J. F. Gales

- 01 Apr 1998 -

Computer Speech & Language

TL;DR: The paper compares the two possible forms of model-based transforms: unconstrained, where any combination of mean and variance transform may be used, and constrained, which requires the variance transform to have the same form as the mean transform.

...read moreread less

Journal ArticleDOI

Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds

Hideki Kawahara, +2 more

- 01 Apr 1999 -

Speech Communication

TL;DR: A set of simple new procedures has been developed to enable the real-time manipulation of speech parameters by using pitch-adaptive spectral analysis combined with a surface reconstruction method in the time–frequency region.

...read moreread less

Proceedings ArticleDOI

A compact model for speaker-adaptive training

Tasos Anastasakos, +3 more

TL;DR: A novel approach to estimating the parameters of continuous density HMMs for speaker-independent (SI) continuous speech recognition that jointly annihilates the inter-speaker variation and estimates the HMM parameters of the SI acoustic models.

...read moreread less

Journal ArticleDOI

A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis

Tomoki Toda, +1 more

- 01 May 2007 -

The IEICE transactions on information an...

TL;DR: In this article, the authors proposed a parameter generation algorithm for an HMM-based speech synthesis technique. But the generated trajectory is often excessively smoothed due to the statistical processing. And the over-smoothing effect usually causes muffled sounds.

...read moreread less

An hmm-based speech synthesis system applied to english

Keiichi Tokuda, +2 more

TL;DR: This paper describes an HMM-based speech synthesis system (HTS), in which speech waveform is generated from HMMs themselves, and applies it to English speech synthesis using the general speech synthesis architecture of Festival.

...read moreread less

Improved Average-Voice-based Speech Synthesis Using Gender-Mixed Modeling and a Parameter Generation Algorithm Considering GV

Citations

Analysis of Speaker Adaptation Algorithms for HMM-Based Speech Synthesis and a Constrained SMAPLR Adaptation Algorithm

Speaker-Independent HMM-based Speech Synthesis System: HTS-2007 System for the Blizzard Challenge 2007

Personalized Spectral and Prosody Conversion Using Frame-Based Codeword Distribution and Adaptive CRF

Performance evaluation of the speaker-independent HMM-based speech synthesis system “HTS 2007” for the Blizzard Challenge 2007

Speaker-Independent HMM-based Speech Synthesis System

References

Maximum likelihood linear transformations for HMM-based speech recognition

Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds

A compact model for speaker-adaptive training

A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis

An hmm-based speech synthesis system applied to english

Related Papers (5)

Nonlinear Speech Modeling and Applications: advanced Lectures and Revised Selected Papers

Improving voice quality of HMM-based speech synthesis using voice conversion method

A hybrid text-to-speech based on sub-band approach

Towards Directly Modeling Raw Speech Signal for Speaker Verification Using CNNS

Improved unit selection speech synthesis method utilizing subjective evaluation results on synthetic speech