A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis

doi:10.1093/IETISY/E90-D.5.816

Journal ArticleDOI

A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis

Tomoki Toda, +1 more

- 01 May 2007 -

The IEICE transactions on information an...

- Vol. 90, Iss: 5, pp 816-824

Chats0

TLDR

In this article, the authors proposed a parameter generation algorithm for an HMM-based speech synthesis technique. But the generated trajectory is often excessively smoothed due to the statistical processing. And the over-smoothing effect usually causes muffled sounds.

Abstract:

This paper describes a novel parameter generation algorithm for an HMM-based speech synthesis technique. The conventional algorithm generates a parameter trajectory of static features that maximizes the likelihood of a given HMM for the parameter sequence consisting of the static and dynamic features under an explicit constraint between those two features. The generated trajectory is often excessively smoothed due to the statistical processing. Using the over-smoothed speech parameters usually causes muffled sounds. In order to alleviate the over-smoothing effect, we propose a generation algorithm considering not only the HMM likelihood maximized in the conventional algorithm but also a likelihood for a global variance (GV) of the generated trajectory. The latter likelihood works as a penalty for the over-smoothing, i.e., a reduction of the GV of the generated trajectory. The result of a perceptual evaluation demonstrates that the proposed algorithm causes considerably large improvements in the naturalness of synthetic speech.

Citations

PDF

Open Access

More filters

Posted Content

WaveNet: A Generative Model for Raw Audio

Aaron van den Oord, +8 more

- 12 Sep 2016 -

arXiv: Sound

TL;DR: This paper proposed WaveNet, a deep neural network for generating audio waveforms, which is fully probabilistic and autoregressive, with the predictive distribution for each audio sample conditioned on all previous ones.

...read moreread less

WaveNet: A Generative Model for Raw Audio

Aaron van den Oord, +8 more

TL;DR: WaveNet, a deep neural network for generating raw audio waveforms, is introduced; it is shown that it can be efficiently trained on data with tens of thousands of samples per second of audio, and can be employed as a discriminative model, returning promising results for phoneme recognition.

...read moreread less

Journal ArticleDOI

Statistical Parametric Speech Synthesis

Alan W. Black, +2 more

TL;DR: This paper gives a general overview of techniques in statistical parametric speech synthesis, and contrasts these techniques with the more conventional unit selection technology that has dominated speech synthesis over the last ten years.

...read moreread less

Journal ArticleDOI

Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory

Tomoki Toda, +2 more

- 01 Nov 2007 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: In this article, a Gaussian mixture model (GMM) of the joint probability density of source and target features is employed for performing spectral conversion between speakers, and a conversion method based on the maximum-likelihood estimation of a spectral parameter trajectory is proposed.

...read moreread less

Proceedings ArticleDOI

Statistical parametric speech synthesis using deep neural networks

Heiga Ze, +2 more

TL;DR: This paper examines an alternative scheme that is based on a deep neural network (DNN), the relationship between input texts and their acoustic realizations is modeled by a DNN, and experimental results show that the DNN- based systems outperformed the HMM-based systems with similar numbers of parameters.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds

Hideki Kawahara, +2 more

- 01 Apr 1999 -

Speech Communication

TL;DR: A set of simple new procedures has been developed to enable the real-time manipulation of speech parameters by using pitch-adaptive spectral analysis combined with a surface reconstruction method in the time–frequency region.

...read moreread less

Proceedings ArticleDOI

Unit selection in a concatenative speech synthesis system using a large speech database

Andrew Hunt, +1 more

TL;DR: In this paper, a state transition network is proposed to select and concatenate phonemes from a large speech database to produce a natural realisation of a target phoneme sequence predicted from text which is annotated with prosodic and phonetic context information.

...read moreread less

Proceedings ArticleDOI

Speech parameter generation algorithms for HMM-based speech synthesis

Keiichi Tokuda, +4 more

TL;DR: A speech parameter generation algorithm for HMM-based speech synthesis, in which the speech parameter sequence is generated from HMMs whose observation vector consists of a spectral parameter vector and its dynamic feature vectors, is derived.

...read moreread less

Journal ArticleDOI

Review of text‐to‐speech conversion for English

Dennis H. Klatt

- 01 Sep 1987 -

Journal of the Acoustical Society of Ame...

TL;DR: This review traces the early work on the development of speech synthesizers, discovery of minimal acoustic cues for phonetic contrasts, evolution of phonemic rule programs, incorporation of prosodic rules, and formulation of techniques for text analysis.

...read moreread less

Proceedings Article

Simultaneous Modeling of Spectrum, Pitch and Duration in HMM-Based Speech Synthesis

Takayoshi Yoshimura, +4 more

TL;DR: An HMM-based speech synthesis system in which spectrum, pitch and state duration are modeled simultaneously in a unified framework of HMM is described.

...read moreread less

Collapse

Related Papers (5)

Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds

Hideki Kawahara, +2 more

- 01 Apr 1999 -

Speech Communication