A hybrid text-to-speech based on sub-band approach

doi:10.1109/APSIPA.2014.7041575

Proceedings ArticleDOI

A hybrid text-to-speech based on sub-band approach

Takuma Inoue, +2 more

- pp 1-4

Chats0

TLDR

This paper proposes a sub-band speech synthesis approach to develop high-quality Text-to-Speech (TTS) that combines the inherent benefits from both waveform- based speech synthesis and HMM-based speech synthesis.

Abstract:

This paper proposes a sub-band speech synthesis approach to develop high-quality Text-to-Speech (TTS). For the low-frequency band and high-frequency band, Hidden Markov Model (HMM)-based speech synthesis and waveform-based speech synthesis are used, respectively. Both speech synthesis methods are widely known to show good performance and to have benefits and shortcomings from different points of view. One motivation is to apply the right speech synthesis method in the right frequency band. Experiment results show that in terms of the smoothness the proposed approach shows better performance than waveform-based speech synthesis, and in terms of the clarity it shows better than HMM-based speech synthesis. Consequently, the proposed approach combines the inherent benefits from both waveform-based speech synthesis and HMM-based speech synthesis.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Sub-band text-to-speech combining sample-based spectrum with statistically generated spectrum.

Tadashi Inai, +5 more

TL;DR: A sub-band speech synthesis approach to develop a high quality Text-to-Speech (TTS) system with a sample-based spectrum selected from a phoneme database such that it is the most similar to spectrum generated by HMM-based speech synthesis.

...read moreread less

Proceedings ArticleDOI

Multi-stream spectral representation for statistical parametric speech synthesis

Kayoko Yanagisawa, +2 more

TL;DR: An approach in which the high frequency spectrum is modelled separately from the low frequency spectrum, which makes samples synthesised using the proposed approach sound less muffled and more natural.

...read moreread less

Posted Content

A Fully Time-domain Neural Model for Subband-based Speech Synthesizer.

Azam Rabiee, +1 more

- 12 Oct 2018 -

arXiv: Audio and Speech Processing

TL;DR: A fully time-domain neural model for subband-based text-to-speech (TTS) synthesizer, which is nearly end- to-end, and shows comparable quality as the fullband one with a slighter network architecture for each subband.

...read moreread less

References

PDF

Open Access

More filters

Journal ArticleDOI

Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds

Hideki Kawahara, +2 more

- 01 Apr 1999 -

Speech Communication

TL;DR: A set of simple new procedures has been developed to enable the real-time manipulation of speech parameters by using pitch-adaptive spectral analysis combined with a surface reconstruction method in the time–frequency region.

...read moreread less

Journal ArticleDOI

Statistical Parametric Speech Synthesis

Alan W. Black, +2 more

TL;DR: This paper gives a general overview of techniques in statistical parametric speech synthesis, and contrasts these techniques with the more conventional unit selection technology that has dominated speech synthesis over the last ten years.

...read moreread less

Proceedings Article

Simultaneous Modeling of Spectrum, Pitch and Duration in HMM-Based Speech Synthesis

Takayoshi Yoshimura, +4 more

TL;DR: An HMM-based speech synthesis system in which spectrum, pitch and state duration are modeled simultaneously in a unified framework of HMM is described.

...read moreread less

Journal ArticleDOI

A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis

Tomoki Toda, +1 more

- 01 May 2007 -

The IEICE transactions on information an...

TL;DR: In this article, the authors proposed a parameter generation algorithm for an HMM-based speech synthesis technique. But the generated trajectory is often excessively smoothed due to the statistical processing. And the over-smoothing effect usually causes muffled sounds.

...read moreread less

Proceedings ArticleDOI