Proceedings ArticleDOI
A hybrid text-to-speech based on sub-band approach
Takuma Inoue,Sunao Hara,Masanobu Abe +2 more
- pp 1-4
Reads0
Chats0
TLDR
This paper proposes a sub-band speech synthesis approach to develop high-quality Text-to-Speech (TTS) that combines the inherent benefits from both waveform- based speech synthesis and HMM-based speech synthesis.Abstract:
This paper proposes a sub-band speech synthesis approach to develop high-quality Text-to-Speech (TTS). For the low-frequency band and high-frequency band, Hidden Markov Model (HMM)-based speech synthesis and waveform-based speech synthesis are used, respectively. Both speech synthesis methods are widely known to show good performance and to have benefits and shortcomings from different points of view. One motivation is to apply the right speech synthesis method in the right frequency band. Experiment results show that in terms of the smoothness the proposed approach shows better performance than waveform-based speech synthesis, and in terms of the clarity it shows better than HMM-based speech synthesis. Consequently, the proposed approach combines the inherent benefits from both waveform-based speech synthesis and HMM-based speech synthesis.read more
Citations
More filters
Proceedings ArticleDOI
Sub-band text-to-speech combining sample-based spectrum with statistically generated spectrum.
TL;DR: A sub-band speech synthesis approach to develop a high quality Text-to-Speech (TTS) system with a sample-based spectrum selected from a phoneme database such that it is the most similar to spectrum generated by HMM-based speech synthesis.
Proceedings ArticleDOI
Multi-stream spectral representation for statistical parametric speech synthesis
TL;DR: An approach in which the high frequency spectrum is modelled separately from the low frequency spectrum, which makes samples synthesised using the proposed approach sound less muffled and more natural.
Posted Content
A Fully Time-domain Neural Model for Subband-based Speech Synthesizer.
Azam Rabiee,Soo-Young Lee +1 more
TL;DR: A fully time-domain neural model for subband-based text-to-speech (TTS) synthesizer, which is nearly end- to-end, and shows comparable quality as the fullband one with a slighter network architecture for each subband.
References
More filters
Journal ArticleDOI
Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds
TL;DR: A set of simple new procedures has been developed to enable the real-time manipulation of speech parameters by using pitch-adaptive spectral analysis combined with a surface reconstruction method in the time–frequency region.
Journal ArticleDOI
Statistical Parametric Speech Synthesis
TL;DR: This paper gives a general overview of techniques in statistical parametric speech synthesis, and contrasts these techniques with the more conventional unit selection technology that has dominated speech synthesis over the last ten years.
Proceedings Article
Simultaneous Modeling of Spectrum, Pitch and Duration in HMM-Based Speech Synthesis
TL;DR: An HMM-based speech synthesis system in which spectrum, pitch and state duration are modeled simultaneously in a unified framework of HMM is described.
Journal ArticleDOI
A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis
Tomoki Toda,Keiichi Tokuda +1 more
TL;DR: In this article, the authors proposed a parameter generation algorithm for an HMM-based speech synthesis technique. But the generated trajectory is often excessively smoothed due to the statistical processing. And the over-smoothing effect usually causes muffled sounds.