scispace - formally typeset
Journal ArticleDOI

A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis

Reads0
Chats0
TLDR
In this article, the authors proposed a parameter generation algorithm for an HMM-based speech synthesis technique. But the generated trajectory is often excessively smoothed due to the statistical processing. And the over-smoothing effect usually causes muffled sounds.
Abstract
This paper describes a novel parameter generation algorithm for an HMM-based speech synthesis technique. The conventional algorithm generates a parameter trajectory of static features that maximizes the likelihood of a given HMM for the parameter sequence consisting of the static and dynamic features under an explicit constraint between those two features. The generated trajectory is often excessively smoothed due to the statistical processing. Using the over-smoothed speech parameters usually causes muffled sounds. In order to alleviate the over-smoothing effect, we propose a generation algorithm considering not only the HMM likelihood maximized in the conventional algorithm but also a likelihood for a global variance (GV) of the generated trajectory. The latter likelihood works as a penalty for the over-smoothing, i.e., a reduction of the GV of the generated trajectory. The result of a perceptual evaluation demonstrates that the proposed algorithm causes considerably large improvements in the naturalness of synthetic speech.

read more

Citations
More filters
Posted Content

WaveNet: A Generative Model for Raw Audio

TL;DR: This paper proposed WaveNet, a deep neural network for generating audio waveforms, which is fully probabilistic and autoregressive, with the predictive distribution for each audio sample conditioned on all previous ones.

WaveNet: A Generative Model for Raw Audio

TL;DR: WaveNet, a deep neural network for generating raw audio waveforms, is introduced; it is shown that it can be efficiently trained on data with tens of thousands of samples per second of audio, and can be employed as a discriminative model, returning promising results for phoneme recognition.
Journal ArticleDOI

Statistical Parametric Speech Synthesis

TL;DR: This paper gives a general overview of techniques in statistical parametric speech synthesis, and contrasts these techniques with the more conventional unit selection technology that has dominated speech synthesis over the last ten years.
Journal ArticleDOI

Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory

TL;DR: In this article, a Gaussian mixture model (GMM) of the joint probability density of source and target features is employed for performing spectral conversion between speakers, and a conversion method based on the maximum-likelihood estimation of a spectral parameter trajectory is proposed.
Proceedings ArticleDOI

Statistical parametric speech synthesis using deep neural networks

TL;DR: This paper examines an alternative scheme that is based on a deep neural network (DNN), the relationship between input texts and their acoustic realizations is modeled by a DNN, and experimental results show that the DNN- based systems outperformed the HMM-based systems with similar numbers of parameters.
References
More filters
Journal ArticleDOI

Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds

TL;DR: A set of simple new procedures has been developed to enable the real-time manipulation of speech parameters by using pitch-adaptive spectral analysis combined with a surface reconstruction method in the time–frequency region.
Proceedings ArticleDOI

Unit selection in a concatenative speech synthesis system using a large speech database

TL;DR: In this paper, a state transition network is proposed to select and concatenate phonemes from a large speech database to produce a natural realisation of a target phoneme sequence predicted from text which is annotated with prosodic and phonetic context information.
Proceedings ArticleDOI

Speech parameter generation algorithms for HMM-based speech synthesis

TL;DR: A speech parameter generation algorithm for HMM-based speech synthesis, in which the speech parameter sequence is generated from HMMs whose observation vector consists of a spectral parameter vector and its dynamic feature vectors, is derived.
Journal ArticleDOI

Review of text‐to‐speech conversion for English

TL;DR: This review traces the early work on the development of speech synthesizers, discovery of minimal acoustic cues for phonetic contrasts, evolution of phonemic rule programs, incorporation of prosodic rules, and formulation of techniques for text analysis.
Proceedings Article

Simultaneous Modeling of Spectrum, Pitch and Duration in HMM-Based Speech Synthesis

TL;DR: An HMM-based speech synthesis system in which spectrum, pitch and state duration are modeled simultaneously in a unified framework of HMM is described.
Related Papers (5)