scispace - formally typeset
Search or ask a question
Author

T. Fukada

Bio: T. Fukada is an academic researcher from Canon Inc.. The author has contributed to research in topics: Adaptive filter & Stability (learning theory). The author has an hindex of 1, co-authored 1 publications receiving 363 citations.

Papers
More filters
Proceedings ArticleDOI
23 Mar 1992
TL;DR: The authors apply the criterion used in the unbiased estimation of log spectrum to the spectral model represented by the mel-cepstral coefficients to solve the nonlinear minimization problem involved in the method and derive an adaptive algorithm whose convergence is guaranteed.
Abstract: The authors describe a mel-cepstral analysis method and its adaptive algorithm. In the proposed method, the authors apply the criterion used in the unbiased estimation of log spectrum to the spectral model represented by the mel-cepstral coefficients. To solve the nonlinear minimization problem involved in the method, they give an iterative algorithm whose convergence is guaranteed. Furthermore, they derive an adaptive algorithm for the mel-cepstral analysis by introducing an instantaneous estimate for gradient of the criterion. The adaptive mel-cepstral analysis system is implemented with an IIR adaptive filter which has an exponential transfer function, and whose stability is guaranteed. The authors also present examples of speech analysis and results of an isolated word recognition experiment. >

374 citations


Cited by
More filters
Journal ArticleDOI
15 Apr 2007
TL;DR: This paper gives a general overview of techniques in statistical parametric speech synthesis, and contrasts these techniques with the more conventional unit selection technology that has dominated speech synthesis over the last ten years.
Abstract: This paper gives a general overview of techniques in statistical parametric speech synthesis. One of the instances of these techniques, called HMM-based generation synthesis (or simply HMM-based synthesis), has recently been shown to be very effective in generating acceptable speech synthesis. This paper also contrasts these techniques with the more conventional unit selection technology that has dominated speech synthesis over the last ten years. Advantages and disadvantages of statistical parametric synthesis are highlighted as well as identifying where we expect the key developments to appear in the immediate future.

1,270 citations

Proceedings ArticleDOI
05 Jun 2000
TL;DR: A speech parameter generation algorithm for HMM-based speech synthesis, in which the speech parameter sequence is generated from HMMs whose observation vector consists of a spectral parameter vector and its dynamic feature vectors, is derived.
Abstract: This paper derives a speech parameter generation algorithm for HMM-based speech synthesis, in which the speech parameter sequence is generated from HMMs whose observation vector consists of a spectral parameter vector and its dynamic feature vectors. In the algorithm, we assume that the state sequence (state and mixture sequence for the multi-mixture case) or a part of the state sequence is unobservable (i.e., hidden or latent). As a result, the algorithm iterates the forward-backward algorithm and the parameter generation algorithm for the case where the state sequence is given. Experimental results show that by using the algorithm, we can reproduce clear formant structure from multi-mixture HMMs as compared with that produced from single-mixture HMMs.

1,071 citations

Proceedings ArticleDOI
26 May 2013
TL;DR: This paper examines an alternative scheme that is based on a deep neural network (DNN), the relationship between input texts and their acoustic realizations is modeled by a DNN, and experimental results show that the DNN- based systems outperformed the HMM-based systems with similar numbers of parameters.
Abstract: Conventional approaches to statistical parametric speech synthesis typically use decision tree-clustered context-dependent hidden Markov models (HMMs) to represent probability densities of speech parameters given texts. Speech parameters are generated from the probability densities to maximize their output probabilities, then a speech waveform is reconstructed from the generated parameters. This approach is reasonably effective but has a couple of limitations, e.g. decision trees are inefficient to model complex context dependencies. This paper examines an alternative scheme that is based on a deep neural network (DNN). The relationship between input texts and their acoustic realizations is modeled by a DNN. The use of the DNN can address some limitations of the conventional approach. Experimental results show that the DNN-based systems outperformed the HMM-based systems with similar numbers of parameters.

880 citations

Proceedings Article
01 Jan 1999
TL;DR: An HMM-based speech synthesis system in which spectrum, pitch and state duration are modeled simultaneously in a unified framework of HMM is described.
Abstract: In this paper, we describe an HMM-based speech synthesis system in which spectrum, pitch and state duration are modeled simultaneously in a unified framework of HMM. In the system, pitch and state duration are modeled by multi-space probability distribution HMMs and multi-dimensional Gaussian distributions, respectively. The distributions for spectral parameter, pitch parameter and the state duration are clustered independently by using a decision-tree based context clustering technique. Synthetic speech is generated by using an speech parameter generation algorithm from HMM and a mel-cepstrum based vocoding technique. Through informal listening tests, we have confirmed that the proposed system successfully synthesizes natural-sounding speech which resembles the speaker in the training database.

759 citations

01 Jan 2007
TL;DR: This paper describes HTS version 2.0 in detail, as well as future release plans, which include a number of new features which are useful for both speech synthesis researchers and developers.
Abstract: A statistical parametric speech synthesis system based on hidden Markov models (HMMs) has grown in popularity over the last few years. This system simultaneously models spectrum, excitation, and duration of speech using context-dependent HMMs and generates speech waveforms from the HMMs themselves. Since December 2002, we have publicly released an open-source software toolkit named HMM-based speech synthesis system (HTS) to provide a research and development platform for the speech synthesis community. In December 2006, HTS version 2.0 was released. This version includes a number of new features which are useful for both speech synthesis researchers and developers. This paper describes HTS version 2.0 in detail, as well as future release plans.

546 citations