scispace - formally typeset
Search or ask a question
Author

Hisao Kuwabara

Other affiliations: University of Tokyo
Bio: Hisao Kuwabara is an academic researcher from Teikyo University of Science. The author has contributed to research in topics: Speech synthesis & Formant. The author has an hindex of 14, co-authored 26 publications receiving 1428 citations. Previous affiliations of Hisao Kuwabara include University of Tokyo.

Papers
More filters
Proceedings ArticleDOI
11 Apr 1988
TL;DR: The authors propose a new voice conversion technique through vector quantization and spectrum mapping which makes it possible to precisely control voice individuality.
Abstract: The authors propose a new voice conversion technique through vector quantization and spectrum mapping. The basic idea of this technique is to make mapping codebooks which represent the correspondence between different speakers' codebooks. The mapping codebooks for spectrum parameters, power values and pitch frequencies are separately generated using training utterances. This technique makes it possible to precisely control voice individuality. To evaluate the performance of this technique, hearing tests are carried out on two kinds of voice conversions. One is a conversion between male and female speakers, the other is a conversion between male speakers. In the male-to-female conversion experiment, all converted utterances are judged as female, and in the male-to-male conversion, 65% of them are identified as the target speaker. >

554 citations

Journal ArticleDOI
TL;DR: A large-scale Japanese speech database has been described and has been used to develop algorithms in speech recognition and synthesis studies and to find acoustic, phonetic and linguistic evidence that will serve as basic data for speech technologies.

282 citations

Journal ArticleDOI
TL;DR: A survey of non-parametric methods for spectral segmental characteristics mapping between speakers, introducing some different types of spectral mapping methods that have evolved in relation to the speaker adaptation techniques being developed in speech recognition research.

138 citations

Journal ArticleDOI
TL;DR: In this article, a new voice conversion technique through vector quantization and spectrum mapping is proposed, which is based on mapping codebooks which represent the correspondence between different speakers' codebooks.
Abstract: A new voice conversion technique through vector quantization and spectrum mapping is proposed. This technique is based on mapping codebooks which represent the cor respondencebetween different speakers' codebooks. The mapping codebooks for spectrum parameters, power values, and pitch frequencies are separately generated using training utterances. This technique makes it possible to precisely control voice individuality. The performance of this technique is confirmed by spectrum distortion and pitch frequency difference. To evaluate the overall performance of this technique, listening tests are carried out on two kinds of voice conversions: one between male and female speakers, the other between male speakers. In the male-to-female conversion experiment, all converted utterances are judged as female, and in the male-to-male conversion, 57% of them are identified as the target speaker.

116 citations

Proceedings Article
01 Jan 1990

63 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The design of a new methodology for representing the relationship between two sets of spectral envelopes and the proposed transform greatly improves the quality and naturalness of the converted speech signals compared with previous proposed conversion methods.
Abstract: Voice conversion, as considered in this paper, is defined as modifying the speech signal of one speaker (source speaker) so that it sounds as if it had been pronounced by a different speaker (target speaker). Our contribution includes the design of a new methodology for representing the relationship between two sets of spectral envelopes. The proposed method is based on the use of a Gaussian mixture model of the source speaker spectral envelopes. The conversion itself is represented by a continuous parametric function which takes into account the probabilistic classification provided by the mixture model. The parameters of the conversion function are estimated by least squares optimization on the training data. This conversion method is implemented in the context of the HNM (harmonic+noise model) system, which allows high-quality modifications of speech signals. Compared to earlier methods based on vector quantization, the proposed conversion scheme results in a much better match between the converted envelopes and the target envelopes. Evaluation by objective tests and formal listening tests shows that the proposed transform greatly improves the quality and naturalness of the converted speech signals compared with previous proposed conversion methods.

1,109 citations

Journal ArticleDOI
TL;DR: In this article, a Gaussian mixture model (GMM) of the joint probability density of source and target features is employed for performing spectral conversion between speakers, and a conversion method based on the maximum-likelihood estimation of a spectral parameter trajectory is proposed.
Abstract: In this paper, we describe a novel spectral conversion method for voice conversion (VC). A Gaussian mixture model (GMM) of the joint probability density of source and target features is employed for performing spectral conversion between speakers. The conventional method converts spectral parameters frame by frame based on the minimum mean square error. Although it is reasonably effective, the deterioration of speech quality is caused by some problems: 1) appropriate spectral movements are not always caused by the frame-based conversion process, and 2) the converted spectra are excessively smoothed by statistical modeling. In order to address those problems, we propose a conversion method based on the maximum-likelihood estimation of a spectral parameter trajectory. Not only static but also dynamic feature statistics are used for realizing the appropriate converted spectrum sequence. Moreover, the oversmoothing effect is alleviated by considering a global variance feature of the converted spectra. Experimental results indicate that the performance of VC can be dramatically improved by the proposed method in view of both speech quality and conversion accuracy for speaker individuality.

914 citations

Proceedings ArticleDOI
22 Sep 2008
TL;DR: The 9th Annual Conference of the International Speech Communication Association, September 22-26, 2008, Brisbane, Australia as discussed by the authors, was held at the University of Queensland, Queensland, Australia.
Abstract: INTERSPEECH2008: 9th Annual Conference of the International Speech Communication Association, September 22-26, 2008, Brisbane, Australia.

796 citations

Proceedings ArticleDOI
12 May 1998
TL;DR: A new voice conversion algorithm that modifies a source speaker's speech to sound as if produced by a target speaker is presented and is found to perform more reliably for small training sets than a previous approach.
Abstract: A new voice conversion algorithm that modifies a source speaker's speech to sound as if produced by a target speaker is presented. It is applied to a residual-excited LPC text-to-speech diphone synthesizer. Spectral parameters are mapped using a locally linear transformation based on Gaussian mixture models whose parameters are trained by joint density estimation. The LPC residuals are adjusted to match the target speakers average pitch. To study effects of the amount of training on performance, data sets of varying sizes are created by automatically selecting subsets of all available diphones by a vector quantization method. In an objective evaluation, the proposed method is found to perform more reliably for small training sets than a previous approach. In perceptual tests, it was shown that nearly optimal spectral conversion performance was achieved, even with a small amount of training data. However, speech quality improved with increases in the training set size.

692 citations

Journal ArticleDOI
TL;DR: The experiences of researchers at MIT in the collection of two large speech databases, timit and voyager, are described, which have somewhat complementary objectives.

570 citations