Showing papers on "Speaker diarisation published in 1995"

PDF

Open Access

Journal Article•DOI•

Robust text-independent speaker identification using Gaussian mixture speaker models

[...]

Douglas A. Reynolds¹, Richard Rose²•Institutions (2)

Massachusetts Institute of Technology¹, AT&T²

01 Jan 1995-IEEE Transactions on Speech and Audio Processing

TL;DR: The individual Gaussian components of a GMM are shown to represent some general speaker-dependent spectral shapes that are effective for modeling speaker identity and is shown to outperform the other speaker modeling techniques on an identical 16 speaker telephone speech task.

...read moreread less

Abstract: This paper introduces and motivates the use of Gaussian mixture models (GMM) for robust text-independent speaker identification. The individual Gaussian components of a GMM are shown to represent some general speaker-dependent spectral shapes that are effective for modeling speaker identity. The focus of this work is on applications which require high identification rates using short utterance from unconstrained conversational speech and robustness to degradations produced by transmission over a telephone channel. A complete experimental evaluation of the Gaussian mixture speaker model is conducted on a 49 speaker, conversational telephone speech database. The experiments examine algorithmic issues (initialization, variance limiting, model order selection), spectral variability robustness techniques, large population performance, and comparisons to other speaker modeling techniques (uni-modal Gaussian, VQ codebook, tied Gaussian mixture, and radial basis functions). The Gaussian mixture speaker model attains 96.8% identification accuracy using 5 second clean speech utterances and 80.8% accuracy using 15 second telephone speech utterances with a 49 speaker population and is shown to outperform the other speaker modeling techniques on an identical 16 speaker telephone speech task. >

...read moreread less

3,134 citations

Journal Article•DOI•

Transformation of formants for voice conversion using artificial neural networks

[...]

M. Narendranath¹, Hema A. Murthy¹, S. Rajendran¹, B. Yegnanarayana¹•Institutions (1)

Indian Institute of Technology Madras¹

01 Feb 1995-Speech Communication

TL;DR: A scheme for developing a voice conversion system that converts the speech signal uttered by a source speaker to a speech signal having the voice characteristics of the target speaker using formants and a formant vocoder is proposed.

...read moreread less

207 citations

Proceedings Article•DOI•

Experiments using data augmentation for speaker adaptation

[...]

Jerome R. Bellegarda¹, P.V. de Souza¹, David Nahamoo¹, Mukund Padmanabhan¹, Michael Picheny¹, Lalit R. Bahl¹ - Show less +2 more•Institutions (1)

IBM¹

09 May 1995

TL;DR: The data augmentation technique is based on the metamorphic algorithm first proposed in Bellegarda et al.

...read moreread less

Abstract: Speaker adaptation typically involves customizing some existing (reference) models in order to account for the characteristics of a new speaker. This work considers the slightly different paradigm of customizing some reference data for the purpose of populating the new speaker's space, and then using the resulting (augmented) data to derive the customized models. The data augmentation technique is based on the metamorphic algorithm first proposed in Bellegarda et al. [1992], assuming that a relatively modest amount of data (100 sentences) is available from each new speaker. This contraint requires that reference speakers be selected with some care. The performance of this method is illustrated on a portion of the Wall Street Journal task.

...read moreread less

165 citations

Proceedings Article•

Connectionist speaker normalization and adaptation.

[...]

Victor Abrash, Horacio Franco, Ananth Sankar, Michael Cohen

01 Jan 1995

TL;DR: This paper explores supervised speaker adaptation and normalization in the MLP component of a hybrid hidden Markov model/ multilayer perceptron version of SRI's DECIPHERTM speech recognition system.

...read moreread less

Abstract: In a speaker-independent, large-vocabulary continuous speech recognition systems, recognition accuracy varies considerably from speaker to speaker, and performance may be significantly degraded for outlier speakers such as nonnative talkers. In this paper, we explore supervised speaker adaptation and normalization in the MLP component of a hybrid hidden Markov model/ multilayer perceptron version of SRI's DECIPHERTM speech recognition system. Normalization is implemented through an additional transformation network that preprocesses the cepstral input to the MLP. Adaptation is accomplished through incremental retraining of the MLP weights on adaptation data. Our approach combines both adaptation and normalization in a single, consistent manner, works with limited adaptation data, and is text-independent. We show significant improvement in recognition accuracy.

...read moreread less

95 citations

Proceedings Article•DOI•

Toward movement-invariant automatic lip-reading and speech recognition

[...]

Paul Duchnowski¹, M. Hunke¹, D. Busching¹, Uwe Meier¹, Alex Waibel¹ - Show less +1 more•Institutions (1)

Karlsruhe Institute of Technology¹

09 May 1995

TL;DR: In this article, a modular system for flexible human-computer interaction via speech is presented, which integrates acoustic and visual information (automatic lip-reading) improving overall recognition, especially in noisy environments.

...read moreread less

Abstract: We present the development of a modular system for flexible human-computer interaction via speech. The speech recognition component integrates acoustic and visual information (automatic lip-reading) improving overall recognition, especially in noisy environments. The image of the lips, constituting the visual input, is automatically extracted from the camera picture of the speaker's face by the lip locator module. Finally, the speaker's face is automatically acquired and followed by the face tracker sub-system. Integration of the three functions results in the first bi-modal speech recognizer allowing the speaker reasonable freedom of movement within a possibly noisy room while continuing to communicate with the computer via voice. Compared to audio-alone recognition, the combined system achieves a 20 to 50 percent error rate reduction for various signal/noise conditions.

...read moreread less

94 citations

Patent•DOI•

Method and apparatus including microphone arrays and neural networks for speech/speaker recognition systems

[...]

James L. Flanagan¹, Qiguang Lin¹, Mazin G. Rahim¹, Chiwei Che¹•Institutions (1)

Rutgers University¹

07 Mar 1995-Journal of the Acoustical Society of America

TL;DR: In this article, a neural network is trained to transform distant-talking cepstrum coefficients, derived from a microphone array receiving speech from a speaker distant therefrom, into a form substantially similar to close-talking coefficients that would be derived from an audio microphone close to the speaker, for providing robust hands-free speech and speaker recognition in adverse practical environments with existing speech-and speaker recognition systems which have been trained on close talking speech.

...read moreread less

Abstract: A neural network is trained to transform distant-talking cepstrum coefficients, derived from a microphone array receiving speech from a speaker distant therefrom, into a form substantially similar to close-talking cepstrum coefficients that would be derived from a microphone close to the speaker, for providing robust hands-free speech and speaker recognition in adverse practical environments with existing speech and speaker recognition systems which have been trained on close-talking speech.

...read moreread less

81 citations

Journal Article•DOI•

Speech spectrum conversion based on speaker interpolation and multi-functional representation with weighting by radial basis function networks

[...]

Naoto Iwahashi, Yoshinori Sagisaka

01 Feb 1995-Speech Communication

TL;DR: A speech spectrum transformation method by interpolating multi-speakers' spectral patterns and multi-functional representation with Radial Basis Function networks to generate new spectrum patterns close to those of the target speaker.

...read moreread less

63 citations

Journal Article•DOI•

A study on minimum error discriminative training for speaker recognition

[...]

Chi-Shi Liu, Chin-Hui Lee, Wu Chou, Biing-Hwang Juang, Aaron E. Rosenberg - Show less +1 more

01 Jan 1995-Journal of the Acoustical Society of America

TL;DR: A discriminative training approach is used which takes into account the models of other competing speakers and formulates the optimization criterion such that speaker separation is enhanced and speaker recognition error rate on the training data is directly minimized.

...read moreread less

Abstract: The use of discriminative training to construct hidden Markov models of speakers for verification and identification is studied As opposed to conventional maximum likelihood training which estimates a speaker’s model based only on the training utterances from the same speaker, a discriminative training approach is used which takes into account the models of other competing speakers and formulates the optimization criterion such that speaker separation is enhanced and speaker recognition error rate on the training data is directly minimized The optimization solution is obtained with a probabilistic descent algorithm For all experiments an isolated digit database consisting of 100 speakers is used For speaker identification, the resulting discriminative speaker models reduce the identification error rate by more than 25% over the results obtained with the conventional training algorithm A new normalized score function is proposed which makes the verification formulation consistent with the minimum error training objective When combining the proposed verification score function with discriminative training, an average equal error rate of 08% is achieved using only one‐digit test utterances This represents an error rate reduction of over 80% from an average equal error rate of 61% when using the conventional algorithm for training and the unnormalized score function for testing

...read moreread less

63 citations

Proceedings Article•DOI•

Measuring fine structure in speech: application to speaker identification

[...]

Charles Jankowski¹, Thomas F. Quatieri¹, Douglas A. Reynolds¹•Institutions (1)

Massachusetts Institute of Technology¹

09 May 1995

TL;DR: This paper investigates features that are based on amplitude and frequency modulations of speech formants, high resolution measurement of fundamental frequency and location of "secondary pulses", measured using a high-resolution energy operator.

...read moreread less

Abstract: The performance of systems for speaker identification (SID) can be quite good with clean speech, though much lower with degraded speech. Thus it is useful to search for new features for SID, particularly features that are robust over a degraded channel. This paper investigates features that are based on amplitude and frequency modulations of speech formants, high resolution measurement of fundamental frequency and location of "secondary pulses", measured using a high-resolution energy operator. When these features are added to traditional features using an existing SID system with a 168 speaker telephone speech database, SID performance improved by as much as 4% for male speakers and 8.2% for female speakers.

...read moreread less

57 citations

Patent•DOI•

State-dependent speaker clustering for speaker adaptation

[...]

Lalit R. Bahl¹, Ponani S. Gopalakrishnan¹, David Nahamoo¹, Mukund Padmanabhan¹•Institutions (1)

IBM¹

13 Dec 1995-Journal of the Acoustical Society of America

TL;DR: In this paper, a system and method for adaptation of a speaker independent speech recognition system for use by a particular user is presented, where a test speaker's acoustic characterization is compared with acoustic characterization data generated for a plurality of training speakers.

...read moreread less

Abstract: A system and method for adaptation of a speaker independent speech recognition system for use by a particular user. The system and method gather acoustic characterization data from a test speaker and compare the data with acoustic characterization data generated for a plurality of training speakers. A match score is computed between the test speaker's acoustic characterization for a particular acoustic subspace and each training speaker's acoustic characterization for the same acoustic subspace. The training speakers are ranked for the subspace according to their scores and a new acoustic model is generated for the test speaker based upon the test speaker's acoustic characterization data and the acoustic characterization data of the closest matching training speakers. The process is repeated for each acoustic subspace.

...read moreread less

56 citations

Proceedings Article•

Language model speaker adaptation.

[...]

Stefan Besling, Hans-Günter Meier

01 Jan 1995

Patent•

Interactive speech recognition combining speaker-independent and speaker-specific word recognition, and having a response-creation capability

[...]

Miyazawa Yasunaga¹, Mitsuhiro Inazumi¹, Hiroshi Hasegawa¹, Isao Edatsune¹•Institutions (1)

Epson¹

29 Sep 1995

TL;DR: In this article, a technique for improving speech recognition in low-cost, speech interactive devices is proposed, which calls for implementing a speaker-specific word enrollment and detection unit in parallel with a word detection unit to permit comprehension of spoken commands or messages issued by binary questions when no recognizable words are found.

...read moreread less

Abstract: A technique for improving speech recognition in low-cost, speech interactive devices. This technique calls for implementing a speaker-specific word enrollment and detection unit in parallel with a word detection unit to permit comprehension of spoken commands or messages issued by binary questions when no recognizable words are found. Preferably, specific speaker detection will be based on the speaker's own personal list of words or expression. Other facets include complementing non-specific pre-registered word characteristic information with individual, speaker-specific verbal characteristics to improve recognition in cases where the speaker has unusual speech mannerisms or accent and response alteration in which speaker-specification registration functions are leveraged to provide access and permit changes to a predefined responses table according to user needs and tastes.

...read moreread less

Patent•DOI•

Speaker identification and verification system

[...]

Richard J. Mammone¹, Khaled Assaleh¹•Institutions (1)

Rutgers University¹

28 Feb 1995-Journal of the Acoustical Society of America

TL;DR: In this article, a speaker recognition method and system which applies adaptive component weighting to each frame of speech for attenuating non-vocal tract components and normalizing speech components is presented.

...read moreread less

Abstract: The present invention relates to a speaker recognition method and system which applies adaptive component weighting to each frame of speech for attenuating non-vocal tract components and normalizing speech components. A linear predictive all pole model is used to form a new transfer function having a moving average component. A normalized spectrum is determined from the new transfer function. The normalized spectrum is defined having improved characteristics for speech components. From the improved speech components, improved speaker recognition over a channel is obtained.

...read moreread less

Proceedings Article•

Speaker recognition using HMM composition in noisy environments.

[...]

Tomoko Matsui, Tomohito Kanno¹, Sadaoki Furui¹•Institutions (1)

Tokyo Institute of Technology¹

01 Jan 1995

TL;DR: Experimental application of the speaker recognition method based on hidden Markov model composition to text-independent speaker identification and verification in various kinds of noisy environments demonstrated considerable improvement in speaker recognition for speech utterances of male speakers.

...read moreread less

Abstract: This paper investigates a speaker recognition method that is robust against background noise. In noisy environments, one important issue is how to create a model for each speaker so as to compensate for noise. The method described here is based on hidden Markov model (HMM) composition, which combines a speaker HMM and a noise-source HMM into a noise-added speaker HMM with a particular signal-to-noise ratio (SNR). Since it is difficult to measure the SNR of input speech with non-stationary noise exactly, this method creates several noise-added speaker HMMs with various SNRs. The HMM that has the highest likelihood value for the input speech is selected, and a speaker decision is made using this likelihood value. Experimental application of this method to text-independent speaker identification and verification in various kinds of noisy environments demonstrated considerable improvement in speaker recognition for speech utterances of male speakers.

...read moreread less

Journal Article•DOI•

Predictive speaker adaptation in speech recognition

[...]

Stephen Cox¹•Institutions (1)

University of East Anglia¹

01 Jan 1995-Computer Speech & Language

TL;DR: A technique of adapting all the speech models to a new speaker's voice when he has given an incomplete set of the vocabulary is presented, based upon using the training-set to obtain estimates of correlations between sounds.

...read moreread less

Proceedings Article•

Speaker recognition using HMM with experiments on the yoho database.

[...]

Chiwei Che, Qiguang Lin

01 Jan 1995

Proceedings Article•DOI•

Text-dependent speaker verification using data fusion

[...]

K.R. Farrell

09 May 1995

TL;DR: A new system is presented for text-dependent speaker verification that uses data fusion concepts to combine the results of distortion-based and discriminant-based classifiers and yields an equal error rate for this task, which is better than the individual performance of either classifier.

...read moreread less

Abstract: A new system is presented for text-dependent speaker verification The system uses data fusion concepts to combine the results of distortion-based and discriminant-based classifiers Hence, both intraspeaker and interspeaker information are utilized in the final decision The distortion and discriminant-based classifiers are based on dynamic time warping (DTW) and the neural tree network (NTN), respectively The system is evaluated with several hundred two word utterances collected over a telephone channel The combined classifier yields an equal error rate of two percent for this task, which is better than the individual performance of either classifier

...read moreread less

Proceedings Article•

Flexible speaker adaptation for large vocabulary speech recognition

[...]

C. J. Leggetter, Philip C. Woodland

16 Sep 1995

Proceedings Article•

Multi-lingual assessment of speaker independent large vocabulary speech-recognition systems: THE SQALE-PROJECT.

[...]

Herman J. M. Steeneken, David A. van Leeuwen

01 Jan 1995

Assessment of speaker verification systems

[...]

Gérard Chollet, Frédéric Bimbot

01 Jan 1995

TL;DR: Speech Reference EPFL-CHAPTER-82317 describes the development of language-based communication techniques and their applications in the classroom and the response of students to these techniques.

...read moreread less

Abstract: Keywords: speech Reference EPFL-CHAPTER-82317 Record created on 2006-03-10, modified on 2017-05-10

...read moreread less

Proceedings Article•

Effect of utterance duration and phonetic content on speaker identification using second-order statistical methods.

[...]

Ivan Magrin-Chagnolleau, Jean-François Bonastre, Frédéric Bimbot

01 Jan 1995

TL;DR: The results tend to show that the speaker-dependent information captured by long-term second-order statistics is consistently common to all phonetic classes, and that the homogeneity of the test material may improve the quality of the estimates.

...read moreread less

Abstract: Second-order statistical methods show very good results for automatic speaker identi cation in controlled recording conditions [2] These approaches are generally used on the entire speech material available In this paper, we study the in uence of the content of the test speech material on the performances of such methods, ie under a more analytical approach [3] The goal is to investigate on the kind of information which is used by these methods, and where it is located in the speech signal Liquids and glides together, vowels, and more particularly nasal vowels and nasal consonants, are found to be particularly speaker speci c: test utterances of 1 second, composed in majority of acoustic material from one of these classes provide better speaker identi cation results than phonetically balanced test utterances, even though the training is done, in both cases, with 15 seconds of phonetically balanced speech Nevertheless, results with other phoneme classes are never dramatically poor These results tend to show that the speaker-dependent information captured by long-term second-order statistics is consistently common to all phonetic classes, and that the homogeneity of the test material may improve the quality of the estimates

...read moreread less

Journal Article•

An introduction to speech and speaker recognition

[...]

Richard D. Peacocke, Daryl H. Graf

01 Jun 1995-Human-Computer Interaction

TL;DR: In this paper, five approaches that can be used to control and simplify the speech recognition task are examined: isolated words, speaker-dependent systems, limited vocabulary size, a tightly constrained grammar, and quiet and controlled environmental conditions.

...read moreread less

Abstract: Five approaches that can be used to control and simplify the speech recognition task are examined. They entail the use of isolated words, speaker-dependent systems, limited vocabulary size, a tightly constrained grammar, and quiet and controlled environmental conditions. The five components of a speech recognition system are described: a speech capture device, a digital signal processing module, preprocessed signal storage, reference speech patterns, and a pattern-matching algorithm. Current speech recognition systems are reviewed and categorized. Speaker recognition approaches and systems are also discussed. >

...read moreread less

Proceedings Article•

An integrated multi-dialect speech recognition system with optional speaker adaptation.

[...]

V. Beattie, S. Edmondson, D. Miller, Y. Patel, G. Talvola - Show less +1 more

01 Jan 1995

Proceedings Article•

Toward Content-Based Audio Indexing and Retrieval and a New Speaker Discrimination Technique

[...]

Lonce Wyse¹, Stephen W. Smoliar¹•Institutions (1)

National University of Singapore¹

01 Jan 1995

TL;DR: This chapter discusses several techniques for identifying segment transitions in an audio stream and a novel speaker discrimination is described that makes segmentation decisions when a continuously updated model of the current speaker suddenly ceases to sufficiently account for the input data.

...read moreread less

Abstract: This chapter discusses several techniques for identifying segment transitions in an audio stream. Gross features are first identified that control more detailed and computationally expensive analysis down stream. The immediate goal of the audio processing is to identify transition points between segments and to do an initial content oriented labeling of the segments. The technique illustrated is a combination of signal processing techniques for feature extraction and intelligent symbolic level processing for decision making. The symbolic processing includes knowledge about characteristics of some of the basic signal types that can be encountered. Pitch is tracked using some basic streaming principles and then used as one cue to speaker transitions. A novel speaker discrimination is also described that makes segmentation decisions when a continuously updated model of the current speaker suddenly ceases to sufficiently account for the input data. Segment transition decisions in audio are based on less temporally localized information than are video transition decisions.

...read moreread less

Patent•DOI•

Speaker independent speech recognition method utilizing multiple training iterations

[...]

Joe A. Naylor, William Y. Huang, Lawrence G. Bahler

02 Aug 1995-Journal of the Acoustical Society of America

TL;DR: A method for recognizing spoken utterances of a speaker is disclosed, the method comprising the steps of providing a database of labeled speech data and providing a prototype of a Hidden Markov Model (HMM) definition to define the characteristics of the HMM.

...read moreread less

Abstract: A method for recognizing spoken utterances of a speaker is disclosed, the method comprising the steps of providing a database of labeled speech data; providing a prototype of a Hidden Markov Model (HMM) definition to define the characteristics of the HMM; and parameterizing speech utterances according to one of linear prediction parameters or Mel-scale filter bank parameters. The method further includes selecting a frame period for accommodating the parameters and generating HMMs and decoding to specified speech utterances by causing the user to utter predefined training speech utterances for each HMM. The method then statistically computes the generated HMMs with the prototype HMM to provide a set of fully trained HMMs for each utterance indicative of the speaker. The trained HMMs are used for recognizing a speaker by computing Laplacian distances via distance table lookup for utterances of the speaker during the selected frame period; and iteratively decoding node transitions corresponding to the spoken utterances during the selected frame period to determine which predefined utterance is present.

...read moreread less

Proceedings Article•

On the use of features from prediction residual signals in speaker identification.

[...]

Jialong He, Li Liu, Günther Palm

01 Jan 1995

Recognition of continuous broadcast news with multiple unknown speakers and environments

[...]

Uday Jain¹, Matthew A. Siegler¹, Sam-Joo Doh¹, Evandro B. Gouvêa¹, Juan M. Huerta¹, Pedro J. Moreno¹, Bhiksha Raj¹, Richard M. Stern¹ - Show less +4 more•Institutions (1)

Carnegie Mellon University¹

01 Jan 1995

TL;DR: This paper describes recent efforts by the CMU speech group to improve the recognition of speech found in long sections of the broadcast news show Marketplace, and compares the recognition accuracy of the SPHINX-II system for different environmental and speaker conditions.

...read moreread less

Abstract: Practical applications of continuous speech recognition in realistic environments place increasing demands for speaker and environment independence. Until recently, this robustness has been measured using evaluation procedures where speaker and environment boundaries are known, with utterances containing complete or nearly complete sentences. This paper describes recent efforts by the CMU speech group to improve the recognition of speech found in long sections of the broadcast news show Marketplace. Most of our effort was concentrated in two areas: the automatic segmentation and classification of environments, and the construction of a suitable lexicon and language model. We review the extensions to SPHINX-II that were necessary to enable it to process continuous broadcast news and we compare the recognition accuracy of the SPHINX-II system for different environmental and speaker conditions.

...read moreread less

Proceedings Article•

Speaker recognition models.

[...]

Kin Yu, John Mason, John Oglesby

01 Jan 1995

TL;DR: Comparing continuous density hidden Markov models, dynamic time warping (DTW) and distortion-based vector quantisa-tion (VQ) for speaker recognition, across incremen-tal amounts of training data shows TD to be superior to TI architecture for speaker Recognition, and TD digit performance illustrates zero, 1 and 9 to be good discrim-inators.

...read moreread less

Abstract: This paper evaluates continuous density hidden Markov models (CDHMM), dynamic time warping (DTW) and distortion-based vector quantisa-tion (VQ) for speaker recognition, across incremen-tal amounts of training data. In comparing VQ and CDHMMs for text-independent (TI) speaker recognition , it is shown that VQ performs better than an equivalent CDHMM with one training version, but is outperformed by the CDHMM when trained with ten training versions. In text-dependent (TD) experiments , a comparison of DTW, VQ and CDHMMs shows that DTW outperforms VQ and CDHMMs for sparse amounts of training data, but with more data, the performance of each model is indistinguishable. Further analysis shows TD to be superior to TI architecture for speaker recognition, and TD digit performance illustrates zero, 1 and 9 to be good discrim-inators.

...read moreread less

Proceedings Article•DOI•

Unsupervised text independent speaker classification

[...]

Arnon D. Cohen¹, V. Lapidus¹•Institutions (1)

Ben-Gurion University of the Negev¹

07 Mar 1995

TL;DR: This paper deals with the problem of unsupervised speaker classification, where no a priori speaker information is available, and proposes an algorithm that accepts multi-speaker dialogue speech data, estimates the number of speakers and assigns each speech segment to its speaker.

...read moreread less

Abstract: Speaker recognition and verification has been used in a variety of commercial, forensic and military applications. The classical problem is that of supervised recognition, in which there is sufficient a priori information on the speakers to be identified. In such cases, the recognition system has speaker models, estimated during training sessions. This paper deals with the problem of unsupervised speaker classification, where no a priori speaker information is available. The algorithm accepts multi-speaker dialogue speech data, estimates the number of speakers and assigns each speech segment to its speaker. Preliminary results are described.

...read moreread less

Patent•

Voice tone quality converting voice synthesizer

[...]

Makoto Hashimoto, Norio Higuchi, 宜男樋口, 誠橋本

10 Mar 1995

TL;DR: In this paper, a spectrum mapping processing section 22 quantizes the acoustic feature parameters of the voice of a selected speaker stored in a voice data-base 10 based on the inputted character string to be voice synthesized employing the code book of the speaker.

...read moreread less

Abstract: PURPOSE: To allow learning with a small amount of learning data and to perform a tone quality conversion with high precision by generating and outputting voices signals of a target speaker corresponding to a character string based on the acoustic feature parameters of the voice signals of the target speaker. CONSTITUTION: A spectrum mapping processing section 22 quantizes the acoustic feature parameters of the voice of a selected speaker stored in a voice data-base 10 based on the inputted character string to be voice synthesized employing the code book of the speaker. Moreover, based on the corresponding relationship between the speaker's code book and the mapping code book, the acoustic parameters of the voice signals of the speaker corresponding to the character string are generated by the section 22. Furthermore, a voice synthesis section 24 generates and outputs the voice signals of the speaker corresponding to the character string based on the acoustic feature parameters of the voice signals of the speaker generated by the section 22. Therefore, the voices for a voice tone quality conversion are allowed to be different and the voice tone quality conversion from learning voices, Japanese and words to English words is accomplished.

...read moreread less