Showing papers on "Speaker diarisation published in 1992"

PDF

Open Access

Patent•

Temporal decorrelation method for robust speaker verification

[...]

Lorin P. Netsch¹, George R. Doddington¹•Institutions (1)

12 Feb 1992

TL;DR: In this article, a speaker voice verification system uses temporal decorrelation linear transformation and includes a collector for receiving speech inputs from an unknown speaker claiming a specific identity, a word-level speech features calculator operable to use a temporal decor correlation linear transformation for generating wordlevel speech feature vectors from such speech inputs, and a word level speech feature storage for storing word level feature vectors known to belong to a speaker with the specific identity.

...read moreread less

Abstract: A speaker voice verification system uses temporal decorrelation linear transformation and includes a collector for receiving speech inputs from an unknown speaker claiming a specific identity, a word-level speech features calculator operable to use a temporal decorrelation linear transformation for generating word-level speech feature vectors from such speech inputs, word-level speech feature storage for storing word-level speech feature vectors known to belong to a speaker with the specific identity, a word-level speech feature vectors received from the unknown speaker with those received from the word-level speech feature storage, and speaker verification decision circuitry for determining, based on the similarity score, whether the unknown speaker's identity is the same as that claimed The word-level vector scorer further includes concatenation circuitry as well as a word-specific orthogonalizing linear transformer Other systems and methods are also disclosed

...read moreread less

143 citations

Proceedings Article•

Speaker-independent, text-independent language identification by HMM.

[...]

Seiichi Nakagawa, Yoshio Ueda, Takashi Seino

01 Jan 1992

47 citations

Patent•DOI•

Speaker verifier using nearest-neighbor distance measure

[...]

Alan L. Higgins

22 Jul 1992-Journal of the Acoustical Society of America

TL;DR: In this article, a speaker verification system which accepts or rejects the claimed identity of an individual based on analysis and measurements of the speaker's utterances is presented, elicited by prompting the individual seeking identification to read test phrases chosen at random by the verification system composed of words from a small vocabulary.

...read moreread less

Abstract: A speaker verification system which accepts or rejects the claimed identity of an individual based on analysis and measurements of the speaker's utterances. The utterances are elicited by prompting the individual seeking identification to read test phrases chosen at random by the verification system composed of words from a small vocabulary. Nearest-neighbor distances between speech frames derived from such spoken test phrases and speech frames of corresponding vocabulary "words" from previously stored utterances of the speaker seeking identification are computed along with distances between such spoken test phrases and corresponding vocabulary words for a set of reference speakers. The claim for identification is accepted or rejected based on the relationship among such distances and a predetermined threshold value.

...read moreread less

46 citations

Proceedings Article•DOI•

Cinematic techniques for speech processing: temporal decomposition and multivariate linear prediction

[...]

C. Montacie, P. Deleglise, Frédéric Bimbot, M.-J. Caraty

23 Mar 1992

TL;DR: Using the original method developed by Laforia, a series of text-independent speaker recognition experiments, characterized by a long-term multivariate auto-regressive modelization, gives first-rate results without using more than one sentence.

...read moreread less

Abstract: Two models, the temporal decomposition and the multivariate linear prediction, of the spectral evolution of speech signals capable of processing some aspects of the speech variability are presented. A series of acoustic-phonetic decoding experiments, characterized by the use of spectral targets of the temporal decomposition techniques and a speaker-dependent mode, gives good results compared to a reference system (i.e., 70% vs. 60% for the first choice). Using the original method developed by Laforia, a series of text-independent speaker recognition experiments, characterized by a long-term multivariate auto-regressive modelization, gives first-rate results (i.e., 98.4% recognition rate for 420 speakers) without using more than one sentence. Taking into account the interpretation of the models, these results show how interesting the cinematic models are for obtaining a reduced variability of the speech signal representation. >

...read moreread less

30 citations

Proceedings Article•DOI•

Text-independent speaker recognition using neural networks

[...]

Hiroaki Hattori

23 Mar 1992

TL;DR: In this article, a text-independent speaker recognition method using predictive neural networks is described, where an ergodic model which allows transitions to any other state is adopted as the speaker model and one predictive neural network is assigned to each state.

...read moreread less

Abstract: A text-independent speaker recognition method using predictive neural networks is described. The speech production process is regarded as a nonlinear process, so the speaker individuality in the speech signal also includes nonlinearity. Therefore, the predictive neural network, which is a nonlinear prediction model based on multilayer perceptrons, is expected to be a more suitable model for representing speaker individuality. For text-independent speaker recognition, an ergodic model which allows transitions to any other state is adopted as the speaker model and one predictive neural network is assigned to each state. The proposed method was compared to distortion-based methods, hidden Markov model (HMM)-based methods, and a discriminative neural-network-based method through text-independent speaker recognition experiments on 24 female speakers. The proposed method gave the highest recognition accuracy of 100.0% and the effectiveness of the predictive neural networks for representing speaker individuality was clarified. >

...read moreread less

28 citations

Proceedings Article•DOI•

Phoneme based speaker verification

[...]

M. Savic¹, J. Sorensen¹•Institutions (1)

Rensselaer Polytechnic Institute¹

23 Mar 1992

TL;DR: An approach to text-independent speaker verification that uses a two-stage classifier that consists of a speaker-independent phoneme detector trained to recognize a phoneme that is distinctive from speaker to speaker.

...read moreread less

Abstract: Text-independent speaker verification systems typically depend upon averaging over a long utterance to obtain a feature set for classification. However, not all speech is equally suited to the task of speaker verification. An approach to text-independent speaker verification that uses a two-stage classifier is presented. The first stage consists of a speaker-independent phoneme detector trained to recognize a phoneme that is distinctive from speaker to speaker. The second stage is trained to recognize the frames of speech from the target speaker that are admitted by the phoneme detector. A common feature vector based on the linear predictive coding (LPC) cepstrum is projected in different directions for each of these pattern recognition tasks. Results of tests using the described speaker verification system are shown. >

...read moreread less

26 citations

Proceedings Article•DOI•

Robust speaker adaptation using a piecewise linear acoustic mapping

[...]

Jerome R. Bellegarda¹, P.V. de Souza¹, A. Nadas¹, David Nahamoo¹, Michael Picheny¹, Lalit R. Bahl¹ - Show less +2 more•Institutions (1)

IBM¹

23 Mar 1992

TL;DR: An adaptation strategy based on a piecewise linear mapping between the feature space of a new speaker and that of a reference speaker is described, which results in a robust speaker adaptation procedure which allows for a drastic reduction in the amount of training data required from the new speaker.

...read moreread less

Abstract: In a large vocabulary speech recognition system, it is desirable to make use of previously acquired speech data when encountering new speakers. The authors describe an adaptation strategy based on a piecewise linear mapping between the feature space of a new speaker and that of a reference speaker. This speaker-normalizing mapping is used to transform the previously acquired parameters of the reference speaker onto the space of the new speaker. This results in a robust speaker adaptation procedure which allows for a drastic reduction in the amount of training data required from the new speaker. The performance of this method is illustrated on an isolated utterance speech recognition task with a vocabulary of 20000 words. >

...read moreread less

24 citations

Proceedings Article•

Speaker recognition using concatenated phoneme models.

[...]

Tomoko Matsui, Sadaoki Furui

01 Jan 1992

17 citations

Features and measures for speaker recognition

[...]

R. K. Rao Yarlagadda, Joseph P. Campbell

01 Jan 1992

TL;DR: New perceptually based features were found which, unfortunately, did not outperform traditional speech production features with respect to speaker identification errors and a main contribution is a new information theoretic shape measure between line spectrum pair (LSP) frequency features.

...read moreread less

Abstract: Scope and method of study. This work derives and demonstrates new and powerful features and measures for automatic speaker recognition and compares them with traditional ones. Automatic speaker recognition is the use of a machine to recognize a person from a spoken phrase. Speaker recognition systems can identify a particular person or verify a person's claimed identity. The scope of this study is limited to speech collected from cooperative users in office environments and without adverse microphone or channel impairments. The success of these systems depends directly upon the power of the features and measures used to discriminate among people. The focus of this research is to discover powerful features and measures for speaker verification. After a thorough literature review, concepts were synthesized from such diverse fields as signal processing, information theory, pattern recognition, physiology, and speech production and perception. The most promising innovations were then compared analytically and by computer simulation. Findings and conclusions. New perceptually based features were found which, unfortunately, did not outperform traditional speech production features with respect to speaker identification errors. Powerful new production features and measures for speaker verification were discovered. The main contribution is a new information theoretic shape measure between line spectrum pair (LSP) frequency features. This new measure, the divergence shape, can be interpreted geometrically as the shape of an information theoretic measure called divergence. The LSPs were found to be very effective features in this divergence shape measure. The experimental results show this combination yields 0.05% speaker identification error, which is superior by over an order of magnitude to the performance of any other claim reported in the literature.

...read moreread less

17 citations

Proceedings Article•

AR-vector models for free-text speaker recognition.

[...]

Claude Montacié, Jean-Luc Le Floch

01 Jan 1992

TL;DR: A new text-independent speaker ecognition method that uses a modeling of the spectral evolution of the speech signals, which is capable of processing some aspects of the inter-speaker variability : the AR-Vector models is proposed.

...read moreread less

Abstract: In this paper, a new text-independent speaker ecognition method is proposed. This method uses a modeling of the spectral evolution of the speech signals, which is capable of processing some aspects of the inter-speaker variability : the AR-Vector models. Some inter-speaker measures are presented and their advantages/inconvenients are discussed. A training technique to learn discriminant AR-Vector models is proposed. The evaluation of this method is carried out on the TIMIT database recorded by cooperative speakers without any impostor. A series of text-independent speaker identification experiments are described. There is no specific corpus for the training sentences and the training corpus is different from the test corpus. Two speech qualities are tested (i.e., good quality and phone quality). The experiments with good speech quality give first-rate results (i.e, identification rate of 100% for 420 speakers) without using more than two sentences for each test.

...read moreread less

17 citations

Proceedings Article•DOI•

Continuous probabilistic acoustic map for speaker identification

[...]

B.L. Tseng¹, F.K. Soong, Aaron E. Rosenberg•Institutions (1)

Massachusetts Institute of Technology¹

23 Mar 1992

TL;DR: The CPAM approach is shown to perform better than a vector quantization based approach in text-independent speaker recognition, and as well as the text-dependent, conventional, continuous mixture HMM approach with significant representation efficiency.

...read moreread less

Abstract: A continuous probabilistic acoustic map (CPAM) approach to speaker recognition is investigated. In the CPAM formulation, the speech input of a speaker is parameterized as a mixture of tied, universal probability density functions (PDFs) with either a CPAM model alone for text-independent operation or a CPAM-based hidden Markov model (HMM) for text-dependent operation. A continuously spoken digit database of 20 speakers (10 M, 10 F) is used to evaluate the CPAM approach in both identification and verification performance. The CPAM approach is shown to perform better than a vector quantization based approach in text-independent speaker recognition, and as well as the text-dependent, conventional, continuous mixture HMM approach with significant representation efficiency. In particular, the CPAM-based HMM achieves an identification error rate of 1.7% and a verification equal-error rate of 4.0% with a CPAM of 128 PDFs while a conventional, continuous mixtures HMM needs 400 PDFs to achieve corresponding error rates of 1.9% and 4.0% using the same combined cepstral features and three-digit test utterances. >

...read moreread less

Patent•DOI•

Speaker adapted speech recognition system

[...]

Sanada Toru¹, Shinta Kimura¹•Institutions (1)

Fujitsu¹

29 Jul 1992-Journal of the Acoustical Society of America

TL;DR: In this article, a speaker adapted speech recognition system consisting of a plurality of acoustic templates of speakers for managing correspondence between an acoustic feature of the speech and a content of speech, a converting portion for converting the acoustic feature managed by the acoustic templates according to a set parameter, a learning portion for learning the parameter, at which the acoustic features of the acoustic template as converted by the converting portion is approximately coincident with those of a corresponding speech feature for learning when the speech input for learning is provided.

...read moreread less

Abstract: A speaker adapted speech recognition system achieving a high recognition rate for an unknown speaker, comprises a plurality of acoustic templates of speakers for managing correspondence between an acoustic feature of the speech and a content of the speech, a converting portion for converting the acoustic feature of the speech managed by the acoustic templates according to a set parameter, a learning portion for learning the parameter, at which the acoustic feature of the acoustic template as converted by the converting portion is approximately coincidence with the acoustic feature of a corresponding speech input for learning when the speech input for learning is provided, a selection portion for selecting one or more of the acoustic templates having the closest acoustic features to that of a speech input for selection; the acoustic features of which are converted by the converting portion by comparing the corresponding acoustic feature of the speech input for selection with the corresponding acoustic features converted by the converting portion when the speech input for selection is provided, and an acoustic template for the unknown speaker is created by converting the acoustic features of the acoustic templates of the speakers that are selected by the selection portion, by the converter, for recognize the content of the speech input of the unknown speaker by using the created acoustic template of the speaker.

...read moreread less

Proceedings Article•DOI•

Text-independent speaker identification using binary-pair partitioned neural networks

[...]

L. Rudasi¹, Stephen A. Zahorian¹•Institutions (1)

Old Dominion University¹

07 Jun 1992

TL;DR: It was shown that the two-way classifiers can be combined to achieve 100% speaker identification performance for large speaker populations.

...read moreread less

Abstract: The N-way speaker identification task is partitioned into N*(N-1)/2 binary-pair classifications. The binary-pair classifications are performed with small neural nets, each trained to make independent binary decisions on small fragments of speech data. Three issues were investigated concerning optimally combining a large number of fragmentary binary decisions into a single N-way decision: (1) incorporating speech energy and phonetic content information to compute an improved probability measure at the individual speech frame level; (2) combining binary frame-level decisions into a binary segment-level decision; and (3) combining the binary segment-level decisions into a single N-way segment level decision. It was shown that the two-way classifiers can be combined to achieve 100% speaker identification performance for large speaker populations. >

...read moreread less

Proceedings Article•DOI•

Minimizing speaker variation effects for speaker-independent speech recognition

[...]

Xuedong Huang¹•Institutions (1)

Carnegie Mellon University¹

23 Feb 1992

TL;DR: A speaker-independent normalization network is constructed such that speaker variation effects can be minimized and performance evaluation showed that speaker-normalized front-end reduced the error rate by 15% for the DARPA resource management speaker- independent speech recognition task.

...read moreread less

Abstract: For speaker-independent speech recognition, speaker variation is one of the major error sources In this paper, a speaker-independent normalization network is constructed such that speaker variation effects can be minimized To achieve this goal, multiple speaker clusters are constructed from the speaker-independent training database A codeword-dependent neural network is associated with each speaker cluster The cluster that contains the largest number of speakers is designated as the golden cluster The objective function is to minimize distortions between acoustic data in each cluster and the golden speaker cluster Performance evaluation showed that speaker-normalized front-end reduced the error rate by 15% for the DARPA resource management speaker-independent speech recognition task

...read moreread less

Proceedings Article•DOI•

An integrated speech-background model for robust speaker identification

[...]

D.A. Reynolds¹, Richard Rose¹•Institutions (1)

Massachusetts Institute of Technology¹

23 Mar 1992

TL;DR: A procedure for text-independent speaker identification in noisy environments where the interfering background signals cannot be characterized using traditional broadband or impulsive noise models is examined.

...read moreread less

Abstract: A procedure for text-independent speaker identification in noisy environments where the interfering background signals cannot be characterized using traditional broadband or impulsive noise models is examined. In the procedure, both the speaker and the background processes are modeled using mixtures of Gaussians. Speaker and background models are integrated into a unified statistical framework allowing the decoupling of the underlying speech process from the noise corrupted observations via the expectation-minimization algorithm. Using this formalism, speaker model parameters are estimated in the presence of the background process, and a scoring procedure is implemented for computing the speaker likelihood in the noise corrupted environment. The performance was evaluated using a 16-speaker conversational speech database with both speech babble and white noise background processes. >

...read moreread less

Journal Article•DOI•

Text-independent speaker verification based on broad phonetic segmentation of speech

[...]

Sunil K. Gupta¹, Michael Savic²•Institutions (2)

Bell Labs¹, Rensselaer Polytechnic Institute²

01 Apr 1992-Digital Signal Processing

TL;DR: This paper investigates text-independent speaker verification, which involves the determination of whether or not a test utterance belongs to a specific reference speaker and the required information stored in the templates is different in this case.

...read moreread less

Proceedings Article•

[...]

Henk van den Heuvel, Toni Rietveld

01 Jan 1992

Book Chapter•DOI•

Speaker Independent Continuous Speech Recognition Using Continuous Density Hidden Markov Models

[...]

Chin-Hui Lee¹, Lawrence R. Rabiner¹, Roberto Pieraccini¹•Institutions (1)

Bell Labs¹

01 Jan 1992

TL;DR: A large vocabulary continuous speech recognition system developed at AT&T Bell Laboratories is described, and the methods used to provide high word recognition accuracy are discussed, focusing on the techniques adopted to select the set of fundamental speech units and to provide the acoustic models of these sub-word units based on a continuous density HMM (CDHMM) framework.

...read moreread less

Abstract: The field of large vocabulary continuous speech recognition has advanced to the point where there are several systems capable of providing greater than 95% word accuracy for speaker independent recognition, of a 1000 word vocabulary, spoken fluently for a task with a perplexity of about 60. There are several factors which account for the high performance achieved by these systems, including the use of effective feature analysis, the use of hidden Markov model (HMM) methodology, the use of context-dependent sub-word units to capture intra-word and inter-word phonemic variations, and the use of corrective training techniques to emphasize differences between acoustically similar words in the vocabulary. In this paper we describe a large vocabulary continuous speech recognition system developed at AT&T Bell Laboratories, and discuss the methods used to provide high word recognition accuracy. In particular we focus our discussion on the techniques adopted to select the set of fundamental speech units and to provide the acoustic models of these sub-word units based on a continuous density HMM (CDHMM) framework. Different modeling approaches, such as a discrete HMM and a tied-mixture HMM, will also be discussed and compared to the CDHMM approach.

...read moreread less

Patent•

Rapidly adapting speech recognition system to new speaker - transforming characteristic vectors of new speaker and reference speaker using hidden Markov modelling transformation.

[...]

Fritz Class¹, Peter Regel-Brietzmann¹, Karl Dr Ing Trottler¹•Institutions (1)

Daimler AG¹

11 Jul 1992

TL;DR: In this article, the authors used Hidden Markov Model (HMM) to adapt to a new unknown speaker using statistic modelling of word sub-units (hidden Markov model recognition) by transformation of characteristic vectors of the new speaker and a reference speaker.

...read moreread less

Abstract: The speech recognition menthol adapting to a new unknown speaker uses statistic modelling of word sub-units (Hidden Markov Model recognition). The method is carried out by transformation of characteristic vectors of the new speaker and a reference speaker. Multi-dimensional distribution functions are used in place of quantised character vectors of a reference speaker. The characteristic vectors of the new speaker and a reference speaker are transformed into a common characteristic space. To calculate the necessary transformation matrices, the new speaker repeats some predetermined works in a training phase. USE/ADVANTAGE - Speech recognition via telephone, e.g. for automatic recognition systems etc. in vehicles. Quick recognition for large vocabulary.

...read moreread less

Proceedings Article•

Speaker independent word recognition using continuous matching of parameters in time-spectral form based on statistical measure.

[...]

Tatsuya Kimura, Mitsuru Endo, Shoji Hiraoka, Katsuyuki Niyada

01 Jan 1992

Proceedings Article•DOI•

Nonlinear vectorial interpolation for speaker recognition

[...]

Yifan Gong¹, Jean-Paul Haton¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

23 Mar 1992

TL;DR: The authors address the problem of speaker recognition using very short utterances, both for training and for recognition, using a nonlinear vectorial interpolation technique to exploit speaker-specific correlations between two suitably defined parameter vector sequences.

...read moreread less

Abstract: The authors address the problem of speaker recognition using very short utterances, both for training and for recognition. The authors propose to exploit speaker-specific correlations between two suitably defined parameter vector sequences. A nonlinear vectorial interpolation technique is used to capture speaker-specific information, through least-square-error minimization. The experiments show the feasibility of recognizing a speaker among a population of about 100 persons using only an utterance of one word both for training and for recognition. >

...read moreread less

Proceedings Article•

Modeling Consistency in a Speaker Independent Continuous Speech Recognition System

[...]

Yochai Konig¹, Nelson Morgan¹, Chuck Wooters¹, Victor Abrash², Michael Cohen², Horacio Franco² - Show less +2 more•Institutions (2)

International Computer Science Institute¹, SRI International²

30 Nov 1992

TL;DR: A Gender Dependent Neural Network which can be tuned for each gender, while sharing most of the speaker independent parameters is discussed, which uses a classification network to help generate gender-dependent phonetic probabilities for a statistical (HMM) recognition system.

...read moreread less

Abstract: We would like to incorporate speaker-dependent consistencies, such as gender, in an otherwise speaker-independent speech recognition system. In this paper we discuss a Gender Dependent Neural Network (GDNN) which can be tuned for each gender, while sharing most of the speaker independent parameters. We use a classification network to help generate gender-dependent phonetic probabilities for a statistical (HMM) recognition system. The gender classification net predicts the gender with high accuracy, 98.3% on a Resource Management test set. However, the integration of the GDNN into our hybrid HMM-neural network recognizer provided an improvement in the recognition score that is not statistically significant on a Resource Management test set.

...read moreread less

Proceedings Article•DOI•

Rapid connectionist speaker adaptation

[...]

Michael Witbrock¹, P. Haffner•Institutions (1)

Carnegie Mellon University¹

23 Mar 1992

TL;DR: Using the trained model and a brief, unconstrained sample of a new speaker's voice, the system produces a speaker voice code that can be used to adapt a recognition system to the new speaker without retraining.

...read moreread less

Abstract: SVCnet, a system for modeling speaker variability, is presented. Encoder neural networks specialized for each speech sound produce low-dimensionality models of acoustical variation, and these models are further combined into an overall model of voice variability. A training procedure is described which minimizes the dependence of this model on which sounds have been uttered. Using the trained model (SVCnet) and a brief, unconstrained sample of a new speaker's voice, the system produces a speaker voice code that can be used to adapt a recognition system to the new speaker without retraining. A system which combines SVCnet with a MS-TDNN recognizer is described. >

...read moreread less

Proceedings Article•

Phoneme performance in speaker recognition.

[...]

Julian P. Eatock, John Mason

01 Jan 1992

Speaker Adaptation Using

[...]

S. Rieck, Heinrich Niemann

01 Jan 1992

TL;DR: A modification of the semi-continuous codebook updating, which allows rapid speaker adaptation, based on the idea that phonetic information already incorporated in a trained model should be used to update the codebook.

...read moreread less

Abstract: This paper presents a new approach to speaker adaptation based on semi-continuous hidden Markov models (SCHMM). We introduce a modification of the semi-continuous codebook updating, which allows rapid speaker adaptation. The approach bases on the idea that phonetic information already incorporated in a trained model should be used to update the codebook. Thus the different acoustic representation of a new speaker is learned while the connectior, between codebook entries and model states remains the same. Several experiments were carried out with a small speech sample. It is possible to demonstrate that the new codebook updating performs better than conventional SCHMM codebook updating and that using a speech sample comprising about 40 seconds of adaptation speech is enough to achieve 50 percent of the difference in performance between full speakerdependent training and no adaptation at all.

...read moreread less

Book•

The speaker's point of view as a regular component of grammar and lexicon

[...]

Valentina Alexeevna Zaitseva

01 Jan 1992

Proceedings Article•

A real-time speaker-independent continuous speech recognition system based on demi-syllable units.

[...]

Koga Shinji, Ryosuke Isotani, Satoshi Tsukada, Kazunaga Yoshida, Kaichiro Hatazaki, Takao Watanabe - Show less +2 more

01 Jan 1992

The Development of the Speaker Independent ARM Continuous Speech Recognition System

[...]

M. J. Russell

01 Jan 1992

TL;DR: A speaker independent continuous speech recognition system based on phoneme level hidden Markov models is described, configured to recognise continuously spoken airborne reconnaissance reports, a task which involves a vocabulary of approximately 500 words.

...read moreread less

Abstract: : This memorandum describes the development of a speaker independent continuous speech recognition system based on phoneme level hidden Markov models. The system is configured to recognise continuously spoken airborne reconnaissance reports, a task which involves a vocabulary of approximately 500 words. On a test set of speech from 80 male subjects, the final system achieves a word accuracy of 74.1% with no explicit syntactic constraints.

...read moreread less

Proceedings Article•DOI•

A real time speaker-independent continuous speech recognition system

[...]

T. Iwasaki¹, K. Nakajima¹•Institutions (1)

Mitsubishi Electric¹

30 Aug 1992

TL;DR: Reports on the speaker-independent continuous speech recognition system with speaker adaptation, based on continuous HMM with mixture Gaussian distribution, and several fast algorithms are applied for real time calculation.

...read moreread less

Abstract: Reports on the speaker-independent continuous speech recognition system with speaker adaptation. This system is based on continuous HMM with mixture Gaussian distribution, and several fast algorithms are applied for real time calculation. To reduce HMM states, the system uses the mono-state context dependent model, called acoustic phonetic segment. The calculation of mixture Gaussian distributions is reduced, by varying the number of mixtures dynamically. The fast Viterbi calculation algorithm with duration control is used. The system has been successfully implemented as a man-machine interface of the plant control expert system, achieving sentence accuracy of 98.7%. >

...read moreread less

Proceedings Article•

Speaker adaptation by modifying mixture coefficients of speaker-independent mixture Gaussian HMMs.

[...]

Tatsuo Matsuoka, Kiyohiro Shikano

01 Jan 1992