Showing papers on "Speaker recognition published in 1987"

PDF

Open Access

Patent•DOI•

[...]

Akihiro Kuroda¹, Masafumi Nishimura¹, Kazuhide Sugawara¹•Institutions (1)

04 Feb 1987-Journal of the Acoustical Society of America

TL;DR: In this paper, confusion coefficients between the labels of the label alphabet for initial training and those for adaptation are determined by alignment of adaption speech with the corresponding initially trained Markov model.

...read moreread less

Abstract: For circumstance adaption, for example, speaker adaption, confusion coefficients between the labels of the label alphabet for initial training and those for adaption are determined by alignment of adaption speech with the corresponding initially trained Markov model. That is, each piece of adaptation speech is aligned with a corresponding initially trained Markov model by the Viterbi algorithm, and each label in the adaption speech is mapped onto one of the states of the Markov models. In respect of each adaptation lable ID, the parameter values for each initial training label of the states which are mapped onto the adaptation label in concern are accumulated and normalized to generate a confusion coefficient between each initial training label and each adaptation label. The parameter table of each Markov model is rewritten in respect of the adaptation label alphabet using the confusion coefficients.

...read moreread less

204 citations

Proceedings Article•DOI•

On the automatic segmentation of speech signals

[...]

Torbjørn Svendsen¹, F. Soong•Institutions (1)

Bell Labs¹

06 Apr 1987

TL;DR: Three different approaches for automatically segmenting speech into phonetic units are described, onebased on template matching, one based on detecting the spectral changes that occur at the boundaries between phoneticunits and one based upon a constrained-clustering vector quantization approach.

...read moreread less

Abstract: For large vocabulary and continuous speech recognition, the sub-word-unit-based approach is a viable alternative to the whole-word-unit-based approach. For preparing a large inventory of subword units, an automatic segmentation is preferrable to manual segmentation as it substantially reduces the work associated with the generation of templates and gives more consistent results. In this paper we discuss some methods for automatically segmenting speech into phonetic units. Three different approaches are described, one based on template matching, one based on detecting the spectral changes that occur at the boundaries between phonetic units and one based on a constrained-clustering vector quantization approach. An evaluation of the performance of the automatic segmentation methods is given.

...read moreread less

156 citations

Journal Article•DOI•

Report: A vector quantization approach to speaker recognition

[...]

F.K. Soong¹, Aaron E. Rosenberg¹, Bling-Hwang Juang¹, Lawrence R. Rabiner¹•Institutions (1)

Bell Labs¹

04 Mar 1987-AT&T technical journal

TL;DR: This work has shown that text dependent speech material used for speaker recognition can be either text dependent (constrained text) or text independent (free text).

...read moreread less

Abstract: Automatic speaker recognition has long been an interesting and challenging problem to speech researchers.1−10 The problem, depending on the nature of the final task, can be classified into two different categories: speaker verification and speaker identification. In a speaker verification task, the recognizer is asked to verify an identity claim made by an unknown speaker and a decision to reject or accept the identity claim is made. In a speaker identification task, the recognizer is asked to decide which out of a population of N speakers is best classified as the unknown speaker. The decision may include a choice of “no classification” (i.e., a choice that the specific speaker is not in a given closed set of speakers). The input speech material used for speaker recognition can be either text dependent (constrained text) or text independent (free text).

...read moreread less

138 citations

Patent•DOI•

Fixed text speaker verification method and apparatus

[...]

Jayant M. Naik¹, George R. Doddington¹•Institutions (1)

Texas Instruments¹

03 Apr 1987-Journal of the Acoustical Society of America

TL;DR: Speaker verification is performed by computing principal components of a fixed text statement comprising a speaker identification code and a two-word phrase, and principal spectral components of an random word phrase.

...read moreread less

Abstract: Speaker verification is performed by computing principal components of a fixed text statement comprising a speaker identification code and a two-word phrase, and principal spectral components of a random word phrase. A multi-phrase strategy is utilized in access control to allow successive verification attempts in a single session, if the speaker fails initial attempts. Based upon a verification attempt, the system produces a verification score which is compared with a threshold value. On successive attempts, the criterion for acceptance is changed, and one of a number of criteria must be satisfied for acceptance in subsequent attempts. A speaker normalization function can also be invoked to modify the verification score of persons enrolled with the system who inherently produce scores which result in denial of access. Accuracy of the verification system is enhanced by updating the reference template which then more accurately symbolizes the person's speech signature.

...read moreread less

79 citations

Proceedings Article•DOI•

Rapid speaker adaptation using a probabilistic spectral mapping

[...]

Richard Schwartz, Yen-Lu Chow, Francis Kubala

01 Apr 1987

TL;DR: A new algorithm is introduced that transforms hidden Markov models of speech derived from one "prototype" speaker so that they model the speech of a new speaker in the form of a probabilistic spectral mapping.

...read moreread less

Abstract: This paper deals with rapid speaker adaptation for speech recognition. We introduce a new algorithm that transforms hidden Markov models of speech derived from one "prototype" speaker so that they model the speech of a new speaker. The Speaker normalization is accomplished by a probabilistic spectral mapping from one speaker to another. For a 350 word task with a grammar and using only 15 seconds of speech for normalization, the recognition accuracy is 97% averaged over 6 speakers. This accuracy would normally require over 5 minutes of speaker dependent training. We derive the probabilistic spectral transformation of HMMs, describe an algorithm to estimate the transformation, and present recognition results.

...read moreread less

76 citations

Patent•DOI•

Multiple parameter speaker recognition system and methods

[...]

Edwin H. Wrench, Robert E. Wohlford, Joe A. Naylor

16 Jan 1987-Journal of the Acoustical Society of America

TL;DR: In this article, an apparatus operates to identify the speech signal of an unknown speaker as one of a finite number of speakers, each speaker is modeled and recognized with any example of their speech, and the output is a list of scores that measure how similar the input speaker is to each of the speakers whose models are stored in the system.

...read moreread less

Abstract: An apparatus operates to identify the speech signal of an unknown speaker as one of a finite number of speakers. Each speaker is modeled and recognized with any example of their speech. The input to the system is analog speech and the output is a list of scores that measure how similar the input speaker is to each of the speakers whose models are stored in the system. The system includes front end processing means which is responsive to the speech signal to provide digitized samples of the speech signal at an output which are stored in a memory. The stored digitized samples are then retrieved and divided into frames. The frames are processed to provide a series of speech parameters indicative of the nature of the speech content in each of the frames. The processor for producing the speech parameters is coupled to either a speaker modeling means, whereby a model for each speaker is provided and consequently stored, or a speaker recognition mode, whereby the speech parameters are again processed with current parameters and compared with the stored parameters during each speech frame. The comparison is accomplished over a predetermined number of frames whereby a favorable comparison is indicative of a known speaker for which a model is stored.

...read moreread less

65 citations

Journal Article•DOI•

Text-dependent speaker verification using vector quantization source coding

[...]

D. K. Burton

01 Feb 1987-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: Several vector quantization approaches to the problem of text-dependent speaker verification are described in this paper, where a source codebook is designed to represent a particular speaker saying a particular utterance, and this same utterance is spoken by a speaker to be verified and is encoded in the source code book representing the speaker whose identity was claimed.

...read moreread less

Abstract: Several vector quantization approaches to the problem of text-dependent speaker verification are described. In each of these approaches, a source codebook is designed to represent a particular speaker saying a particular utterance. Later, this same utterance is spoken by a speaker to be verified and is encoded in the source codebook representing the speaker whose identity was claimed. The speaker is accepted if the verification utterance's quantization distortion is less than a prespecified speaker-specific threshold. The best approach achieved a 0.7 percent false acceptance rate and a 0.6 percent false rejection rate on a speaker population comprising 16 admissible speakers and 111 casual imposters. The approaches are described, and detailed experimental results are presented and discussed.

...read moreread less

58 citations

Journal Article•DOI•

Network-based connected digit recognition

[...]

M. Bush¹, G. Kopec•Institutions (1)

Brown University¹

01 Oct 1987-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: A system for speaker-independent connected digit recognition is described in which explicit acoustic-phonetic features and constraints play a significant role and the best configurations of the recognizer achieve string recognition accuracies.

...read moreread less

Abstract: A system for speaker-independent connected digit recognition is described in which explicit acoustic-phonetic features and constraints play a significant role. The digit vocabulary is modeled using a finite-state pronunciation network whose branches correspond to meaningful acoustic-phonetic units. Each branch is associated with an acoustic pattern matcher which employs a combination of whole-spectrum and feature-based metrics. The system has been evaluated using 17 000 utterances from the Texas Instruments (TI) multidialect, connected digits database. The best configurations of the recognizer achieve string recognition accuracies of approximately 96 and 97 percent when the length of the input string is unknown and known, respectively, and when different talkers are used for training and testing.

...read moreread less

46 citations

Proceedings Article•DOI•

Techniques for suppression of an interfering talker in co-channel speech

[...]

J. Naylor, S. Boll

01 Apr 1987

TL;DR: The problem addressed by this study is the suppression of an undesired talker when two talkers are communicating simultaneously on the same monophonic channel (co-channel speech).

...read moreread less

Abstract: The problem addressed by this study is the suppression of an undesired talker when two talkers are communicating simultaneously on the same monophonic channel (co-channel speech). Two different applications are considered, improved intelligibility for human listeners, and improved performance for automatic speech and speaker recognition (ASR) systems. For the human intelligibility problem, the desired talker is the weaker of the two signals with voice-to-voice power ratios (Power desired / Power interference), or VVRs, as low as -18dB. For ASR applications, the desired talker is the stronger of the two signals, with VVRs as low as 5dB. Signal analysis algorithms have been developed which attempt to separate the co-channel spectrum into components due to the two different (stronger and weaker) talkers.

...read moreread less

42 citations

Journal Article•DOI•

Dynamic speaker adaptation for feature-based isolated word recognition

[...]

Richard M. Stern¹, Moshe J. Lasry•Institutions (1)

Carnegie Mellon University¹

01 Jun 1987-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: A set of dynamic adaptation procedures for updating expected feature values during recognition using maximum a posteriori probability (MAP) estimation techniques to update the mean vectors of sets of feature values on a speaker-by-speaker basis.

...read moreread less

Abstract: In this paper, we describe efforts to improve the performance of FEATURE, the Carnegie-Mellon University speaker-independent speech recognition system that classifies isolated letters of the English alphabet by enabling the system to learn the acoustical characteristics of individual speakers. Even when features are designed to be speaker-independent, it is frequently observed that feature values may vary more from speaker to speaker for a single letter than they vary from letter to letter. In these cases, it is necessary to adjust the system's statistical description of the features of individual speakers to obtain improved recognition performance. This paper describes a set of dynamic adaptation procedures for updating expected feature values during recognition. The algorithm uses maximum a posteriori probability (MAP) estimation techniques to update the mean vectors of sets of feature values on a speaker-by-speaker basis. The MAP estimation algorithm makes use of both knowledge of the observations input to the system from an individual speaker and the relative variability of the features' means within and across all speakers. In addition, knowledge of the covariance of the features' mean vectors across the various letters enables the system to adapt its representation of similar-sounding letters after any one of them is presented to the classifier. The use of dynamic speaker adaptation improves classification performance of FEATURE by 49 percent after four presentations of the alphabet, when the system is provided with supervised training indicating which specific utterance had been presented to the classifier from a particular user. Performance can be improved by as much as 31 percent when the system is allowed to adapt passively in an unsupervised learning mode. without any information from individual users.

...read moreread less

41 citations

Journal Article•DOI•

Learning and Plan Refinement in a Knowledge-Based System for Automatic Speech Recognition

[...]

Renato De Mori¹, Lily Lam², Michel Gilloux•Institutions (2)

McGill University¹, Concordia University²

01 Feb 1987-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A semiautomatic design of a speech recognition system can be done as a planning activity and results in the recognition of connected letters spoken by 100 speakers are presented.

...read moreread less

Abstract: This paper shows how a semiautomatic design of a speech recognition system can be done as a planning activity. Recognition performances are used for deciding plan refinement. Inductive learning is performed for setting action preconditions. Experimental results in the recognition of connected letters spoken by 100 speakers are presented.

...read moreread less

Journal Article•DOI•

The Contribution of Speech Rhythm and Pitch to Speaker Recognition

[...]

Wim A. van Dommelen¹•Institutions (1)

University of Kiel¹

01 Oct 1987-Language and Speech

TL;DR: In this paper, a Fourcin laryngograph was used to make recordings of three male speakers and the Lx signals were presented to a group of eight listeners, who performed both an AX discrimination and a speaker identification test.

...read moreread less

Abstract: Using a Fourcin laryngograph, Lx recordings of three male speakers were made. After manipulation, the Lx signals were presented to a group of eight listeners, who performed both an AX discrimination and a speaker identification test. The results show that the listeners made use of the three parameters varied in the listening tests, viz. speech rhythm, F0 contour and F0 height. Furthermore, the data suggest that the relevance of these different parameters for speaker recognition is speaker-dependent rather than absolute.

...read moreread less

Journal Article•DOI•

Some performance benchmarks for isolated work speech recognition systems

[...]

Lawrence R. Rabiner¹, Jay G. Wilpon¹•Institutions (1)

AT&T¹

01 Sep 1987-Computer Speech & Language

TL;DR: Algorithms based on both template matching (via dynamic time warping (DTW) procedures) and hidden Markov models (HMMs) have been developed which yield high accuracy on several standard vocabularies, including the 10 digits and the set of 26 letters of the English alphabet.

...read moreread less

Journal Article•DOI•

The analog voice privacy system

[...]

Keith B. Bauer¹, Donald E. Bock¹, Richard V. Cox¹, James D. Johnston¹, James H. Snyder¹ - Show less +1 more•Institutions (1)

Bell Labs¹

02 Jan 1987-AT&T technical journal

TL;DR: The underlying principles of the AVPS algorithm, its implementation, and laboratory test results are described, and the quality of the decrypted speech is considered very natural, and speaker recognition is retained — a significant advantage over digital vocoders.

...read moreread less

Abstract: The Analog Voice Privacy System (AVPS) is a voice scrambler that permutes individual output samples from a subband coder analysis filterbank. The system has 125! possible permutation keys, giving it the cryptanalytical strength of a digital encryption system. However, it retains the good voice-quality characteristics of analog scramblers. The AVPS has been implemented in a real-time hardware prototype designed for evaluation in telephone environments and works with any modular telephone and standard 120V ac electrical power. The unit contains two circuit boards — one for analog and one for digital processing — that each use four digital signal processors. To date, we have successfully tested it over long-distance telephone connections, several analog and digital PBXs and telephone switches, and a channel simulator. The quality of the decrypted speech is considered very natural, and speaker recognition is retained — a significant advantage over digital vocoders. This paper describes the underlying principles of the AVPS algorithm, its implementation, and laboratory test results.

...read moreread less

Journal Article•DOI•

Electronic Speech Recognition

[...]

W.A. Ainsworth

01 Jan 1987-Electronics and Power

Proceedings Article•DOI•

Evaluation of a high performance speaker verification system for access control

[...]

J. Naik¹, G. Doddington•Institutions (1)

Texas Instruments¹

01 Apr 1987

TL;DR: The results of an extensive evaluation of a speaker verification system for access control using a 200 speaker population and over 40,000 impostor attempts, both performed on line, over a 4-month period are presented.

...read moreread less

Abstract: The results of an extensive evaluation of a speaker verification system for access control are presented. The system employs an algorithm based on the Principal Spectral Components representation derived from the short term spectrum of the speech signal. This system designed for access control applications has been evaluated using a 200 speaker population and a total of over 13,000 true speaker attempts and over 40,000 impostor attempts, both performed on line, over a 4-month period. A true speaker rejection rate of less than 1 % and an impostor acceptance rate of less than 0.1 % have been obtained.

...read moreread less

Journal Article•DOI•

Automatic measurement of speech recognition performance: a comparison of six speaker-dependent recognition devices☆

[...]

Howard C. Nusbaum¹, David B. Pisoni¹•Institutions (1)

Indiana University¹

01 Jun 1987-Computer Speech & Language

TL;DR: A computer-controlled testing system and a set of standard tests are developed to assess the performance of speech recognition devices sold by Texas Instruments, Votan, Dragon, IBM, Interstate, and NEC, demonstrating several reliable performance differences among these systems.

...read moreread less

Proceedings Article•DOI•

Speaker-dependent speech recognition as the basis for a speech training aid

[...]

Diane Kewley-Port¹, Charles S. Watson, Daniel P. Maki, Daniel J. Reed•Institutions (1)

Indiana University¹

01 Apr 1987

TL;DR: The results of several evaluations of the utility of the SRB metric as a substitute for human judgement of the goodness of articulation of a whole word are presented.

...read moreread less

Abstract: The Indiana Speech Training Aid project (ISTRA) is evaluating the use of speaker-dependent speech recognition to provide feedback for deaf speakers or normal-hearing misarticulating children, to assist them in improving their speech. Ongoing clinical trials of the ISTRA system have demonstrated effective improvement in speech production. The theoretical approach is first to form templates from a child's current best productions of a word and then to use the score generated by matching new utterances to these templates as feedback to indicate the goodness of articulation. This paper presents the results of several evaluations of the utility of the SRB metric as a substitute for human judgement of the goodness of articulation of a whole word. Also, the confusion matrices resulting from recognition of acoustically similar words are discussed in terms of possible modifications of the algorithms.

...read moreread less

Proceedings Article•

Use of procedural knowledge for automatic speech recognition

[...]

Renato De Mori¹, Ettore Merlo¹, Mathew Palakal¹, Jean Rouat¹•Institutions (1)

McGill University¹

23 Aug 1987

TL;DR: A paradigm for automatic speech recognition using networks of actions performing variable depth analysis is presented and preliminary results in the recognition of isolated letters and digits are presented.

...read moreread less

Abstract: A paradigm for automatic speech recognition using networks of actions performing variable depth analysis is presented. The paradigm produces descriptions of speech properties that are related to speech units through Markov models representing system performance. Preliminary results in the recognition of isolated letters and digits are presented.

...read moreread less

Proceedings Article•DOI•

Vector quantization for speaker adaptation

[...]

H. Bonneau¹, J. Gauvain•Institutions (1)

Centre national de la recherche scientifique¹

01 Apr 1987

TL;DR: Comparison of performance of the two methods shows that a new speaker's codebook is not necessary to represent the new speaker, and a vector quantization approach to speaker adaptation is evaluated.

...read moreread less

Abstract: In view of designing a speaker-independent large vocabulary recognition system, we evaluate a vector quantization approach to speaker adaptation. Only one speaker (the reference speaker) pronounces the application vocabulary. He also pronounces a small vocabulary called the adaptation vocabulary. Each new speaker then merely pronounces the adaptation vocabulary. Two adaptation methods are investigated, establishing a correspondence between the codebooks of these two speakers. This allows us to transform the reference utterances of the reference speaker into suitable references for the new speaker. Method I uses a transposed codebook to represent the new speaker during the recognition process whereas Method II uses a codebook which is obtained by clustering on the new speaker's pronunciation of the adaptation vocabulary. Experiments were carried out on a 20-speaker database (10 male, 10 female). The adaptation vocabulary contains 136 words; the application one has 104 words. The mean recognition error rate without adaptation is 22.3% for inter-speaker experiments; after one of the two methods has been implemented the mean recognition error rate is 10.5%. Comparison of performance of the two methods shows that a new speaker's codebook is not necessary to represent the new speaker.

...read moreread less

The application of phoneme sequence constraints to word boundary identification in automatic, continuous speech recognition.

[...]

Jonathan Harrington, Ian Johnson, Maggie Cooper

01 Jan 1987

TL;DR: This study examines the set of CV, VC, CVC and some CCVC sequences which are non-occurring in monomorphemic words in a 20,000 word lexicon and suggests that many sequences in which the prevocalic and postvocalic consonants are similar, or identical, are excluded.

...read moreread less

Abstract: This study examines the set of CV, VC, CVC and some CCVC sequences which are non-occurring in monomorphemic words in a 20,000 word lexicon. A preliminary analysis suggests that many sequences in which the prevocalic and postvocalic consonants are similar, or identical, are excluded. The sequences are discussed in relation to 'reduced forms', characteristic offast speech, word boundary assimilation and lexical access.

...read moreread less

Book Chapter•DOI•

A general fuzzy-parsing scheme for speech recognition

[...]

Enrique Vidal¹, Francisco Casacuberta¹, Emilio Sanchis¹, Jose M Benedi¹•Institutions (1)

University of Valencia¹

01 May 1987

TL;DR: A Speech Recognition Methodology is proposed which is based on the general assumption of ‘fuzzyness’ of both speech-data and knowledge-sources and on other fundamental assumptions which are also the bases of the proposed methodology.

...read moreread less

Abstract: In this paper a Speech Recognition Methodology is proposed which is based on the general assumption of ‘fuzzyness’ of both speech-data and knowledge-sources. Besides this general principle, there are other fundamental assumptions which are also the bases of the proposed methodology: ‘Modularity’ in the knowledge organization, ‘Homogeneity’ in the representation of data and knowledge, ‘Passiveness’ of the ‘understanding flow’ (no backtraking or feedback), and ‘Parallelism’ in the recognition activity.

...read moreread less

Phoneme-based continuous speech recognition without pre-segmentation.

[...]

Yifan Gong, Jean Paul Haton

01 Jan 1987

Experimental evaluation of Italian language models for large-dictionary speech recognition.

[...]

M. Codogno, Luciano Fissore, Alessandro Martelli, G. Pirani, Giampiero Volpi - Show less +1 more

01 Jan 1987

Journal Article•DOI•

Discrete word recognition using energy-time profiles

[...]

W. M. Lai¹, P. C. Ching¹, Y. T. Chan²•Institutions (2)

The Chinese University of Hong Kong¹, Royal Military College of Canada²

01 Dec 1987-International Journal of Electronics

TL;DR: A novel isolated-word recognition system for monosyllabic tonal languages is proposed which depends on the energy-time profiles of the utterances at different frequency bands and a mean accuracy of 97-99% was achieved for speaker-dependent recognition over the ten Cantonese digits.

...read moreread less

Abstract: A novel isolated-word recognition system for monosyllabic tonal languages is proposed which depends on the energy-time profiles (ETP) of the utterances at different frequency bands. Training procedures, together with the classification strategy will be discussed. A mean accuracy of 97-99% was achieved for speaker-dependent recognition over the ten Cantonese digits.

...read moreread less

Journal Article•DOI•

Some fundamental considerations regarding voice identification

[...]

Thomas Shipp, E. Thomas Doherty, Harry Hollien

01 Aug 1987-Journal of the Acoustical Society of America

TL;DR: Several important areas need substantial clarification or expansion before the reported findings of Koenig, ‘‘Spectrographic voice identification: A forensic survey’’ can be readily accepted.

...read moreread less

Abstract: Several important areas need substantial clarification or expansion before the reported findings of Koenig, ‘‘Spectrographic voice identification: A forensic survey’’ [J. Acoust. Soc. Am. 79, 2088–2090 (1986)], can be readily accepted. They are: (1) the method of ‘‘voiceprint’’ analysis used, (2) ‘‘voiceprint’’ examiners’ qualifications, and (3) the means for determining the FBI’s correct identification.

...read moreread less

Speech recognition based on a text-to-speech synthesis system

[...]

Mats Blomberg, Rolf Carlson, Kjell Elenius, Björn Granström, Sheri Hunnicutt - Show less +1 more

01 Jan 1987

TL;DR: This paper proposes the use of synthetic speech as a means of handling the collection of reference data and speaker normalization in large-vocabulary speech recognition.

...read moreread less

Abstract: A major problem in large-vocabulary speech recognition is the collection of reference data and speaker normalization. In this paper we propose the use of synthetic speech as a means of handling this problem. An experimental scheme for such a system will be described.

...read moreread less

Evaluation of speaker-independent isolated-word recognition systems over telephone network.

[...]

Dominique Dutoit

01 Jan 1987

Patent•

Voice recognition response system

[...]

Yasuhiro Nara, Hiroshi Tanaka, Tatsuro Matsumoto

20 Mar 1987

Proceedings Article•DOI•

Duration modelling in finite state automata for speech recognition and fast speaker adaptation

[...]

M. Codogno¹, L. Fissore•Institutions (1)

CSELT¹

01 Apr 1987

TL;DR: Two different approaches are exploited to obtain sets of models in which the state duration is characterized by suited probability density functions, and two difficult speaker-dependent recognition tasks have been carried out.

...read moreread less

Abstract: The classical first-order Hidden Markov Models with continuous probabilistic density function (HMMCs) seem to be a promising tool for speech modelling with reference to the task of both isolated word and continuous speech recognition. However, these models have a strong limitation because they are poorly able to capture the information about duration, sometimes the most important feature that permits to distinguish between similar sounds. In this paper two different approaches are exploited, in such a way to obtain sets of models in which the state duration is characterized by suited probability density functions. In order to evaluate the performance of both model sets, two difficult speaker-dependent recognition tasks have been carried out. It has been also tested the opportunity of using a limited-size training lexicon for a new speaker, and merge these duration models with the other ones obtained through some speakers.

...read moreread less