Showing papers on "Speaker recognition published in 1989"

PDF

Open Access

Book•DOI•

Automatic Speech Recognition

[...]

Kai-Fu Lee

01 Jan 1989

528 citations

Proceedings Article•DOI•

Continuous hidden Markov modeling for speaker-independent word spotting

[...]

J.R. Rohlicek, W. Russell, Salim Roukos, H. Gish

23 May 1989

TL;DR: A word-spotting system using Gaussian hidden Markov models is presented and it is observed that performance can be greatly affected by the choice of features used, the covariance structure of the Gaussian models, and transformations based on energy and feature distributions.

...read moreread less

Abstract: A word-spotting system using Gaussian hidden Markov models is presented. Several aspects of this problem are investigated. Specifically, results are reported on the use of various signal processing and feature transformation techniques. The authors have observed that performance can be greatly affected by the choice of features used, the covariance structure of the Gaussian models, and transformations based on energy and feature distributions. Due to the open-set nature of the problem, the specific techniques for modeling out-of-vocabulary speech and the choice of scoring metric can have a significant effect on performance. >

...read moreread less

280 citations

Journal Article•DOI•

Integration of acoustic and visual speech signals using neural networks

[...]

Ben P. Yuhas¹, Moise H. Goldstein¹, Terrence J. Sejnowski¹•Institutions (1)

Johns Hopkins University¹

01 Nov 1989-IEEE Communications Magazine

TL;DR: It is demonstrated that neural networks are able to extract speech information from the visual images and that this information can be used to improve automatic vowel recognition.

...read moreread less

Abstract: Results from a series of experiments that use neural networks to process the visual speech signals of a male talker are presented. In these preliminary experiments, the results are limited to static images of vowels. It is demonstrated that these networks are able to extract speech information from the visual images and that this information can be used to improve automatic vowel recognition. The structure of speech and its corresponding acoustic and visual signals are reviewed. The specific data that was used in the experiments along with the network architectures and algorithms are described. The results of integrating the visual and auditory signals for vowel recognition in the presence of acoustic noise are presented. >

...read moreread less

212 citations

Proceedings Article•DOI•

Speaker adaptation for large vocabulary speech recognition systems using speaker Markov models

[...]

Gerhard Rigoll¹•Institutions (1)

Fraunhofer Society¹

23 May 1989

TL;DR: In this paper, an alternative approach to speaker adaptation for a large-vocabulary hidden-Markov-model-based speech recognition system is described, based on the use of a stochastic model representing the different properties of the new speaker and an old speaker for which the full training set of 20 minutes is available.

...read moreread less

Abstract: An alternative approach to speaker adaptation for a large-vocabulary hidden-Markov-model-based speech recognition system is described. The goal of this investigation was to train the IBM speech recognition system with only five minutes of speech data from a new speaker instead of the usual 20 minutes without the recognition rate dropping by more than 1-2%. The approach is based on the use of a stochastic model representing the different properties of the new speaker and an old speaker for which the full training set of 20 minutes is available. It is called a speaker Markov model. It is shown how the parameters of such a model can be derived and how it can be used for transforming the training set of the old speaker in order to use it in addition to the short training set of the new speaker. The adaptation algorithm was tested with 12 speakers. The average recognition rate dropped from 96.4% to 95.2% for a 5000-word vocabulary task. The decoding time increased by a factor of 1.35; this factor is often 3-5 if other adaptation algorithms are used. >

...read moreread less

180 citations

Proceedings Article•DOI•

HMM clustering for connected word recognition

[...]

Lawrence R. Rabiner¹, Chin-Hui Lee¹, Biing-Hwang Juang¹, Jay G. Wilpon¹•Institutions (1)

Bell Labs¹

23 May 1989

TL;DR: The authors have developed a splitting procedure which initializes each new cluster (statistical model) by splitting off all tokens in the training set which were poorly represented by the current set of models, which gives excellent recognition performance in connected-word tasks.

...read moreread less

Abstract: The authors describe an HMM (hidden Markov model) clustering procedure and discuss its application to connected-word systems and to large-vocabulary recognition based on phonelike units. It is shown that the conventional approach of maximizing likelihood is easily implemented but does not work well in practice, as it tends to give improved models of tokens for which the initial model was generally quite good, but does not improve tokens which are poorly represented by the initial model. The authors have developed a splitting procedure which initializes each new cluster (statistical model) by splitting off all tokens in the training set which were poorly represented by the current set of models. This procedure is highly efficient and gives excellent recognition performance in connected-word tasks. In particular, for speaker-independent connected-digit recognition, using two HMM-clustered models, the recognition performance is as good as or better than previous results using 4-6 models/digit obtained from template-based clustering. >

...read moreread less

108 citations

Proceedings Article•DOI•

Speaker verification over long distance telephone lines

[...]

Jayant M. Naik¹, Lorin P. Netsch¹, George R. Doddington¹•Institutions (1)

Texas Instruments¹

23 May 1989

TL;DR: The authors present the results of speaker-verification technology development for use over long-distance telephone lines, using template-based dynamic time warping and hidden Markov modeling for discriminant analysis techniques which improve the discrimination between true speakers and imposters.

...read moreread less

Abstract: The authors present the results of speaker-verification technology development for use over long-distance telephone lines. A description is given of two large speech databases that were collected to support the development of new speaker verification algorithms. Also discussed are the results of discriminant analysis techniques which improve the discrimination between true speakers and imposters. A comparison is made of the performance of two speaker-verification algorithms, one using template-based dynamic time warping, and the other, hidden Markov modeling. >

...read moreread less

108 citations

Proceedings Article•DOI•

Unsupervised speaker adaptation by probabilistic spectrum fitting

[...]

Stephen Cox¹, John S. Bridle•Institutions (1)

BT Group¹

23 May 1989

TL;DR: A general approach to speaker adaptation in speech recognition is described, in which speaker differences are treated as arising from a parameterized transformation.

...read moreread less

Abstract: A general approach to speaker adaptation in speech recognition is described, in which speaker differences are treated as arising from a parameterized transformation. Given some unlabeled data from a particular speaker, a process is described which maximizes the likelihood of this data by estimating the transformation parameters at the same time as refining estimates of the labels. The technique is illustrated using isolated vowel spectra and phonetically motivated linear spectrum transformations and is shown to give significantly better performance than nonadaptive classification. >

...read moreread less

70 citations

Proceedings Article•DOI•

Shift-invariant, multi-category phoneme recognition using Kohonen's LVQ2

[...]

Erik McDermott, Shigeru Katagiri

23 May 1989

TL;DR: A shift-tolerant neural network architecture for phoneme recognition based on LVQ2, an algorithm which pays close attention to approximating the optimal Bayes decision line in a discrimination task, which is suggested to be the basis for a successful speech recognition system.

...read moreread less

Abstract: The authors describe a shift-tolerant neural network architecture for phoneme recognition. The system is based on LVQ2, an algorithm which pays close attention to approximating the optimal Bayes decision line in a discrimination task. Recognition performances in the 98-99% correct range were obtained for LVQ2 networks aimed at speaker-dependent recognition of phonemes in small but ambiguous Japanese phonemic classes. A correct recognition rate of 97.7% was achieved by a single, larger LVQ2 network covering all Japanese consonants. These recognition results are at least as high as those obtained in the time delay neural network system and suggest that LVQ2 could be the basis for a successful speech recognition system. >

...read moreread less

66 citations

Patent•DOI•

Method and apparatus for real time speech recognition with and without speaker dependency

[...]

Tiecheng Yu¹, Ning Bi¹, Meiling Rong¹, Enyao Zhang¹•Institutions (1)

Academia Sinica¹

08 Nov 1989-Journal of the Acoustical Society of America

TL;DR: In this paper, a method and apparatus for real-time speech recognition with and without speaker dependency is presented. But the method is not suitable for speech recognition in the presence of speaker dependency.

...read moreread less

Abstract: A method and apparatus for real time speech recognition with and without speaker dependency which includes the following steps. Converting the speech signals into a series of primitive sound spectrum parameter frames; detecting the beginning and ending of speech according to the primitive sound spectrum parameter frame, to determine the sound spectrum parameter frame series; performing non-linear time domain normalization on the sound spectrum parameter frame series using sound stimuli, to obtain speech characteristic parameter frame series with predefined lengths on the time domain; performing amplitude quantization normalization on the speech characteristic parameter frames; comparing the speech characteristic parameter frame series with the reference samples, to determine the reference sample which most closely matches the speech characteristic parameter frame series; and determining the recognition result according to the most closely matched reference sample.

...read moreread less

54 citations

Proceedings Article•DOI•

Word recognition using whole word and subword models

[...]

Chin-Hui Lee¹, Biing-Hwang Juang¹, F.K. Soong¹, Lawrence R. Rabiner¹•Institutions (1)

Bell Labs¹

23 May 1989

TL;DR: A unified framework is discussed which can be used to accomplish the goal of creating effective basic models of speech and points out the relative advantages of each type of speech unit based on the results of a series of recognition experiments.

...read moreread less

Abstract: The problem of how to select and construct a set of fundamental unit statistical models suitable for speech recognition is addressed. A unified framework is discussed which can be used to accomplish the goal of creating effective basic models of speech. The performances of three types of fundamental units, namely whole word, phoneme-like, and acoustic segment units, in a 1109-word vocabulary speech recognition task are compared. The authors point out the relative advantages of each type of speech unit based on the results of a series of recognition experiments. >

...read moreread less

47 citations

Proceedings Article•DOI•

An improved sub-word based speech recognizer

[...]

Torbjørn Svendsen¹, Kuldip K. Paliwal¹, E. Harborg¹, P. O. Husoy¹•Institutions (1)

Norwegian Institute of Technology¹

23 May 1989

TL;DR: The authors describe a system for speaker-dependent speech recognition based on acoustic subword units that showed results comparable to those of whole-word-based systems.

...read moreread less

Abstract: The authors describe a system for speaker-dependent speech recognition based on acoustic subword units. Several strategies for automatic generation of an acoustic lexicon are outlined. Preliminary tests have been performed on a small vocabulary. In these tests, the proposed system showed results comparable to those of whole-word-based systems. >

...read moreread less

Journal Article•DOI•

Unsupervised speaker adaptation based on hierarchical spectral clustering

[...]

Sadaoki Furui

01 Dec 1989-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: An automatic speaker adaptation algorithm for speech recognition, in which a small amount of training material of unspecified text can be used, which reduces the mean word recognition error rate from 4.9 to 2.9%.

...read moreread less

Abstract: The author proposes an automatic speaker adaptation algorithm for speech recognition, in which a small amount of training material of unspecified text can be used. The algorithm is easily applied to vector-quantization- (VQ) speech recognition systems consisting of a VQ codebook and a word dictionary in which each word is represented as a sequence of codebook entries. In the adaptation algorithm, the VQ codebook is modified for each new speaker, whereas the word dictionary is universally used for all speakers. The important feature of this algorithm is that a set of spectra in training frames and the codebook entries are clustered hierarchically. Based on the vectors representing deviation between centroids of the training frame clusters and the corresponding codebook clusters, adaptation is performed hierarchically from small to large numbers of clusters. The spectral resolution of the adaptation process is improved accordingly. Results of recognition experiments using utterances of 100 Japanese city names show that adaptation reduces the mean word recognition error rate from 4.9 to 2.9%. Since the error rate for speaker-dependent recognition is 2.2%, the adaptation method is highly effective. >

...read moreread less

Proceedings Article•DOI•

Recent advances in speech processing

[...]

Joseph-Jean Mariani¹•Institutions (1)

Centre national de la recherche scientifique¹

23 May 1989

TL;DR: The author introduces the methodological novelties that allowed for progress along three axes: from isolated-word recognition to continuous speech, from speaker-dependent recognition to speaker-independent, and from small vocabularies to large vocabULARies.

...read moreread less

Abstract: An overview is given of recent advances in the domain of speech recognition. The author focuses on speech recognition, but also mentions some progress in other areas of speech processing (speaker recognition, speech synthesis, speech analysis and coding) using similar methodologies. The problems related to automatic speech processing are identified, and the initial approaches that have been followed in order to address those problems are described. The author then introduces the methodological novelties that allowed for progress along three axes: from isolated-word recognition to continuous speech, from speaker-dependent recognition to speaker-independent, and from small vocabularies to large vocabularies. Special emphasis centers on the improvements made possible by Markov models and, more recently, by connectionist models, resulting in improved performance for difficult vocabularies or in more robust systems. Some specialized hardware is described, as are efforts aimed at assessing speech-recognition systems. >

...read moreread less

Proceedings Article•DOI•

Automatic detection of new words in a large vocabulary continuous speech recognition system

[...]

A. Asadi¹, Richard Schwartz, John Makhoul•Institutions (1)

Northeastern University¹

15 Oct 1989

TL;DR: A preliminary investigation of techniques that automatically detect when the speaker has used a word that is not in the vocabulary, and develops a technique that uses a general model for the acoustics of any word to recognize the existence of new words.

...read moreread less

Abstract: In practical large vocabulary speech recognition systems, it is nearly impossible for a speaker to remember which words are in the vocabulary. The probability of the speaker using words outside the vocabulary can be quite high. For the case when a speaker uses a new word, current systems will always' recognize other words within the vocabulary in place of the new word, and the speaker wouldn't know what the problem is.In this paper, we describe a preliminary investigation of techniques that automatically detect when the speaker has used a word that is not in the vocabulary. We developed a technique that uses a general model for the acoustics of any word to recognize the existence of new words. Using this general word model, we measure the correct detection of new words versus the false alarm rate.Experiments were run using the DARPA 1000-word Resource Management Database for continuous speech recognition. The recognition system used is the BBN BYBLOS continuous speech recognition system (Chow et al., 1987). The preliminary results indicate a detection rate of 74% with a false alarm rate of 3.4%.

...read moreread less

Journal Article•DOI•

Automatic speech recognition for disabled people

[...]

Jan Noyes¹, R. Haigh², A.F. Starr•Institutions (2)

University of Bristol¹, Royal National Hospital for Rheumatic Diseases²

01 Dec 1989-Applied Ergonomics

TL;DR: Although speech recognition applications for disabled people are well within the capacity of available technology, it is primarily a lack of human factors work which is impeding developments in this field.

...read moreread less

Proceedings Article•DOI•

Speaker adaptation applied to HMM and neural networks

[...]

Satoshi Nakamura, K. Shikano

23 May 1989

TL;DR: The authors propose a speaker adaptation algorithm which does not depend on speech recognition algorithms and is applied to hidden Markov models and neural networks and evaluated using a database of 216 phonetically balanced words and 5240 important Japanese words uttered by three speakers.

...read moreread less

Abstract: The authors propose a speaker adaptation algorithm which does not depend on speech recognition algorithms. The proposed spectral mapping algorithm is based on three ideas: (1) accurate representation of the input vector by separate vector quantization and fuzzy vector quantization, (2) continuous spectral mapping from one speaker to another by fuzzy mapping, and (3) accurate establishment of spectral correspondence based on the fuzzy relationship of the membership function obtained from supervised training. The spectrum dynamic features are also utilized. The algorithm is applied to hidden Markov models (HMMs) and neural networks and evaluated using a database of 216 phonetically balanced words and 5240 important Japanese words uttered by three speakers. The HMM speaker adapted recognition rate for /b,d,g/ is 79.5%. The average recognition rate for the top-three choices is about 91%. The algorithm was applied to neural networks and resulted in almost the same performance. The algorithm was also applied to voice conversion, and a preference score of 65.6% was obtained. >

...read moreread less

Patent•DOI•

Speech recognition system for telephony

[...]

Masayuki Sakanishi¹, Hiroki Yoshida¹, Takaaki Ishii¹, Hiroshi Sato¹, Makoto Hoshino¹ - Show less +1 more•Institutions (1)

Toshiba¹

14 Aug 1989-Journal of the Acoustical Society of America

TL;DR: A speech recognition system that detects a similarity between each speech pattern which has already been registered in the system and a speech pattern newly generated in response to the user's utterance made while the system is in a registration mode.

...read moreread less

Abstract: A speech recognition system that detects a similarity between each speech pattern which has already been registered in the system and a speech pattern newly generated in response to the user's utterance made while the system is in a registration mode. The system further provides to the user in the registration mode information representing the detected similarity. The speech recognition system may be incorporated into a telephone apparatus or a radio telephone apparatus, in which a call origination may be automatically made in response to the user's utterance.

...read moreread less

Proceedings Article•DOI•

On the use of neural networks and fuzzy logic in speech recognition

[...]

Amano¹, Aritsuka¹, Hataoka¹, Ichikawa¹•Institutions (1)

Hitachi¹

01 Jan 1989

TL;DR: About 80% of the errors occurring in conventional template matching, which the discrimination rules were designed to recover, were in fact recovered, and this confirms the effectiveness of the proposed phoneme recognition method.

...read moreread less

Abstract: A rule-based phoneme recognition method is proposed. This method uses neural networks for acoustic feature detection and fuzzy logic for the decision procedure. Rules for phoneme recognition are prepared for each pair of phonemes (pair-discrimination rules). Recognition experiments were performed using Japanese city names uttered by two male speakers. About 80% of the errors occurring in conventional template matching, which the discrimination rules were designed to recover, were in fact recovered (an improvement in recognition rate of 4.0 to 8.0%). This confirms the effectiveness of the proposed method. >

...read moreread less

Proceedings Article•DOI•

Improvements in the stochastic segment model for Phoneme recognition

[...]

Vassilios Digalakis¹, Mari Ostendorf¹, J.R. Rohlicek•Institutions (1)

Boston University¹

15 Oct 1989

TL;DR: This work discusses refinements of the stochastic segment model, an alternative to hidden Markov models for representation of the acoustic variability of phonemes, and focuses on mechanisms for better modelling time correlation of features across an entire segment.

...read moreread less

Abstract: The heart of a speech recognition system is the acoustic model of sub-word units (e.g., phonemes). In this work we discuss refinements of the stochastic segment model, an alternative to hidden Markov models for representation of the acoustic variability of phonemes. We concentrate on mechanisms for better modelling time correlation of features across an entire segment. Results are presented for speaker-independent phoneme classification in continuous speech based on the TIMIT database.

...read moreread less

Patent•DOI•

Voice verification circuit for validating the identity of an unknown person

[...]

Jayant M. Naik¹, Lorin P. Netsch¹, George R. Doddington¹•Institutions (1)

Texas Instruments¹

09 May 1989-Journal of the Acoustical Society of America

TL;DR: A speaker verification system receives input speech from a speaker of unknown identity and undergoes linear predictive coding analysis and transformation to maximize separability between true speakers and impostors when compared to reference speech parameters which have been similarly transformed.

...read moreread less

Abstract: A speaker verification system receives input speech from a speaker of unknown identity. The speech undergoes linear predictive coding (LPC) analysis and transformation to maximize separability between true speakers and impostors when compared to reference speech parameters which have been similarly transformed. The transformation incorporated a "inter-class" covariance matrix of successful impostors within a database.

...read moreread less

Proceedings Article•

Sentence-hypotheses generation in a continuous-speech recognition system.

[...]

Volker Steinbiss

01 Jan 1989

TL;DR: A pulse motor for use in time pieces comprises a stator which defines a circular space, and a circular rotor provided with a plurality of equally spaced slots extending inwardly from the periphery of the rotor at an angle to the radial direction.

...read moreread less

Abstract: In this paper, the dynamic-programming algorithm for continuous-speech recognition is modified in orderto obtain a top-N sentence-hypotheses Iist instead of the usual one sentence only. The theoretical basis of this extension is a generalization of Bellman's principle of optimality. Due to the computational complexity of the new algorithm, a sub-optimal variant is proposed, and experimental results within the SPICOS system are presented.

...read moreread less

Proceedings Article•DOI•

A connectionist approach to continuous speech recognition

[...]

M. Franzini¹, Michael Witbrock¹, Kai-Fu Lee¹•Institutions (1)

Carnegie Mellon University¹

23 May 1989

TL;DR: The authors have applied connectionist learning procedures to speaker-independent continuous recognition, creating a system which has achieved 97% word accuracy and 91% sentence accuracy in preliminary tests on the TI/NBS connected-digits database.

...read moreread less

Abstract: The authors have applied connectionist learning procedures to speaker-independent continuous recognition, creating a system which has achieved 97% word accuracy and 91% sentence accuracy in preliminary tests on the TI/NBS connected-digits database. The system uses a four-layer back-propagation network with recurrent connections to generate and refine hypotheses about the identity of an utterance over successive intervals. The hypotheses generated by the network are used as input to a Markov-chain-based Viterbi recognizer which produces a final identification of the entire utterance. >

...read moreread less

Patent•DOI•

Method and apparatus for automatically updating estimates of undesirable components of the speech signal in a speech recognition system

[...]

Vladimir Sejnoha

31 Mar 1989-Journal of the Acoustical Society of America

TL;DR: A speech recognition method and apparatus take into account a system transfer function between the speaker and the recognition apparatus, which update a signal representing the transfer function on a periodic basis during actual speech recognition.

...read moreread less

Abstract: A speech recognition method and apparatus take into account a system transfer function between the speaker and the recognition apparatus. The method and apparatus update a signal representing the transfer function on a periodic basis during actual speech recognition. The transfer function representing signal is updated about every fifty words as determined by the speech recognition apparatus. The method and apparatus generate an initial transfer function representing signal and generate from the speech input, successive input frames which are employed for modifying the value of the current transfer function signal so as to eliminate error and distortion. The error and distortion occur, for example, as a speaker changes the direction of his profile relative to a microphone, as the speaker's voice changes or as other effects occur that alter the spectra of the input speech frames. The method is automatic and does not require the knowledge of the input words or text.

...read moreread less

Proceedings Article•

Speech recognition in the noisy car environment.

[...]

Hans-Wilhelm Rühl, Stefan Dobler, J. Weith, Peter Meyer, Andreas Noll, Hans-Hermann Hamer, Herbert Piotrowski - Show less +3 more

01 Jan 1989

TL;DR: An algorithm for recognition of connected words has been adapted to an application for mobile radio telephony and several manners of generating feature vectors were evaluated using two databases collected in a small car moving at about 120 km/h.

...read moreread less

Proceedings Article•DOI•

Speaker-independent recognition of connected utterances using recurrent and non-recurrent neural networks

[...]

Franzini¹, Witbrock¹, Lee¹•Institutions (1)

Carnegie Mellon University¹

01 Jan 1989

TL;DR: Connectionist learning procedures are applied to the task of speaker-independent continuous speech recognition, creating a system which has achieved a recognition rate of 97% correct in preliminary tests on the Texas Instruments/National Bureau of Standards Connected Digits Database.

...read moreread less

Abstract: Connectionist learning procedures are applied to the task of speaker-independent continuous speech recognition, creating a system which has achieved a recognition rate of 97% correct in preliminary tests on the Texas Instruments/National Bureau of Standards Connected Digits Database. Two versions of the system were implemented, both of which used four-layer backpropagation networks. One used a static (nonrecurrent) network with a history mechanism, in which the input weights were slaved together, as they are in time-delay neural networks (TDNNs), and the other used a recurrent connection structure similar to that proposed by J.L. Elman (Tech. Rep., Univ. of California, San Diego, April 1988). The final recognition accuracies produced by the two approaches were not significantly different. The networks generated and refined hypotheses about the identity of utterances over successive intervals. The hypotheses generated by the networks were used as input to a Markov-chain-based Viterbi recognizer which produced a final identification of the entire utterance. >

...read moreread less

Speaker adaptation for large vocabulary speech recognition systems using "speaker

[...]

Gcrhord Rigoll

01 Jan 1989

TL;DR: An alternative approach to speaker adaptation for a large-vocabulary hidden-Markov-model-based speech recognition system is described, based on the use of a stochastic model representing the different properties of the new speaker and an old speaker for which the full training set of 20 minutes is available.

...read moreread less

Abstract: This paper describes an alternative approach to speaker adaptation for a large vocabulary Hidden Markov Model based speech recognition system. The goal of this investigation was to train the IBM speech recognition system with only 5 minutes of speech data from a new speaker instead of the usual 20 minutes. At the same time the recognition rate should not drop by more than 1-2%. The approach is based on the use of a stochastic model representing the different properties of the new speaker and an old speaker for which the full training set of 20 minutes is available. Such a model can he called a ‘‘Speaker Markov Model”. It is shown how the parameters of such a model can be derived and how it can be used for transforming the training set of the old speaker in order to use it in addition to the short training set of the new speaker. The adaptation algorithm was tested with 12 speakers including male and female speakers as well as speakers with foreign accent. The average recognition rate dropped only from 96.4% to 95.2% for a 5000 word vocabulary task if the adaptation was used instead of the full training. Mostly important is that the decoding time’increases only by a factor of 1.35 while this factor is often 3-5 if other adaptation algorithms are used.

...read moreread less

Patent•DOI•

Speech processing apparatus

[...]

Koichi Miyamae¹, Satoshi Omata¹•Institutions (1)

Canon Inc.¹

21 Apr 1989-Journal of the Acoustical Society of America

TL;DR: In this article, a speech processing apparatus was proposed that enables processor elements (403a to 403r) each comprising at least one nonlinear oscillator circuit (621) to be used as band pass filters by using the entrainment taking place in each of the processor elements.

...read moreread less

Abstract: A speech processing apparatus of the present invention enables processor elements (403a to 403r) each comprising at least one nonlinear oscillator circuit (621) to be used as band pass filters by using the entrainment taking place in each of the processor elements, whereby the speech of a particular talker in the speech of a plurality of talkers can be recognized.

...read moreread less

Proceedings Article•DOI•

Design of hierarchical perceptron structures and their application to the task of isolated-word recognition

[...]

Kammerer¹, Kupper¹•Institutions (1)

Siemens¹

01 Jan 1989

TL;DR: Several design strategies for feedforward networks are examined within the scope of pattern classification and a hierarchical structure with pairwise training of two-class models is superior to a single uniform network for speaker-independent word recognition.

...read moreread less

Abstract: Several design strategies for feedforward networks are examined within the scope of pattern classification. Single- and two-layer perceptron models are adapted for experiments in isolated-word recognition. Direct (one-step) classification and several hierarchical (two-step) schemes have been considered. For a vocabulary of 20 English words spoken repeatedly by 11 speakers, the word classes are found to be separable by hyperplanes in the chosen feature space. Since for speaker-dependent word recognition the underlying database contains only a small training set, an automatic expansion of the training material improves the generalization properties of the networks. This method accounts for a wide variety of observable temporal structures for each word and gives a better overall estimate of the network parameters, which leads to a recognition rate of 99.5%. For speaker-independent word recognition, a hierarchical structure with pairwise training of two-class models is superior to a single uniform network (98% average recognition rate). >

...read moreread less

Proceedings Article•DOI•

Dynamic adaptation of hidden Markov model for robust speech recognition

[...]

Gao Yu-qing, Chen Yong-bin, Wu Bo-Xiu

08 May 1989

TL;DR: An algorithm is presented for adaptation and self-learning of the hidden Markov model (HMM) that makes the HMM-based speech recognition robust, so that well-trained models can be adapted to new speaking conditions or a new speaker.

...read moreread less

Abstract: An algorithm is presented for adaptation and self-learning of the hidden Markov model (HMM). It makes the HMM-based speech recognition robust, so that well-trained models can be adapted to new speaking conditions or a new speaker. The self-learning consists of the fact that, during recognition, all test tokens can be used to augment the current model. Both procedures increase the size of the training set. The algorithm was tested on a speaker-dependent speech recognition system for the whole Chinese vocabulary and a speaker-independent system for 0-9 digits. Experiments show that the algorithm is very successful, both for new-speaker adaptation and for variations of speech in a single speaker under various conditions. >

...read moreread less

Proceedings Article•DOI•

On modeling duration in context in speech recognition

[...]

Joseph Picone¹•Institutions (1)

Texas Instruments¹

23 May 1989

TL;DR: A clustering algorithm is introduced that allows clustering of HMM (hidden Markov models) models directly and high-performance speaker-independent digit recognition on a studio-quality connected-digit database is demonstrated.

...read moreread less

Abstract: A clustering algorithm is introduced that allows clustering of HMM (hidden Markov models) models directly. This clustering algorithm determines the appropriate duration profile for a recognition unit. High-performance speaker-independent digit recognition on a studio-quality connected-digit database is demonstrated using this algorithm. >

...read moreread less