Showing papers on "Speaker diarisation published in 1989"

PDF

Open Access

Proceedings Article•DOI•

Speaker adaptation for large vocabulary speech recognition systems using speaker Markov models

[...]

Gerhard Rigoll¹•Institutions (1)

23 May 1989

TL;DR: In this paper, an alternative approach to speaker adaptation for a large-vocabulary hidden-Markov-model-based speech recognition system is described, based on the use of a stochastic model representing the different properties of the new speaker and an old speaker for which the full training set of 20 minutes is available.

...read moreread less

Abstract: An alternative approach to speaker adaptation for a large-vocabulary hidden-Markov-model-based speech recognition system is described. The goal of this investigation was to train the IBM speech recognition system with only five minutes of speech data from a new speaker instead of the usual 20 minutes without the recognition rate dropping by more than 1-2%. The approach is based on the use of a stochastic model representing the different properties of the new speaker and an old speaker for which the full training set of 20 minutes is available. It is called a speaker Markov model. It is shown how the parameters of such a model can be derived and how it can be used for transforming the training set of the old speaker in order to use it in addition to the short training set of the new speaker. The adaptation algorithm was tested with 12 speakers. The average recognition rate dropped from 96.4% to 95.2% for a 5000-word vocabulary task. The decoding time increased by a factor of 1.35; this factor is often 3-5 if other adaptation algorithms are used. >

...read moreread less

180 citations

Proceedings Article•DOI•

Speaker verification over long distance telephone lines

[...]

Jayant M. Naik¹, Lorin P. Netsch¹, George R. Doddington¹•Institutions (1)

Texas Instruments¹

23 May 1989

TL;DR: The authors present the results of speaker-verification technology development for use over long-distance telephone lines, using template-based dynamic time warping and hidden Markov modeling for discriminant analysis techniques which improve the discrimination between true speakers and imposters.

...read moreread less

Abstract: The authors present the results of speaker-verification technology development for use over long-distance telephone lines. A description is given of two large speech databases that were collected to support the development of new speaker verification algorithms. Also discussed are the results of discriminant analysis techniques which improve the discrimination between true speakers and imposters. A comparison is made of the performance of two speaker-verification algorithms, one using template-based dynamic time warping, and the other, hidden Markov modeling. >

...read moreread less

108 citations

Proceedings Article•DOI•

Unsupervised speaker adaptation by probabilistic spectrum fitting

[...]

Stephen Cox¹, John S. Bridle•Institutions (1)

BT Group¹

23 May 1989

TL;DR: A general approach to speaker adaptation in speech recognition is described, in which speaker differences are treated as arising from a parameterized transformation.

...read moreread less

Abstract: A general approach to speaker adaptation in speech recognition is described, in which speaker differences are treated as arising from a parameterized transformation. Given some unlabeled data from a particular speaker, a process is described which maximizes the likelihood of this data by estimating the transformation parameters at the same time as refining estimates of the labels. The technique is illustrated using isolated vowel spectra and phonetically motivated linear spectrum transformations and is shown to give significantly better performance than nonadaptive classification. >

...read moreread less

70 citations

Journal Article•DOI•

Unsupervised speaker adaptation based on hierarchical spectral clustering

[...]

Sadaoki Furui

01 Dec 1989-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: An automatic speaker adaptation algorithm for speech recognition, in which a small amount of training material of unspecified text can be used, which reduces the mean word recognition error rate from 4.9 to 2.9%.

...read moreread less

Abstract: The author proposes an automatic speaker adaptation algorithm for speech recognition, in which a small amount of training material of unspecified text can be used. The algorithm is easily applied to vector-quantization- (VQ) speech recognition systems consisting of a VQ codebook and a word dictionary in which each word is represented as a sequence of codebook entries. In the adaptation algorithm, the VQ codebook is modified for each new speaker, whereas the word dictionary is universally used for all speakers. The important feature of this algorithm is that a set of spectra in training frames and the codebook entries are clustered hierarchically. Based on the vectors representing deviation between centroids of the training frame clusters and the corresponding codebook clusters, adaptation is performed hierarchically from small to large numbers of clusters. The spectral resolution of the adaptation process is improved accordingly. Results of recognition experiments using utterances of 100 Japanese city names show that adaptation reduces the mean word recognition error rate from 4.9 to 2.9%. Since the error rate for speaker-dependent recognition is 2.2%, the adaptation method is highly effective. >

...read moreread less

41 citations

Proceedings Article•DOI•

Automatic detection of new words in a large vocabulary continuous speech recognition system

[...]

A. Asadi¹, Richard Schwartz, John Makhoul•Institutions (1)

Northeastern University¹

15 Oct 1989

TL;DR: A preliminary investigation of techniques that automatically detect when the speaker has used a word that is not in the vocabulary, and develops a technique that uses a general model for the acoustics of any word to recognize the existence of new words.

...read moreread less

Abstract: In practical large vocabulary speech recognition systems, it is nearly impossible for a speaker to remember which words are in the vocabulary. The probability of the speaker using words outside the vocabulary can be quite high. For the case when a speaker uses a new word, current systems will always' recognize other words within the vocabulary in place of the new word, and the speaker wouldn't know what the problem is.In this paper, we describe a preliminary investigation of techniques that automatically detect when the speaker has used a word that is not in the vocabulary. We developed a technique that uses a general model for the acoustics of any word to recognize the existence of new words. Using this general word model, we measure the correct detection of new words versus the false alarm rate.Experiments were run using the DARPA 1000-word Resource Management Database for continuous speech recognition. The recognition system used is the BBN BYBLOS continuous speech recognition system (Chow et al., 1987). The preliminary results indicate a detection rate of 74% with a false alarm rate of 3.4%.

...read moreread less

40 citations

Proceedings Article•DOI•

Speaker adaptation applied to HMM and neural networks

[...]

Satoshi Nakamura, K. Shikano

23 May 1989

TL;DR: The authors propose a speaker adaptation algorithm which does not depend on speech recognition algorithms and is applied to hidden Markov models and neural networks and evaluated using a database of 216 phonetically balanced words and 5240 important Japanese words uttered by three speakers.

...read moreread less

Abstract: The authors propose a speaker adaptation algorithm which does not depend on speech recognition algorithms. The proposed spectral mapping algorithm is based on three ideas: (1) accurate representation of the input vector by separate vector quantization and fuzzy vector quantization, (2) continuous spectral mapping from one speaker to another by fuzzy mapping, and (3) accurate establishment of spectral correspondence based on the fuzzy relationship of the membership function obtained from supervised training. The spectrum dynamic features are also utilized. The algorithm is applied to hidden Markov models (HMMs) and neural networks and evaluated using a database of 216 phonetically balanced words and 5240 important Japanese words uttered by three speakers. The HMM speaker adapted recognition rate for /b,d,g/ is 79.5%. The average recognition rate for the top-three choices is about 91%. The algorithm was applied to neural networks and resulted in almost the same performance. The algorithm was also applied to voice conversion, and a preference score of 65.6% was obtained. >

...read moreread less

36 citations

Patent•DOI•

Voice verification circuit for validating the identity of an unknown person

[...]

Jayant M. Naik¹, Lorin P. Netsch¹, George R. Doddington¹•Institutions (1)

Texas Instruments¹

09 May 1989-Journal of the Acoustical Society of America

TL;DR: A speaker verification system receives input speech from a speaker of unknown identity and undergoes linear predictive coding analysis and transformation to maximize separability between true speakers and impostors when compared to reference speech parameters which have been similarly transformed.

...read moreread less

Abstract: A speaker verification system receives input speech from a speaker of unknown identity. The speech undergoes linear predictive coding (LPC) analysis and transformation to maximize separability between true speakers and impostors when compared to reference speech parameters which have been similarly transformed. The transformation incorporated a "inter-class" covariance matrix of successful impostors within a database.

...read moreread less

32 citations

Patent•DOI•

Method and apparatus for automatically updating estimates of undesirable components of the speech signal in a speech recognition system

[...]

Vladimir Sejnoha

31 Mar 1989-Journal of the Acoustical Society of America

TL;DR: A speech recognition method and apparatus take into account a system transfer function between the speaker and the recognition apparatus, which update a signal representing the transfer function on a periodic basis during actual speech recognition.

...read moreread less

Abstract: A speech recognition method and apparatus take into account a system transfer function between the speaker and the recognition apparatus. The method and apparatus update a signal representing the transfer function on a periodic basis during actual speech recognition. The transfer function representing signal is updated about every fifty words as determined by the speech recognition apparatus. The method and apparatus generate an initial transfer function representing signal and generate from the speech input, successive input frames which are employed for modifying the value of the current transfer function signal so as to eliminate error and distortion. The error and distortion occur, for example, as a speaker changes the direction of his profile relative to a microphone, as the speaker's voice changes or as other effects occur that alter the spectra of the input speech frames. The method is automatic and does not require the knowledge of the input words or text.

...read moreread less

24 citations

Patent•DOI•

Speaker verification system

[...]

Masao Watari¹•Institutions (1)

NEC¹

22 Aug 1989-Journal of the Acoustical Society of America

TL;DR: In this article, a plurality of control reference patterns similar to the verification reference pattern are determined from among the control reference pattern candidates, and the speaker to be verified is judged as the registered speaker on the basis of first and second dissimilarities.

...read moreread less

Abstract: Control reference pattern candidates corresponding to a verification reference patterns of a registered speaker are synthesized by connecting unit speech patterns of a plurality of speakers. A plurality of control reference patterns similar to the verification reference pattern are determined from among the control reference pattern candidates. First dissimilarity between an input pattern of a speaker to be verified and the verification reference pattern specified by the registered speaker and second dissimilarity between the input pattern and the control reference patterns specified by the registered speaker are calculated. The speaker to be verified is judged as the registered speaker on the basis of the first and second dissimilarities.

...read moreread less

21 citations

Speaker adaptation for large vocabulary speech recognition systems using "speaker

[...]

Gcrhord Rigoll

01 Jan 1989

TL;DR: An alternative approach to speaker adaptation for a large-vocabulary hidden-Markov-model-based speech recognition system is described, based on the use of a stochastic model representing the different properties of the new speaker and an old speaker for which the full training set of 20 minutes is available.

...read moreread less

Abstract: This paper describes an alternative approach to speaker adaptation for a large vocabulary Hidden Markov Model based speech recognition system. The goal of this investigation was to train the IBM speech recognition system with only 5 minutes of speech data from a new speaker instead of the usual 20 minutes. At the same time the recognition rate should not drop by more than 1-2%. The approach is based on the use of a stochastic model representing the different properties of the new speaker and an old speaker for which the full training set of 20 minutes is available. Such a model can he called a ‘‘Speaker Markov Model”. It is shown how the parameters of such a model can be derived and how it can be used for transforming the training set of the old speaker in order to use it in addition to the short training set of the new speaker. The adaptation algorithm was tested with 12 speakers including male and female speakers as well as speakers with foreign accent. The average recognition rate dropped only from 96.4% to 95.2% for a 5000 word vocabulary task if the adaptation was used instead of the full training. Mostly important is that the decoding time’increases only by a factor of 1.35 while this factor is often 3-5 if other adaptation algorithms are used.

...read moreread less

19 citations

Patent•DOI•

Speech processing apparatus

[...]

Koichi Miyamae¹, Satoshi Omata¹•Institutions (1)

Canon Inc.¹

21 Apr 1989-Journal of the Acoustical Society of America

TL;DR: In this article, a speech processing apparatus was proposed that enables processor elements (403a to 403r) each comprising at least one nonlinear oscillator circuit (621) to be used as band pass filters by using the entrainment taking place in each of the processor elements.

...read moreread less

Abstract: A speech processing apparatus of the present invention enables processor elements (403a to 403r) each comprising at least one nonlinear oscillator circuit (621) to be used as band pass filters by using the entrainment taking place in each of the processor elements, whereby the speech of a particular talker in the speech of a plurality of talkers can be recognized.

...read moreread less

Proceedings Article•DOI•

Dynamic adaptation of hidden Markov model for robust speech recognition

[...]

Gao Yu-qing, Chen Yong-bin, Wu Bo-Xiu

08 May 1989

TL;DR: An algorithm is presented for adaptation and self-learning of the hidden Markov model (HMM) that makes the HMM-based speech recognition robust, so that well-trained models can be adapted to new speaking conditions or a new speaker.

...read moreread less

Abstract: An algorithm is presented for adaptation and self-learning of the hidden Markov model (HMM). It makes the HMM-based speech recognition robust, so that well-trained models can be adapted to new speaking conditions or a new speaker. The self-learning consists of the fact that, during recognition, all test tokens can be used to augment the current model. Both procedures increase the size of the training set. The algorithm was tested on a speaker-dependent speech recognition system for the whole Chinese vocabulary and a speaker-independent system for 0-9 digits. Experiments show that the algorithm is very successful, both for new-speaker adaptation and for variations of speech in a single speaker under various conditions. >

...read moreread less

Proceedings Article•

Speaker Independent Speech Recognition with Neural Networks and Speech Knowledge

[...]

Yoshua Bengio¹, Renato De Mori¹, Régis Cardin¹•Institutions (1)

McGill University¹

01 Jan 1989

TL;DR: This work attempts to combine neural networks with knowledge from speech science to build a speaker independent speech recognition system and combines delays, copies of activations of hidden and output units at the input level, and Back-Propagation for Sequences (BPS), a learning algorithm for networks with local self-loops.

...read moreread less

Abstract: We attempt to combine neural networks with knowledge from speech science to build a speaker independent speech recognition system. This knowledge is utilized in designing the preprocessing, input coding, output coding, output supervision and architectural constraints. To handle the temporal aspect of speech we combine delays, copies of activations of hidden and output units at the input level, and Back-Propagation for Sequences (BPS), a learning algorithm for networks with local self-loops. This strategy is demonstrated in several experiments, in particular a nasal discrimination task for which the application of a speech theory hypothesis dramatically improved generalization.

...read moreread less

Patent•

Speaker recognition system

[...]

Masashi Sekisui Kagaku Kogyo K.K. Miyakawa, Shingo Nishimura, Shigenobu Sekisui Kagaku Kogyo K.K. Nonaka, Masayuki Umino

17 Nov 1989

TL;DR: In this article, the mean frequency characteristics and mean pitch frequency of a voice were used as an input for speaker recognition using a neural network and the output of the neural network 20 is inputted to a decision circuit 30 and identified or matched.

...read moreread less

Abstract: PURPOSE:To reduce deterioration in recognition rate with time and to easily perform real-time processing by using the mean frequency characteristics and mean pitch frequency of a voice as an input for speaker recognition which uses a neural network. CONSTITUTION:The input voice is divided equally into blocks with time and the voice waveform is passed through band-pass filters 10 of plural channels; and the respective obtained blocks, i.e. frequency characteristics of constant time intervals and respective blocks obtained by passing the voice waveform through a pitch extraction part in parallel to said processing, i.e. pitch frequencies of constant time intervals are averaged by an averaging circuit 15 in block units and inputted to the neural network 20. Then the output of the neural network 20 is inputted to a decision circuit 30 and identified or matched. Consequently, the deterioration in recognition rate with time is small and the real-time processing is easily performed.

...read moreread less

Proceedings Article•

Codebooks to optimise speaker recognition.

[...]

John Mason, John Oglesby, Li-Qun Xu

01 Jan 1989

TL;DR: Experimental results show that 10% or more of speech acts as little more than noise, interfering in the task of speaker recognition, and a classifier is developed here, and it is shown that the spoken digit 'nine' is good, while 'six' is bad.

...read moreread less

Abstract: A recent approach to speaker identification is based on personalised codebooks. The algorithm compares incoming test features with a set of N codebooks, one for each valid member of the user population, and the codebook which gives rise to the smallest accumula.ted distance for the full test feature sequence is assumed to identify the speaker. Results from this inherently text-independent approach have highlighted the performance variations for different test utterances: the spoken digit 'nine' is good, while 'six' is bad. This observation has Iead to the idea of classifying speech, via a text and speaker-independent codebook, according to empirical discriminating properlies in the recognition task. Such a classifier is developed here, and experimental results show that 10% or more of speech acts as little more than noise, interfering in the task of speaker recognition.

...read moreread less

Proceedings Article•DOI•

A cepstral based speaker recognition system

[...]

R. Sethuraman¹, J.N. Gowdy¹•Institutions (1)

Clemson University¹

26 Mar 1989

TL;DR: Development of a fixed limited vocabulary automatic speaker recognition system based on extraction of ceptstral features from single isolated word utterances by various speakers is described.

...read moreread less

Abstract: Development of a fixed limited vocabulary automatic speaker recognition system is described. The operation of the system is based on extraction of ceptstral features from single isolated word utterances by various speakers. A dynamic time warping algorithm is used in the comparison stage to bring the feature vectors being compared into time alignment. A nearest neighbor rule is used to determine the identity of the speaker. >

...read moreread less

Patent•

Speech recognition with speaker adaptation by learning

[...]

Kazunaga Yoshida¹, Takao Watanabe¹•Institutions (1)

NEC¹

17 May 1989

TL;DR: In this paper, a speech recognition system is adapted to a particular speaker by converting the reference pattern to normalized pattern through learning operation using training pattern prodused provisionally by the particular speaker.

...read moreread less

Abstract: A speech recognition apparatus of the speaker adaptation type operates to recognize inputted speech pattern produced by a particular speaker according to reference pattern produced by a standard speaker. The speech recognition apparatus is adapted to the particular speaker by converting the reference pattern to normalized pattern through learning operation using training pattern prodused provisionally by the particular speaker. In the alternative, the speech recognition apparatus is trained through conversion of the training pattern with reference to the reference pattern. The speech recognition apparatus operates to convert inputted speech pattern into normalized speech pattern in real time basis according to the result of learning operation and to recognize the normalized speech pattern according to the reference pattern.

...read moreread less

Proceedings Article•

Large-vocabulary speech recognition with speaker-adapted codebook and HMM parameters.

[...]

Marco Ferretti, Stefano Scarci

01 Jan 1989

Proceedings Article•DOI•

Synthetic phoneme prototypes in a connected-word speech recognition system

[...]

M. Blomberg

23 May 1989

TL;DR: A recognition system based on a reference library of synthetic phoneme prototypes is described, and speaker-independent recognition results are given for male speakers on isolated words and connected digits.

...read moreread less

Abstract: A recognition system based on a reference library of synthetic phoneme prototypes is described. The phoneme templates are specified in terms of formant synthesis parameters. The vocabulary and grammar are described in a finite-state network where each node represents a phoneme. A transition between two phonemes in the net is expanded to a number of new nodes using interpolation on the synthesis parameters or at the spectrum level. For each node, a 16-channel filter bank section is computed from the synthesis parameters. Adaptation to each speaker's individual voice source spectrum is performed during recognition. Auditory forward masking is incorporated. Speaker-independent recognition results are given for male speakers on isolated words and connected digits. Future improvements include coarticulation and reduction rules and speaker adaptation of phoneme parameters. The method could also be used in combination with hidden Markov models to provide reference data in cases not covered by the training material. >

...read moreread less

Proceedings Article•

Text dependent speaker recognition in noise.

[...]

Janusz Zalewski

01 Jan 1989

TL;DR: The author has developed a method of speaker identification based on representation of speakers by some LPC-coded vowels, which consisted in use of a noise canceller to identify speakers under noise conditions.

...read moreread less

Abstract: The author has developed a method of speaker identification,based on representation of speakers by some LPC-coded vowels.The minimum cumulated spectral difference between corresponding test and reference samples was the decision criterion in the recognition task.The experiments reported nere investigated the ability of the modified method to identify speakers under noise conditions.Tne modification of the method consisted in use a noise canceller.

...read moreread less

Proceedings Article•DOI•

Speaker adaptation for recognition systems with a large vocabulary

[...]

F. Class, P. Regel, K. Trottler

11 Apr 1989

TL;DR: It is shown that, by means of adaptation procedures based on statistical correlation analysis, error rates as low as those of a speaker-dependent recognition system can be achieved after an extremely short training phase with any new speaker.

...read moreread less

Abstract: Algorithms for a fast speaker adaptation in a speech-recognition system are described. The techniques aim at transformations of the feature vectors, which have to be optimized with respect to some constraints. The methods transform every feature vector, computed in a 10-ms frame rate, into a speaker-normalized vector. The advantage of adaptation by transforming the feature vectors is that this procedure can be applied no matter which classification scheme is used. It is shown that, by means of adaptation procedures based on statistical correlation analysis, error rates as low as those of a speaker-dependent recognition system can be achieved after an extremely short training phase with any new speaker. The key is that the feature vectors are extended nonlinearly to a polynomial vector of second or higher order. Since the algorithms necessary for calculating the transformation matrices are typical for signal processing a real-time implementation on digital signal processors appears feasible. >

...read moreread less

Proceedings Article•DOI•

Speaker adaptation using multiple reference speakers

[...]

Francis Kubala, Richard Schwartz, C. Barry

15 Oct 1989

TL;DR: This paper describes the baseline (single reference speaker) speaker-adaptation system and gives current performance results from a recent formal evaluation of the system, and describes the proposal for adapting from multiple reference speakers.

...read moreread less

Abstract: We introduce a new technique for using the speech of multiple reference speakers as a basis for speaker adaptation in large vocabulary continuous speech recognition. In contrast to other methods that use a pooled reference model, this technique normalizes the training speech from multiple reference speakers to a single common feature space before pooling it. The normalized and pooled speech can then be treated as if it came from a single reference speaker for training the reference hidden Markov model (HMM). Our usual probabilistic spectrum transformation can be applied to the reference HMM to model a new (target) speaker. In this paper, we describe our baseline (single reference speaker) speaker-adaptation system and give current performance results from a recent formal evaluation of the system. We also describe our proposal for adapting from multiple reference speakers and report on recent preliminary experimental results in support of the proposed technique.

...read moreread less

Proceedings Article•

Large vocabulary speaker-independent isolated-word speech recognition using hidden Markov models: status report and planned research.

[...]

José Manuel Pardo, H. Hasan

01 Jan 1989

Proceedings Article•

Automatic vocabulary extension for a speaker-adaptive speech recognition system based on CVC units.

[...]

Peter Fesseler, Heidi Hackbarth, Marianne Kugler, Arnd Böhm

01 Jan 1989

Proceedings Article•DOI•

Speaker adaptation from limited training in the BBN BYBLOS Speech Recognition system

[...]

Francis Kubala, Ming-Whei Feng, John Makhoul, Richard Schwartz

21 Feb 1989

TL;DR: The BBN BYBLOS continuous speech recognition system has been used to develop a method of speaker adaptation from limited training and techniques employed to accomplish this transformation are reviewed and experimental results conducted on the DARPA Resource Management database are presented.

...read moreread less

Abstract: The BBN BYBLOS continuous speech recognition system has been used to develop a method of speaker adaptation from limited training. The key step in the method is the estimation of a probabilistic spectral mapping between a prototype speaker, for whom there exists a well-trained speaker-dependent hidden Markov model (HMM), and a target speaker for whom there is only a small amount of training speech available. The mapping defines a set of transformation matrices which are used to modify the parameters of the prototype model. The resulting transformed model is then used as an approximation to a well-trained model for the target speaker. We review the techniques employed to accomplish this transformation and present experimental results conducted on the DARPA Resource Management database.

...read moreread less

Proceedings Article•

An information theory approach to speaker adaptation.

[...]

Gerhard Rigoll

01 Jan 1989

SlOb.3 SPEAKER VERIFICATION OVER LONG DISTANCE TELEPHONE LINES

[...]

Jayant M. Naik, Lorin P. Netsch, George R. Doddington

01 Jan 1989

TL;DR: Two large speech databases that were collected to support the development of new speaker verification algorithms and the results of discriminant analysis techniques which improve the discrimination between true speakers and impostors are described.

...read moreread less

Abstract: In this paper we present the results of speaker verification technology development for use over long distance telephone lines. We describe two large speech databases that were collected to support the development of new speaker verification algorithms. We discuss the results of discriminant analysis techniques which improve the discrimination between true speakers and impostors. We compare the performance of two speaker verification algorithms, one using template based Dynamic Time Warping (DTW) and the other, Hidden Markov Modeling (HMM).

...read moreread less