Showing papers on "Speaker diarisation published in 1990"

PDF

Open Access

Proceedings Article•DOI•

Speaker adaptation from a speaker-independent training corpus

[...]

Francis Kubala, Richard Schwartz, C. Barry

03 Apr 1990

TL;DR: A technique for using the speech of multiple reference speakers as a basis for speaker adaptation in large-vocabulary continuous-speech recognition is introduced, and the usual probabilistic spectrum transformation can be applied to the reference HMM to model a new speaker.

...read moreread less

Abstract: A technique for using the speech of multiple reference speakers as a basis for speaker adaptation in large-vocabulary continuous-speech recognition is introduced. In contrast to other methods that use a pooled reference model, this technique normalizes the training speech from multiple reference speakers to a single common feature space before pooling it. The normalized and pooled speech is then treated as if it came from a single reference speaker for training the reference hidden Markov model (HMM). The usual probabilistic spectrum transformation can be applied to the reference HMM to model a new speaker. Preliminary experimental results are reported from applying this approach to over 100 reference speakers from the speaker-independent portion of the DARPA 1000-Word Resource Management Database. >

...read moreread less

179 citations

Journal Article•DOI•

Speaker verification: a tutorial

[...]

J.M. Naik

01 Jan 1990-IEEE Communications Magazine

TL;DR: The task of speaker verification, a subset of the general problem of speaker recognition, is defined and the feature selection and pattern matching steps of the recognition procedure are examined.

...read moreread less

Abstract: The task of speaker verification, a subset of the general problem of speaker recognition is defined. The feature selection and pattern matching steps of the recognition procedure are examined. Speaker verification system design and performance are discussed, and databases for evaluating them are briefly considered. An example of a speaker verification system is described. An overview of industry research in this area is given. >

...read moreread less

146 citations

Proceedings Article•DOI•

Text independent speaker identification using automatic acoustic segmentation

[...]

Richard Rose¹, Douglas A. Reynolds¹•Institutions (1)

Massachusetts Institute of Technology¹

03 Apr 1990

TL;DR: An acoustic-class-dependent technique for text-independent speaker identification on very short utterances is described, based on maximum-likelihood estimation of a Gaussian mixture model representation of speaker identity.

...read moreread less

Abstract: An acoustic-class-dependent technique for text-independent speaker identification on very short utterances is described. The technique is based on maximum-likelihood estimation of a Gaussian mixture model representation of speaker identity. Gaussian mixtures are noted for their robustness as a parametric model and their ability to form smooth estimates of rather arbitrary underlying densities. Speaker model parameters are estimated using a special case of the iterative expectation-maximization (EM) algorithm, and a number of techniques are investigated for improving model robustness. The system is evaluated using a 12 reference speaker population from a conversational speech database. It achieves 80% average text-independent speaker identification performance for a 1-s test utterance length. >

...read moreread less

122 citations

Patent•DOI•

Preprocessing system for speech recognition

[...]

William Stuart Meisel, W. Andreas Wittenstein

19 Nov 1990-Journal of the Acoustical Society of America

TL;DR: In this article, a set of speaker specific enrollment parameters for normalizing analysis parameters including the speaker's pitch, the frequency spectrum of the speech as a function of time, and certain measurements of speech signal in the time-domain.

...read moreread less

Abstract: The present invention processes an independent body of speech during an enrollment process and creates a set of speaker specific enrollment parameters for normalizing analysis parameters including the speaker's pitch, the frequency spectrum of the speech as a function of time, and certain measurements of the speech signal in the time-domain. A particular objective of the invention is to make these analysis parameters have the same meaning from speaker to speaker. Thus after the pre-processing performed by this invention, the parameters would look much the same for the same word independent of speaker. In this manner, variations in the speech signal caused by the physical makeup of a speaker's throat, mouth, lips, teeth, and nasal cavity would be, at least in part, reduced by the pre-processing.

...read moreread less

92 citations

Journal Article•DOI•

An introduction to speech and speaker recognition

[...]

Richard D. Peacocke¹, Daryl H. Graf¹•Institutions (1)

bell northern research¹

01 Aug 1990-IEEE Computer

TL;DR: In this article, five approaches that can be used to control and simplify the speech recognition task are examined: isolated words, speaker-dependent systems, limited vocabulary size, a tightly constrained grammar, and quiet and controlled environmental conditions.

...read moreread less

Abstract: Five approaches that can be used to control and simplify the speech recognition task are examined. They entail the use of isolated words, speaker-dependent systems, limited vocabulary size, a tightly constrained grammar, and quiet and controlled environmental conditions. The five components of a speech recognition system are described: a speech capture device, a digital signal processing module, preprocessed signal storage, reference speech patterns, and a pattern-matching algorithm. Current speech recognition systems are reviewed and categorized. Speaker recognition approaches and systems are also discussed. >

...read moreread less

87 citations

Proceedings Article•

RecNorm: Simultaneous Normalisation and Classification applied to Speech Recognition

[...]

John S. Bridle, Stephen Cox¹•Institutions (1)

BT Group¹

01 Oct 1990

TL;DR: A method of training this network to "tune in" the speaker parameters to a particular speaker based on a trick for converting a supervised network to an unsupervised mode is outlined, indicating an improvement over speaker-independent performance and, for unlabelled data, a performance close to that achieved on labelled data.

...read moreread less

Abstract: A particular form of neural network is described, which has terminals for acoustic patterns, class labels and speaker parameters. A method of training this network to "tune in" the speaker parameters to a particular speaker is outlined, based on a trick for converting a supervised network to an unsupervised mode. We describe experiments using this approach in isolated word recognition based on whole-word hidden Markov models. The results indicate an improvement over speaker-independent performance and, for unlabelled data, a performance close to that achieved on labelled data.

...read moreread less

65 citations

Proceedings Article•DOI•

Variable parameter speaker verification system based on hidden Markov modeling

[...]

M. Savic¹, Sunil K. Gupta¹•Institutions (1)

Rensselaer Polytechnic Institute¹

03 Apr 1990

TL;DR: It is shown that different classes of phonemes are not equally effective in discriminating between speakers and that verification performance can be considerably improved by separately classifying speech segments representing each broad phonetic category as belonging to an impostor or as belong to the true speaker.

...read moreread less

Abstract: A text-independent speaker verification system based on an adaptive vocal tract model which emulates the vocal tract of the speaker is described. Each speaker is represented by a set of feature vectors derived from speech segments belonging to different classes of phonemes. Linear predictive hidden Markov modeling and maximum-likelihood Viterbi decoding are applied to a speech utterance to obtain different classes of phonemes pronounced by a speaker. It is shown that different classes of phonemes are not equally effective in discriminating between speakers and that verification performance can be considerably improved by separately classifying speech segments representing each broad phonetic category as belonging to an impostor or as belonging to the true speaker. A weighted linear combination of scores for individual categories can be used as the final verification score. The weights are chosen to reflect the effectiveness of particular classes of phonemes in discriminating between speakers and are adjusted to maximize the verification performance. >

...read moreread less

60 citations

Patent•DOI•

Speaker verification system

[...]

Hiroki Uchiyama¹, Hiroo Kitagawa¹, Yamazaki Nobuhide¹•Institutions (1)

Ricoh¹

08 Nov 1990-Journal of the Acoustical Society of America

TL;DR: In this paper, a feature extracting part extracts features of an unknown speaker for every segmented block by using the time-series acoustic parameters and a distance calculating part calculates a distance between the features of the speaker extracted by the feature extractor and reference features stored in a memory.

...read moreread less

Abstract: In a speaker verification system, a detecting part detects a speech section of an input speech signal by using a time-series acoustic parameters thereof. A segmentation part calculates individuality information for segmentation by using the time-series acoustic parameters within the speech section, and segments the input speech section into a plurality of blocks based on the individuality information. A feature extracting part extracts features of an unknown speaker for every segmented block by using the time-series acoustic parameters. A distance calculating part calculates a distance between the features of the speaker extracted by the feature extracting part and reference features stored in a memory. A decision part makes a decision as to whether or not the unknown speaker is a real speaker by comparing the calculated distance with a predetermined threshold value. Segmentation is made by calculating a primary moment of the spectrum, over a block, and finding successive values which satisfy a predetermined criterion.

...read moreread less

47 citations

Patent•

Limited vocabulary speech recognition system

[...]

Kamyar Rohani¹, R. Mark Harrison¹•Institutions (1)

Motorola¹

16 Nov 1990

TL;DR: In this paper, various functions associated with some of the words or instructions recognizable by a speaker independent voice recognition device are presented to an operator via one or more menus (200a-200d) so that the operator may select any of several functions by using a limited set of speaker independent commands.

...read moreread less

Abstract: Various functions (or portions thereof) are associated with some of the words or instructions recognizable by a speaker independent voice recognition device (128). This association is presented to an operator via one or more menus (200a-200d) so that the operator may select any of several functions by use of a limited set of speaker independent commands.

...read moreread less

46 citations

Proceedings Article•DOI•

Improved hidden Markov modeling for speaker-independent continuous speech recognition

[...]

Xuedong Huang, F. Alleva, Satoru Hayamizu, Hsiao-Wuen Hon, Mei-Yuh Hwang, Kai-Fu Lee - Show less +2 more

24 Jun 1990

TL;DR: Recent efforts to further improve the performance of the Sphinx system for speaker-independent continuous speech recognition are reported, with incorporation of additional dynamic features, semi-continuous hidden Markov models, and speaker clustering.

...read moreread less

Abstract: The paper reports recent efforts to further improve the performance of the Sphinx system for speaker-independent continuous speech recognition. The recognition error rate is significantly reduced with incorporation of additional dynamic features, semi-continuous hidden Markov models, and speaker clustering. For the June 1990 (RM2) evaluation test set, the error rates of our current system are 4.3% and 19.9% for word-pair grammar and no grammar respectively.

...read moreread less

26 citations

Proceedings Article•DOI•

Speaker recognition based on source coding approaches

[...]

Biing-Hwang Juang¹, F.K. Soong¹•Institutions (1)

Bell Labs¹

03 Apr 1990

TL;DR: It is found that incorporation of memory in source coders in general enhances the speaker recognition accuracy but that more remarkable improvements can be accomplished by properly including potential source variations in the coder design/training.

...read moreread less

Abstract: The use of nonmemoryless source coders in speaker recognition problems is studied, and the effects of source variations, including speaking inconsistency and channel mismatch, in source coder designs for the intended application are discussed. It is found that incorporation of memory in source coders in general enhances the speaker recognition accuracy but that more remarkable improvements can be accomplished by properly including potential source variations in the coder design/training. An experiment with a 100-speaker database shows a 99.5% recognition accuracy. >

...read moreread less

Proceedings Article•

Features for noise-robust speaker-independent word recognition.

[...]

Brian A. Hanson, Ted H. Applebaum

01 Jan 1990

Proceedings Article•

Speaker-independent English alphabet recognition: experiments with the e-set.

[...]

Mark Fanty, Ronald A. Cole

01 Jan 1990

Proceedings Article•DOI•

A new paradigm for speaker-independent training and speaker adaptation

[...]

Francis Kubala, Richard Schwartz

24 Jun 1990

TL;DR: A new paradigm for speaker-independent (SI) training of hidden Markov models (HMM) is presented, which uses a large amount of speech from a few speakers instead of the traditional practice of using a little speech from many speakers.

...read moreread less

Abstract: This paper reports on two contributions to large vocabulary continuous speech recognition. First, we present a new paradigm for speaker-independent (SI) training of hidden Markov models (HMM), which uses a large amount of speech from a few speakers instead of the traditional practice of using a little speech from many speakers. In addition, combination of the training speakers is done by averaging the statistics of independently trained models rather than the usual pooling of all the speech data from many speakers prior to training. With only 12 training speakers for SI recognition, we achieved a 7.5% word error rate on a standard grammar and test set from the DARPA Resource Management corpus. This performance is comparable to our best condition for this test suite, using 109 training speakers.Second, we show a significant improvement for speaker adaptation (SA) using the new SI corpus and a small amount of speech from the new (target) speaker. A probabilistic spectral mapping is estimated independently for each training (reference) speaker and the target speaker. Each reference model is transformed to the space of the target speaker and combined by averaging. Using only 40 utterances from the target speaker for adaptation, the error rate dropped to 4.1% --- a 45% reduction in error compared to the SI result.

...read moreread less

Proceedings Article•DOI•

Text-independent speaker recognition by trajectory space comparison

[...]

Yifan Gong¹, Jean-Paul Haton¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

03 Apr 1990

TL;DR: The principle of trajectory space comparison for text-independent speaker recognition and some solutions to the space comparison problem based on vector quantization are presented and the comparison of the recognition rates of different solutions is reported.

...read moreread less

Abstract: The principle of trajectory space comparison for text-independent speaker recognition and some solutions to the space comparison problem based on vector quantization are presented. The comparison of the recognition rates of different solutions is reported. The experimental system achieves a 99.5% text-independent speaker recognition rate for 23 speakers, using five phrases for training and five for test. A speaker-independent continuous speech recognition system is built in which this principle is used for speaker adaptation. >

...read moreread less

Proceedings Article•

Noise robustness in speaker independent speech recognition.

[...]

Shuji Morii, Toshiyuki Morii, Masakatsu Hoshimi, Shoji Hiraoka, Taisuke Watanabe, Katsuyuki Niyada - Show less +2 more

01 Jan 1990

Proceedings Article•DOI•

Simultaneous speaker normalisation and utterance labelling using Bayesian/neural net techniques

[...]

Stephen Cox¹, J.S. Bridle•Institutions (1)

BT Group¹

03 Apr 1990

TL;DR: Results of using this technique with whole-word hidden Markov models (HMMs) indicate an improvement over speaker-independent performance and, for unlabeled data, a performance close to that achieved on labeled data.

...read moreread less

Abstract: A particular form of neural network is described which has terminals for acoustic patterns, class labels, and speaker parameters. A method of training this network to tune in the speaker parameters to a new speaker is outlined. This process can also be viewed from a Bayesian perspective as maximizing the likelihood of the speaker's data by optimizing the model and speaker parameters. A method for doing this when the data are labeled is described. Results of using this technique with whole-word hidden Markov models (HMMs) indicate an improvement over speaker-independent performance and, for unlabeled data, a performance close to that achieved on labeled data. >

...read moreread less

Book Chapter•DOI•

Text-Dependent Speaker Identification Using Learning Vector Quantization

[...]

Younès Bennani¹, Françoise Fogelman Soulié¹, Patrick Gallinari¹•Institutions (1)

University of Paris-Sud¹

01 Jan 1990

TL;DR: This paper presents a connectionist approach to automatic speaker identification, based for the first time on the LVQ (Learning Vector Quantization) algorithm, and indicates the results obtained for different combinations of parameters.

...read moreread less

Abstract: This paper presents a connectionist approach to automatic speaker identification, based for the first time on the LVQ (Learning Vector Quantization) algorithm. For each “subscriber” to the identification system, a number of references is fixed. The algorithm is based on a nearest neighbor principle, with adaptation through learning. The identification is realized by comparing to a given threshold the distance of the unknown utterance to the nearest reference. Preliminary tests run on a 10 speakers set show an identication rate of 97% for MFC coefficients. We present the identification system and data base used, and indicate the results obtained for different combinations of parameters. We further evaluate our system, by comparing its performances with a Bayesian system.

...read moreread less

Proceedings Article•

Automatically focusing on good discriminating speech segments in speaker recognition.

[...]

Julian P. Eatock, John Mason

01 Jan 1990

Proceedings Article•

Experiments with a speaker-independent continuous speech recognition system on the timit database.

[...]

Yunxin Zhao, Hisashi Wakita

01 Jan 1990

Proceedings Article•DOI•

Supplementation of HMM for articulatory variation in speaker adaptation

[...]

Hiroaki Hattori, Satoshi Nakamura, K. Shikano

03 Apr 1990

TL;DR: A method of dealing with articulatory speaker variations in hidden Markov models (HMMs) for speaker adaptation through the /b,d,g/ recognition task shows 82.5% recognition accuracy, which is better than the rates of other methods.

...read moreread less

Abstract: A method of dealing with articulatory speaker variations in hidden Markov models (HMMs) for speaker adaptation is proposed. Speech data from many speakers are spectrally mapped onto a standard speaker. These data are used to teach the HMM the interspeaker articulatory variations that subsist across the spectral mapping. The proposed method is compared to other adaptation methods through the /b,d,g/ recognition task. The results show 82.5% recognition accuracy, which is better than the rates of other methods. Evaluation experiments on a Japanese all phoneme recognition task and a continuous-speech recognition task are reported. Average recognition rates for Japanese all phonemes are 71.3% and 93.2%, for the best candidate and the top-three candidates, respectively. These are 0.7% and 1.5% higher than the rates of the basic spectrum mapping method. In the continuous-speech recognition experiment, average phrase recognition rates are 74.9% and 96.2%, for the best candidate and the top-five candidates, respectively. >

...read moreread less

Proceedings Article•DOI•

Baseform adaptation for large vocabulary hidden Markov model based speech recognition systems

[...]

Gerhard Rigoll¹•Institutions (1)

Fraunhofer Society¹

03 Apr 1990

TL;DR: A method for adaptation of the IBM speech recognition system in the situation where the system is already trained for the new speaker and one tries to further adapt and improve the system while it is actually being used by the new Speaker in the recognition mode is described.

...read moreread less

Abstract: A method for adaptation of the IBM speech recognition system in the situation where the system is already trained for the new speaker and one tries to further adapt and improve the system while it is actually being used by the new speaker in the recognition mode is described. A special kind of adaptation is investigated where the emphasis is not on the adaptation of the statistical parameters of the Markov models but on the adaptation of the structure of these models. This structure is defined by the baseforms describing the composition of word models from phone models in the system. Therefore, baseform adaptation corresponds directly to the adaptation of the new system to the personal speaker characteristics of the new user. Several different baseform adaptation schemes are investigated and it is demonstrated that for a speaker who has already trained the system and achieves a 95.2% recognition performance, the performance can be further improved to 96.3%. >

...read moreread less

Proceedings Article•

A comparative study of speaker adaptation methods for HMM-based speech recognition.

[...]

Myoung-Wan Koo, Chong Kwan Un, Hwang Soo Lee, Jun Mo Koo, Kim Hee Kyung - Show less +1 more

01 Nov 1990

TL;DR: The results show that adaptation based on the fuzzy histogram algorithm yields the highest accuracy in an HMM-based speech recognition system.

...read moreread less

Abstract: In this paper, we compare the performances of speaker adaptation which consist of two stages of processing for an HMM-based speech recognition system. We compare three kinds of VQ adaptation methods which may be used in the first stage to reduce the distortion error for a new speaker : label prototype adaptation, adaptation with a codebook from adaptation speech itself, and adaptation with a mapped codebook. We then compare the performance of four kinds of HMM parameter adaptation methods which may be used in the second stage to transform HMM parameters for a new speaker : adaptation by the Viterbi algorithm, that by the DTW algorithm, that by the iterative alignment algorithm. The results show that adaptation based on the fuzzy histogram algorithm yields the highest accuracy in an HMM-based speech recognition system.

...read moreread less

Proceedings Article•

Speaker recognition using static and dynamic CEPSTRAL feature by a learning neural network.

[...]

Hujun Yin, Tong Zhou

01 Jan 1990

Proceedings Article•

Speaker adaptable phoneme recognition selecting reliable acoustic features based on mutual information.

[...]

Katsuhiko Shirai, Naoki Hosaka, Eiichiro Kitagawa, Takashi Endo

01 Jan 1990

Proceedings Article•DOI•

Speaker adaptive phoneme recognition by multi-level clustering based on mutual information criterion

[...]

Katsuhiko Shirai¹, Naoki Hosaka¹, E. Kitagawa¹•Institutions (1)

Waseda University¹

03 Apr 1990

TL;DR: A statistical method for recognizing phonemes in continuous speech using a parametric expression of speaker individuality and the effective calculation of phoneme likelihood, especially for consonants in various phoneme environments is presented.

...read moreread less

Abstract: A statistical method for recognizing phonemes in continuous speech is presented. Two aspects of the system are discussed. The first aspect is speaker adaptation to improve the recognition rate. A parametric expression of speaker individuality is used, which is calculated from the spectral distortion in the vector quantization. Each acoustic feature space is divided by the speaker individuality parameter. The second is the effective calculation of phoneme likelihood, especially for consonants in various phoneme environments. Since the acoustic features of consonants are strongly dependent on the surrounding phonemes, phoneme segments which have high scores are extracted. The other parts are discriminated under the assumption that the reliable parts of phonemes really exist in the string of utterance. In the frame level, the correct recognition rate of all phoneme categories reaches 78.3% in a multispeaker experiment (six males) and 72.7% in a completely speaker-independent experiment. >

...read moreread less