Showing papers on "Speaker diarisation published in 1991"

PDF

Open Access

Journal Article•DOI•

Speaker verification using randomized phrase prompting

[...]

Alan L. Higgins, Lawrence G. Bahler, Jack Elliott Porter

01 Apr 1991-Digital Signal Processing

TL;DR: The system described here is capable of accurately verifying an individual’s claimed identity from a short sample of his or her speech, and a rationale was developed for determining the size of the test required to allow hypotheses regarding the system's true error rates to be tested with stated confidence levels.

...read moreread less

230 citations

Proceedings Article•

BREF, a large vocabulary spoken corpus for French.

[...]

Lori F. Larnel, Jean-Luc Gauvain, Maxine Eskenazi

01 Jan 1991

TL;DR: This paper presents some of the design considerations of BREF, a large read-speech corpus for French designed to provide continuous speech data for the development of dictation machines, for the evaluation of continuous speech recognition systems, and for the study of phonological variations.

...read moreread less

Abstract: This paper presents some of the design considerations of BREF, a large read-speech corpus for French. BREF was designed to provide continuous speech data for the development of dictation machines, for the evaluation of continuous speech recognition systems (both speaker-dependent and speakerindependent), and for the study of phonological variations. The texts to be read were selected from 5 million words of the French newspaper, Le Monde. In total, 11,000 texts were selected, with selection criteria that emphasisized maximizing the number of distinct triphones. Separate text materials were selected for training and test corpora. Ninety speakers have been recorded, each providing between 5,000 and 10,000 words (approximately 40-70 min.) of speech.

...read moreread less

225 citations

Journal Article•DOI•

Speaker-dependent-feature extraction, recognition and processing techniques

[...]

Sadaoki Furui

15 Dec 1991-Speech Communication

TL;DR: Recent advances in and perspectives of research on speaker-dependent-feature extraction from speech waves, automatic speaker identification and verification, speaker adaptation in speech recognition, and voice conversion techniques are discussed.

...read moreread less

108 citations

Proceedings Article•DOI•

Connected word talker verification using whole word hidden Markov models

[...]

Aaron E. Rosenberg¹, Chin-Hui Lee¹, S. Gokcen¹•Institutions (1)

Bell Labs¹

14 Apr 1991

TL;DR: A speaker verification system using connected word verification phrases has been implemented and studied and the system has been evaluated on a 20-speaker telephone database of connected digital utterances.

...read moreread less

Abstract: A speaker verification system using connected word verification phrases has been implemented and studied. Verification utterances are represented as concatenated speaker-dependent whole-word hidden Markov models (HMMs). Verification phrases are specified as strings of words drawn from a small fixed vocabulary, such as the digits. Phrases can either be individualized or randomized for greater security. Training techniques to create speaker-dependent models for verification are used in which initial word models are created by bootstrapping from existing speaker-independent models. The system has been evaluated on a 20-speaker telephone database of connected digital utterances. Using approximately 66 s of connected digit training utterances per speaker, the verification equal-error rate is approximately 3.5% for 1.1 s test utterances and 0.3% for 4.4 s test utterances. In comparison, the performance of a template-based system using the same amount of training data is 6.7% and 1.5%, respectively. >

...read moreread less

88 citations

Proceedings Article•DOI•

A speaker verification system using alpha-nets

[...]

Michael J. Carey, E.S. Parris, J.S. Bridle¹•Institutions (1)

Bell Labs¹

14 Apr 1991

TL;DR: Experimental results show that adapting the spectral observation probabilities of each state of the model by the back propagation of errors can correct misclassification errors.

...read moreread less

Abstract: Speaker verification is performed by comparing the output probabilities of two Markov models of the same phonetic unit. One of these Markov models is speaker-specific, being built from utterances from the speaker whose identity is to be verified. The second model is built from utterances from a large population of speakers. The performance of the system is improved by treating the pair of models as a connectionist network, an alpha-net, which then allows discriminative training to be carried out. Experimental results show that adapting the spectral observation probabilities of each state of the model by the back propagation of errors can correct misclassification errors. The real-time implementation of the system produced an average digit error rate of 4.5% and only one misclassification in 600 trials using a five-digit sequence. >

...read moreread less

74 citations

Proceedings Article•DOI•

Speaker adaptation and voice conversion by codebook mapping

[...]

K. Shikano, Satoshi Nakamura¹, M. Abe¹•Institutions (1)

Nippon Telegraph and Telephone¹

11 Jun 1991

TL;DR: The authors summarize a speaker adaptation algorithm based on codebook mapping from one speaker to a standard speaker to be useful in various kinds of speech recognition systems such as hidden-Markov-model-based, feature- based, and neural-network-based systems.

...read moreread less

Abstract: The authors summarize a speaker adaptation algorithm based on codebook mapping from one speaker to a standard speaker. This algorithm has been developed to be useful in various kinds of speech recognition systems such as hidden-Markov-model-based, feature-based, and neural-network-based systems. The codebook mapping speaker adaptation algorithm has been much improved by introducing several ideas based on fuzzy vector quantization. This fuzzy codebook mapping algorithm is also applicable to voice conversion between arbitrary speakers. >

...read moreread less

63 citations

Proceedings Article•DOI•

A text-independent speaker recognition method robust against utterance variations

[...]

Tomoko Matsui, Sadaoki Furui

14 Apr 1991

TL;DR: A VQ (vector-quantization)-based text-independent speaker recognition method which is robust against utterance variations, and a normalization method, talker variability normalization (TVN), which normalizes parameter variation taking both inter- and intra-speaker variability into consideration.

...read moreread less

Abstract: The authors describe a VQ (vector-quantization)-based text-independent speaker recognition method which is robust against utterance variations. Three techniques are introduced to cope with temporal and text-dependent spectral variations. First, either an ergodic hidden Markov model or a voiced/unvoiced decision is used to classify input speech into broad phonetic classes. Second, a new distance measure, the distortion-intersection measure (DIM), is introduced for calculating VQ distortion of input speech compared to speaker-independent codebooks. Third, a normalization method, talker variability normalization (TVN), is introduced. TVN normalizes parameter variation taking both inter- and intra-speaker variability into consideration. The system was tested using utterances of nine speakers recorded over three years. The combination of the three techniques achieves high speaker identification accuracies of 98.5% using only vocal tract information and 99.0% using both vocal tract and pitch information. >

...read moreread less

59 citations

Proceedings Article•DOI•

On speaker-independent, speaker-dependent, and speaker-adaptive speech recognition

[...]

Xuedong Huang¹, Kai-Fu Lee¹•Institutions (1)

Carnegie Mellon University¹

14 Apr 1991

TL;DR: The authors already have a state-of-the-art speaker-independent speech recognition system, SPHINX, and extended it to speaker-dependent speech recognition, which demonstrated a substantial difference between speaker- dependent and -independent systems.

...read moreread less

Abstract: The DARPA Resource Management task is used as the domain to investigate the performance of speaker-independent, speaker-dependent, and speaker-adaptive speech recognition. The authors already have a state-of-the-art speaker-independent speech recognition system, SPHINX. The error rate for RM2 test set is 4.3%. They extended SPHINX to speaker-dependent speech recognition. The error rate is reduced to 1.4-2.6% with 600-2400 training sentences for each speaker, which demonstrated a substantial difference between speaker-dependent and -independent systems. Based on speaker-independent models, a study was made of speaker-adaptive speech recognition. With 40 adaptation sentences for each speaker, the error rate can be reduced from 4.3% to 3.1%. >

...read moreread less

41 citations

Proceedings Article•DOI•

Integration of speaker and speech recognition systems

[...]

D.A. Reynolds¹, Larry Heck¹•Institutions (1)

Georgia Institute of Technology¹

14 Apr 1991

TL;DR: A combination of a high-performance speaker identification system and an isolated word recognizer is presented, capable of automatically producing speech and speaker identification with a closed set of speakers.

...read moreread less

Abstract: A combination of a high-performance speaker identification system and an isolated word recognizer is presented. The front-end text-independent speaker identification system determines the most likely speaker for an input word. The speaker identity is then used to choose the reference word models for the speech recognizer. When used with a closed set of speakers, the combination is capable of automatically producing speech and speaker identification. For an open set of speakers, the speaker recognition system acts as speaker quantizer which associates the unknown speaker with an acoustically similar speaker. The matching speaker's word models are used in the speech recognizer. The application of this front-end speaker recognizer is described for a DTW and HMM speech recognizer. Results on a combination using a DTW word recognizer are 100% for closed set experiments. >

...read moreread less

33 citations

Journal Article•DOI•

Voice across America: Toward robust speaker-independent speech recognition for telecommunications applications

[...]

Barbara J. Wheatley¹, Joseph Picone¹•Institutions (1)

Texas Instruments¹

01 Apr 1991-Digital Signal Processing

TL;DR: The methods and motivation for VAA data collection and validation procedures, the current contents of thedatabase, and the results of exploratory research on a 1088-speaker subset of the database are described.

...read moreread less

31 citations

Proceedings Article•DOI•

Vector-quantization-based speech recognition and speaker recognition techniques

[...]

S. Furui

04 Nov 1991

TL;DR: It is concluded that not only has the VQ technique reduced the amount of computation and storage, but it has also created new ideas for solving various problems in speech/speaker recognition.

...read moreread less

Abstract: The author reviews major methods of applying the vector quantization (VQ) technique to speech and speaker recognition. These include speech recognition based on the combination of VQ and the DTW/HMM (dynamic time warping/hidden Markov model) technique. VQ-distortion-based recognition, learning VQ algorithms, speaker adaptation by VQ-codebook mapping, and VQ-distortion-based speaker recognition. It is concluded that not only has the VQ technique reduced the amount of computation and storage, but it has also created new ideas for solving various problems in speech/speaker recognition. >

...read moreread less

Patent•DOI•

Speech recognition apparatus of speaker adaptation type

[...]

Kazunaga Yoshida¹, Takao Watanabe¹•Institutions (1)

NEC¹

23 Apr 1991-Journal of the Acoustical Society of America

TL;DR: A speech recognition apparatus is adapted to the speech of the particular speaker by converting the reference pattern into a normalized pattern by a neural network unit, internal parameters of which are modified through a learning operation using a normalized feature vector of the training pattern produced by the voice of the particularly speaker and normalized on the basis of thereference pattern.

...read moreread less

Abstract: A speech recognition apparatus of the speaker adaptation type operates to recognize an inputted speech pattern produced by a particular speaker by using a reference pattern produced by a voice of a standard speaker. The speech recognition apparatus is adapted to the speech of the particular speaker by converting the reference pattern into a normalized pattern by a neural network unit, internal parameters of which are modified through a learning operation using a normalized feature vector of the training pattern produced by the voice of the particular speaker and normalized on the basis of the reference pattern, so that the neural netowrk unit provides an optimum output similar to the corresponding normalized feature vector of the training pattern. In the alternative, the speech recognition apparatus operates to recognize an inputted speech pattern by converting the inputted speech pattern into a normalized speech pattern by the neural network unit, internal parameters of which are modified through a learning operation using a feature vector of the reference pattern normalized on the basis of the training pattern, so that the neural network unit provides an optimum output similar to the corresponding normalized feature vector of the reference pattern and recognizing the normalized speech pattern according to the reference pattern.

...read moreread less

Patent•

Method and apparatus for speaker individuality conversion

[...]

Masanobu Abe, Shigeki Sagayama

17 Sep 1991

TL;DR: In this paper, a speech segment correspondence unit makes a dynamic programming (DP) based correspondence between the obtained speech segments and training speech data of the target speaker, thereby making a SP correspondence table.

...read moreread less

Abstract: Input speech of a reference speaker, who wants to convert his/her voice quality, and speech of a target speaker are converted into a digital signal by an analog to digital (A/D) converter. The digital signal is then subjected to speech analysis by a linear predictive coding (LPC) analyzer. Speech data of the reference speaker is processed into speech segments by a speech segmentation unit. A speech segment correspondence unit makes a dynamic programming (DP) based correspondence between the obtained speech segments and training speech data of the target speaker, thereby making a speech segment correspondence table. A speaker individuality conversion is made on the basis of the speech segment correspondence table by a speech individuality conversion and synthesis unit.

...read moreread less

Proceedings Article•DOI•

Robust speech parameters extraction for word recognition in noise using neural networks

[...]

L. Barbier¹, Gérard Chollet¹•Institutions (1)

Télécom ParisTech¹

14 Apr 1991

TL;DR: An attempt was made to enhance the performance of a DTW (dynamic time warping) speech recognizer by preprocessing speech parameters using a neural network transformation, using a multilayer perceptron trained with speech utterances of a single speaker.

...read moreread less

Abstract: An attempt was made to enhance the performance of a DTW (dynamic time warping) speech recognizer by preprocessing speech parameters using a neural network transformation. A multilayer perceptron trained with speech utterances of a single speaker has been used in front of a DTW recognizer. Results show an improvement of about 15% in the recognition rate in all cases, even with a speaker that was not used for training. If the network is not completely speaker independent, a dynamic adaptation to the speaker could be performed. >

...read moreread less

Proceedings Article•DOI•

Robust speaker identification in noisy environments using noise adaptive speaker models

[...]

Richard Rose¹, J. Fitzmaurice¹, E.M. Hofstetter¹, D.A. Reynolds¹•Institutions (1)

Massachusetts Institute of Technology¹

14 Apr 1991

TL;DR: The integrated noise model was noted for having a noise suppression characteristic that arises naturally from the statistical model, which is important if one wishes to avoid the ad hoc fine tuning of thresholds required in the noise processing approach implemented.

...read moreread less

Abstract: The use of probabilistic mixture densities for text-independent speaker identification in a noisy telephone channel environment is investigated. Two techniques for noise compensation are considered. In the first approach, a background noise model is integrated directly into the model for speech. In the second approach, noise preprocessing techniques are used to compensate noisy observations before passing them along to the speaker identification on conversational utterances collected over long-distance telephone channels from ten speakers. The integrated noise model was noted for having a noise suppression characteristic that arises naturally from the statistical model, which is important if one wishes to avoid the ad hoc fine tuning of thresholds required in the noise processing approach implemented. >

...read moreread less

Proceedings Article•

Speaker clustering for dialectic robustness in speaker independent recognition

[...]

Dirk Van Compernolle, Johan Smolders, P. Jaspers, T. Hellemans

01 Jan 1991

Proceedings Article•DOI•

A novel approach to speaker verification

[...]

D.A. Gaganelis¹, E. Frangoulis¹•Institutions (1)

Logica¹

14 Apr 1991

TL;DR: In this method, K-means clustering is used during training for robust speaker reference templates through the use of the Fourier-Bessel functions, which allows the combination of multiple feature sets in a single classification test.

...read moreread less

Abstract: The authors report on a novel method for telephone-based speaker verification. In this method, K-means clustering is used during training for robust speaker reference templates. The classification is made through the use of the Fourier-Bessel functions, which transform the original problem to a multidimensional detection problem. This technique allows the combination of multiple feature sets in a single classification test. Experiments with a number of speakers and words over the telephone network show the potential benefits of the new techniques using the standard and multivariate Gaussian classifiers. >

...read moreread less

Proceedings Article•DOI•

A neural speaker model for speaker clustering

[...]

Satoshi Nakamura¹, Toshio Akabane¹•Institutions (1)

National Archives and Records Administration¹

14 Apr 1991

TL;DR: A speaker model using a neural network is proposed for reference speaker clustering on speaker independent speech recognition and neural prediction modeling by multilayer perceptron and learning matrix vector-quantization are considered for the speaker modeling.

...read moreread less

Abstract: A speaker model using a neural network is proposed for reference speaker clustering on speaker independent speech recognition. Speaker individuality is embedded in not only a static short time spectrum and a pitch frequency, but also a dynamic spectral pattern and pitch pattern. In conventional modeling, speaker individuality is based on the former static features. The authors try to capture the latter dynamic features, of speaker by a neural speaker model. Two methods, neural prediction modeling by multilayer perceptron and learning matrix vector-quantization, are considered for the speaker modeling. Using the measures of speaker modeling, speaker clustering of the reference patterns based on mutual information is carried out for speaker independent speech recognition. >

...read moreread less

Proceedings Article•DOI•

Speaker-adaptive HMM-based speech recognition with a stochastic speaker classifier

[...]

A. Imamura

14 Apr 1991

TL;DR: A speaker-adaptive speech recognition method using a stochastic speaker classifier and four integrated 9-state ergodic speaker hidden Markov models estimated from the command words uttered by 116 training speakers is proposed.

...read moreread less

Abstract: A speaker-adaptive speech recognition method using a stochastic speaker classifier is proposed. The stochastic speaker classifier decides which spectral feature subspace is suitable for the input speaker by using integrated speaker Markov models. In the acoustic HMMs (hidden Markov models), the observation emission probabilities, are presented as joint probabilities for speaker individuality obtained from the speaker classifier and feature vectors, from the acoustic preprocessor. Evaluation experiments are performed using a telephone speech database of 50 command words and 10 Japanese digits. Using four integrated 9-state ergodic speaker hidden Markov models estimated from the command words uttered by 116 training speakers, the best word recognition accuracy of 98.1% is achieved for the 10 digits uttered by 116 test speakers. This is an improvement of 2% over the conventional pooled training method. >

...read moreread less

Proceedings Article•DOI•

Connectionist speaker normalization and its applications to speech recognition

[...]

X.D. Huang¹, Kai-Fu Lee¹, Alex Waibel¹•Institutions (1)

Carnegie Mellon University¹

30 Sep 1991

TL;DR: A codeword-dependent neural network is presented as a nonlinear mapping function to transform speech data between two speakers that significantly reduced the error rate and made full use of dynamic information.

...read moreread less

Abstract: Speaker normalization may have a significant impact on both speaker-adaptive and speaker-independent speech recognition. In this paper, a codeword-dependent neural network (CDNN) is presented for speaker normalization. The network is used as a nonlinear mapping function to transform speech data between two speakers. The mapping function is characterized by two important properties. First, the assembly of mapping functions enhances overall mapping quality. Second, multiple input vectors are used simultaneously in the transformation. This not only makes full use of dynamic information but also alleviates possible errors in the supervision data. Large-vocabulary continuous speech recognition is chosen to study the effect of speaker normalization. Using speaker-dependent semi-continuous hidden Markov models, performance evaluation over 360 testing sentences from new speakers showed that speaker normalization significantly reduced the error rate from 41.9% to 5.0% when only 40 speaker-dependent sentences were used to estimate CDNN parameters. >

...read moreread less

Journal Article•DOI•

Adaptable phoneme-based models for large-vocabulary speech recognition

[...]

Paul G. Bamberg, Mark A. Mandel

15 Dec 1991-Speech Communication

TL;DR: The DragonDictate recognizer was tested with two texts that differed greatly in vocabulary and style and performance for all three speakers was better than the performance for the reference speaker on unadapted models.

...read moreread less

Text-independent speaker recognition based on vowel spotting

[...]

Nikos Fakotakis, Anastasios Tsopanoglou, G. Kokkinakis

02 Sep 1991

TL;DR: An automatic text-independent speaker recognition system is presented, which is suitable for identification as well as for verification purposes, based on spotting the stable part of the vowel phonemes of the test utterances, extracting parameter vectors and classifying them to a speaker-dependent vowel reference database.

...read moreread less

Abstract: An automatic text-independent speaker recognition system is presented, which is suitable for identification as well as for verification purposes. The system is based on spotting the stable part of the vowel phonemes of the test utterances, extracting parameter vectors and classifying them to a speaker-dependent vowel reference database. The system was tested over a period of four months with a population of 12 male and female speakers with non-correlated training and test data. The accuracy of the system as measured by experimentation is satisfactory considering that the training utterances per speaker do not exceed 50 sec and the test utterances 1 sec in average. >

...read moreread less

Proceedings Article•

Speaker independent continuous HMM-based recognition of isolated words on a real-time multi-DSP system.

[...]

Abdulmesih Aktas, Klaus Zünkler

01 Jan 1991

Proceedings Article•

Large vocabulary speaker-adaptive continuous speech recognition research overview at dragon systems.

[...]

Janet M. Baker

01 Jan 1991

Proceedings Article•DOI•

Extending the vocabulary of a speaker independent recognition system

[...]

Stephan Euler¹, J. Zinke¹•Institutions (1)

Bosch¹

14 Apr 1991

TL;DR: The authors discuss the extension and adaptation of a speaker-independent, small-vocabulary, isolated word recognition system based on tied density hidden Markov models and compares different algorithms to avoid zero probabilities in the word models due to insufficient data.

...read moreread less

Abstract: The authors discuss the extension and adaptation of a speaker-independent, small-vocabulary, isolated word recognition system based on tied density hidden Markov models. In the proposed approach, the density functions are trained from a basic set of words using acoustic segmentation, position-dependent segment labeling, and clustering of the segment specific densities. Then the parameters of the word models are estimated by means of a Viterbi update procedure. With a given set of densities the Viterbi update can also be used to generate models for words not included in the basic set. The dependency between the recognition performance and the amount of reference data both for speaker-independent and speaker-dependent experiments is examined in detail. The authors compare different algorithms to avoid zero probabilities in the word models due to insufficient data. >

...read moreread less

Proceedings Article•

Comparison of time-dependent acoustic features for a speaker-independent speech recognition system.

[...]

Dominique Dubois

01 Jan 1991

Proceedings Article•

Optimization of perceptually-based spectral transforms in speaker identification.

[...]

Li-Qun Xu, John Mason

01 Jan 1991

Proceedings Article•DOI•

Fast speaker adaptation: some experiments on different techniques for codebook and HMM parameters estimation

[...]

Marco Ferretti¹, A.M. Mazza¹•Institutions (1)

IBM¹

14 Apr 1991

TL;DR: A set of techniques to perform fast speaker adaptation for a large vocabulary, natural-language, speech recognition system are presented and the experimentation has been carried out using a 20000-word, real-time,natural-language speech recognizer for the Italian language.

...read moreread less

Abstract: A set of techniques to perform fast speaker adaptation for a large vocabulary, natural-language, speech recognition system are presented. The experimentation has been carried out using a 20000-word, real-time, natural-language speech recognizer for the Italian language. To perform speaker adaptation within the framework of the probabilistic approach to speech recognition two different problems must be addressed: codebook adaptation and hidden Markov model parameters adaptation. The basic idea is to use a set of data collected from several different speakers as a source of a priori knowledge with a small speech sample provided by the new speaker to perform the adaptation task. Several different techniques for codebook adaptation have been tried and discussed. >

...read moreread less

Proceedings Article•DOI•

Speaker adaptation based on Markov modeling of speakers in speaker-independent speech recognition

[...]

H. Hattori

14 Apr 1991

TL;DR: A speaker adaptation method for HMM (hidden Markov model) based speaker-independent speech recognition without supervising reduces the confusion between models, which is caused by training using large-size training data, by controlling the influences of the training samples used in HMM training by considering the similarity of speaker individuality.

...read moreread less

Abstract: A speaker adaptation method for HMM (hidden Markov model) based speaker-independent speech recognition without supervising is presented. This method reduces the confusion between models, which is caused by training using large-size training data, by controlling the influences of the training samples used in HMM training by considering the similarity of speaker individuality. A Markov model and a hidden Markov model are used to represent an input speaker's individuality. These models are compared through their entropy and /b, d, g, m, n, N/ recognition task. The results show that a hidden Markov model is more suitable than a Markov model. >

...read moreread less

Proceedings Article•

Selection of speech units for a speaker-independent CSR task

[...]

L. Fissore, Egidio P. Giachin, Pietro Laface, G. Micca

01 Jan 1991