scispace - formally typeset
Search or ask a question

Showing papers on "Speaker diarisation published in 1988"



Proceedings ArticleDOI
11 Apr 1988
TL;DR: Normalization and selection techniques are described which improve speaker recognition accuracy using very short uncontrolled speech samples and facilitates setting acceptance thresholds for speaker verification against an open population.
Abstract: Normalization and selection techniques are described which improve speaker recognition accuracy using very short uncontrolled speech samples. The first normalization depends on the means and variances of scores for a short, unknown sample matched to different models for many speakers. The selection procedure discards portions of a speech sample with poor speaker-discrimination ability. A second normalization is based on the range of matching scores of the supposed speaker's model against other speaker's models. It facilitates setting acceptance thresholds for speaker verification against an open population. >

124 citations


PatentDOI
TL;DR: In this article, label output probabilities for subsequent speakers are re-parameterized based on confusion matrix entries having values indicative of the similarity between an lth label output of the subsequent speaker and a kth output for the reference speaker.
Abstract: Apparatus and method for training the statistics of a Markov Model speech recognizer to a subsequent speaker who utters part of a training text after the recognizer has been trained for the statistics of a reference speaker who utters a full training text. Where labels generated by an acoustic processor in response to uttered speech serve as outputs for Markov models, the present apparatus and method determine label output probabilities at transitions in the Markov models corresponding to the subsequent speaker where there is sparse training data. Specifically, label output probabilities for the subsequent speaker are re-parameterized based on confusion matrix entries having values indicative of the similarity between an lth label output of the subsequent speaker and a kth label output for the reference speaker. The label output probabilities based on re-parameterized data are combined with initialized label output probabilities to form "smoothed" label output probabilities which feature smoothed probability distributions. Based on label outputs generated when the subsequent speaker utters the shortened training text, "basic" label output probabilities computed by conventional methodology are linearly averaged against the smoothed label output probabilities to produce improved label output probabilities.

65 citations


Proceedings ArticleDOI
11 Apr 1988
TL;DR: This system extends an earlier robust continuous observation HMM IWR system to continuous speech using the DARPA-robust (multi-condition with a pilot's facemask) database.
Abstract: Most speech recognizers are sensitive to the speech style and the speaker's environment. This system extends an earlier robust continuous observation HMM IWR system to continuous speech using the DARPA-robust (multi-condition with a pilot's facemask) database. Performance on a 207 word, perplexity 14 task is 0.9% word error rate under office conditions and 2.5% (best speaker) and 5% (4 speaker average) for the normal test condition of the database. >

54 citations


Proceedings ArticleDOI
11 Apr 1988
TL;DR: Two algorithms are given for automatically recognizing the gender of a speaker using acoustic parameters extracted from the speaker's speech based on vowels and fricatives.
Abstract: Two algorithms are given for automatically recognizing the gender of a speaker using acoustic parameters extracted from the speaker's speech. The speech data used for developing the algorithms were taken from a large data set. Only acoustic parameters for vowels and fricatives were used to develop and test the algorithms because the authors wanted the gender classification to be achieved rapidly using only a brief data record. >

28 citations


Proceedings ArticleDOI
Masafumi Nishimura1, K. Sugawara1
11 Apr 1988
TL;DR: The authors describe a speaker adaptation method consisting of two stages, in the first stage, label prototypes, which represent spectral features, are modified to reduce the total distortion error of vector quantization for a new speaker.
Abstract: The authors describe a speaker adaptation method consisting of two stages. In the first stage, label prototypes, which represent spectral features, are modified to reduce the total distortion error of vector quantization for a new speaker. In the second stage, well-trained hidden Markov model (HMM) parameters are transformed by using a linear mapping function. This is estimated by counting the correspondences along the alignment between a state sequence of an HMM and a label sequence of a new speaker utterance. This adaptation procedure was tested in an isolated word recognition task using 150 confusable Japanese words. The original label prototypes and HMM parameters were estimated for a male speaker, who spoke each word 10 times. When the adaptation procedure was applied with 25 words, the average error rate for another seven male speakers was reduced from 25.0% to 5.6%, which was roughly the same as that for the original speaker. This procedure was also effective for adaptation between male and female speakers. >

25 citations


PatentDOI
TL;DR: Speaker verification is performed by converting a spectral analysis of an input speech signal into a digital format which is sent directly to the address input of memory storage defining the address which contains relevant information pertaining to the actual speech spectrum.
Abstract: Speaker verification is performed by converting a spectral analysis of an input speech signal into a digital format. This digital format is sent directly to the address input of memory storage defining the address which contains relevant information pertaining to the actual speech spectrum. After training, each address contains labels defining whether the address is not used, is used by multiple users, or is used by a single user. Actual verification is performed by counting each occurrence of a valid user address during speech input by a speaker and selecting the highest count as indicative of the user who was speaking.

12 citations


Journal ArticleDOI
TL;DR: The Kohonen self-organizing feature mapping algorithm is used to derive speech templates for text-independent automatic speaker recognition and has a practical advantage that the desired number of templates is specified in advance.

10 citations


Proceedings ArticleDOI
D. Bigorgne1, A. Cozannet1, M. Guyomard1, Guy Mercier1, Laurent Miclet1, M. Querre1, J. Sirox1 
11 Apr 1988
TL;DR: A description of a speaker-dependent continuous speech understanding system, an extension of the KEAL system, connected to ALOEMDA, an active chart parser modifying its strategy and linguistic capabilities, and to a dialogue manager, and a speaker adaptation module allows to adjust some of the system parameters.
Abstract: A description of a speaker-dependent continuous speech understanding system is given. An unknown utterance is recognized by means of the following procedures: acoustic analysis, phonetic segmentation and identification, word and sentence analysis. This new system is an extension of the KEAL system, connected to ALOEMDA, an active chart parser modifying its strategy and linguistic capabilities, and to a dialogue manager. A speaker adaptation module allows to adjust some of the system parameters by matching known utterances with their acoustical representation. The task to be performed, described by its vocabulary and its grammar, is given as a parameter of the system. The new configuration is under test and first results are given. Continuously spoken sentences extracted from a 'pseudo-LOGO' language are analysed with two different linguistic modules and recognition figures are presented. Another understanding task is described: the dialogue-driven interrogation of a small data base. >

7 citations


Patent
19 Feb 1988
TL;DR: In this article, a regular synthesizing means which synthesizes the voice signal of a 1st standard speaker with respect to the plural languages and a voice converting means which converts a voice signal outputted by a selecting means into the voice signals of a 2nd speaker to whom individual features are to be added.
Abstract: PURPOSE: To obtain a voice signal which has individual features as to plural languages by providing a regular synthesizing means which synthesizes the voice signal of a 1st standard speaker with respect to the plural languages and a voice converting means which converts a voice signal outputted by a selecting means into the voice signal of a 2nd speaker to whom individual features are to be added. CONSTITUTION: A switching part 100 selects one of languages L1WLn, e.g., L1 and outputs the voice signal s11 of a standard speaker A1 of the selected language L1 from a multilingual regular synthesizing group 104 to a voice conversion part 101. The voice quality conversion part 101 receives the voice signal s11, refers to data on the speaker B whose voice is to be given individual ity in a voice individual information file 102, and converts the voice signal s11 of the standard speaker A1 into the voice signal s4 of the speaker B, which is outputted. Consequently, the individual features of speaker are given to regu larly synthesized voices of respective languages. COPYRIGHT: (C)1989,JPO&Japio

4 citations


Patent
Hiroaki Sakoe1
13 Oct 1988
TL;DR: In this article, a speaker verification system and process is described, which consists of a control reference pattern memory (100), a verification reference memory (60), a counter, a control designation and decision unit (80), a control designator (90), a pattern matching unit (70), and a judge unit (120).
Abstract: The invention relates to a speaker verification system and process. The system comprises a control reference pattern memory (100), a verification reference pattern memory (60), a counter (110), a control designation and decision unit (80), a control designator (90), a pattern matching unit (70) and a judge unit (120) outputting a confirmation signal when a predetermined condition is met.

Proceedings ArticleDOI
Naftali Tishby1
11 Apr 1988
TL;DR: An information theoretic approach to speech modeling with prior statistical knowledge is proposed, using the concept of minimum discrimination information (MDI), a model of speech can be factored into a prior distribution and an exponential correction term, depending on the specific training data.
Abstract: An information theoretic approach to speech modeling with prior statistical knowledge is proposed. Using the concept of minimum discrimination information (MDI), a model of speech can be factored into a prior distribution and an exponential correction term, depending on the specific training data. The discrimination information measures the statistical deviations of the training data from a prior model, in a way that is known to be optimal in a well defined sense. The minimization of the discrimination information, subject to the given training data as constraints, yields a set of Lagrange multipliers. These multipliers serve to characterize the part of the training data which is not described by the prior model. The problem of separating the speaker dependent part from a 'universal' speaker independent prior in hidden Markov models is studied in this framework and a practical method for achieving this separation is derived. As an example, universal hidden Markov priors for isolated English digits are trained for male and female speakers using a database of 100 speakers and 20000 spoken digits. The speaker specific part is modeled by the individual Lagrange multipliers obtained by minimizing the discrimination information between the training data and the corresponding prior language model. >


Patent
22 Sep 1988
TL;DR: In this paper, different types of coding devices are successively switched between the output of a memory which outputs the speech samples to be detected and the input of the speech recogniser or speaker recogniser to be tested.
Abstract: In the method for testing speech recognisers and speaker recognisers, after the speech samples to be learnt have been input the system is switched over to speech recognition, different types of coding devices being successively switched between the output of a memory which outputs the speech samples to be detected and the input of the speech recogniser or speaker recogniser to be tested. In this process, the detected meaning which in each case appears at the output of the said speech recogniser or speaker recogniser is compared with the true meaning of a speech sample to be detected, the speech sample being supplied from the memory which receives the speech sample to be detected. These comparisons are carried out by means of a control device, a recognition rate being calculated from the comparison results.

01 Jan 1988
TL;DR: Label protowhich represent spectral features, are modified uce the total distortion error of vector quantization new speaker by using a linear pping function in the first stage.
Abstract: his paper describes a speaker adaptation method ting of two stages. In the first stage, label protowhich represent spectral features, are modified uce the total distortion error of vector quantization new speaker. In the second stage, well-trained parameters are transformed by using a linear pping function. This is estimated by counting the respondences along an alignment between a state uence of an HMM and a label sequence of a new aker utterance. ion procedure was tested in an isolated tion task using 150 confusable Japanese original label prototypes and HMM pae estimated for a male speaker, who spoke 0 times. When the adaptation procedure with 25 words, the average error rate for another seven male speakers was reduced from 25.0%

Book ChapterDOI
01 Jan 1988
TL;DR: A modified version of “Condensing” combined with “Editing” algorithm is implemented to select the reference templates for a speaker independent isolated word recognition problem and it is shown that these algorithms improve the recognition rate in comparison to using clustering techniques for template selection.
Abstract: This study explores the possibility of using Condensed Nearest Neighbor (CNN) rule for classification in various word recognition problems. A modified version of “Condensing” combined with “Editing” algorithm is implemented to select the reference templates for a speaker independent isolated word recognition problem. It is shown that these algorithms improve the recognition rate in comparison to using clustering techniques for template selection.