Showing papers on "Speaker diarisation published in 1988"

PDF

Open Access

Large-vocabulary speaker-independent continuous speech recognition: the sphinx system

[...]

Raj Reddy¹, Kai-Fu Lee¹•Institutions (1)

Carnegie Mellon University¹

01 Jan 1988

405 citations

Proceedings Article•DOI•

Normalizations and selection of speech segments for speaker recognition scoring

[...]

K.-P. Li, J.E. Porter

11 Apr 1988

TL;DR: Normalization and selection techniques are described which improve speaker recognition accuracy using very short uncontrolled speech samples and facilitates setting acceptance thresholds for speaker verification against an open population.

...read moreread less

Abstract: Normalization and selection techniques are described which improve speaker recognition accuracy using very short uncontrolled speech samples. The first normalization depends on the means and variances of scores for a short, unknown sample matched to different models for many speakers. The selection procedure discards portions of a speech sample with poor speaker-discrimination ability. A second normalization is based on the range of matching scores of the supposed speaker's model against other speaker's models. It facilitates setting acceptance thresholds for speaker verification against an open population. >

...read moreread less

124 citations

Patent•DOI•

Rapidly training a speech recognizer to a subsequent speaker given training data of a reference speaker

[...]

Lalit R. Bahl¹, Robert Leroy Mercer¹, David Nahamoo¹•Institutions (1)

IBM¹

16 Jun 1988-Journal of the Acoustical Society of America

TL;DR: In this article, label output probabilities for subsequent speakers are re-parameterized based on confusion matrix entries having values indicative of the similarity between an lth label output of the subsequent speaker and a kth output for the reference speaker.

...read moreread less

Abstract: Apparatus and method for training the statistics of a Markov Model speech recognizer to a subsequent speaker who utters part of a training text after the recognizer has been trained for the statistics of a reference speaker who utters a full training text. Where labels generated by an acoustic processor in response to uttered speech serve as outputs for Markov models, the present apparatus and method determine label output probabilities at transitions in the Markov models corresponding to the subsequent speaker where there is sparse training data. Specifically, label output probabilities for the subsequent speaker are re-parameterized based on confusion matrix entries having values indicative of the similarity between an lth label output of the subsequent speaker and a kth label output for the reference speaker. The label output probabilities based on re-parameterized data are combined with initialized label output probabilities to form "smoothed" label output probabilities which feature smoothed probability distributions. Based on label outputs generated when the subsequent speaker utters the shortened training text, "basic" label output probabilities computed by conventional methodology are linearly averaged against the smoothed label output probabilities to produce improved label output probabilities.

...read moreread less

65 citations

Proceedings Article•DOI•

Speaker stress-resistant continuous speech recognition

[...]

D.B. Paul¹, E.A. Martin¹•Institutions (1)

Massachusetts Institute of Technology¹

11 Apr 1988

TL;DR: This system extends an earlier robust continuous observation HMM IWR system to continuous speech using the DARPA-robust (multi-condition with a pilot's facemask) database.

...read moreread less

Abstract: Most speech recognizers are sensitive to the speech style and the speaker's environment. This system extends an earlier robust continuous observation HMM IWR system to continuous speech using the DARPA-robust (multi-condition with a pilot's facemask) database. Performance on a 207 word, perplexity 14 task is 0.9% word error rate under office conditions and 2.5% (best speaker) and 5% (4 speaker average) for the normal test condition of the database. >

...read moreread less

54 citations

Proceedings Article•DOI•

Automatic recognition of gender by voice

[...]

Donald G. Childers¹, Ke Wu¹, K.S. Bae¹, Douglas M. Hicks•Institutions (1)

University of Florida¹

11 Apr 1988

TL;DR: Two algorithms are given for automatically recognizing the gender of a speaker using acoustic parameters extracted from the speaker's speech based on vowels and fricatives.

...read moreread less

Abstract: Two algorithms are given for automatically recognizing the gender of a speaker using acoustic parameters extracted from the speaker's speech. The speech data used for developing the algorithms were taken from a large data set. Only acoustic parameters for vowels and fricatives were used to develop and test the algorithms because the authors wanted the gender classification to be achieved rapidly using only a brief data record. >

...read moreread less

28 citations

Proceedings Article•DOI•

Speaker adaptation method for HMM-based speech recognition

[...]

Masafumi Nishimura¹, K. Sugawara¹•Institutions (1)

IBM¹

11 Apr 1988

TL;DR: The authors describe a speaker adaptation method consisting of two stages, in the first stage, label prototypes, which represent spectral features, are modified to reduce the total distortion error of vector quantization for a new speaker.

...read moreread less

Abstract: The authors describe a speaker adaptation method consisting of two stages. In the first stage, label prototypes, which represent spectral features, are modified to reduce the total distortion error of vector quantization for a new speaker. In the second stage, well-trained hidden Markov model (HMM) parameters are transformed by using a linear mapping function. This is estimated by counting the correspondences along the alignment between a state sequence of an HMM and a label sequence of a new speaker utterance. This adaptation procedure was tested in an isolated word recognition task using 150 confusable Japanese words. The original label prototypes and HMM parameters were estimated for a male speaker, who spoke each word 10 times. When the adaptation procedure was applied with 25 words, the average error rate for another seven male speakers was reduced from 25.0% to 5.6%, which was roughly the same as that for the original speaker. This procedure was also effective for adaptation between male and female speakers. >

...read moreread less

25 citations

Patent•DOI•

Speaker verification using memory address

[...]

Alan John Greaves, Paul Christopher Millar

07 Oct 1988-Journal of the Acoustical Society of America

TL;DR: Speaker verification is performed by converting a spectral analysis of an input speech signal into a digital format which is sent directly to the address input of memory storage defining the address which contains relevant information pertaining to the actual speech spectrum.

...read moreread less

Abstract: Speaker verification is performed by converting a spectral analysis of an input speech signal into a digital format. This digital format is sent directly to the address input of memory storage defining the address which contains relevant information pertaining to the actual speech spectrum. After training, each address contains labels defining whether the address is not used, is used by multiple users, or is used by a single user. Actual verification is performed by counting each occurrence of a valid user address during speech input by a speaker and selecting the highest count as indicative of the user who was speaking.

...read moreread less

12 citations

Journal Article•DOI•

Speaker recognition using kohonen's self-organizing feature map algorithm

[...]

J. A. Naylor, A. Higgins, K. P. Li, D. Schmoldt

01 Jan 1988-Neural Networks

TL;DR: The Kohonen self-organizing feature mapping algorithm is used to derive speech templates for text-independent automatic speaker recognition and has a practical advantage that the desired number of templates is specified in advance.

...read moreread less

10 citations

Proceedings Article•DOI•

A versatile speaker-dependent continuous speech understanding system

[...]

D. Bigorgne¹, A. Cozannet¹, M. Guyomard¹, Guy Mercier¹, Laurent Miclet¹, M. Querre¹, J. Sirox¹ - Show less +3 more•Institutions (1)

CNET¹

11 Apr 1988

TL;DR: A description of a speaker-dependent continuous speech understanding system, an extension of the KEAL system, connected to ALOEMDA, an active chart parser modifying its strategy and linguistic capabilities, and to a dialogue manager, and a speaker adaptation module allows to adjust some of the system parameters.

...read moreread less

Abstract: A description of a speaker-dependent continuous speech understanding system is given. An unknown utterance is recognized by means of the following procedures: acoustic analysis, phonetic segmentation and identification, word and sentence analysis. This new system is an extension of the KEAL system, connected to ALOEMDA, an active chart parser modifying its strategy and linguistic capabilities, and to a dialogue manager. A speaker adaptation module allows to adjust some of the system parameters by matching known utterances with their acoustical representation. The task to be performed, described by its vocabulary and its grammar, is given as a parameter of the system. The new configuration is under test and first results are given. Continuously spoken sentences extracted from a 'pseudo-LOGO' language are analysed with two different linguistic modules and recognition figures are presented. Another understanding task is described: the dialogue-driven interrogation of a small data base. >

...read moreread less

7 citations

Patent•

Regular synthesizing device for multilingual voice

[...]

Masanobu Abe, Kiyohiro Kano, Hisao Kuwabara

19 Feb 1988

TL;DR: In this article, a regular synthesizing means which synthesizes the voice signal of a 1st standard speaker with respect to the plural languages and a voice converting means which converts a voice signal outputted by a selecting means into the voice signals of a 2nd speaker to whom individual features are to be added.

...read moreread less

Abstract: PURPOSE: To obtain a voice signal which has individual features as to plural languages by providing a regular synthesizing means which synthesizes the voice signal of a 1st standard speaker with respect to the plural languages and a voice converting means which converts a voice signal outputted by a selecting means into the voice signal of a 2nd speaker to whom individual features are to be added. CONSTITUTION: A switching part 100 selects one of languages L1WLn, e.g., L1 and outputs the voice signal s11 of a standard speaker A1 of the selected language L1 from a multilingual regular synthesizing group 104 to a voice conversion part 101. The voice quality conversion part 101 receives the voice signal s11, refers to data on the speaker B whose voice is to be given individual ity in a voice individual information file 102, and converts the voice signal s11 of the standard speaker A1 into the voice signal s4 of the speaker B, which is outputted. Consequently, the individual features of speaker are given to regu larly synthesized voices of respective languages. COPYRIGHT: (C)1989,JPO&Japio

...read moreread less

4 citations

Patent•

Speaker verification system and process

[...]

Hiroaki Sakoe¹•Institutions (1)

NEC¹

13 Oct 1988

TL;DR: In this article, a speaker verification system and process is described, which consists of a control reference pattern memory (100), a verification reference memory (60), a counter, a control designation and decision unit (80), a control designator (90), a pattern matching unit (70), and a judge unit (120).

...read moreread less

Abstract: The invention relates to a speaker verification system and process. The system comprises a control reference pattern memory (100), a verification reference pattern memory (60), a counter (110), a control designation and decision unit (80), a control designator (90), a pattern matching unit (70) and a judge unit (120) outputting a confirmation signal when a predetermined condition is met.

...read moreread less

Proceedings Article•DOI•

Information theoretic factorization of speaker and language in hidden Markov models, with application to speaker recognition

[...]

Naftali Tishby¹•Institutions (1)

Bell Labs¹

11 Apr 1988

TL;DR: An information theoretic approach to speech modeling with prior statistical knowledge is proposed, using the concept of minimum discrimination information (MDI), a model of speech can be factored into a prior distribution and an exponential correction term, depending on the specific training data.

...read moreread less

Abstract: An information theoretic approach to speech modeling with prior statistical knowledge is proposed. Using the concept of minimum discrimination information (MDI), a model of speech can be factored into a prior distribution and an exponential correction term, depending on the specific training data. The discrimination information measures the statistical deviations of the training data from a prior model, in a way that is known to be optimal in a well defined sense. The minimization of the discrimination information, subject to the given training data as constraints, yields a set of Lagrange multipliers. These multipliers serve to characterize the part of the training data which is not described by the prior model. The problem of separating the speaker dependent part from a 'universal' speaker independent prior in hidden Markov models is studied in this framework and a practical method for achieving this separation is derived. As an example, universal hidden Markov priors for isolated English digits are trained for male and female speakers using a database of 100 speakers and 20000 spoken digits. The speaker specific part is modeled by the individual Lagrange multipliers obtained by minimizing the discrimination information between the training data and the corresponding prior language model. >

...read moreread less

Speaker Dependent and Independent Speech Recognition with an Auditory Model s5.9

[...]

Melvyn J. Hunt

01 Jan 1988

Patent•

Method for testing speech recognisers and speaker recognisers

[...]

Zinke Joachim Dipl Ing

22 Sep 1988

TL;DR: In this paper, different types of coding devices are successively switched between the output of a memory which outputs the speech samples to be detected and the input of the speech recogniser or speaker recogniser to be tested.

...read moreread less

Abstract: In the method for testing speech recognisers and speaker recognisers, after the speech samples to be learnt have been input the system is switched over to speech recognition, different types of coding devices being successively switched between the output of a memory which outputs the speech samples to be detected and the input of the speech recogniser or speaker recogniser to be tested. In this process, the detected meaning which in each case appears at the output of the said speech recogniser or speaker recogniser is compared with the true meaning of a speech sample to be detected, the speech sample being supplied from the memory which receives the speech sample to be detected. These comparisons are carried out by means of a control device, a recognition rate being calculated from the comparison results.

...read moreread less

Speaker adaptation method for hmm-base recognition

[...]

Masafumi Nishimuva, Kazuhide Sugawava

01 Jan 1988

TL;DR: Label protowhich represent spectral features, are modified uce the total distortion error of vector quantization new speaker by using a linear pping function in the first stage.

...read moreread less

Abstract: his paper describes a speaker adaptation method ting of two stages. In the first stage, label protowhich represent spectral features, are modified uce the total distortion error of vector quantization new speaker. In the second stage, well-trained parameters are transformed by using a linear pping function. This is estimated by counting the respondences along an alignment between a state uence of an HMM and a label sequence of a new aker utterance. ion procedure was tested in an isolated tion task using 150 confusable Japanese original label prototypes and HMM pae estimated for a male speaker, who spoke 0 times. When the adaptation procedure with 25 words, the average error rate for another seven male speakers was reduced from 25.0%

...read moreread less

Book Chapter•DOI•

A new approach to template selection for speaker independent word recognition

[...]

N. Yalabik¹, F. Yarman-Vural¹, A. Mansur²•Institutions (2)

Middle East Technical University¹, Drexel University²

01 Jan 1988

TL;DR: A modified version of “Condensing” combined with “Editing” algorithm is implemented to select the reference templates for a speaker independent isolated word recognition problem and it is shown that these algorithms improve the recognition rate in comparison to using clustering techniques for template selection.

...read moreread less

Abstract: This study explores the possibility of using Condensed Nearest Neighbor (CNN) rule for classification in various word recognition problems. A modified version of “Condensing” combined with “Editing” algorithm is implemented to select the reference templates for a speaker independent isolated word recognition problem. It is shown that these algorithms improve the recognition rate in comparison to using clustering techniques for template selection.

...read moreread less