scispace - formally typeset
Search or ask a question

Showing papers on "Speaker diarisation published in 1982"


PatentDOI
TL;DR: In this article, a method and apparatus for recognizing an unknown speaker from a plurality of speaker candidates is presented, where portions of speech from the speaker candidates and from the unknown speaker are sampled and digitized.
Abstract: A method and apparatus for recognizing an unknown speaker from a plurality of speaker candidates. Portions of speech from the speaker candidates and from the unknown speaker are sampled and digitized. The digitized samples are converted into frames of speech, each frame representing a point in an LPC-12 multi-dimensional speech space. Using a character covering algorithm, a set of frames of speech is selected, called characters, from the frames of speech of all speaker candidates. The speaker candidates' portions of speech are divided into smaller portions called segments. A smaller plurality of model characters for each speaker candidate is selected from the character set. For each set of model characters the distance from each speaker candidate's frame of speech to the closest character in the model set is determined and stored in a model histogram. When a model histogram is completed for a segment a distance D is found whereby at least a majority of frames have distances greater D. The mean distance value of D and variance across all segments for both speaker and imposter is then calculated. These values are added to the set of model characters to form the speaker model. To perform recognition the frames of the unknown speaker as they are received are buffered and compared with the sets of model characters to form model histograms for each speaker. A likelihood ratio is formed. The speaker candidate with the highest likelihood ratio is chosen as the unknown speaker.

44 citations


Journal ArticleDOI
TL;DR: A method for speaker independent connected word recognition is described, based on a syntax-directed dynamic programming algorithm which matches the isolated word templates to sentence length utterances of a 100 speaker population.
Abstract: A method for speaker independent connected word recognition is described. Speaker independence is achieved by clustering isolated word utterances of a 100 speaker population. Connected word recognition is based on a syntax-directed dynamic programming algorithm which matches the isolated word templates to sentence length utterances. The method has been tested on an artificial task-oriented language based on a 127 word vocabulary. Four subjects, two men and two women, spoke a total of 209 sentences comprising 1750 words. At an average speaking rate of 171 words/min over dialed-up telephone lines, a correct word recognition rate of 97 percent was observed.

27 citations


Proceedings ArticleDOI
01 May 1982
TL;DR: A grouping of phoneme is proposed so that one adaptation parameter set is used for all phonemes that belong to any one group, and the cost of phoneme class-specific adaptation is very high, but the method needs a large learning set.
Abstract: Speaker dependence of automatic speech recognition systems can be reduced by applying speaker-specific transformations to adapt the speech signal of a new speaker to that of the reference speaker. Initial investigations showed that speaker adaptation can be performed by transformations using spectral weighting and spectral warping. These heuristic methods can be substituted by a general linear matrix transformation, the parameters of which are determined by mean square error optimisation. The improvement of the recognition rate achievable by this matrix transformation is very high, but the method needs a large learning set. This can be reduced by restriction of the matrix to a band including the main diagonal in the middle. This banded matrix yields results close to those of the general matrix. Adaptation can be performed speaker-specifically as well as speaker- and class-specifically. As the cost of phoneme class-specific adaptation is very high, a grouping of phonemes is proposed so that one adaptation parameter set is used for all phonemes that belong to any one group.

15 citations


Journal ArticleDOI

13 citations


Journal ArticleDOI
TL;DR: A new approach to text‐independent speaker recognition, developed to perform with short unknown utterances, models the spectral traits of a speaker with multiple sub‐models rather than using a single statistical distribution as done with previous approaches.
Abstract: This paper presents a new approach to text‐independent speaker recognition. The technique, developed to perform with short unknown utterances, models the spectral traits of a speaker with multiple sub‐models rather than using a single statistical distribution as done with previous approaches. The recognition is based on the statistical distribution of the distances between the unknown speaker and each of the speaker models. Only frames that are close to one of the speaker's sub‐models are considered in the recognition decision, so that speech events not encountered in the training data do not bias the recognition. The technique has been tested on a conversational data base. Models were generated using 100 s of speech from each of 11 male talkers. Unknown speech was obtained one week after the model data. Recognition accuracies of 96%, 87%, and 79% were obtained for unknown speech durations of 10, 5, and 3 s, respectively. The use of multiple sub‐models to characterize spectral traits results in improved discrimination between speakers, particularly when short speech segments are recognized. [Work supported by U. S. Air Force, Rome Air Development Center.]

12 citations


Journal ArticleDOI
TL;DR: The development of a high accuracy (about 99%) text-independent speaker recognition system is discussed in this paper and any two parameter sets of the first stage tests are combines logically to obtain a significantly higher recognition accuracy than is possible with any single-speaker-sensitive parameter set.

12 citations


Book ChapterDOI
Patrick Corsi1
01 Jan 1982
TL;DR: This paper presents a unified discussion of the scientific and practical issues in the field of speaker recognition, and distinguishes between the Verification and Identification tasks.
Abstract: This paper presents a unified discussion of the scientific and practical issues in the field of speaker recognition. Besides some background on speaker recognition by listening and visual analysis of spectrograms, we survey the computer recognition methods, and briefly discuss some technical aspects of various speaker recognizers, Methods for selecting an efficient set of features, and examples of results of experimental studies are also presented. We then differentiate between the Verification and Identification tasks.

7 citations


Book ChapterDOI
Stephen E. Levinson1
01 Jan 1982
TL;DR: A method for speaker independent connected word recognition is described, based on a syntax-directed dynamic programming algorithm which matches the isolated word templates to sentence length utterances of a 100 speaker population.
Abstract: A method for speaker independent connected word recognition is described. Speaker independence is achieved by clustering isolated word utterances of a 100 speaker population. Connected word recognition is based on a syntax-directed dynamic programming algorithm which matches the isolated word templates to sentence length utterances. The method has been tested on a task oriented English-like language based on a 127 word vocabulary. Four subjects, two men and two women, spoke a total of 209 sentences comprising 1750 words. At an average speaking rate of 171 words per minute over dialed-up telephone lines, a correct word recognition rate of 97% was observed.

5 citations


01 Dec 1982
TL;DR: An experiment to determine the possibilities of obtaining some speaker independence using speaker dependent voice recognition equipment revealed about 99% accuracy when the user's speech templates were in memory along with those of four other users.
Abstract: : This report discusses the results of an experiment to determine the possibilities of obtaining some speaker independence using speaker dependent voice recognition equipment. The results revealed about 99% accuracy when the user's speech templates were in memory along with those of four other users. If the user's voice patterns were not in memory but those of the four other users still were in memory, recognition accuracy still hovered around 95%. (Author)

4 citations


Journal ArticleDOI
Hermann Ney1
TL;DR: New techniques for automatic speaker recognition from telephone speech are described, based on spectral analysis of fixed sentence-long utterances, which is carried out by a dynamic programming algorithm which minimizes timing differences between corresponding speech events.

3 citations



Journal ArticleDOI
H. Mutschler1
TL;DR: In a series of experiments 7 parameters (speaker training, long-term speech consistency, system training, speaker sex, vocabulary phonetics, -size, background noise) were tested in a simple word input task.