scispace - formally typeset
Search or ask a question

Showing papers on "Speaker diarisation published in 1985"


Proceedings ArticleDOI
26 Apr 1985
TL;DR: A vector quantization (VQ) codebook was used as an efficient means of characterizing the short-time spectral features of a speaker and was used to recognize the identity of an unknown speaker from his/her unlabelled spoken utterances based on a minimum distance (distortion) classification rule.
Abstract: In this study a vector quantization (VQ) codebook was used as an efficient means of characterizing the short-time spectral features of a speaker. A set of such codebooks were then used to recognize the identity of an unknown speaker from his/her unlabelled spoken utterances based on a minimum distance (distortion) classification rule. A series of speaker recognition experiments was performed using a 100-talker (50 male and 50 female) telephone recording database consisting of isolated digit utterances. For ten random but different isolated digits, over 98% speaker identification accuracy was achieved. The effects, on performance, of different system parameters such as codebook sizes, the number of test digits, phonetic richness of the text, and difference in recording sessions were also studied in detail.

493 citations




Proceedings ArticleDOI
01 Apr 1985
TL;DR: The methods found to be most effective rely on the training process to incorporate channel variability, and it is shown that the direct approach, of using simple channel-invariant features, can discard much speaker dependent information.
Abstract: In this paper, we examine several methods for text-independent speaker identification of telephone speech with limited duration data, The issue addressed is the assessment of channel characteristics, especially linear aspects, and methods for improving speaker identification performance when the speaker to be identified is on a different telephone channel than that data used for training. We show experimental evidence illustrating the cross-channel problem and also show that the direct approach, of using simple channel-invariant features, can discard much speaker dependent information. The methods we have found to be most effective rely on the training process to incorporate channel variability.

31 citations


Proceedings ArticleDOI
01 Apr 1985
TL;DR: An application of source coding to speaker recognition is described, where each speaker is represented by a sequence of vector quantization codebooks; known input utterances are classified using these codebook sequences and the resulting classification distortion is compared to a rejection threshold.
Abstract: An application of source coding to speaker recognition is described. The method is text-dependent - the text spoken is known, and the problem is to determine who said it. Each speaker is represented by a sequence of vector quantization codebooks; known input utterances are classified using these codebook sequences and the resulting classification distortion is compared to a rejection threshold. On a 16 speaker test population with an additional 111 imposters, this method achieved a false rejection rate of 0.8%, an imposter acceptance rate of 1.8%, and within the 16 speakers, an identification error rate of 0.0%.

31 citations


Journal ArticleDOI
TL;DR: A text‐independent speaker clustering approach to speaker‐indepencent speaker recognition through vector quantization (VQ) was investigated, where the distortion value was used as a clustering measure.
Abstract: A text‐independent speaker clustering approach to speaker‐indepencent speaker recognition through vector quantization (VQ) was investigated, where the distortion value was used as a clustering measure. To show the possibility of the text‐independent speaker clustering, speaker recognition experiments were carried out using the Harvard sentence database. Nine male speakers uttered ten different Harvard sentences each. Codebooks were generated from the first five sentences for each speaker using Weighted Likelihood Ratio measure (WLR) through LPC analysis. Using 128 vectors in each codebook, a speaker recognition rate of 98% was attained on the latter five Harvard sentences. Effects of codebook size and input length are also discussed. The above approach based on framewise VQ only utilizes the static distribution of LPC spectra. VQ for multiframe codebooks was used to represent the coarticulation units. The results of speaker recognition experiments based on multi‐frame codebooks will be compared with fixed length VQ approaches.

17 citations


Journal ArticleDOI
TL;DR: An experimental investigation to determine the human speaker recognition performance of LPC voice processors indicates the importance of high-frequency data bandwidth for speaker recognition.
Abstract: Immediate identification of speakers' voices can be highly important to efficient communication in certain applications. This correspondence describes an experimental investigation to determine the human speaker recognition performance of LPC voice processors. A small group of coworkers were used as the test subjects. The test results indicate the importance of high-frequency data bandwidth for speaker recognition.

11 citations


Proceedings ArticleDOI
01 Apr 1985
TL;DR: Tests indicate that ASR using vocoded speech is definitely feasible, though further research is needed to determine which speech parameters are best suited for use with each voice processor.
Abstract: Automatic speaker recognition (ASR) offers potential benefit for numerous applications, including identification of users of communication channels such as the telephone and channels using processed or vocoded speech. Currently the listener must subjectively determine whether the person on the other end of the line is who he or she claims to be. However, when the speech is processed, recognition of voices can be very difficult for human listeners. A series of tests was conducted to evaluate the feasibility of automatic speaker recognition with processed or vocoded speech. The analog outputs of six different voice processors were used as input to a real-time ASR system. Recognition accuracy results for the processed speech were 70% to 95% using a 2500 Hz bandwidth input filter, and 75% to 95% using a 4000 Hz input filter. These results indicate that ASR using vocoded speech is definitely feasible, though further research is needed to determine which speech parameters are best suited for use with each voice processor.

5 citations



Journal ArticleDOI

1 citations


20 Sep 1985
TL;DR: Although it seems that on the whole LPC processing reduces speaker recognition, the reverse may be the case for some speakers in some contexts, this suggests that one should be cautious about comparing speaker recognition for different voice systems of the basis of a single set of speakers.
Abstract: : The effect of narrowband digital processing, using a linear predictive coding (LPC) algorithm at 2400 bits/s, on the recognition of previously unfamiliar speakers was investigated. Three sets of five speakers each (two sets of males differing in rated voice distinctiveness and one set of females) were tested for speaker recognition in two separate experiments using a familiarization-test procedure. In the first experiment three groups of listeners each heard a single set of speakers in both voice processing conditions, and in the second two groups of listeners each heard all three sets of speakers in a single voice processing condition. There were significant differences among speaker sets both with and without LPC processing, with the low distinctive males generally more poorly recognized than the other groups. There was also an interaction of speaker set and voice processing condition; the low distinctive males were no less recognizable over LPC than they were unprocessed, and one speaker in particular was actually better recognized over LPC. Although it seems that on the whole LPC processing reduces speaker recognition, the reverse may be the case for some speakers in some contexts. This suggests that one should be cautious about comparing speaker recognition for different voice systems of the basis of a single set of speakers. It also presents a serious obstacle to the development of a reliable standardized test of speaker recognizability. Keywords include: Speaker recognition; Linear prodictive coding (LPC); Voice distinctiveness; and Speaker recognition test.

Proceedings Article
18 Aug 1985
TL;DR: The case of the "jion'', a subset of character readings that "generates" a large subset of Japanese, is investigated, which suggests that combining DP-matching with limitedscope phoneme recognition could break through present limits.
Abstract: Attempts at automatic speech recognition have known several waves. Early efforts were based on the faith that speech is a string of phonemes that can be isolated and recognized one by one. This wave broke when it became clear that the physical realization of a phoneme is smeared in time and mingled with that of its neighbors, and also context and speaker-dependent. Next came the invention of the highly successful time-warping DP-matching methods, in which whole words are matched by templates. This wave is still going strong, at least in Japan, but it may have reached a high mark. To probe this question, we investigate the case of the "jion'', a subset of character readings that "generates" a large subset of Japanese. This set has low redundancy and contains many minimal pairs. Error analysis of DP-matching shows that most errors occur between pairs that differ only in their initial consonant, especially if it belongs to groups such as plosives or nasals. Combining DP-matching with limitedscope phoneme recognition could break through present limits.