scispace - formally typeset
Search or ask a question

Showing papers on "Speaker diarisation published in 1978"


Journal ArticleDOI
TL;DR: Dynamic programming is applied to the selection of feature subsets in text-independent speaker identification, showing a lower average identification error in comparison to that of the "knock-out" strategy, the cepstral coefficients, and the PARCOR coefficients.
Abstract: Dynamic programming is applied to the selection of feature subsets in text-independent speaker identification. Each feature is long-term averaged in order to reduce its variability to text information. The resulting subset of features shows a lower average identification error in comparison to that of the "knock-out" strategy, the cepstral coefficients, and the PARCOR coefficients.

35 citations


Proceedings ArticleDOI
01 Apr 1978
TL;DR: A very large data base consisting of over thirty-six hours of linguistically unconstrained extemporaneous speech, from seventeen speakers, recorded over a period of more than three months, was analyzed to determine the effectiveness of long-term average features for speaker identification.
Abstract: A very large data base consisting of over thirty-six hours of linguistically unconstrained extemporaneous speech, from seventeen speakers, recorded over a period of more than three months, was analyzed to determine the effectiveness of long-term average features for speaker identification. The results were strongly dependent on the voiced speech averaging interval, or L v . Monotonic increases in the probability of correct identification were obtained as L v increased, even with substantial time periods between successive sessions. Speaker identification performance in open tests improved if features with small between-class to within-class variance ratios were eliminated. For L v corresponding to approximately thirty-nine seconds of speech, true text-independent results (no linguistic constraints embedded into the data base) of 98.05% for speaker identification were obtained.

4 citations


01 Apr 1978
TL;DR: The development of a system for recognizing connected speech in real time using a commercially available speech preprocessor, a minicomputer and programs written in FORTRAN is described.
Abstract: : This report describes the development of a system for recognizing connected speech in real time using a commercially available speech preprocessor, a minicomputer and programs written in FORTRAN. The system was tested on two speakers using the digits and the word 'point' with inconclusive results. Recognition accuracy of 86% was achieved for one speaker whereas accuracy for the other speaker was lower (39%) due to an anomalous difference between training and test data for that speaker's voice. (Author)

1 citations


01 Jan 1978
TL;DR: The method which reduced candidate words in the vocabulary by means of pre-matching using both local and global features of a spoken word was adopted, to eliminate the most unlike group of candidates using the measurements of both features from the vocabulary list to reduce the recognition time.
Abstract: SUMMARY If we enlarge the vocabulary size of the word recognl~lOn system to about several hundreds, we are afraid that the recognition time becomes not only very long by increasing an amount of processing but also the correct rate of recognition decreases. To cope with these weak points, we adopted the method which reduced candidate words in the vocabulary by means of pre-matching using both local and global features of a spoken word. That is, to eliminate the most unlike group of candidates using the measurements of both features from the vocabulary list was tried to reduce the recognition time, and this operation also eliminated the misleading candidates to make increase the correct rate of recognition. Furthermore, to add the measurement to the final judgement made increase the correct rate. Moreover, in order to absorb the influence of speaker differences, we added the capability of learning to the system. In an experiment on name recognition using 100 Japanese-city names, the system recognized the names correctly at the rate of 83 % for unspec-ific speakers and 93 % after learning, using a mini-computer in real time. The number of candidate words was reduced to one tenth by pre-matching .

1 citations