Search or ask a question

Showing papers on "Speaker diarisation published in 1978"

PDF

Open Access

Journal Article•DOI•

Feature selection via dynamic programming for text-independent speaker identification

[...]

R. Cheung, B. Eisenstein

01 Oct 1978-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: Dynamic programming is applied to the selection of feature subsets in text-independent speaker identification, showing a lower average identification error in comparison to that of the "knock-out" strategy, the cepstral coefficients, and the PARCOR coefficients.

...read moreread less

Abstract: Dynamic programming is applied to the selection of feature subsets in text-independent speaker identification. Each feature is long-term averaged in order to reduce its variability to text information. The resulting subset of features shows a lower average identification error in comparison to that of the "knock-out" strategy, the cepstral coefficients, and the PARCOR coefficients.

...read moreread less

35 citations

Proceedings Article•DOI•

Text-independent speaker identification from a large linguistically unconstrained time-spaced data base

[...]

J. Markel, S. Davis

01 Apr 1978

TL;DR: A very large data base consisting of over thirty-six hours of linguistically unconstrained extemporaneous speech, from seventeen speakers, recorded over a period of more than three months, was analyzed to determine the effectiveness of long-term average features for speaker identification.

...read moreread less

Abstract: A very large data base consisting of over thirty-six hours of linguistically unconstrained extemporaneous speech, from seventeen speakers, recorded over a period of more than three months, was analyzed to determine the effectiveness of long-term average features for speaker identification. The results were strongly dependent on the voiced speech averaging interval, or L v . Monotonic increases in the probability of correct identification were obtained as L v increased, even with substantial time periods between successive sessions. Speaker identification performance in open tests improved if features with small between-class to within-class variance ratios were eliminated. For L v corresponding to approximately thirty-nine seconds of speech, true text-independent results (no linguistic constraints embedded into the data base) of 98.05% for speaker identification were obtained.

...read moreread less

4 citations

LISTEN: A System for Recognizing Connected Speech Over Small, Fixed Vocabularies, In Real Time.

[...]

J. E. Porter

01 Apr 1978

TL;DR: The development of a system for recognizing connected speech in real time using a commercially available speech preprocessor, a minicomputer and programs written in FORTRAN is described.

...read moreread less

Abstract: : This report describes the development of a system for recognizing connected speech in real time using a commercially available speech preprocessor, a minicomputer and programs written in FORTRAN. The system was tested on two speakers using the digits and the word 'point' with inconclusive results. Recognition accuracy of 86% was achieved for one speaker whereas accuracy for the other speaker was lower (39%) due to an anomalous difference between training and test data for that speaker's voice. (Author)

...read moreread less

1 citations

A Pre-Matching Method for a Real Time Spoken Word Recognition System and a Learning Procedure of Speaker Differences.

[...]

Seiichi Nakagawa, Toshiyuki Sakai

01 Jan 1978

TL;DR: The method which reduced candidate words in the vocabulary by means of pre-matching using both local and global features of a spoken word was adopted, to eliminate the most unlike group of candidates using the measurements of both features from the vocabulary list to reduce the recognition time.

...read moreread less

Abstract: SUMMARY If we enlarge the vocabulary size of the word recognl~lOn system to about several hundreds, we are afraid that the recognition time becomes not only very long by increasing an amount of processing but also the correct rate of recognition decreases. To cope with these weak points, we adopted the method which reduced candidate words in the vocabulary by means of pre-matching using both local and global features of a spoken word. That is, to eliminate the most unlike group of candidates using the measurements of both features from the vocabulary list was tried to reduce the recognition time, and this operation also eliminated the misleading candidates to make increase the correct rate of recognition. Furthermore, to add the measurement to the final judgement made increase the correct rate. Moreover, in order to absorb the influence of speaker differences, we added the capability of learning to the system. In an experiment on name recognition using 100 Japanese-city names, the system recognized the names correctly at the rate of 83 % for unspec-ific speakers and 93 % after learning, using a mini-computer in real time. The number of candidate words was reduced to one tenth by pre-matching .

...read moreread less

1 citations