scispace - formally typeset
Search or ask a question

Showing papers on "Speaker diarisation published in 1969"


Journal ArticleDOI
TL;DR: In this paper, a data reduction procedure based on the Karhunen-Loeve representation was used to represent the pitch information in each contour in a 20-dimensional space.
Abstract: The results of a study aimed at finding the importance of pitch for automatic speaker recognition are presented. Pitch contours were obtained for 60 utterances, each approximately 2‐sec in duration, of 10 female speakers. A data‐reduction procedure based on the Karhunen‐Loeve representation was found effective in representing the pitch information in each contour in a 20‐dimensional space. The data were divided into two portions; one part was used to design the speaker recognition system, while the other part was used to test the effectiveness of the design. The 20‐dimensional vectors representing the pitch contours of the design set were linearly transformed so that the ratio of interspeaker to intraspeaker variance in the transformed space was maximum. A reference utterance was formed for each speaker by averaging the transformed vectors of that speaker. The test utterance was assigned to the speaker corresponding to the reference utterance with the smallest Euclidean distance in the transformed space. The percentage of correct identifications using the above procedure was found to be 97%. The recognition rate increased to 98% with the addition of duration as an independent measure.

297 citations


Journal ArticleDOI
TL;DR: Impostors attempting to mimic the authorized speaker could not improve their ability to deceive the syst...
Abstract: Automatic speaker verification was accomplished in this study using cepstral measurements to characterize short segments in each of the first two vowels of the standard test phrase “My code is .” The length of the word “my” and the speaker's pitch were used as additional parameters. The verification decision is treated as a two‐class problem, the speaker being either the authorized speaker or an impostor. Reference data is used only for the authorized speaker. The decision is based on the test sample's distance to the nearest reference sample. Data is presented to show that, if reference samples are collected over a period of many days, then verification is possible more than two months later, whereas, if reference data is collected at one sitting, verification is highly inaccurate as little as 1 h later. Four authorized speakers and 30 impostors were examined, with error rates obtained from 6% to 13%. Impostors attempting to mimic the authorized speaker could not improve their ability to deceive the system significantly.

52 citations


Proceedings ArticleDOI
18 Nov 1969
TL;DR: A two-class recognition scheme is of interest for speaker verification, where a speaker who desired verification of his identity based upon some previously stored characteristics of his speech represents one of the two classes, whereas the other class encompasses all other speakers.
Abstract: There are many ways in which a pattern recognition system may be implemented. In the specific problem of speaker verification, a two-class recognition scheme is of interest. A speaker who desired verification of his identity based upon some previously stored characteristics of his speech represents one of the two classes (real), whereas the other class (impostor) encompasses all other speakers.

8 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed to use acoustic measures that are related in as direct a manner as possible to the voice characteristics of the unknown speaker and that are minimally affected by irrelevant factors.
Abstract: In a scheme for mechanical recognition of speakers, it is desirable to use acoustic measures that are related in as direct a manner as possible to the voice characteristics of the unknown speaker and that are minimally affected by irrelevant factors. Acoustic attributes that are dependent on anatomical properties of the speaker's vocal mechanism should be particularly effective ones. Certain phonemes or phoneme features are well suited for displaying speaker‐dependent characteristics. For example, aspects of the spectra of /∫/ (high frequency shape), /i/ (shape of the F2‐F3‐F4 peak), and /m/ (pole‐zero interplay and nasal ferments) have been found to be effective. This approach suggests that speaker‐recognition procedures should utilize a strategy of measuring only significant features of certain segments of an utterance rather than general measurements over the extent of the utterance. Implementation of this approach in automated speaker‐recognition schemes is discussed. [Work supported in part by the Na...

5 citations