scispace - formally typeset
Search or ask a question

Showing papers on "Speaker diarisation published in 1971"


Patent
20 Apr 1971
TL;DR: In this paper, a nonlinear process is used to align the sample and reference utterances through a piece-wise linear continuous transformation of the time scale, and the extent of time transformation that is required to achieve maximum similarity also influences the decision to accept or reject the identity claim.
Abstract: Speaker verification, as opposed to speaker identification, is carried out by matching a sample of a person''s speech with a reference version of the same text derived from prerecorded samples of the same speaker. Acceptance or rejection of the person as the claimed individual is based on the concordance of a number of acoustic parameters, for example, formant frequencies, pitch period and speech energy. The degree of match is assessed by time aligning the sample and reference utterance. Time alignment is achieved by a nonlinear process which maximizes the similarity between the sample and reference through a piece-wise linear continuous transformation of the time scale. The extent of time transformation that is required to achieve maximum similarity also influences the decision to accept or reject the identity claim.

58 citations


Journal ArticleDOI
S. Das1, W. Mohn2
TL;DR: Experiments investigating adaptive pattern recognition in automatic speaker verification are reported, indicating that the utterances used for training purposes should preferably be collected over a relatively long period of time.
Abstract: Experiments investigating adaptive pattern recognition in automatic speaker verification are reported. A binary decision confirming or rejecting a speaker's purported identity is required. The experiments involve 7000 phrase length utterances of 118 speakers. An average misclassification rate of one percent with a "no decision" rate of ten percent is obtained. Other experiments indicate that the utterances used for training purposes should preferably be collected over a relatively long period of time.

41 citations


Patent
04 Oct 1971
TL;DR: In this paper, the identification of which speaker is speaking at a given time is enabled by a unit associated with the recording apparatus and having a key for each speaker, the court reporter or attendant operating the appropriate corresponding key when a given speaker was speaking.
Abstract: An electronic recording and playback aid for use in stenographic reporting, by means of which the speakers'' voices are recorded and later played back to prepare a typewritten transcript of their words. Identification of which speaker is speaking at a given time is enabled by a unit associated with the recording apparatus and having a key for each speaker, the court reporter or attendant operating the appropriate corresponding key when a given speaker is speaking. Operation of the key applies a particular frequency of signal to a channel or track separate from that in which the speech is recorded, the frequency or tone continuing throughout the speech by that speaker, until a different key for a different speaker is operated. A light associated with each key is also automatically turned on when that key is operated, so that the operator can assure himself that the proper key is operated. When the recording is later played back, the particular speaker''s identification signal is automatically detected and used to turn on the light associated with the corresponding speaker''s key, while the speaker''s words are also being played back. This enables the operator on playback to start at any point on the tape with the speaker automatically identified by the light associated with his key, and without any interference with the sound of the recorded words.

12 citations


Patent
29 Dec 1971
TL;DR: In this article, a switching system and matrix circuit are used to connect the speaker to appropriate interpretation booths, thereby enabling an interpreter to translate the speaker's language into the different appropriate language and a circuit for transmitting the speaker''s language and the language of each active interpreter to a conference audience.
Abstract: A simultaneous translation system used to enable interpretation from the language of a speaker into several different languages. A switching system and matrix circuit are used to connect the speaker to appropriate interpretation booths thereby enabling an interpreter to translate the speaker''s language into the different appropriate language and a circuit for transmitting the speaker''s language and the language of each active interpreter to a conference audience.

8 citations


Journal ArticleDOI
TL;DR: The preliminary results obtained indicate that most of the necessary adaptation can be achieved in a relatively short time, provided that the speakers are instructed in how to change their articulations to produce the desired effects.
Abstract: This study examines the feasibility and limitations of speaker adaptation in improving the performance of a fixed (speaker-independent) automatic speech recognition system. A fixed vocabulary of 55 [ƏCVd] syllables is used in the recognition system, where C is 1 of 11 stops and fricatives, and V is 1 of 5 tense vowels. The results of the experiment on speaker adaptation, performed with 6 male and 6 female adult speakers, show that speakers can learn to change their articulations to appreciably improve recognition scores. The preliminary results obtained also indicate that most of the necessary adaptation can be achieved in a relatively short time, provided that the speakers are instructed in how to change their articulations to produce the desired effects.

4 citations


Journal ArticleDOI
TL;DR: A comparison has been made between the performance of a computer procedure of speaker verification and listener performance in the same task and the same test utterances used by Doddington were used as stimuli in a subjective speaker verification experiment.
Abstract: A comparison has been made between the performance of a computer procedure of speaker verification [Doddington, J. Acoust. Soc. Amer. 49, 139(A) (1971)] and listener performance in the same task. In the evaluation of the computer method, 32 “casual” impostors were pitted against eight “true” speakers. A “casual” impostor is one who makes no attempt to mimic the “true” speaker but simply repeats the same test sentences in his own natural voice. After an a posteriori adjustment of the acceptance‐rejection criterion to equalize errors of false acceptance and false rejection, an average error rate of 1.5% is obtained. The same test utterances used by Doddington were used as stimuli in a subjective speaker verification experiment in which 10 listeners participated. Each stimulus presentation was a paired comparison consisting of a challenge and a reference utterance. The reference was one of the “true” speaker utterances while the challenge was either an utterance from the same “true” speaker or an “impostor” utterance with equal likelihood. Listeners were required to indicate whether they thought the utterances were by the same or different speakers. The over‐all average error rates were approximately 4.2% for both false acceptance and false rejection. The best false acceptance rate by an individual listener was 1.6%, while the best individual false rejection rate was 0.5%.

2 citations


01 Jan 1971
TL;DR: An instant speaker algorithm and the digital circuitry employed to implement this algorithm are described and some comments are made concerning “on-off” patterns found in normal conversation and their relationship to theInstant speaker algorithm.
Abstract: Absfracf-When speech is transmitted over a conventional pulsecode modulation carrier the connection between two subscribers on a digital basis is straightforward. However, techniques for digital conference arrangements are not well known. Implementation of a conference arrangement using traditional analog techniques requires elaborate hardware and use of analog-to-digital converters. This paper, after a brief discussion of existing conference arrangements, describes an instant speaker algorithm and the digital circuitry employed to implement this algorithm. During every time frame an “active speaker” sample is sought. The active speaker is identified by comparing each participant’s digital sample during the following time frame. The active speaker sample is transmitted to all participants in the conferencing arrangement. A “last speaker” sample is transmitted to the active speaker. The conference circuit consists of 3 modules: comparison and gating module, qctive speaker memory module, and last speaker memory module. The implementation of these modules with conventional transistor-transistor logic digital circuits is discussed. Overall arrangement of this conference circuit in a local digital exchange is also explored. Some comments are made concerning “on-off” patterns found in normal conversation and their relationship to the instant speaker algorithm.

2 citations


Journal ArticleDOI
TL;DR: In this paper, the authors used a total of about 7000 utterances of the phrase "Check available terminals" from 118 speakers and reported an average misclassification rate of 1% with a no decision rate of 10%.
Abstract: Speaker verification is defined as confirming a speaker's purported identity. The method of confirmation is to compare a speech sample of the person against his speech profile, which is based on previous utterances of the same phrase. Experiments have demonstrated the feasibility of an automated scheme of speaker verification under certain restrictions. Important among these restrictions are the use of only male speakers, wide‐band (180–8000 Hz) audio channel, and noise‐free environment. The experiments used a total of about 7000 utterances of the phrase “Check available terminals” from 118 speakers. An average misclassification rate of 1% with a “no decision” rate of 10% was obtained. Further work is directed toward removing some of the restrictions, such as reducing the bandwidth and including female utterances. The strategy for improving the system performance is to reduce the “no decision” rate without increasing the misclassification rate.

2 citations