scispace - formally typeset
Search or ask a question

Showing papers on "Speaker diarisation published in 1984"


Patent
31 Dec 1984
TL;DR: In this paper, a method and system for speaker enrollment, as well as for speaker recognition, is described, where each candidate speaker is assigned a set of short acoustic segments of phonemic duration.
Abstract: The invention provides a method and system for speaker enrollment, as well as for speaker recognition. Speaker enrollment creates for each candidate speaker a set of short acoustic segments, or templates, of phonemic duration. An equal number of templates is derived from every candidate speaker's training utterance. A speaker's template set serves as a model for that speaker. Recognition is accomplished by employing a continuous speech recognition (CSR) system to match the recognition utterance with each speaker's template set in turn. The system selects the speaker whose templates match the recognition utterance most closely, that is, the speaker whose CSR match score is lowest. The method of the invention incorporates the entire training utterance in each speaker model, and explains the entire test utterance. The method of the invention models individual short segments of the speech utterances as well as their long-term statistics. Both static and dynamic speaker characteristics are captured in the speaker models.

28 citations


Proceedings ArticleDOI
01 Mar 1984
TL;DR: The paper describes an automatic method, called Automatic Diphone Bootstrapping (or A.D.R.B.B.), for template extraction for Speaker-Adaptive Continuous Speech Recognition using "diphones" as speech units, which operates without any manual intervention and performed very well for all the speakers on which it was tested.
Abstract: The paper describes an automatic method, called Automatic Diphone Bootstrapping (or A.D.B.), for template extraction for Speaker-Adaptive Continuous Speech Recognition using "diphones" as speech units. Diphones have proved to be very suitable for C.S.R. as they meet the main requirements of phonetic units: invariance with the context and economy. Furthermore the performance of diphone-based speaker dependent C.S.R. systems is very high. For a long time manual extraction has been presented in the literature as the only completely reliable method for sub-word template creation for any speaker (see [1] as an example). Recently some automatic techniques for reference pattern extraction were developed [2,3], but they also require some manual corrections. The A.D.B. procedure operates without any manual intervention and performed very well for all the speakers on which it was tested. In a connected digit recognition task, a W.R.R. of 98.79% was achieved by using the speaker-adaptive templates created by the A.D.B. procedure.

14 citations


Proceedings ArticleDOI
19 Mar 1984
TL;DR: An artificial speech recognition experiment is introduced as a convenient means of assessing alignment accuracy, and alignment accuracy is found to be improved considerably by applying certain speaker adaptation transformations to the synthetic speech.
Abstract: A capacity to carry out reliable automatic time alignment of synthetic speech to naturally produced speech offers potential benfits in speech recognition and speaker recognition as well as in synthesis itself. Phrase alignment experiments are described that indicate that alignment to synthetic speech is more difficult than alignment of speech from two natural speakers. An artificial speech recognition experiment is introduced as a convenient means of assessing alignment accuracy. By this measure, alignment accuracy is found to be improved considerably by applying certain speaker adaptation transformations to the synthetic speech, by modifying the spectrum similarity metric, and by generating the synthetic spectra directly from the control parameters using simplified excitation spectra. The improvements seem to limit, however, at a level below that found between natural speakers. It is conjectured that further improvement requires modifications to the synthesis rules themselves.

13 citations


Proceedings ArticleDOI
01 Mar 1984
TL;DR: A Speaker Recognizability Test (SRT) is presented, which tries to establish how well a given communications system preserves a speaker's identity.
Abstract: Speech intelligibility and quality are the two most often tested features of speech coding systems. However, another feature of interest in store-and-forward applications is the preservation of a speaker's identity. Here, a Speaker Recognizability Test (SRT) is presented, which tries to establish how well a given communications system preserves a speaker's identity. Contrary to previous efforts, no attempt is made to identify the cues used by listeners for speaker recognition. Instead, listeners are asked directly to identify a speaker who says an utterance by comparing the uttered sentence with reference sentences, one from each speaker. Among the issues considered in the design of the test is the choice of speakers, the use of reference sentences from the same or different sessions of data collection, and the use of processed or unprocessed speech for reference.

6 citations


Proceedings ArticleDOI
01 Mar 1984
TL;DR: A speaker adaptation method that follows two steps -- selection of "persons" who have voices similar to the user's and generation of a speaker-adapted dictionary from their dictionaries is studied.
Abstract: A speaker-trained voice recognition system with a large vocabulary has a serious weak point, that is, the user must register a large number of words prior to its use. To be freed from this problem, the authors have studied a speaker adaptation method. This method follows two steps -- 1) selection of "persons" who have voices similar to the user's and 2) generation of a speaker-adapted dictionary from their dictionaries. Results of simulation using 1000-word speech samples by 40 male speakers (20 for standard dictionaries and 20 for performance evaluation) are reported. The results indicated the advantage of this method. The speaker-trained dictionary gave 90.1% recognition accuracy, the speaker-independent dictionary gave 83.6%, and the speaker-adapted dictionary which required only 10% of the vocabulary for training gave 85.7%.

4 citations


01 Jan 1984
TL;DR: In this paper, a speaker-independent spoken word recognition system for a large size vocabulary is described, in which speech is analyzed by the filter bank, from whose logarithmic spectrum the 11 features are extracted every 10 ms.
Abstract: This paper describes the speaker-independent spoken word recognition system for a large size vocabulary. Speech is analyzed by the filter bank, from whose logarithmic spectrum the 11 features are extracted every 10 ms. Using the features the speech is first segmented and the primary phoneme recognition is carried out for every segment using the Bayes decision method. After correcting errors in segmentation and phoneme recognition, the secondary recognition of part of the consonants is carried out and the phonemic sequence is determined. The word dictionary item having maximum likelihood to the sequence is chosen as the recognition output. The 75.9% score for the phoneme recognition and the 92.4% score for the word recognition are obtained for the training samples in the 212 words uttered by 10 male and 10 female speakers. For the same words uttered by 30 male and 20 female speakers different from the above speakers, the 88.1% word recognition score is obtained.

3 citations



Proceedings ArticleDOI
01 Jan 1984
TL;DR: This study uses operational evaluation techniques to model a system which processes human speech to verify the identity persons seeking access to a facility resource and decides whether the speaker is valid or imposter based on the degree of similarity observed.
Abstract: This study uses operational evaluation techniques to model a system which processes human speech to verify the identity persons seeking access to a facility resource. The system consists of hardware and software for accepting analog speech; extracting time, frequency, and amplitude characteristics; producing compact digital templates containing the features for speaker identification; and cross-referencing templates with reference patterns establish the degree of similarity between utterence and a set of utterences for the person whose identity is being claimed. decision algorithm is implemented determine whether the speaker is valid or imposter based on the degree of similarity observed.A conceptual model has been tested and used to simulate variations in system attributes in order to optimize system performance. Performance is evaluated in terms of number of imposters who can defeat system, and the number of rejected valid speakers.

3 citations




16 Jul 1984
TL;DR: A commonly cited drawback of narrowband systems such as the DoD standard linear predictive coding (LPC) algorithm is that speaker recognition is poor, yet it is the opinion of many users that they frequently recognize the speaker.
Abstract: : A commonly cited drawback of narrowband systems such as the DoD standard linear predictive coding (LPC) algorithm is that speaker recognition is poor Yet it is the opinion of many users that they frequently recognize the speaker Tape recordings of 24 speakers conversing over an unprocessed channel and over an LPC voice processing system were subjected to listening tests Twenty four co workers listened to the tapes and attempted to identify each speaker from a list of about 40 people in the same branch Prior to the recognition tests, each of the listeners also rated his or her familiarity with each of the speakers and the distinctiveness of each speaker's voice There was some loss in voice recognition over LPC, but the recognition rate was still quite high Unprocessed voices were correctly identified 88% of the time, whereas the same people talking over the LPC system were correctly identified 69% of the time Talker familiarity was significantly correlated with correct identifications There was no significant correlation between the rated distinctiveness of the speaker and correct identifications However, familiarity and distinctiveness ratings were highly correlated