scispace - formally typeset
Search or ask a question

Showing papers on "Speaker recognition published in 1979"


Journal ArticleDOI
H. Sakoe1
TL;DR: A general principle of connected word recognition is given based on pattern matching between unknown continuous speech and artificially synthesized connected reference patterns and Computation time and memory requirement are both proved to be within reasonable limits.
Abstract: This paper reports a pattern matching approach to connected word recognition. First, a general principle of connected word recognition is given based on pattern matching between unknown continuous speech and artificially synthesized connected reference patterns. Time-normalization capability is allowed by use of dynamic programming-based time-warping technique (DP-matching). Then, it is shown that the matching process is efficiently carried out by breaking it down into two steps. The derived algorithm is extensively subjected to recognition experiments. It is shown in a talker-adapted recognition experiment that digit data (one to four digits) connectedly spoken by five persons are recognized with as high as 99.6 percent accuracy. Computation time and memory requirement are both proved to be within reasonable limits.

289 citations


Proceedings ArticleDOI
01 Apr 1979
TL;DR: In this paper, a speaker independent, isolated word recognition system is proposed which is based on the use of multiple templates for each word in the vocabulary, which are obtained from a statistical clustering analysis of a large data base consisting of 100 replications of each word (i.e. once by each of 100 talkers).
Abstract: A speaker independent, isolated word recognition system is proposed which is based on the use of multiple templates for each word in the vocabulary. The word templates are obtained from a statistical clustering analysis of a large data base consisting of 100 replications of each word (i.e. once by each of 100 talkers). The recognition system, which uses telephone recordings, is based on an LPC analysis of the unknown word, dynamic time warping of each reference template to the unknown word (using the Itakura LPC distance measure), and the application of a K-nearest neighbor (KNN) decision rule to lower the probability of error. Results are presented on two test sets of data which show error rates that are comparable to, or better than, those obtained with speaker trained, isolated word recognition systems.

120 citations


Journal ArticleDOI
TL;DR: In this article, a large data set consisting of over 36 hours of unconstrained extemporaneous speech, from 17 speakers, recorded over a period of more than three months, was analyzed to determine the effectiveness of long-term average features for speaker recognition.
Abstract: A very large data base consisting of over 36 h of unconstrained extemporaneous speech, from 17 speakers, recorded over a period of more than three months, has been analyzed to determine the effectiveness of long-term average features for speaker recognition. Results are shown to be strongly dependent on the voiced speech averaging interval L e . Monotonic increases in the probability of correct identification and monotonic decreases in the equal error probability for speaker verification were obtained as L e increased, even with substantial time periods between successive sessions. For L e corresponding to approximately 39 s of speech, text-independent results (no linguistic constraints embedded into the data base) of 98.05 percent for speaker identification and 4.25 percent for equal error speaker verification were obtained.

53 citations



Journal ArticleDOI
TL;DR: Results obtained for four male speakers show how accounting for coarticulation effects gives substantially better performances than previous approaches.
Abstract: A system for the automatic recognition of bilabial /m/ and alveolar /n/ in vowel-consonant-vowel utterances extracted from continuous speech is presented. It is based on a syntactic pattern recognition approach and the use of fuzzy relations for evaluating phonemic hypotheses. The knowledge source, based on very simple transition networks with associated simple semantic rules, is inferred from experiments. Results obtained for four male speakers are presented together with an acoustic-phonetic motivation of the approach used. These show how accounting for coarticulation effects gives substantially better performances than previous approaches.

20 citations


Journal ArticleDOI
TL;DR: Results from the present experiment suggest that subjects have the option to prevent the speaker’s-voice attribute from being stored with the contents of what is said when such processing would interfere with other cognitive operations.
Abstract: The voice-connotation hypothesis of Geiselman and Bellezza (1976, 1977) states that a speaker’s voice is sometimes remembered without intent because the connotation of the voice automatically influences the meaning of what is said. Results from the present experiment suggest that subjects have the option to prevent the speaker’s-voice attribute from being stored with the contents of what is said when such processing would interfere with other cognitive operations.

18 citations


Proceedings ArticleDOI
01 Apr 1979
TL;DR: A speaker dependent system for recognizing carefully articulated continuous speech that accepts English sentences composed from a 127 word vocabulary appropriate to an airline information reservation task and achieves 75% sentence recognition.
Abstract: A speaker dependent system for recognizing carefully articulated continuous speech is described. The system accepts English sentences composed from a 127 word vocabulary appropriate to an airline information reservation task. The system is controlled by a finite state parser which generates word candidates and established their temporal locations in hypothetical sentences. The word candidates are evaluated by an LPC distance measure and a dynamic programming algorithm which nonlinearly time aligns isolated word reference templates with the input speech stream. The input is recognized as the hypothetical sentence having the lowest distance according to a well-defined criterion. In a preliminary test based on 100 sentences spoken over dialed up telephone lines by two male talkers, 90% word accuracy, resulting in 75% sentence recognition, was achieved.

15 citations


Journal ArticleDOI
TL;DR: Analysis and design of a two-stage pattern classifier for speaker identification in a population of 30 is considered and a subset of the total feature set is given that gives an absolute identification of the speaker's identity.
Abstract: Design of speaker identification schemes for a small number of speakers (around 10) with a high degree of accuracy in a controlled environment is a practical proposition today. When the number of speakers is large (say, above 20 or 30), many of these schemes cannot be directly utilized as both recognition error and computation time increase monotonically with population size. A multistage classification technique gives better results when the number of speakers is large. Such a scheme may be implemented as a decision tree classifier in which the final decision is made only after a predetermined number of stages. In the present paper, analysis and design of a two-stage pattern classifier is considered. At the first stage a large number of classes, to which the given pattern cannot belong, is rejected. This is to be done using a subset of the total feature set. Also, the accuracy of such a rejection process must be very high, consistent with the overall accuracy desired. This initial classification gives a subset of the total classes, which has to be carefully considered at the next stage utilizing the remaining features for an absolute identification of the class label (the speaker's identity). The procedure is illustrated by designing and testing a two-stage classifier for speaker identification in a population of 30.

13 citations



Proceedings ArticleDOI
01 Apr 1979
TL;DR: An intensive experimental screening of more than 550 potential voice descriptors permitted the development of a sensitive, comprehensive voice-rating form which was used by an experienced listening crew to characterize voice samples from 80 male speakers.
Abstract: The development of practical methods of predicting speaker recognizability in communication systems has had to await the development of an adequate perceptual voice taxonomy. Previous efforts have been hampered by inadequate voice samples and inappropriate scaling techniques. An intensive experimental screening of more than 550 potential voice descriptors permitted the development of a sensitive, comprehensive voice-rating form which was used by an experienced listening crew to characterize voice samples from 80 male speakers. Factor analysis was used to identify the elementary perceptual parameters of individual differences in speech and to classify voices in a perceptual voice trait space. Implications for the development of tests of speaker recognizability are discussed.

11 citations


Journal ArticleDOI
TL;DR: Two categorizations are presented of aspects of the speaker recognition field; the first examines the memory systems involved in experimental tasks and is based on a critical account of the taxonomy proposed by Bricker & Pruzansky (1976).
Abstract: Two categorizations are presented of aspects of the speaker recognition field. The first examines the memory systems involved in experimental tasks and is based on a critical account of the taxonomy proposed by Bricker & Pruzansky (1976). The second deals with the decisions which listeners are required to make in the experimental situation. Finally, the differences between the experimental situation and the real world are examined.



Proceedings ArticleDOI
01 Apr 1979
TL;DR: Under the present restriction to vowel spectra adaptation methods by spectral amplitude weighting and by spectral shifting are investigated, by a special method it was enabled to adapt test spectra class specifically.
Abstract: An automatic speech recognition system based on the reference set of a single speaker can be extended for use by several speakers by applying appropriate preprocessing transformations. These transformations adapt the incoming patterns of a new speaker to the patterns of the reference set. Under the present restriction to vowel spectra adaptation methods by spectral amplitude weighting and by spectral shifting are investigated. By a special method it was enabled to adapt test spectra class specifically.

Proceedings ArticleDOI
01 Apr 1979
TL;DR: It is proposed to characterize the speech short-term spectrum with a reduced number of parameters (4 to 7) computed from a rough spectral analysis that permits a correct classification of the steady-state French speech sounds pronounced by different speakers.
Abstract: Tracking and identifying the formants in order to perform speech recognition is a time-consuming, error full and speaker-dependent operation. It is proposed to characterize the speech short-term spectrum with a reduced number of parameters (4 to 7) computed from a rough spectral analysis. These parameters permit a correct classification of the steady-state French speech sounds (vowels, including nasals, and unvoiced fricatives) pronounced by different speakers. A word recognition experiment based on the same parameters gives good results with words differing from each other by one phoneme only (single speaker, one learning pass).

Proceedings ArticleDOI
G. Mian1
01 Apr 1979
TL;DR: The aim of the present work was to evaluate the performances of an automatic speaker recognition system, based on LPC, on the same speech material recorded in three different conditions: on a quiet room, from dialled up telephones lines via direct hookup and via a suction cup tap.
Abstract: Linear prediction parameters are critically dependent upon the short-term spectrum of speech and therefore to noises and distorsions introduced by transmission and recording systems. The aim of the present work was to evaluate the performances of an automatic speaker recognition system, based on LPC, on the same speech material recorded in three different conditions: on a quiet room, from dialled up telephones lines via direct hookup and via a suction cup tap. Each of ten speakers spoke an 8s long sentence four times over a two-months period. Sentences were manually segmented and performance evaluation was conducted on phonemes, on breath groups and on the whole sentence using a minimum weighted distance classifier.

Proceedings ArticleDOI
01 Apr 1979
TL;DR: This procedure is formulated as a stochastic optimal control problem and is illustrated by designing speaker recognition system for 60 speakers with overall accuracy of 97.2 %.
Abstract: Speaker recognition schemes which work satisfactorily for small populations often fail when the number of classes is very large One way of solving such problems is to go in for multistage classification schemes The basic technique is to successively reduce the number of classes in several stages using one feature at each stage and when the number of classes is less than a predetermined value then the final decision is made The whole scheme is designed so that the probability of error is fixed at an acceptable level The computational cost of such a multistage scheme depends on the features used at each stage and the cost of measurement of each feature The features to be used at each stage are determined so as to reduce the average computational cost for making a decision This procedure is formulated as a stochastic optimal control problem and is illustrated by designing speaker recognition system for 60 speakers The overall accuracy of the system is 972 %

01 Jan 1979
TL;DR: A hierarchical clustering algorithm was used, followed by an iterative optimization procedure, to develop a robust speaker-independent, connected digit-sequence recognition capability as the front-end for a speaker verification(voice authentication) program and to install and demonstrate that capability on the Base and Installation Security System Advanced Development Model for speaker verification located at RADC.
Abstract: : The objective of this resarch has been to develop a robust speaker-independent, connected digit-sequence recognition capability as the front-end for a speaker verification(voice authentication) program and to install and demonstrate that capability on the Base and Installation Security System Advanced Development Model for speaker verification located at RADC In such a system, the correct digit sequence recognition provides the user identification of the claimed identity Verification is then performed on the same speech data This total-voice system must recognize connected digits independent of speaker with high reliability Two sequence constraints aid recognition: two parity checks must be satisfied, and difficult digit pairs were disallowed A further sequence constraint added to aid verification was that all digits must be different The selected constraints yield 320 possible sequences The speech processing strategy features highly reliable time registration and accommodates multiple concurrent hypotheses at various processing levels Basic to robust speaker-independent recognition is the existence of a set of reference patterns capable of allowing for the speaker's sex and dialect Rather than arbitrary segmentation of the design data to produce reference patterns, a hierarchical clustering algorithm was used, followed by an iterative optimization procedure