scispace - formally typeset
Search or ask a question

Showing papers on "Speaker recognition published in 1971"


Patent
20 Apr 1971
TL;DR: In this paper, a nonlinear process is used to align the sample and reference utterances through a piece-wise linear continuous transformation of the time scale, and the extent of time transformation that is required to achieve maximum similarity also influences the decision to accept or reject the identity claim.
Abstract: Speaker verification, as opposed to speaker identification, is carried out by matching a sample of a person''s speech with a reference version of the same text derived from prerecorded samples of the same speaker. Acceptance or rejection of the person as the claimed individual is based on the concordance of a number of acoustic parameters, for example, formant frequencies, pitch period and speech energy. The degree of match is assessed by time aligning the sample and reference utterance. Time alignment is achieved by a nonlinear process which maximizes the similarity between the sample and reference through a piece-wise linear continuous transformation of the time scale. The extent of time transformation that is required to achieve maximum similarity also influences the decision to accept or reject the identity claim.

58 citations



Journal ArticleDOI
S. Das1, W. Mohn2
TL;DR: Experiments investigating adaptive pattern recognition in automatic speaker verification are reported, indicating that the utterances used for training purposes should preferably be collected over a relatively long period of time.
Abstract: Experiments investigating adaptive pattern recognition in automatic speaker verification are reported. A binary decision confirming or rejecting a speaker's purported identity is required. The experiments involve 7000 phrase length utterances of 118 speakers. An average misclassification rate of one percent with a "no decision" rate of ten percent is obtained. Other experiments indicate that the utterances used for training purposes should preferably be collected over a relatively long period of time.

41 citations


Journal ArticleDOI
G. L. Clapper1
TL;DR: Simple circuitry is described that learns a word with a single utterance and recognizes it thereafter and is potentially economical for the spoken equivalent of key entry of data and data inquiry, and a limited vocabulary of commands to the equipment.
Abstract: Separately spoken individual words can be automatically recognized using a two-dimensional pattern of spectral density versus a nonlinear time base The pattern for a given word differs from person to person and must be adaptively learned by the machine for each speaker Simple circuitry is described that learns a word with a single utterance and recognizes it thereafter The scheme is potentially economical for the spoken equivalent of key entry of data and data inquiry, and a limited vocabulary of commands to the equipment

21 citations


Book ChapterDOI
Klaus W. Otten1
TL;DR: This chapter focuses on the analysis of the problems that have to be solved to achieve machine recognition of Conversational speech, and discusses some possible avenues along which the design of conversational speech-recognition systems are likely to proceed.
Abstract: Publisher Summary This chapter focuses on the analysis of the problems that have to be solved to achieve machine recognition of conversational speech, and discusses some possible avenues along which the design of conversational speech-recognition systems are likely to proceed. Machine recognition of speech is the translation of the continuously varying acoustical speech signal into a sequence of discrete symbols representing linguistically defined units. The chapter discusses the reasons for interest in machine recognition of conversational speech, fundamentals of speech recognition, speech recognition as a mapping operation, speech recognition as the complement to speech production, and operational requirements for speech-recognition systems. The approaches to the recognition of conversational speech show wide ranges in three major areas: segmentation and classification, consideration of speaker variations, and information processing. Three approaches that appear to have chances for successful implementations and that have the potential of conversational speech recognition are described in continuous dynamic pattern match, phonological parametric decoding, and phonological digital decoding. Speech recognition efforts in the near future will play the role of providing instrumentation for research in phonetics, phonology, and linguistics rather than that of developing man–machine interfaces. The development of operational automatic-recognition systems for truly conversational speech appears to be a project for the far future.

11 citations


Journal ArticleDOI
TL;DR: A new pattern recognition technique is proposed that avoids the exhaustive comparison process associated with pattern matching and some preliminary results obtained show that a performance very similar to that obtained from the exhaustive compare process is attainable with a significant saving in computational effort.
Abstract: A description is given of an unusual pattern recognition technique which has been used in an experimental speech recognition system. Preliminary results obtained using this technique are reported. The speech analyzer produces a multichannel ternary signal at its output, which is the short term digital autocorrelation function of the input signal. This output is sampled at regular intervals and this sampled information is transferred to a computer. A new pattern recognition technique is proposed that avoids the exhaustive comparison process associated with pattern matching. The technique is similar to a tree-structured process in that decisions are taken that exclude certain master patterns from further processing as it becomes apparent that these are sufficiently dissimilar to the unknown pattern. However, retracing within the structure and the substitution of an alternative path are permitted if the current path appears unlikely to lead to a correct decision. Some preliminary results obtained using this technique are described. These show that a performance very similar to that obtained from the exhaustive comparison process is attainable with a significant saving in computational effort. The effect of varying certain parameters within the recognition process is also considered and some preliminary optimization of parameter values is reported.

5 citations


Journal ArticleDOI
TL;DR: The preliminary results obtained indicate that most of the necessary adaptation can be achieved in a relatively short time, provided that the speakers are instructed in how to change their articulations to produce the desired effects.
Abstract: This study examines the feasibility and limitations of speaker adaptation in improving the performance of a fixed (speaker-independent) automatic speech recognition system. A fixed vocabulary of 55 [ƏCVd] syllables is used in the recognition system, where C is 1 of 11 stops and fricatives, and V is 1 of 5 tense vowels. The results of the experiment on speaker adaptation, performed with 6 male and 6 female adult speakers, show that speakers can learn to change their articulations to appreciably improve recognition scores. The preliminary results obtained also indicate that most of the necessary adaptation can be achieved in a relatively short time, provided that the speakers are instructed in how to change their articulations to produce the desired effects.

4 citations


Proceedings ArticleDOI
01 Dec 1971

2 citations


Journal ArticleDOI
TL;DR: In this paper, the authors used a total of about 7000 utterances of the phrase "Check available terminals" from 118 speakers and reported an average misclassification rate of 1% with a no decision rate of 10%.
Abstract: Speaker verification is defined as confirming a speaker's purported identity. The method of confirmation is to compare a speech sample of the person against his speech profile, which is based on previous utterances of the same phrase. Experiments have demonstrated the feasibility of an automated scheme of speaker verification under certain restrictions. Important among these restrictions are the use of only male speakers, wide‐band (180–8000 Hz) audio channel, and noise‐free environment. The experiments used a total of about 7000 utterances of the phrase “Check available terminals” from 118 speakers. An average misclassification rate of 1% with a “no decision” rate of 10% was obtained. Further work is directed toward removing some of the restrictions, such as reducing the bandwidth and including female utterances. The strategy for improving the system performance is to reduce the “no decision” rate without increasing the misclassification rate.

2 citations



Book ChapterDOI
01 Jan 1971
TL;DR: The transmission system for speech consists of source, transmission path and receiver, and the human organ for speech production always constitutes the source, even if a recording system is used.
Abstract: The transmission system for speech consists of source, transmission path and receiver. The human organ for speech production always constitutes the source, even if a recording system is used. The transmission path for direct communication is the distance through the air, to which an electrical line with connected networks may be added for telecommunication. The receiver of speech information is — so far — the human ear.

Journal ArticleDOI
TL;DR: In this paper, the authors describe progress in the implementation of an automated speaker-verification system and a technique for codifying information contained within a voiced signal, which is not dependent upon speaker amplitude variations and allows verification of speakers to be accomplished in real time.
Abstract: This paper describes progress in the implementation of an automated speaker‐verification system and a technique for codifying information contained within a voiced signal. The technique described is not dependent upon speaker amplitude variations and allows verification of speakers to be accomplished in real time. Subjective experiments of simulated automated systems utilizing spectrographic analysis and an adjustable filter with a decibel readout establishes the feasibility of such a codification process. The manual codification process resulted in a speaker‐verification success score in excess of 60% when a single‐word utterance was used. By further processing called “serial elimination,” several utterances are analyzed serially and result in scores of the order of 95% and greater. A completely automated version of the above‐described simulated system employs a bank of narrow‐band filters to separate the voiced signals into component spectral parameters. A feature extraction strategy called a “sliding window” is described, which was found to reduce effects of multiple‐utterance “jitter.” In order to to reduce rejections of the desired speaker, his stored code is derived from the range of variance his utterances exhibit when collected from a succession of time‐spaced utterances. With a serial elimination process, speaker‐verification scores are comparable to those derived in the simulated system.This paper describes progress in the implementation of an automated speaker‐verification system and a technique for codifying information contained within a voiced signal. The technique described is not dependent upon speaker amplitude variations and allows verification of speakers to be accomplished in real time. Subjective experiments of simulated automated systems utilizing spectrographic analysis and an adjustable filter with a decibel readout establishes the feasibility of such a codification process. The manual codification process resulted in a speaker‐verification success score in excess of 60% when a single‐word utterance was used. By further processing called “serial elimination,” several utterances are analyzed serially and result in scores of the order of 95% and greater. A completely automated version of the above‐described simulated system employs a bank of narrow‐band filters to separate the voiced signals into component spectral parameters. A feature extraction strategy called a “sliding w...