scispace - formally typeset
Search or ask a question

Showing papers on "Speaker recognition published in 1980"


Book
01 Jan 1980

228 citations



Journal ArticleDOI
TL;DR: The authors reviewed what is currently known about voice identification by human listeners and concluded that the caution and suspicion currently accorded to visual identification must be extended also, and perhaps more so, to voice identification.
Abstract: This paper reviews what is currently known about voice identification by human listeners. Our own experimental data from a four-year research program into this topic is used to elucidate, support, and in some cases to contradict published work into the effects on voice identification of such factors as speech sample size and quality, voice disguise, delay in holding voice identification sessions, incidental as opposed to intentional memory for voices, the effects of the age of the witness, training in specific modes of encoding voices, and the relationship between objective accuracy and subjective feelings of certainty of correctness. It is concluded that the caution and suspicion currently accorded to visual identification must be extended also, and perhaps more so, to voice identification.

81 citations


Proceedings ArticleDOI
01 Apr 1980
TL;DR: It is shown that, in several simple performance evaluations, the local minimum method performed considerably better then the fixed range method.
Abstract: Several variations on algorithms for dynamic time warping have been proposed for speech processing applications. In this paper two general algorithms that have been proposed for word spotting and connected word recognition are studied. These algorithms are called the fixed range method and the local minimum method. The characteristics and properties of these algorithms are discussed. It is shown that, in several simple performance evaluations, the local minimum method performed considerably better then the fixed range method. Explanations of this behavior are given and an optimized method of applying the local minimum algorithm to word spotting and connected word recognition is described.

44 citations


Journal ArticleDOI
TL;DR: Back vowels as targets have been found to give improved classification of the preceding consonants, and a comparison of the result of machine recognition with those of published results on perception tests has been included.
Abstract: In this paper the results of a study of the computer recognition of unaspirated plosives in commonly used polysyllabic words uttered by three different informants are presented. The onglide transitions of the first two formants and their durations have been found to be an effective set of features for the recognition of unaspirated plosives. The rates of transition of these two formants as a feature set have been found to be significantly inferior to the features mentioned earlier. The maximum likelihood method, under the assumption of a normal distribution for the feature set, provides an adequate tool for classification. The assumption of both intergroup and intragroup independence of the features reduces recognition scores. A prior knowledge of target vowels is found necessary for attaining reasonable efficiency. A prior knowledge of voicing manner improves classification efficiency to some extent. The physiological factors responsible for the variation of the recognition score for the various plosives are discussed. For labials and velars the recognition score is very high, nearly 90 percent. An attempt to correlate the dynamics of tongue-body motion with the variations in recognition scores has been made. Back vowels as targets have been found to give improved classification of the preceding consonants. A comparison of the result of machine recognition with those of published results on perception tests has been included. The results are found to be of the same order.

29 citations


Proceedings ArticleDOI
01 Apr 1980
TL;DR: The results of this study indicate that LPC derived parameters perform better than do those derived from cepstral and spectral data.
Abstract: Four automatic speaker recognition techniques were investigated with a contain speech data base to determine their effectiveness in a text independent mode. These four techniques used the correlation of short and long term spectral averages, cepstral measurements of long term spectral averages, orthogonal linear prediction of the speech waveform, and long term average LPC reflection coefficients combined with pitch and overall power. The results of this study indicate that LPC derived parameters perform better than do those derived from cepstral and spectral data. Recognition accuracies of 95% and 93% were obtained for LPC based techniques with 13 seconds of unknown speech. The corresponding recognition accuracies for the cepstral and spectral based systems were 79% and 54% respectively.

22 citations


Journal ArticleDOI
TL;DR: A microprocessor based speech recognition system for the voice control of wheelchair, touch-tone phone, typewriter and environmental control unit, which exhibits less than one percent substitutions and eleven percent rejections with the ten digit set.

21 citations


Book ChapterDOI
01 Jan 1980
TL;DR: An approach to speech recognition which tries to avoid the problems of using a phoneme level of description and treats larger units such as words as patterns with a time axis is described.
Abstract: This is an overview of techniques which have been developed for automatic pattern recognition, with an indication of their relevance to automatic speech recognition. The first part is concerned with data transformations, distance measures, cluster analysis and other aspects of what could be called ‘classic’ mathematical pattern recognition. The second part is more directly concerned with speech, and the term ‘pattern recognition’ is used to denote an approach to speech recognition which tries to avoid the problems of using a phoneme level of description and treats larger units such as words as patterns with a time axis.

12 citations


Patent
16 Jun 1980

9 citations


Proceedings ArticleDOI
01 Apr 1980
TL;DR: In this machine, a new method for connected word recognition, namely inverse dynamic programming (DP) matching, is adopted, and the recognition rate of 99.3% is obtained.
Abstract: Construction and performance of a machine for recognizing spoken connected words are described. In this machine, a new method for connected word recognition, namely inverse dynamic programming (DP) matching, is adopted. Two kinds of DP matching techniques are used in the inverse DP matching, one of which is the usual DP matching and the other is matching performed in a time reverse mode, starting from the end of speech. Combining the similarities obtained by these two kinds of matching, the similarities between input speech and word sequences are computed. Also a technique for rejecting candidates is used in the machine to reduce computation amount. The machine performance is tested by 1400 samples of connected digits. The recognition rate of 99.3% is obtained.

5 citations


Book ChapterDOI
Frederick Jelinek1
01 Feb 1980
TL;DR: Current efforts to recognize continuous (or “connected”) speech are aimed at constructing a voice-excited “typewriter” that automatically transcribes natural speech into ordinary (e.g. English) written form.
Abstract: Current efforts to recognize continuous (or “connected”) speech are aimed at constructing a voice-excited “typewriter” that automatically transcribes natural speech into ordinary (e.g. English) written form. So far, however, only very restricted speech has been recognized. The sentences that are spoken must either be prescribed a priori by an artificial grammar which the experimenter has designed, or else limited by a vocabulary and a restricted area of discourse such as that used in business letters, book reviews, or airline reservation systems. These latter so-called natural tasks are generally much more difficult than the artificial ones (given a fixed vocabulary).

Proceedings ArticleDOI
01 Apr 1980
TL;DR: Experiments show that parameters derived from casual speech improve vowel recognition markedly, and that method e) appears strongest.
Abstract: Frequency normalization of talkers remains a problem in word recognition, especially where new talkers cannot be asked to provide samples (of their vowels, for example) in advance. Several methods were investigated; for each, parameters were derived by calculating their effect on formant histograms derived from casual speech. Methods tried were a) uniform multiplication of frequencies ("stretching" the vocal tract); b) "stretching" each formant region by a different amount; c) combined shift and stretch (affine mapping); d) different affine mappings for different formants (this includes warping each formant as a function of its range); e) warping each formant non-linearly as a function of its distribution. Experiments show that parameters derived from casual speech improve vowel recognition markedly, and that method e) appears strongest.

Journal ArticleDOI
TL;DR: The DP-100 design overcomes two serious handicaps which cause inaccuracies in automatic speech recognition systems, namely the variation in the rate at which words are spoken and the general problem of continuous speech recognition.
Abstract: Considers the Nippon Electric Co.'s DP-100 automatic continuous speech recognition system having an identification capability of approximately 100 words and aimed at application such as routing and inventory control in warehouses. The DP-100 design overcomes two serious handicaps which cause inaccuracies in automatic speech recognition systems, namely the variation in the rate at which words are spoken and the general problem of continuous speech recognition. The author gives details of the design and how these problems are overcome.

Journal ArticleDOI
TL;DR: Discusses the Harpy experimental system using a low-cost minicomputer which is capable of automatic speech recognition with up to 98% accuracy when the vocabulary is restricted to 1011 words, and sentence structure limited to that used in the retrieval of abstracts of documents relating to computer technology.
Abstract: Discusses the Harpy experimental system using a low-cost minicomputer which is capable of automatic speech recognition with up to 98% accuracy when the vocabulary is restricted to 1011 words, and sentence structure limited to that used in the retrieval of abstracts of documents relating to computer technology. The recognition process of the system is described.


Proceedings ArticleDOI
01 Apr 1980
TL;DR: The realization of a speech analyzer plus an LPC synthesizer in a single chip signal processing microprocessor that is able to process both algorithms in real time to create an interactive voice analyzer/response system operating under the control of a microprocessor and with the LPC speech data stored in a ROM.
Abstract: The realization of a speech analyzer plus an LPC synthesizer in a single chip signal processing microprocessor is described. The chip is able to process both algorithms in real time to create an interactive voice analyzer/response system operating under the control of a microprocessor and with the LPC speech data stored in a ROM. The chip is a 16 bit microprocessor specially architectured for signal processing. It features all single cycle instructions with a 300nsec cycle time, and a 12 × 12 bit parallel multiplier pipelined to operate in a single cycle. It can be programmed to perform a wide variety of signal processing functions including speech processing.



Journal ArticleDOI
TL;DR: An access control system using speech samples for an automatic verification of a claimed identity using a suitable pattern recognition algorithm implemented on a minicomputer so that magnetic identity cards can be used as memory to be carried around by the user.
Abstract: An access control system is described using speech samples for an automatic verification of a claimed identity. A hardware speech analysis processor extracts spectral information from the utterance and compares these features with a reference stored under the claimed identity using a suitable pattern recognition algorithm implemented on a minicomputer. This reference can be reduced in storage size per speaker such that magnetic identity cards can be used as memory to be carried around by the user. Thus the number of users is not limited by the system and no large memory is required in the minicomputer. The results of first experiments are reported.


Book ChapterDOI
01 Jan 1980
TL;DR: The paper presents a part of the investigations being carried out by the authors on methods and applications of pattern recognition and image analysis on problems on automatic computer recognition of speech sounds using fuzzy logic.
Abstract: The paper presents a part of the investigations being carried out by the authors on methods and applications of pattern recognition and image analysis. A class of problems on automatic computer recognition of speech sounds using fuzzy logic is being dealt with. The input patterns are usually given as deterministic data although they may contain some fuzziness, and the output decision is also deterministic but the process of classification is fuzzy in nature.

Book ChapterDOI
01 Jan 1980
TL;DR: The possibility of identifying a speaker is examined, when strong differences exist, both in the type of speech and in the voice-recording conditions.
Abstract: The possibility of identifying a speaker is examined, when strong differences exist, both in the type of speech and in the voice-recording conditions