scispace - formally typeset
Search or ask a question

Showing papers on "Speaker recognition published in 1976"


Journal ArticleDOI
Frederick Jelinek1
01 Apr 1976
TL;DR: Experimental results are presented that indicate the power of the methods and concern modeling of a speaker and of an acoustic processor, extraction of the models' statistical parameters and hypothesis search procedures and likelihood computations of linguistic decoding.
Abstract: Statistical methods useful in automatic recognition of continuous speech are described. They concern modeling of a speaker and of an acoustic processor, extraction of the models' statistical parameters and hypothesis search procedures and likelihood computations of linguistic decoding. Experimental results are presented that indicate the power of the methods.

1,024 citations


Journal ArticleDOI
B.S. Atal1
01 Apr 1976
TL;DR: The paper indudes a discussion of the speaker-dependent properties of the speech signal, methods for selecting an efficient set of speech measurements, results of experimental studies illustrating the performance of various methods of speaker recognition, and a comparision of theperformance of automatic methods with that of human listeners.
Abstract: This paper presents a survey of automatic speaker recognition techniques. The paper indudes a discussion of the speaker-dependent properties of the speech signal, methods for selecting an efficient set of speech measurements, results of experimental studies illustrating the performance of various methods of speaker recognition, and a comparision of the performance of automatic methods with that of human listeners. Both text-dependent as well as text-independent speaker-recognition techniques are discussed.

420 citations


Journal ArticleDOI
A.E. Rosenberg1
01 Apr 1976
TL;DR: The techniques, evaluations, and implementations of various proposed speaker recognition systems are reviewed with special emphasis on issues peculiar to speaker verification, especially the distinction between speaker verification and speaker identification.
Abstract: The relation of speaker verification to other pattern-recognition problems in speech is discussed, especially the distinction between speaker verification and speaker identification. The prospects for automatic speaker verification, its settings and applications are outlined. The techniques, evaluations, and implementations of various proposed speaker recognition systems are reviewed with special emphasis on issues peculiar to speaker verification. Two large-scale operating systems using different analysis techniques and applied to different settings are described.

229 citations


Journal ArticleDOI
TL;DR: A very brief survey of recent developments in basic pattern recognition and image processing techniques is presented.
Abstract: Extensive research and development has taken place over the last 20 years in the areas of pattern recognition and image processing. Areas to which these disciplines have been applied include business (e. g., character recognition), medicine (diagnosis, abnormality detection), automation (robot vision), military intelligence, communications (data compression, speech recognition), and many others. This paper presents a very brief survey of recent developments in basic pattern recognition and image processing techniques.

153 citations


Journal ArticleDOI
TL;DR: Characteristic perceptual features for four perceived age decades which could be classified according to pitch, rate of speech, quality, and articulation should be useful in establishing criteria for defining ‘’normal’’ aging speech, planning management strategies for individuals with speech deviances, and in speaker recognition research.
Abstract: Spontaneous speech samples of 46 male speakers between the ages of 25 and 70 years were played to 40 untrained listeners who estimated the speakers’ ages. Samples which showed agreement among untrained listeners were played to 20 trained listeners who described the perceptual features of the given perceived ages via a closed‐response schema. Results showed characteristic perceptual features for four perceived age decades which could be classified according to pitch, rate of speech, quality, and articulation. The features were discussed in light of earlier findings. The features and their weightings by the listeners were derived from an a posteriori schema and samples of spontaneous speech. It was concluded that these features have perceptual importance and should be useful in establishing criteria for defining ’’normal’’ aging speech, planning management strategies for individuals with speech deviances, and in speaker recognition research.Subject Classification: [43]70.30, [43]70.35.

105 citations


Journal ArticleDOI
M. Sambur1
TL;DR: The speaker discrimination potential of the linear prediction orthogonal parameters was formally tested in both a speaker identification and a speaker verification experiment, and an accuracy of 94 percent was achieved for high-quality speech inputs.
Abstract: Recent experiments in speech synthesis have shown that, by an appropriate eigenvector analysis, a set of orthogonal parameters can be obtained that is essentially independent of all linguistic information across an analyzed utterance, but highly indicative of the identity of the speaker. The orthogonal parameters are formed by a linear transformation of the linear prediction parameters, and can achieve their recognition potential without the need of any time-normalization procedure. The speaker discrimination potential of the linear prediction orthogonal parameters was formally tested in both a speaker identification and a speaker verification experiment. The speech data for these experiments consisted of six repetitions of the same sentence spoken by 21 male speakers on six separate occasions. For both identification and verification, the recognition accuracy of the orthogonal parameters exceeded 99 percent for high-quality speech inputs. For telephone inputs, the accuracy exceeded 96 percent. In a separate text-independent speaker identification experiment, an accuracy of 94 percent was achieved for high-quality speech inputs.

61 citations


Book ChapterDOI
01 Jan 1976

51 citations


Journal ArticleDOI
TL;DR: In this paper, the residual energy of linear prediction was used for speaker identification in a recognition system in which the reference data are taken from the intended speaker, and the use of the residual signal energy was shown to be useful for speaker screening in a large population.
Abstract: Recognition of steady-state vowels based on the residual energy of linear prediction was ascertained to be useful for a recognition system in which the reference data are taken from the intended speaker. Sharp speaker selectivity based on a threshold criterion suggests that the use of the residual signal energy may also be useful for speaker identification, especially for speaker screening in a large population.

37 citations


Proceedings ArticleDOI
01 Apr 1976
TL;DR: This report presents results obtained in some experiments on the computer recognition of continuous speech with two simple languages having vocabularies of 11 and 250 words.
Abstract: This report presents results obtained in some experiments on the computer recognition of continuous speech. The experiments deal with two simple languages having vocabularies of 11 and 250 words.

36 citations


Patent
01 Mar 1976
TL;DR: In this article, an electrical voice print is converted into sampled digital values which are converted into corresponding moment invariants (MI) and a comparison of the moment invariant values of a standard phrase uttered by the same person and stored in the storage means against the most recently converted MI determines the degree of correlation.
Abstract: An electrical voice print is converted into sampled digital values which are converted into corresponding moment invariants (MI). A comparison of the moment invariant values of a standard phrase uttered by the same person and stored in the storage means against the most recently converted moment invariants determines the degree of correlation. A high degree of correlation is indicative of a voice match.

27 citations


Journal ArticleDOI
G.M. White1
TL;DR: The nature of some of these advances and the state of the art of automatic speech recognition are explained and an introduction to the state-of-the-art is provided.
Abstract: Research toward mechanical recognition of speech is laying the foundation for significant advances in pattern recognition and artificial intelligence This paper explains the nature of some of these advances and provides an introduction to the state of the art of automatic speech recognition

Journal ArticleDOI
TL;DR: In this paper, a connected digit recognition system that uses a statistical decision approach based on an expanded form of the principle of minimum residual error has been developed, which includes effects of analysis estimation error, the effects of coarticulation, and the effect of speaker variability.
Abstract: A connected digit recognition system that uses a statistical decision approach based on an expanded form of the principle of minimum residual error has been developed. The expanded distance measure includes the effects of analysis estimation error, the effects of coarticulation, and the effects of speaker variability. The recognition system has been tested on six speakers in a speaker dependent mode with recognition accuracies near 100%. It has also been tested with ten new speakers in a speaker independent mode, with a digit recognition accuracy exceeding 95%.

Journal ArticleDOI
R.L. Kashyap1
TL;DR: Using statistical decision theory, various types of tests for speaker verification and identification using only one phoneme segment or the entire utterance are developed.
Abstract: We are interested in determining whether the given utterance comes from a member of a given speaker group or an imposter If it is the former, we are interested in determining the identity of the speaker The only knowledge available is a set of known utterances from the given group of speakers The given utterance is manually divided into phonemes without necessarily ascertaining the identity of phonemes Using statistical decision theory, we will develop various types of tests for speaker verification and identification using only one phoneme segment or the entire utterance We will consider related problems such as the methods of clustering speakers to aid speaker verification, the optimal choice of phonemes for speaker recognition Next we consider the role of speaker variability in speech recognition and recognize its complementarity to the problem of optimal choice of phonemes for speaker recognition We illustrate the efficacy of the various methods developed here by considering the speaker and speech identification problems with three speech data bases

Proceedings ArticleDOI
01 Apr 1976
TL;DR: This paper describes a connected speech understanding system being implemented in Nancy made up of an acoustic recognizer which gives a string of phoneme-like segments from a spoken sentence, a syntactic parser which controls the recognition process, a word recognizer working on words predicted by the parser and a dialog procedure which takes in account semantic constraints in order to avoid some of the errors and ambiguities.
Abstract: This paper describes a connected speech understanding system being implemented in Nancy, thanks to the work done in automatic speech recognition since 1968. This system is made up of four parts : an acoustic recognizer which gives a string of phoneme-like segments from a spoken sentence, a syntactic parser which controls the recognition process, a word recognizer working on words predicted by the parser and a dialog procedure which takes in account semantic constraints in order to avoid some of the errors and ambiguities. Some original features of the system are pointed out : modularily (e.g. the language used is considered as a parameter), possibility of processing slightly syntactically incorrect sentences, ... The application both in data management and in oral control of a telephone center has given very promising results. Work is in progress for generalizing our model : extension of the vocabulary and of the grammar, multi-speaker operation, etc.

R.L. Kashyap1
01 Jan 1976
TL;DR: In this paper, the authors developed various types of tests for speaker verification and identification using only one phoneme segment or the entire utterance using statistical decision theory, and considered the role of speaker variability in speech recognition and recognize its complementarity to the problem of optimal choice of phonemes for speaker recognition.
Abstract: We are interested in determining whether the given utterance comes from a member of a given speaker group or an imposter. If it is the former, we are interested in determining the identity of the speaker. The only knowledge available is a set of known utterances from the given group of speakers. The given utterance is manually divided into phonemes without necessarily ascertaining the identity of phonemes. Using statistical decision theory, we will develop various types of tests for speaker verification and identification using only one phoneme segment or the entire utterance. We will consider related problems such as the methods of clustering speakers to aid speaker verification, the optimal choice of phonemes for speaker recognition. Next we consider the role of speaker variability in speech recognition and recognize its complementarity to the problem of optimal choice of phonemes for speaker recognition. We illustrate the efficacy of the various methods developed here by considering the speaker and speech identification problems with three speech data bases.

Proceedings ArticleDOI
E. Bunge1
12 Apr 1976
TL;DR: A new modular speaker recognition system consisting of a st of real?time speech analysis processors and a pattern recognition software package is described, results of which are being discussed.
Abstract: Summary form only given, as follows. This paper describes a new modular speaker recognition system consisting of a st of real?time speech analysis processors and a pattern recognition software package. Within a government sponsored research project, combinations of different speech analysis procedures and different pattern recognition algorithms are compared in order to find optimal subsystems, to be applied to security systems or law enforcement, for given boundary conditions. In order to find the influence of different techniques, distance measures, quantisation band distortions on the recognition rate of given data base (2,500 utterances), a study has been carried out, results of which are being discussed.

Proceedings ArticleDOI
01 Apr 1976
TL;DR: The grammars are proposed which generate all possible reference signals of speech using a finite number of elementary reference signals and a finite system of rules to solve the recognition problem by using the dynamic programming method.
Abstract: The grammars are proposed which generate all possible reference signals of speech using a finite number of elementary reference signals and a finite system of rules. These grammars take into account the essential factors of variability of speech signals: coarticulation of sounds, nonlinear change of the rate and the intensity of pronouncing. The recognition of a shown speech signal consists in finding among all reference signals one which has the greatest resemblance to the shown signal and indicating the sequence of phonemes, syllables or words that corresponds to this reference signal. The recognition problem is solved by using the dynamic programming method. The learning consists in evaluating the parameters of the grammars: alphabet of elementary reference signals, acoustic-phonetic transcriptions of words and so on. Some algorithms for speech recognition are proposed: recognition of words, recognition of connected speech composed from the words of a finite vocabulary, speech recognition by phonemes without vocabulary restriction, recognition of words and phrases by phonemes. Good experimental results were obtained for all algorithmes.

Proceedings ArticleDOI
12 Apr 1976
TL;DR: This paper describes an implementation of a speaker independent system which can recognize connected digits and evaluates the accuracy of the system in segmenting and recognizing digit strings.
Abstract: This paper describes an implementation of a speaker independent system which can recognize connected digits. The overall recognition system consists of two separate but inter-related parts. The function of the first part of the system is to segment the digit string into the individual digits which comprise the string; the second part of the system then recognizes the individual digits based on the results of the segmentation. To evaluate the accuracy of the system in segmenting and recognizing digit strings a series of experiments was conducted. Using high quality recordings from a soundproof booth the segmentation accuracy was found to be about 99%, and the recognition accuracy was about 91% across 10 speakers (5 male, 5 female). With recordings made in a noisy computer room the segmentation accuracy remained close to 99%, and the recognition accuracy was about 87% across another group of 10 speakers (5 male, 5 female).

Proceedings ArticleDOI
M. Sambur1
01 Apr 1976
TL;DR: The speaker discrimination potential of the linear prediction orthogonal parameters were formally tested in both a speaker recognition and a speaker verification experiment, and the identification accuracy of the orthogsonal parameters exceeded 99%.
Abstract: Recent experiments in speech synthesis have shown, that by an appropriate eigenvector analysis, a set of orthogonal parameters can be obtained that is essentially independent of all linguistic information across an analyzed utterance but highly indicative of the identity of the speaker. The orthogonal parameters are formed by a linear transformation of the linear prediction parameters, and can achieve their recognition potential without the need of any time normalization procedure. The speaker discrimination potential of the linear prediction orthogonal parameters were formally tested in both a speaker recognition and a speaker verification experiment. The speech data for these experiments consisted of six repetitions of the same sentence spoken by 21 male speakers on six separate occasions. For both recognition and verification, the identification accuracy of the orthogonal parameters exceeded 99%. In a separate text-independent speaker recognition experiment, an accuracy of 94% was achieved.

Journal ArticleDOI
R. Kimball1, M. Rothkopf
TL;DR: The results show that it is possible to reliably predict when the utterance classifier has made the correct decision.
Abstract: A variety of automatic speech recognition experiments have been executed that support a measure of confidence for utterance classification. The confidence measure tested was the ratio of the two best "Hamming distance" scores obtained in matching utterance templates with an unknown utterance. The results show that it is possible to reliably predict when the utterance classifier has made the correct decision.

Proceedings ArticleDOI
01 Apr 1976
TL;DR: The recognition system of spoken words using the restricted number of learning samples is described, where a half of the twenty words to be recognized is used as learning samples and the recognition rate is obtained in the case of the optimum learning sample.
Abstract: The recognition system of spoken words using the restricted number of learning samples is described. The learning samples are composed of a part of the whole words to be recognized and are used to derive the reference patterns of phonemic spectrums needed in the recognition system. Four kinds of algorithms for selecting the optimum set of words constituting the learning samples are proposed and tested. Recognition test of twenty words is done for 20 speakers. By the use of a half of the twenty words as learning samples, the recognition rate of 98.6 % is obtained in the case of the optimum learning sample.

01 Jan 1976
TL;DR: The LISTEN (LIMITED SPOKEN TEXT ENcODER) system which automatically recognizes spoken words in isolation for a limited vocabulary is developed, a subpart of the LITHAN (LlsTEN-THINK-ANsWER) speech understanding system.
Abstract: SUMMARY We have developed the LISTEN (LIMITED SPOKEN TEXT ENcODER) system which automatically recognizes spoken words in isolation for a limited vocabulary. This system is a subpart of the LITHAN (LlsTEN-THINK-ANsWER) speech understanding system 1 ,2). This makes a great feature of the recognition in real time on a mini-computer. Owing to this development, it became capable of trying the various experiments on many speech data. There are other two features in this system: One is to learn the speaker differences by preliminary uttered vowels. The other is that the system is composed of two stages, i.e., phoneme recognition and word recognition. In the latter stage, the effect of coarticulation is taken into account. The system performance obtained the recognition rate of 98.0% on experiments of spoken digits that were uttered by 40 male adults. And also the system obtained the rate of 98.4% on preliminary learning by some spoken digits. When no learning procedure, however, the rate decreased to 95.8%.

Proceedings ArticleDOI
12 Apr 1976
TL;DR: A method based on linear prediction (LP) analysis is described which yields features that are more speaker dependent than the usual linear predictor coefficients (LPC) through cascade realization of digital inverse filtering (DIF) for speech signals.
Abstract: Quest for new speaker dependent features is a constant problem in the design of automatic speaker recognition systems. In speech, information about the speaker usually arises along with the semantic information which makes its independent use difficult. In this paper, a method based on linear prediction (LP) analysis is described which yields features that are more speaker dependent than the usual linear predictor coefficients (LPC). In this method the LPC contours are obtained through cascade realization of digital inverse filtering (DIF) for speech signals. A low order (2-4) DIF removes the gross spectral characteristics such as the large dynamic range and some significant peaks which tend to mask the weaker formants. Visual comparison of the contours and a preliminary statistical analysis indicate that the LPC contours obtained by processing the output signal of the first stage contain better features for speaker dependency than the direct LPC contours.

H Yilmaz, L Ferber, J Shao, W. Park, H Kellett 
01 Sep 1976
TL;DR: A speaker-independent speech recognition system was constructed which implements a solution to one of the most difficult and most important problems in speech, that of speaker-to-speaker variability, based on a theory of speech perception which is consistent with the linguistic universals of world languages.
Abstract: : A speaker-independent speech recognition system was constructed which implements a solution to one of the most difficult and most important problems in speech, that of speaker-to-speaker variability. The system, which recognizes words in naturally spoken, uncontrolled text, is based on a theory of speech perception which is consistent with the linguistic universals of world languages. The representation is invariant under certain adaptive transformations which render the speech speaker-independent. The problem of speaker-to-speaker variability was solved by reducing the multi-speaker problem to a single-speaker proposition. A single speaker may train the system to recognize a given vocabulary. A subsequent speaker need speak only a predetermined sentence or word sequence to transform the system for operation on his voice. Performance has been evaluated using constraint-free speech, spoken in natural word sequences.

Journal ArticleDOI
TL;DR: Results on the effect of training the phonological variation and acoustic processor models used in a speech recognition system developed at the IBM Research Center are presented.
Abstract: Statistical models for characterizing various aspects of speech production and processing are discussed. Such models are being used effectively in speech recognition systems. The basic philosophy and rationale behind the use of such models is presented. One major advantage of these models is that they can be trained automatically by the use of a parameter estimation algorithm. This makes training the system for new speakers a fairly simple task. This training algorithm and other algorithms that can be used in conjunction with statistical models are discussed. In particular, results on the effect of training the phonological variation and acoustic processor models used in a speech recognition system developed at the IBM Research Center are presented. Some ideas relating to statistical modeling of natural language are also discussed.

Proceedings ArticleDOI
01 Apr 1976
TL;DR: This paper presents the preliminary study of a new approach to Automatic Speech Recognition using Acoustic-Phonetic analysis and Statistical Pattern Recognition techniques.
Abstract: This paper presents the preliminary study of a new approach to Automatic Speech Recognition using Acoustic-Phonetic analysis and Statistical Pattern Recognition techniques. Implementation of an On-Line, Adaptive, Speaker-Independent Word Recognition System is also described to illustrate the approach.

Proceedings ArticleDOI
01 Apr 1976
TL;DR: Results of this study indicate that one of the zero-crossing methods appears to have the potential for more accurate vowel recognition than the others.
Abstract: Earlier results have determined mathematical interrelationships between several zero-crossing analysis methods which have been applied to automatic speech recognition. In the present paper, some initial work aimed at determining the relative applicability. of these zero-crossing analysis methods for speech recognition is presented. Toward this end, two simulation studies with English vowels are described. One study is aimed at determining the relative discriminability of the methods for vowel recognition. The other study is aimed at determining the noise vulnerability of the zero-crossing analysis methods. Results of this study indicate that one of the zero-crossing methods appears to have the potential for more accurate vowel recognition than the others.

Journal ArticleDOI
King-Sun Fu1
TL;DR: Applications of pattern recognition include character recognition, target detection, medical diagnosis, analysis of biomedical signals and images, remote sensing, identification of human faces and fingerprints, reliability, speech recognition and understanding, and machine parts recognition.
Abstract: During the past fifteen years, there has been a considerable growth of interest in problems of pattern recognition. This interest has created an increasing need for theoretical methods and experimental software and hardware for use in the design of pattern recognition systems. A number of books have been published on this subject,1-16and some special pattern recognition machines have been designed and built for practical use. Applications of pattern recognition include character recognition,12target detection, medical diagnosis, analysis of biomedical signals and images, remote sensing, identification of human faces and fingerprints, reliability,17socio-economics,18speech recognition and understanding,19and machine parts recognition.