scispace - formally typeset
Search or ask a question

Showing papers on "Speaker recognition published in 1977"



01 Aug 1977
TL;DR: Two decision algorithmic methods using weighted-distance functions and property sets are developed and implemented with the optimum size of the training set on a large number of Telugu speech sounds with a recognition score of 82 percent for vowels and 97 percent for the speaker.
Abstract: Some applications based on the theory of fuzzy sets in problems of computer recognition of vowels and identifying the person from his spoken words using only the first three formants (F 1, F2, and F3) of the unknown utterance are presented. Two decision algorithmic methods using weighted-distance functions and property sets are developed and implemented with the optimum size of the training set on a large number of Telugu (an important Indian language) speech sounds with a recognition score of 82 percent for vowels and 97 percent for the speaker.

187 citations


Journal ArticleDOI
TL;DR: It was demonstrated that the between-to-within speaker variance ratio was significantly increased by performing long-term averaging of the parameter sets, and the reflection coefficient averages for k 2 and k 6 were shown to produce the highest variance ratios.
Abstract: The potential benefits of long-term parameter averaging for speaker recognition were investigated. Parameters studied were pitch, gain, and reflection coefficients. Parameter variability was computed over various averaging lengths from one frame averaging (in effect, no averaging) to 1000 frame averaging (about 70 s of speech). It was demonstrated that the between-to-within speaker variance ratio, measured over several speakers, was significantly increased by performing long-term averaging of the parameter sets. The reflection coefficient averages for k 2 and k 6 , respectively, were shown to produce the highest variance ratios.

96 citations


Journal ArticleDOI
TL;DR: A summary of the state-of-the-art of automatic speech recognition (ASR) and its relevance to military applications and a number of unsolved problems and techniques which need to be perfected before the solutions to anumber of military applications of the ASR field are possible.
Abstract: The objective of this paper is to provide a summary of the state-of-the-art of automatic speech recognition (ASR) and its relevance to military applications. Until recently, speech recognition had its widest application in the development of vocoders for narrow-band speech communications. Presently, research in ASR has been accelerated for military tasks such as command and control, secure voice systems, surveillance of communication channels, and others. Research in voice control technology and digital narrow-band systems are of special interest. Much of the emphasis of today's military-supported research is to reduce to practice the current state of knowledge of ASR, as well as directing research in such a way as to have future military relevance. In coordination with the above-mentioned emphasis in military-supported research, this paper is divided into two major sections. The first section presents discussion of the state-of-the-art and problems in the various subareas of the ASR field. The second section presents a number of unsolved problems and techniques which need to be perfected before the solutions to a number of military applications of the ASR field are possible.

51 citations


Journal ArticleDOI
TL;DR: A standard for comparing the performance of different recognizers on arbitrary vocabularies based on a human word recognition model is developed, which allows recognition results to be normalized for comparison according to two intuitively meaningful figures of merit.
Abstract: Although automatic word recognition systems have existed for some twenty-five years there is still no suitable standard for evaluating their relative performances. Currently, the merits of two systems cannot be meaningfully compared unless they have been tested with at least the same vocabulary or, preferably, with the same acoustic samples. This paper develops a standard for comparing the performance of different recognizers on arbitrary vocabularies based on a human word recognition model. This standard allows recognition results to be normalized for comparison according to two intuitively meaningful figures of merit: 1) the noise level necessary to achieve comparable human performance and 2) the deviation of the pattern of confusions from human performance. Examples are given of recognizers evaluated in this way, and the role of these performance measures in automatic speech recognition and other related areas is discussed.

33 citations


Book
01 Jan 1977

26 citations


Proceedings ArticleDOI
01 Jan 1977
TL;DR: Results are presented of evaluations for speakers using their own stored reference patterns, the reference patterns of other speakers and reference patterns averaged over several speakers for speaker dependent word recognizer and syntax analysis.
Abstract: A speech recognition system has been implemented which accepts reasonably natural English sentences spoken as isolated words. The major components of the system are a speaker dependent word recognizer and a syntax analyzer. The set of sentences selected for investigation is intended for use as requests in an automated flight information and reservation system. Results are presented of evaluations for speakers using their own stored reference patterns, the reference patterns of other speakers and reference patterns averaged over several speakers. For speakers using their own reference pattern the median word recognition error rate fell from 11.7% to 0.4% with the use of syntax analysis.

17 citations


Proceedings ArticleDOI
01 May 1977
TL;DR: A third method has been devised to allow the Harpy speech recognition system to dynamically learn the speaker dependent parameters while using the system.
Abstract: The Harpy speech recognition system works optimally when it "knows" the speaker, i,e. when it has learned the speaker dependent characteristics (speaker dependent parameters) of the speaker. There are three methods of learning these parameters. One way is to generate them from a set of training data which covers all the allophones that occur in the task language. A second method is to use "speaker independent" parameters with a resulting reduction in accuracy performance. Since it is inconvenient for a "new" speaker to say a set of training data before using the system and the low accuracy with speaker independent parameters is unacceptable, a third method has been devised to allow the system to dynamically learn the speaker dependent parameters while using the system. The new speaker starts with a set of speaker independent parameters. These parameters are then altered after correct recognition (which can be forced if necessary) to match the spoken utterance.

17 citations


Journal ArticleDOI
TL;DR: The authors examined utterances intended by a foreign speaker to be speech acts, but which are unsuccessful, and identified the following areas of possible failure in the purported speech act pattern: focus, semantic redundance, prosody and listener expectation.

8 citations


Proceedings ArticleDOI
01 May 1977
TL;DR: A comparative study of two syntactic models used concurrently in this experiment : a classical top-down model and a generalized bottom-up, island-driven model are presented and the generalization of this model to natural language understanding system is discussed.
Abstract: The Signal Processing and Pattern Recognition Group in Nancy is developing the MYRTILLE System for recognizing and understanding sentences in artificial and natural languages. A model for the oral control of a telephone exchange with an artificial language was proposed and described at ICASSP 76. In this paper, we first present a comparative study of two syntactic models that we have used concurrently in this experiment : a classical top-down model and a generalized bottom-up, island-driven model. Then, we discuss the generalization of this model to natural language understanding system (e.g. for the oral consultation of a data base). Such a generalization involves a very different structure, which is described in details.

7 citations


Proceedings ArticleDOI
01 May 1977
TL;DR: Acoustic-phonetic conversion is probably the most critical step in continuous speech recognition and the transitional information can be used as follows, in order to improve the results.
Abstract: Acoustic-phonetic conversion is probably the most critical step in continuous speech recognition. The transitional information can be used as follows, in order to improve the results. First we contitute a lexicon of the phoneme steady-state spectra and a lexicon of all the transitions (diphones), each one being characterized by a"differential spectrum". The unknown continuous speech wave is segmented into quasi steady-state and transitional segments ; the labelling of the quasi steady-state segments admits several candidates. The transitional segment between two quasi steady-state spectra is then compared to the diphones of the lexicon selected from the combination of the surrounding possible phoneme labels. Actually, only the comparisons which are compatible with the recent past of the message are made. When working as a phoneme-vocoder, the whole procedure needs about 3x real-time, without any optimization.

Proceedings ArticleDOI
01 May 1977
TL;DR: A system is described for the automatic comparison of speakers given short samples of their speech, and a useful level of recognition performance has been obtained using a total of 154 20s samples of read speech from thirteen typical speakers of British English.
Abstract: A system is described for the automatic comparison of speakers given short samples of their speech. The method does not depend on knowing what is being said, and is to a large extent independent of the degradations likely to be suffered by the speech during transmission. A small computer has been used to generate statistics on fundamental frequency and spectral shape information produced by a real-time cepstrum processor. Fundamental frequency is intrinsically resistant to most transmission degradations, and the spectral statistics taken are independent of linear spectral shaping. A useful level of recognition performance has been obtained using a total of 154 20s samples of read speech from thirteen typical speakers of British English.

Journal ArticleDOI
01 Jan 1977-Frequenz
TL;DR: An account is given of the pattern recognition and classification procedures applied, the results obtained with the AUROS system so far, and future directions of research.
Abstract: The AUROS system is one of the most recently developed systems for speaker recognition. It allows flexible combination of parameter extraction and classification procedures in order to conduct experiments under realistic environmental conditions. An account is given of the pattern recognition and classification procedures applied, the results obtained with the AUROS system so far, and future directions of research. Übersicht: Das AUROS System ist eines der zuletzt entstandenen Experimentalsyetcme zur Sprechererkennung. Es gestattet die flexible Kombination von Parameterextraktionsund Klassifizierungsverfahren, um Experimente unter realistischen Randbedingungen durchzuführen. Es werden die verwendeten Mustererkennungsund Klassifizierungsverfahren und die bisher mit dem AUROS System erzielten Ergebnisse und die geplante Forschungsrichtung beschrieben. Für die Dokumentation: Sprechererkennung / Sprecherverifizierung / Sprecheridentifizierung / Sprachsignalverarbeitung Akustische Mustererkennung

Proceedings ArticleDOI
E. Bunge1, U. Hofker, P. Jesorsky, B. Kriener, D. Wesseling 
09 May 1977
TL;DR: The structure and modules of the speaker recognition system, results of comparative experiments are being discussed and real-time speech signal analysis, mainly based on two-stage statistical measurements in combination with minimum risk classifiers, allows code-word related as well as text-independent speaker verification and identification, both with very high accuracy.
Abstract: Within a government sponsored research program various methods of speech analysis techniques and pattern recognition methods have been applied to the speaker identification and verification problem. For this purpose a modular speaker recognition system has been developed to be used for comparative studies. Real-time speech signal analysis, mainly based on two-stage statistical measurements in combination with minimum risk classifiers allows code-word related as well as text-independent speaker verification and identification, both with very high accuracy for male and female voices. This paper describes the structure and modules of the speaker recognition system, results of comparative experiments are being discussed.

Proceedings ArticleDOI
01 May 1977
TL;DR: Studies made on designing speaker recognition schemes using an interactive signal processing facility that consists of a HP 2100S minicomputer based Fourier Analyzer System are described.
Abstract: In this paper, we describe studies made on designing speaker recognition schemes using an interactive signal processing facility. The facility consists of a HP 2100S minicomputer based Fourier Analyzer System. This facility accepts speech input and provides a display of results at various stages of recognition procedure. Such a system permits the use of large design and test data sets with consequent advantages in performance evaluation. We describe attempts on isolating better features purely from speaker recognition point of view. Feature selection criteria, choice of code words, design of classifiers and performance assessment are discussed.

Proceedings ArticleDOI
01 May 1977
TL;DR: A mathematical formulation of an automatic speaker verification scheme as a two class pattern recognition problem is presented and the bound on the performance of anautomatic speaker identification system as a cascade of independent verification systems is derived.
Abstract: A mathematical formulation of an automatic speaker verification scheme as a two class pattern recognition problem is presented. Expressions for the expected values and the variance of the design-set and the test set error rates are derived. The bound on the performance of an automatic speaker identification system as a cascade of independent verification systems is derived. The implications of these results in the design of an automatic speaker recognition system are discussed.

Proceedings ArticleDOI
09 May 1977
TL;DR: The results have been encouraging and the combined identification power of these vectors, both in laboratory and field ("simucrimes") experiments, currently is under evaluation.
Abstract: A four vector computerized semi-automatic speaker identiflcation system has been developed. The ultimate purpose of this machine approach is to permit field identification of unknown talkers -- rather than to carry out speaker verification tasks in controlled environments. The four vectors presently utilized consist of 11-45 parameters each; they include analysis of 1) fundamental frequency (17-25 parameters) ; 2) power spectra (11-23 parameters) ; 3) vowel formants (32-45 parameters), and 4) temporal features (15-24 parameters). Vectors 1, 2, and 4 have been subjected to considerable laboratory analysis -- for both large and small populations and under both ideal and distorted conditions. Some testing of vector 3 and of combinations of vectors also has been conducted. The results have been encouraging and the combined identification power of these vectors, both in laboratory and field ("simucrimes") experiments, currently is under evaluation.

Proceedings ArticleDOI
01 May 1977
TL;DR: A voice recognition experiment for speech understanding based on the fact that a voice recognition system can have a big improvement by exploiting the intrinsic redundancy of the spoken natural language, that is by delaying every decision to the highest available information level.
Abstract: This paper presents a voice recognition experiment for speech understanding. The approach is based on the fact that a voice recognition system can have a big improvement by exploiting the intrinsic redundancy of the spoken natural language, that is by delaying every decision to the highest available information level. Namely any decision taken at phoneme level (acoustic level) carries the loss of a certain amount of information. The linguistic recognition system, we have so far developed, is based on a linguistic model, where decisions are taken only at the full message level. This approach follows the same basic idea of a system now successfully working for Mail Address Optical Recognition (1). Such a system has been successfully improved via EMMA, a spe cial network of associative minicomputers, consisting, for that application, in about 60 processors.

01 Jan 1977
TL;DR: Research capabilities on speech understanding, speech recognition, and voice control are described and research activities and the activities which involve text input rather than speech are discussed.
Abstract: Research capabilities on speech understanding, speech recognition, and voice control are described. Research activities and the activities which involve text input rather than speech are discussed.

01 Jun 1977
TL;DR: The effectiveness of the inverse filtering led to the conclusion that the major problem was the face mask worn by the subjects, causing a variable element in the acoustic transmission path, and additional work will be required to eliminate face mask effects.
Abstract: : The effects of g-force stress on human voice patterns were investigated with the objective of finding means for making isolated word recognition word devices work in the fighter aircraft cockpit environment. Data were taken in a human centrifuge with SCOPE Electronics Inc's Voice Data Entry System (VDETS) used to prompt and pace the subjects. Data were subsequently digitized and stored for analysis and recognition experiments using the VDETS algorithm with a number of variations. Recognition performance on the centrifuge data was initially poor. Means were found for improving it substantially through modifications to the VDETS algorithm and through preprocessing techniques. VDETS modifications included increased coding resolution, improved segmentation techniques and provision for multimode training. Breathing noise elimination and inverse filtering preprocessing routines were effective. Variations in spectral characteristics with g-force stress were found, but no consistent pattern was discerned. The effectiveness of the inverse filtering led to the conclusion that the major problem was the face mask worn by the subjects, causing a variable element in the acoustic transmission path. Additional work will be required to eliminate face mask effects. (Author)