Showing papers on "Speaker recognition published in 1979"

PDF

Open Access

Journal Article•DOI•

Two-level DP-matching--A dynamic programming-based pattern matching algorithm for connected word recognition

[...]

H. Sakoe¹•Institutions (1)

01 Dec 1979-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: A general principle of connected word recognition is given based on pattern matching between unknown continuous speech and artificially synthesized connected reference patterns and Computation time and memory requirement are both proved to be within reasonable limits.

...read moreread less

Abstract: This paper reports a pattern matching approach to connected word recognition. First, a general principle of connected word recognition is given based on pattern matching between unknown continuous speech and artificially synthesized connected reference patterns. Time-normalization capability is allowed by use of dynamic programming-based time-warping technique (DP-matching). Then, it is shown that the matching process is efficiently carried out by breaking it down into two steps. The derived algorithm is extensively subjected to recognition experiments. It is shown in a talker-adapted recognition experiment that digit data (one to four digits) connectedly spoken by five persons are recognized with as high as 99.6 percent accuracy. Computation time and memory requirement are both proved to be within reasonable limits.

...read moreread less

289 citations

Proceedings Article•DOI•

Speaker independent recognition of isolated words using clustering techniques

[...]

Lawrence R. Rabiner¹, Stephen E. Levinson², Aaron E. Rosenberg², Jay G. Wilpon²•Institutions (2)

Bell Labs¹, Alcatel-Lucent²

01 Apr 1979

TL;DR: In this paper, a speaker independent, isolated word recognition system is proposed which is based on the use of multiple templates for each word in the vocabulary, which are obtained from a statistical clustering analysis of a large data base consisting of 100 replications of each word (i.e. once by each of 100 talkers).

...read moreread less

Abstract: A speaker independent, isolated word recognition system is proposed which is based on the use of multiple templates for each word in the vocabulary. The word templates are obtained from a statistical clustering analysis of a large data base consisting of 100 replications of each word (i.e. once by each of 100 talkers). The recognition system, which uses telephone recordings, is based on an LPC analysis of the unknown word, dynamic time warping of each reference template to the unknown word (using the Itakura LPC distance measure), and the application of a K-nearest neighbor (KNN) decision rule to lower the probability of error. Results are presented on two test sets of data which show error rates that are comparable to, or better than, those obtained with speaker trained, isolated word recognition systems.

...read moreread less

120 citations

Journal Article•DOI•

Text-independent speaker recognition from a large linguistically unconstrained time-spaced data base

[...]

J. Markel, S. Davis

01 Feb 1979-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: In this article, a large data set consisting of over 36 hours of unconstrained extemporaneous speech, from 17 speakers, recorded over a period of more than three months, was analyzed to determine the effectiveness of long-term average features for speaker recognition.

...read moreread less

Abstract: A very large data base consisting of over 36 h of unconstrained extemporaneous speech, from 17 speakers, recorded over a period of more than three months, has been analyzed to determine the effectiveness of long-term average features for speaker recognition. Results are shown to be strongly dependent on the voiced speech averaging interval L e . Monotonic increases in the probability of correct identification and monotonic decreases in the equal error probability for speaker verification were obtained as L e increased, even with substantial time periods between successive sessions. For L e corresponding to approximately 39 s of speech, text-independent results (no linguistic constraints embedded into the data base) of 98.05 percent for speaker identification and 4.25 percent for equal error speaker verification were obtained.

...read moreread less

53 citations

Book•

Automatic Speech and Speaker Recognition

[...]

N. Rex Dixon, Thomas B. Martin

01 Aug 1979

30 citations

Journal Article•DOI•

Inference of a knowledge source for the recognition of nasals in continuous speech

[...]

R. De Mori¹, Ryszard Gubrynowicz, Pietro Laface•Institutions (1)

University of Turin¹

01 Oct 1979-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: Results obtained for four male speakers show how accounting for coarticulation effects gives substantially better performances than previous approaches.

...read moreread less

Abstract: A system for the automatic recognition of bilabial /m/ and alveolar /n/ in vowel-consonant-vowel utterances extracted from continuous speech is presented. It is based on a syntactic pattern recognition approach and the use of fuzzy relations for evaluating phonemic hypotheses. The knowledge source, based on very simple transition networks with associated simple semantic rules, is inferred from experiments. Results obtained for four male speakers are presented together with an acoustic-phonetic motivation of the approach used. These show how accounting for coarticulation effects gives substantially better performances than previous approaches.

...read moreread less

20 citations

Journal Article•DOI•

Inhibition of the automatic storage of speaker's voice.

[...]

Ralph E. Geiselman¹•Institutions (1)

University of California, Los Angeles¹

01 May 1979-Memory & Cognition

TL;DR: Results from the present experiment suggest that subjects have the option to prevent the speaker’s-voice attribute from being stored with the contents of what is said when such processing would interfere with other cognitive operations.

...read moreread less

Abstract: The voice-connotation hypothesis of Geiselman and Bellezza (1976, 1977) states that a speaker’s voice is sometimes remembered without intent because the connotation of the voice automatically influences the meaning of what is said. Results from the present experiment suggest that subjects have the option to prevent the speaker’s-voice attribute from being stored with the contents of what is said when such processing would interfere with other cognitive operations.

...read moreread less

18 citations

Proceedings Article•DOI•

A new system for continuous speech recognition - preliminary results

[...]

Stephen E. Levinson¹, A. Rosenberg•Institutions (1)

Bell Labs¹

01 Apr 1979

TL;DR: A speaker dependent system for recognizing carefully articulated continuous speech that accepts English sentences composed from a 127 word vocabulary appropriate to an airline information reservation task and achieves 75% sentence recognition.

...read moreread less

Abstract: A speaker dependent system for recognizing carefully articulated continuous speech is described. The system accepts English sentences composed from a 127 word vocabulary appropriate to an airline information reservation task. The system is controlled by a finite state parser which generates word candidates and established their temporal locations in hypothetical sentences. The word candidates are evaluated by an LPC distance measure and a dynamic programming algorithm which nonlinearly time aligns isolated word reference templates with the input speech stream. The input is recognized as the hypothetical sentence having the lowest distance according to a well-defined criterion. In a preliminary test based on 100 sentences spoken over dialed up telephone lines by two male talkers, 90% word accuracy, resulting in 75% sentence recognition, was achieved.

...read moreread less

15 citations

Journal Article•DOI•

Automatic speaker identification for a large population

[...]

H. Dante¹, V. Sarma¹•Institutions (1)

Indian Institute of Science¹

01 Jun 1979-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: Analysis and design of a two-stage pattern classifier for speaker identification in a population of 30 is considered and a subset of the total feature set is given that gives an absolute identification of the speaker's identity.

...read moreread less

Abstract: Design of speaker identification schemes for a small number of speakers (around 10) with a high degree of accuracy in a controlled environment is a practical proposition today. When the number of speakers is large (say, above 20 or 30), many of these schemes cannot be directly utilized as both recognition error and computation time increase monotonically with population size. A multistage classification technique gives better results when the number of speakers is large. Such a scheme may be implemented as a decision tree classifier in which the final decision is made only after a predetermined number of stages. In the present paper, analysis and design of a two-stage pattern classifier is considered. At the first stage a large number of classes, to which the given pattern cannot belong, is rejected. This is to be done using a subset of the total feature set. Also, the accuracy of such a rejection process must be very high, consistent with the overall accuracy desired. This initial classification gives a subset of the total classes, which has to be carefully considered at the next stage utilizing the remaining features for an absolute identification of the class label (the speaker's identity). The procedure is illustrated by designing and testing a two-stage classifier for speaker identification in a population of 30.

...read moreread less

13 citations

Journal Article•DOI•

More People Are Talking to Computers as Speech Recognition Enters the Real World

[...]

Arthur L. Robinson

16 Feb 1979-Science

12 citations

Proceedings Article•DOI•

Toward the development of practical methods of evaluating speaker recognizability

[...]

W. Voiers

01 Apr 1979

TL;DR: An intensive experimental screening of more than 550 potential voice descriptors permitted the development of a sensitive, comprehensive voice-rating form which was used by an experienced listening crew to characterize voice samples from 80 male speakers.

...read moreread less

Abstract: The development of practical methods of predicting speaker recognizability in communication systems has had to await the development of an adequate perceptual voice taxonomy. Previous efforts have been hampered by inadequate voice samples and inappropriate scaling techniques. An intensive experimental screening of more than 550 potential voice descriptors permitted the development of a sensitive, comprehensive voice-rating form which was used by an experienced listening crew to characterize voice samples from 80 male speakers. Factor analysis was used to identify the elementary perceptual parameters of individual differences in speech and to classify voices in a perceptual voice trait space. Implications for the development of tests of speaker recognizability are discussed.

...read moreread less

11 citations

Journal Article•DOI•

Memory and decision in speaker recognition

[...]

Roger W. Brown¹•Institutions (1)

University of Edinburgh¹

01 Nov 1979-International Journal of Human-computer Studies \/ International Journal of Man-machine Studies

TL;DR: Two categorizations are presented of aspects of the speaker recognition field; the first examines the memory systems involved in experimental tasks and is based on a critical account of the taxonomy proposed by Bricker & Pruzansky (1976).

...read moreread less

Abstract: Two categorizations are presented of aspects of the speaker recognition field. The first examines the memory systems involved in experimental tasks and is based on a critical account of the taxonomy proposed by Bricker & Pruzansky (1976). The second deals with the decisions which listeners are required to make in the experimental situation. Finally, the differences between the experimental situation and the real world are examined.

...read moreread less

Book•

Automatic speech & speaker recognition

[...]

N. Rex Dixon, Thomas B. Martin

01 Jan 1979

Patent•

Standard pattern input system for voice identification

[...]

Riyouhei Nakatsu, Hiromi Nagashima

17 Aug 1979

Proceedings Article•DOI•

An approach to speaker normalization in an automatic speech recognition system

[...]

J. Jaschul

01 Apr 1979

TL;DR: Under the present restriction to vowel spectra adaptation methods by spectral amplitude weighting and by spectral shifting are investigated, by a special method it was enabled to adapt test spectra class specifically.

...read moreread less

Abstract: An automatic speech recognition system based on the reference set of a single speaker can be extended for use by several speakers by applying appropriate preprocessing transformations. These transformations adapt the incoming patterns of a new speaker to the patterns of the reference set. Under the present restriction to vowel spectra adaptation methods by spectral amplitude weighting and by spectral shifting are investigated. By a special method it was enabled to adapt test spectra class specifically.

...read moreread less

Proceedings Article•DOI•

Speech characterization from a rough spectral analysis

[...]

J. Lienard

01 Apr 1979

TL;DR: It is proposed to characterize the speech short-term spectrum with a reduced number of parameters (4 to 7) computed from a rough spectral analysis that permits a correct classification of the steady-state French speech sounds pronounced by different speakers.

...read moreread less

Abstract: Tracking and identifying the formants in order to perform speech recognition is a time-consuming, error full and speaker-dependent operation. It is proposed to characterize the speech short-term spectrum with a reduced number of parameters (4 to 7) computed from a rough spectral analysis. These parameters permit a correct classification of the steady-state French speech sounds (vowels, including nasals, and unvoiced fricatives) pronounced by different speakers. A word recognition experiment based on the same parameters gives good results with words differing from each other by one phoneme only (single speaker, one learning pass).

...read moreread less

Proceedings Article•DOI•

Some factors influencing the performances of a speaker recognition system based on LPC

[...]

G. Mian¹•Institutions (1)

University of Padua¹

01 Apr 1979

TL;DR: The aim of the present work was to evaluate the performances of an automatic speaker recognition system, based on LPC, on the same speech material recorded in three different conditions: on a quiet room, from dialled up telephones lines via direct hookup and via a suction cup tap.

...read moreread less

Abstract: Linear prediction parameters are critically dependent upon the short-term spectrum of speech and therefore to noises and distorsions introduced by transmission and recording systems. The aim of the present work was to evaluate the performances of an automatic speaker recognition system, based on LPC, on the same speech material recorded in three different conditions: on a quiet room, from dialled up telephones lines via direct hookup and via a suction cup tap. Each of ten speakers spoke an 8s long sentence four times over a two-months period. Sentences were manually segmented and performance evaluation was conducted on phonemes, on breath groups and on the whole sentence using a minimum weighted distance classifier.

...read moreread less

Proceedings Article•DOI•

Multistage decision schemes for speaker recognition

[...]

H. Dante¹, V. Sarma, G. Dattatreya•Institutions (1)

Indian Institute of Science¹

01 Apr 1979

TL;DR: This procedure is formulated as a stochastic optimal control problem and is illustrated by designing speaker recognition system for 60 speakers with overall accuracy of 97.2 %.

...read moreread less

Abstract: Speaker recognition schemes which work satisfactorily for small populations often fail when the number of classes is very large One way of solving such problems is to go in for multistage classification schemes The basic technique is to successively reduce the number of classes in several stages using one feature at each stage and when the number of classes is less than a predetermined value then the final decision is made The whole scheme is designed so that the probability of error is fixed at an acceptable level The computational cost of such a multistage scheme depends on the features used at each stage and the cost of measurement of each feature The features to be used at each stage are determined so as to reduce the average computational cost for making a decision This procedure is formulated as a stochastic optimal control problem and is illustrated by designing speaker recognition system for 60 speakers The overall accuracy of the system is 972 %

...read moreread less

Total Voice Speaker Verification.

[...]

Robert L Davis, Barbara M. Hydrick, George R. Doddington

01 Jan 1979

TL;DR: A hierarchical clustering algorithm was used, followed by an iterative optimization procedure, to develop a robust speaker-independent, connected digit-sequence recognition capability as the front-end for a speaker verification(voice authentication) program and to install and demonstrate that capability on the Base and Installation Security System Advanced Development Model for speaker verification located at RADC.

...read moreread less

Abstract: : The objective of this resarch has been to develop a robust speaker-independent, connected digit-sequence recognition capability as the front-end for a speaker verification(voice authentication) program and to install and demonstrate that capability on the Base and Installation Security System Advanced Development Model for speaker verification located at RADC In such a system, the correct digit sequence recognition provides the user identification of the claimed identity Verification is then performed on the same speech data This total-voice system must recognize connected digits independent of speaker with high reliability Two sequence constraints aid recognition: two parity checks must be satisfied, and difficult digit pairs were disallowed A further sequence constraint added to aid verification was that all digits must be different The selected constraints yield 320 possible sequences The speech processing strategy features highly reliable time registration and accommodates multiple concurrent hypotheses at various processing levels Basic to robust speaker-independent recognition is the existence of a set of reference patterns capable of allowing for the speaker's sex and dialect Rather than arbitrary segmentation of the design data to produce reference patterns, a hierarchical clustering algorithm was used, followed by an iterative optimization procedure

...read moreread less