Showing papers on "Speaker recognition published in 1983"

PDF

Open Access

Book•

The Phonetic Bases of Speaker Recognition

[...]

Francis Nolan¹•Institutions (1)

28 Oct 1983

TL;DR: This volume clearly demonstrates that any valid theory of speaker recognition must integrate the approaches of a number of disciplines and it is itself an important step towards that integration.

...read moreread less

Abstract: List of tables List of figures Acknowledgements Introduction 1. Perspectives on speaker recognition 2. The bases of between-speaker differences 3. Short term parameters: segments and co-articulation 4. Long term quality 5. Conclusions References Index.

...read moreread less

235 citations

Journal Article•DOI•

Discrete utterance speech recognition without time alignment

[...]

J. Shore, D. Burton

01 Jul 1983-IEEE Transactions on Information Theory

TL;DR: The results of a new method based on rate-distortion speech coding (speech coding by vector quantization), minimum cross-entropy pattern classification, and information-theoretic spectral distortion measures for discrete utterance speech recognition are presented.

...read moreread less

Abstract: The results of a new method are presented for discrete utterance speech recognition. The method is based on rate-distortion speech coding (speech coding by vector quantization), minimum cross-entropy pattern classification, and information-theoretic spectral distortion measures. Separate vector quantization code books are designed from training sequences for each word in the recognition vocabulary. Inputs from outside the training sequence are classified by performing vector quantization and finding the code book that achieves the lowest average distortion per speech frame. The new method obviates time alignment. It achieves 99 percent accuracy for speaker-dependent recognition of a 20 -word vocabulary that includes the ten digits, with higher accuracy for recognition of the digit subset. For speaker-independent recognition, the method achieves 88 percent accuracy for the 20 -word vocabulary and 95 percent for the digit subset. Background of the method, detailed empirical results, and an analysis of computational requirements are presented.

...read moreread less

92 citations

Patent•

Individual verification apparatus

[...]

Sadakazu Watanabe, Hidenori Shinoda

27 Jan 1983

TL;DR: In this article, an individual verification apparatus consisting of a verification data file (20), a speech input section (10), a data memory (30), speech recognition unit (40), and a speaker verification unit (50) is described.

...read moreread less

Abstract: An individual verification apparatus comprises a verification data file (20), a speech input section (10), a data memory (30), a speech recognition unit (40), and a speaker verification unit (50). In the verification data file key codes set by customers and corresponding reference data for individual verification are registered. Speech of the key code spoken by a customer is processed by the speech input section (10) and the result is stored in the data memory (30). The speech recognition unit (40) recognizes the input key code based on the key code data stored in the data memory (30). The speaker verification unit (50) verifies the customer by comparing the key code data with speech reference data of customers having the recognized key code.

...read moreread less

68 citations

Proceedings Article•DOI•

An approach to text-independent speaker recognition with short utterances

[...]

K. Li, E. Wrench

14 Apr 1983

TL;DR: A new technique for text-independent speaker recognition is proposed which uses a statistical model of the speaker's vector quantized speech which retains text- independent properties while allowing considerably shorter test utterances than comparable speaker recognition systems.

...read moreread less

Abstract: A new technique for text-independent speaker recognition is proposed which uses a statistical model of the speaker's vector quantized speech. The technique retains text-independent properties while allowing considerably shorter test utterances than comparable speaker recognition systems. The frequently-occurring vectors or characters form a model of multiple points in the n dimensional speech space instead of the usual single point models, The speaker recognition depends on the statistical distribution of the distances between the speech frames from the unknown speaker and the closest points in the model. Models were generated with 100 seconds of conversational training speech for each of 11 male speakers. The system was able to identify 11 speakers with 96%, 87%, and 79% accuracy from sections of unknown speech of durations of 10, 5, and 3 seconds, respectively. Accurate recognition was also obtained even when there were variations in channels over which the training and testing data were obtained. A real-time demonstration system has been implemented including both training and recognition processes.

...read moreread less

66 citations

Journal Article•DOI•

What the speaker means: the recognition of speakers plans in discourse

[...]

Candace L. Sidner¹•Institutions (1)

BBN Technologies¹

01 Jan 1983-Computers & Mathematics With Applications

TL;DR: A new model for recognizing the speaker's intended meaning in determining a response is presented, which makes use of the speaker’s plan, his beliefs about the domain and about the hearer's relevant capacities.

...read moreread less

Abstract: Human conversational participants depend upon the ability of their partners to recognize their intentions, so that those partners may respond appropriately. In such interactions, the speaker encodes his intentions about the hearer's response in a variety of sentence types. Instead of telling the hearer what to do, the speaker may just state his goals, and expect a response that meets these goals at least part way. This paper presents a new model for recognizing the speaker's intended meaning in determining a response. It shows that this recognition makes use of the speaker's plan, his beliefs about the domain and about the hearer's relevant capacities.

...read moreread less

58 citations

Proceedings Article•DOI•

Isolated word recognition using phoneme-like templates

[...]

N. Sugamura, Kiyohiro Shikano, Sadaoki Furui

01 Apr 1983

TL;DR: New technique for use in a word recognition system where word templates are represented as sequences of descrete phoneme-like (pseudo-phoneme) templates which are automatically determined from a training set of word utterances by a clustering technique.

...read moreread less

Abstract: This paper describes new technique for use in a word recognition system. This recognition system is especially efffective in speaker-dependent large vocabulary word recognition based on multiple reference templates. In this system, word templates are represented as sequences of descrete phoneme-like (pseudo-phoneme) templates which are automatically determined from a training set of word utterances by a clustering technique. In speaker-dependent 641 city names word recognition experiments, 96.3% recognition accuracy was obtained using 256 phoneme-like templates.

...read moreread less

56 citations

Journal Article•DOI•

Incidental processing of speaker characteristics: voice as connotative information

[...]

Ralph E. Geiselman¹, Joseph M. Crawley¹•Institutions (1)

University of California, Los Angeles¹

01 Feb 1983-Journal of Verbal Learning and Verbal Behavior

TL;DR: In this paper, two experiments were conducted to investigate how subjects remember paralinguistic speaker's voice information without apparent intent, with the suibjects' stated task being only to remember the sentences, incidental memory for which speaker spoke which sentences was facilitated by fabricated personal histories of the speakers.

...read moreread less

53 citations

Proceedings Article•DOI•

Bayesian adaptation in speech recognition

[...]

P. Brown, Chin-Hui Lee, J. Spohrer

01 Apr 1983

TL;DR: It is demonstrated that by using Bayesian techniques, prior knowledge derived from speaker-independent data can be combined with speaker-dependent training data to improve system performance.

...read moreread less

Abstract: In order to achieve state-of-the-art performance in a speaker-dependent speech recognition task, it is necessary to collect a large number of acoustic data samples during the training process. Providing these samples to the system can be a long and tedious process for users. One way to attack this problem is to make use of extra information from a data bank representing a large population of speakers. In this paper we demonstrate that by using Bayesian techniques, prior knowledge derived from speaker-independent data can be combined with speaker-dependent training data to improve system performance.

...read moreread less

35 citations

Proceedings Article•DOI•

The discriminative network: A mechanism for focusing recognition in whole-word pattern matching

[...]

Roger K. Moore, Martin J. Russell, M. J. Tomlinson

01 Apr 1983

TL;DR: Results indicate that discrimination between similar sounding words can be greatly improved, and an alternative DTW approach which is able to focus its attention on those parts of a speech pattern which serve to distinguish it from similar patterns is presented.

...read moreread less

Abstract: Whole-word pattern matching using dynamic time-warping (DTW) has achieved considerable success as an algorithm for automatic speech recognition. However, the performance of such an algorithm is ultimately limited by its inability to discriminate between similar sounding words. The problem arises because all differences between speech patterns are treated as being equally important, hence the algorithm is particularly susceptible to confusions caused by irrelevant differences. This paper presents an alternative DTW approach which is able to focus its attention on those parts of a speech pattern which serve to distinguish it from similar patterns. A network-type data structure is derived from reference speech patterns, and the separate paths through the network determine the regions where recognition takes place. Results indicate that discrimination between similar sounding words can be greatly improved.

...read moreread less

23 citations

Proceedings Article•DOI•

Recognition of isolated-word sentences from a 5000-word vocabulary office correspondence task

[...]

Lalit R. Bahl¹, A. Cole, Frederick Jelinek, Robert Leroy Mercer, A. Nadas, David Nahamoo, Michael Picheny - Show less +3 more•Institutions (1)

IBM¹

01 Apr 1983

TL;DR: Recognition results on sentences from a 5000-word vocabulary drawn from office correspondence are presented, which comprises the 5000 most frequently occurring words in a data-base of 14,000 office memoranda and letters, and has a perplexity of 90.

...read moreread less

Abstract: Recognition results on sentences from a 5000-word vocabulary drawn from office correspondence are presented. The sentences were read with pauses between the words. The vocabulary comprises the 5000 most frequently occurring words in a data-base of 14,000 office memoranda and letters, and has a perplexity of 90, measured from a trigram language model. Experiments were carried out with 6 speakers (4 male, 2 female) in an office environment using a close-talking microphone. The recognition system was automatically trained to each speaker by having the speaker read 100 typical sentences from the office correspondence data-base. Recognition was carried out for each speaker on 20 test sentences, consisting of 299 words. The recognition rate (% words correct) averaged across the 6 speakers was 94.5%.

...read moreread less

22 citations

Journal Article•DOI•

The use of phonetic rules in automatic speech recognition

[...]

Victor W. Zue¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Jul 1983-Speech Communication

Journal Article•DOI•

A segmentation algorithm for connected word recognition based on estimation principles

[...]

R. Zelinski, F. Class

01 Aug 1983-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: A speaker-independent segmentation procedure which automatically adapts the classifier to the speaker-dependent effects of coarticulation and is well suited for a speech input where the number of words in a word string is not known to the recognition system.

...read moreread less

Abstract: Recognition of connected words can be performed by segmenting the word string automatically into single-word components which are then classified by a single-word recognition system. We propose and investigate a speaker-independent segmentation procedure which is based completely on statistical principles. An estimation algorithm, adapted to the statistical data of the signal parameters, determines the word boundaries. The statistical data are computed from vocabulary-dependent speech samples of different speakers. The segmentation procedure, which operates independently of the single-word recognizer, has been tested with connected digits. The results show that an estimation algorithm based on quadratic polynomials yields a very reliable segmentation. The segmentation procedure is also well suited for a speech input where the number of words in a word string is not known to the recognition system. Based on the above segmentation procedure, we have carried out several recognition experiments on two-to-four-digit strings. The investigations show that the proposed segmentation algorithm provides an efficient tool to tackle the effects of coarticulation between adjacent words. We present a training procedure which automatically adapts the classifier to the speaker-dependent effects of coarticulation.

...read moreread less

Proceedings Article•DOI•

A speech recognition processor

[...]

T. Iwata, H. Ishizuka, M. Watari, T. Hoshi, M. Mizuno - Show less +1 more

01 Jan 1983

TL;DR: A single chip implementing a distance calculator, dynamic programming equation calculator and pipelined operations for use in speech recognition and up to 340 isolated words or 40 connected words can be recognized in realtime.

...read moreread less

Abstract: This report will discuss a single chip implementing a distance calculator, dynamic programming equation calculator and pipelined operations for use in speech recognition. Up to 340 isolated words or 40 connected words can be recognized in realtime.

...read moreread less

Proceedings Article•DOI•

Further experiments in text-independent speaker recognition over communications channels

[...]

M. Hunt¹•Institutions (1)

National Research Council¹

14 Apr 1983

TL;DR: The results of the simulation indicate that performance estimates from recognition experiments should be allowed wide error tolerances, and they illustrate the danger of trying too many features on the same database.

...read moreread less

Abstract: Experiments are described in automatic, text-independent speaker recognition using three databases: good quality read speech, conversations over simulated telephone links, and conversations over real telephone links. A recognition system is evaluated on this material using a set of features which were believed to have some resistance to transmission degradations, namely, F 0 statistics and statistics of low-order cepstrum coefficient variation. Performance is reasonable on the first two databases but poor on the telephone speech. A new set of features based on the frequencies of peaks in the short-term smoothed spectrum is found to perform better on the telephone speech, presumably because of its greater resistance to noise and nonlinear distortions. A computer simulation of the recognition experiments is described. The results of the simulation indicate that performance estimates from recognition experiments should be allowed wide error tolerances, and they illustrate the danger of trying too many features on the same database.

...read moreread less

Proceedings Article•DOI•

On the generation and use of a segment dictionary for speech coding, synthesis and recognition

[...]

G. Chollet, J. Galliano, J. Lefevre, E. Viara

01 Apr 1983

TL;DR: A methodology is described to obtain a set of segments and rules that represents adequately the speech performance of a given speaker and how such a segment data base can be used for speech coding at very low bit rate, synthesis from unrestricted text, and continuous speech recognition.

...read moreread less

Abstract: A methodology is described to obtain a set of segments and rules that represents adequately the speech performance of a given speaker. This methodology proceeds from an initial set of diphones extracted from a neutral context and modify this set with larger and/or smaller segments depending on the match with natural utterances. Each segment is stored as a sequence of frames coded using LPC coefficients. An estimate of the likelihood of timescale distortion is associated with each frame. It represents knowledge on temporal variability that can be used by synthesis rules and/or pattern matching algorithms. It is then shown how such a segment data base can be used for 1) speech coding at very low bit rate ( ∼ 400 bit/sec), 2) synthesis from unrestricted text, 3) continuous speech recognition.

...read moreread less

Journal Article•DOI•

Selected military applications of automatic speech recognition technology

[...]

J. Woodard, E. Cupples

01 Dec 1983-IEEE Communications Magazine

Proceedings Article•DOI•

Further investigation of probabilistic methods for text-independent speaker identification

[...]

J. Wolf¹, M. Krasner, K. Karnofsky, Richard Schwartz, S. Roucos - Show less +1 more•Institutions (1)

BBN Technologies¹

01 Apr 1983

TL;DR: Preliminary results show that the probabilistic methods perform significantly better than a minimum-distance classifier for the multi-session paradigm.

...read moreread less

Abstract: In this paper, we present the preliminary performance of four methods for text-independent speaker identification using speech transmitted over radio channels. In a previous paper [1], we showed that for both laboratory-quality and simulated noisy-channel data in a single-session paradigm, new probabilistic classifiers yielded performance superior to that of a minimum distance classifier. We have recently compiled a speech database consisting of speech transmissions over a radio-channel. The lower quality and higher variability of this database differ markedly from the laboratory-quality databases often used in speech processing research. We present preliminary results with the same four methods of text-independent speaker identification using the radio-channel database with several experimental paradigms including multi-session paradigms. These results show that the probabilistic methods perform significantly better than a minimum-distance classifier for the multi-session paradigm.

...read moreread less

Proceedings Article•DOI•

Automatic prediction of linear frequency warp for speech recognition

[...]

R. Golibersuch

01 Apr 1983

TL;DR: This paper addresses the use of linear frequency warping for template normalization and describes both a technique for estimating the long-term distribution of the frequencies of a talker's formants and a techniques for automatically predicting an optimal linear frequency warp.

...read moreread less

Abstract: In a template-based, speaker-independent, speech recognition system, stored templates may be used in matching the speech of new users. For optimal results, templates should be carefully selected and proper normalization algorithms should be applied for each new talker. This paper addresses the use of linear frequency warping for template normalization and describes both a technique for estimating the long-term distribution of the frequencies of a talker's formants and a technique for automatically predicting an optimal linear frequency warp.

...read moreread less

Patent•

Continuous voice recognition system

[...]

Yasuo Satou, Takayuki Fujimoto

31 Mar 1983

An approach to text-independent speaker recognition with short utterances

[...]

Absi ' Ract

01 Jan 1983

Journal Article•DOI•

Text independent speaker recognition

[...]

J. Foil, Don H. Johnson

01 Dec 1983-IEEE Communications Magazine

Proceedings Article•

The FOPHO speech recognition project

[...]

Mary O'Kane

08 Aug 1983

TL;DR: The FOPHO (F_oreign Phonetician) speech recognition project concerns the development of a system to produce a reasonably high quality phonetic transcription output from continuous speech input.

...read moreread less

Abstract: The FOPHO (F_oreign Phonetician) speech recognition project concerns the development of a system to produce a reasonably high quality phonetic transcription output from continuous speech input. The system is developed to perform in a way which approximates the actions of a phonetician trying to transcribe a foreign tongue, (in the case of FOPHO, Australian English). Because of this central philosophy, FOPHO is a very interactive system and has facilities for automatic learning and analysis of its own performance. Good quality recognition is achieved through algorithms which are very context-dependent and which are sensitive to a variety of possible productions of similar sounds even though the system itself is speaker independent.

...read moreread less

Journal Article•DOI•

Speech recognition: A current perspective: In spite of limitations, areas of application are growing, and voice communication with computers may well be commonplace by the 21st century

[...]

Frederick E. Petry¹•Institutions (1)

Tulane University¹

21 Jan 1983-IEEE Potentials

TL;DR: This system, which is designed to provide three major functions — intelligent interface, knowledge-base management, and problem solving — will provide communications with the computer in a form natural to humans, particularly via speech and graphics.

...read moreread less

Abstract: Work in the area of speech recognition by computer has been taking place for about 30 years — almost since the inception of the computer itself. The results of these efforts can be seen today in a number of products capable of various degrees of speech recognition. Although the systems used in these products exhibit limitations when compared to people's ability to recognize and respond to speech, the use of speech in our everyday work is so natural and desirable that these systems are finding numerous applications. Speech recognition has even become important on a global level, as can be seen by the integral role it will play in the proposed fifth-generation Japanese computer system. This system, which is designed to provide three major functions — intelligent interface, knowledge-base management, and problem solving — will provide communications with the computer in a form natural to humans, particularly via speech and graphics.

...read moreread less

Proceedings Article•DOI•

Syntactic pattern recognition of discrete utterances

[...]

E. Bronson¹•Institutions (1)

Purdue University¹

14 Apr 1983

TL;DR: A discrete utterance recognition technique which applies formal language theory to a symbol string derived from the speech input using stored context-free grammars for the allowed vocabulary is described.

...read moreread less

Abstract: This paper describes a discrete utterance recognition technique which applies formal language theory to a symbol string derived from the speech input. Analysis is performed to obtain a representation of the input utterance in terms of acoustically consistent labeled regions. Syntactic pattern recognition is then used to parse this representation of the input word using stored context-free grammars for the allowed vocabulary. Preliminary results are reported.

...read moreread less

Proceedings Article•DOI•

Probabilistic model for the performance of speech recognition systems

[...]

Aaron E. Rosenberg¹•Institutions (1)

Bell Labs¹

14 Apr 1983

TL;DR: A probabilistic model is developed to account for the error rate behavior of isolated word speech recognition systems and results indicate that two-way mixture distributions account quite well for the experimental performance results.

...read moreread less

Abstract: A probabilistic model is developed to account for the error rate behavior of isolated word speech recognition systems. Two kinds of error are examined, confusion error, an a priori characterization of a recognizer which measures differences between words, and recognition rank error, an a posteriori characterization, which, in addition to taking into account differences between words, accounts for differences between different tokens of the same word. It is shown that these kinds of error can be modelled by describing recognition trials as Bernoulli trials. Good models of error rate behavior as a function of vocabulary size can be obtained if the distributions of confusion or rank number are considered to be mixtures of binomial distributions. The data obtained from a recent experiment in isolated word recognition with a large vocabulary, (1109 words), are used to evaluate the model. Model functions based on mixture distributions are fit by means of an optimization algorithm to experimental error rate functions obtained from each of six talkers and three partitions of the vocabulary. The results indicate that two-way mixture distributions account quite well for the experimental performance results.

...read moreread less

Journal Article•DOI•

Voice control of an interactive simulation

[...]

Stephen J. Withers¹•Institutions (1)

University of Warwick¹

01 Jan 1983-Simulation

TL;DR: A low cost, microcomputer-based voice recognition device makes a convenient input channel for an interactive model of a manufacturing system and potential exists for useful voice control of simulations in the near future.

...read moreread less

Abstract: A low cost, microcomputer-based voice recognition device makes a convenient input channel for an interactive model of a manufacturing system. The problems with current hardware are its limited capabilities and unreliable operation. However, the potential exists for useful voice control of simulations in the near future

...read moreread less

Journal Article•DOI•

Speech recognition and understanding

[...]

T. K. Vintsyuk

01 May 1983-Cybernetics and Systems Analysis

TL;DR: The goal of the research is to develop an automatic typewriter that will automatically edit and type text under voice control and an application of the composition dynamic programming method for the solution of basic problems in the recognition and understanding of speech.

...read moreread less

Abstract: This article discusses the automatic processing of speech signals with the aim of finding a sequence of works (speech recognition) or a concept (speech understanding) being transmitted by the speech signal. The goal of the research is to develop an automatic typewriter that will automatically edit and type text under voice control. A dynamic programming method is proposed in which all possible class signals are stored, after which the presented signal is compared to all the stored signals during the recognition phase. Topics considered include element-by-element recognition of words of speech, learning speech recognition, phoneme-by-phoneme speech recognition, the recognition of connected speech, understanding connected speech, and prospects for designing speech recognition and understanding systems. An application of the composition dynamic programming method for the solution of basic problems in the recognition and understanding of speech is presented.

...read moreread less

The Importance of Phase in Word Recognition.

[...]

Jeffrey T. Pfeiffer

01 Sep 1983

TL;DR: The results of tests for both the speaker dependent and speaker-independent case indicate that phase may be an important feature to consider in the development of word recognition systems.

...read moreread less

Abstract: : The use of phase-only representations of speech for isolated word recognition is explored Until recently the ear was thought to be short-term phase insensitive However, short-term phase-only reconstructed speech has been shown to retain much of the intelligibility of the original signal Using cepstral and analytic signal processing techniques, a system for isolated word recognition is developed The results of tests for both the speaker dependent and speaker-independent case indicate that phase may be an important feature to consider in the development of word recognition systems (Author)

...read moreread less

Proceedings Article•DOI•

A comparison of distance measures for text-independent speaker identification

[...]

Malayappan Shridhar¹, N. Mohankrishnan, M. Sid-Ahmed•Institutions (1)

University of Windsor¹

14 Apr 1983

TL;DR: In this work several distance classifiers are evaluated for use in text-independent speaker identification and it is found that both the maximum a posteriori probability criterion and the correlation distance measure yield extremely poor results.

...read moreread less

Abstract: A survey of research efforts in the area of speaker recognition indicate that for the same choice of speaker-dependent speech parameters the recognition accuracy is significantly affected by the distance measure used. In this work several distance classifiers are evaluated for use in text-independent speaker identification. The four distance measures investigated are the Mahalanobis distance, maximum a posteriori probability, nearest neighbor criterion and the correlation distance measure. It is found that both the maximum a posteriori probability criterion and the correlation distance measure yield extremely poor results. The Mahalanobis distance and the nearest neighborhood criterion yield relatively poor results (error \sim20-30 %) with the former consistently superior to the latter. It is shown that these scores can be improved through a proposed variation of the nearest neighbor method.

...read moreread less

Journal Article•DOI•

A speaker recognizability test for communications systems

[...]

P. Papamichalis, George R. Doddington

01 Nov 1983-Journal of the Acoustical Society of America

TL;DR: In this paper, a Speaker Recognizability Test (SRT) was designed which tries to establish how well a given communications system preserves a speaker's identity, and no attempt was made to identify the cues used by listeners for speaker recognition.

...read moreread less

Abstract: A Speaker Recognizability Test (SRT) has been designed which tries to establish how well a given communications system preserves a speaker's identity. Contrary to previous efforts, no attempt is made to identify the cues used by listeners for speaker recognition. Instead, listeners are asked directly to identify a speaker who says an utterance. The test is constructed as follows: Several sentences are collected from five male and five female speakers. One sentence from each speaker is used as reference. The listening team consists of ten listeners. Each listener is presented 20 different sentences and is asked to identify the speaker of each one of them by comparing it with the ten reference sentences. Among the issues considered in the design of the test is the choice of speakers, the use of reference sentences from the same or different sessions of data collection, and the use of processed or unprocessed speech for reference.

...read moreread less