Showing papers on "Speaker recognition published in 1985"

PDF

Open Access

Proceedings Article•DOI•

A vector quantization approach to speaker recognition

[...]

F.K. Soong¹, Aaron E. Rosenberg², Lawrence R. Rabiner², Biing-Hwang Juang²•Institutions (2)

26 Apr 1985

TL;DR: A vector quantization (VQ) codebook was used as an efficient means of characterizing the short-time spectral features of a speaker and was used to recognize the identity of an unknown speaker from his/her unlabelled spoken utterances based on a minimum distance (distortion) classification rule.

...read moreread less

Abstract: In this study a vector quantization (VQ) codebook was used as an efficient means of characterizing the short-time spectral features of a speaker. A set of such codebooks were then used to recognize the identity of an unknown speaker from his/her unlabelled spoken utterances based on a minimum distance (distortion) classification rule. A series of speaker recognition experiments was performed using a 100-talker (50 male and 50 female) telephone recording database consisting of isolated digit utterances. For ten random but different isolated digits, over 98% speaker identification accuracy was achieved. The effects, on performance, of different system parameters such as codebook sizes, the number of test digits, phonetic richness of the text, and difference in recording sessions were also studied in detail.

...read moreread less

493 citations

Proceedings Article•DOI•

Context-dependent modeling for acoustic-phonetic recognition of continuous speech

[...]

Richard Schwartz¹, Y. L. Chow¹, Owen Kimball¹, S. Roucos¹, M. Krasner¹, John Makhoul¹ - Show less +2 more•Institutions (1)

BBN Technologies¹

26 Apr 1985

TL;DR: The combination of general spectral information and specific acoustic-phonetic features is shown to result in more accurate phonetic recognition than either representation by itself.

...read moreread less

Abstract: This paper describes the results of our work in designing a system for phonetic recognition of unrestricted continuous speech. We describe several algorithms used to recognize phonemes using context-dependent Hidden Markov Models of the phonemes. We present results for several variations of the parameters of the algorithms. In addition, we propose a technique that makes it possible to integrate traditional acoustic-phonetic features into a hidden Markov process. The categorical decisions usually associated with heuristic acoustic-phonetic algorithms are replaced by automated training techniques and global search strategies. The combination of general spectral information and specific acoustic-phonetic features is shown to result in more accurate phonetic recognition than either representation by itself.

...read moreread less

367 citations

Journal Article•DOI•

Speaker recognition—Identifying people by their voices

[...]

G.R. Doddington¹•Institutions (1)

Texas Instruments¹

01 Nov 1985

TL;DR: A discussion of inherent performance limitations, along with a review of the performance achieved by listening, visual examination of spectrograms, and automatic computer techniques, attempts to provide a perspective with which to evaluate the potential of speaker recognition and productive directions for research into and application of speaker Recognition technology.

...read moreread less

Abstract: The usefulness of identifying a person from the characteristics of his voice is increasing with the growing importance of automatic information processing and telecommunications. This paper reviews the voice characteristics and identification techniques used in recognizing people by their voices. A discussion of inherent performance limitations, along with a review of the performance achieved by listening, visual examination of spectrograms, and automatic computer techniques, attempts to provide a perspective with which to evaluate the potential of speaker recognition and productive directions for research into and application of speaker recognition technology.

...read moreread less

350 citations

Patent•DOI•

Speech recognition apparatus and method

[...]

James K. Baker, Paul G. Bamberg, Mark Franklin Sidell, Robert Roth

12 Nov 1985-Journal of the Acoustical Society of America

TL;DR: A system is disclosed for recognizing a pattern in a collection of data given a context of one or more other patterns previously identified, which enables an operator to confirm the system's best guess as to the spoken word merely by speaking another word.

...read moreread less

Abstract: A system is disclosed for recognizing a pattern in a collection of data given a context of one or more other patterns previously identified. Preferably the system is a speech recognition system, the patterns are words and the collection of data is a sequence of acoustic frames. During the processing of each of a plurality of frames, for each word in an active vocabulary, the system updates a likelihood score representing a probability of a match between the word and the frame, combines a language model score based on one or more previously recognized words with that likelihood score, and prunes the word from the active vocabulary if the combined score is below a threshold. A rapid match is made between the frames and each word of an initial vocabulary to determine which words should originally be placed in the active vocabulary. Preferably the system enables an operator to confirm the system's best guess as to the spoken word merely by speaking another word, to indicate that an alternate guess by the system is correct by typing a key associated with that guess, and to indicate that neither the best guess nor the alternate guesses was correct by typing yet another key. The system includes other features, including ones for determining where among the frames to look for the start of speech, and a special hardware processor for computing likelihood scores.

...read moreread less

208 citations

Journal Article•DOI•

The use of speech knowledge in automatic speech recognition

[...]

V.W. Zue¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Nov 1985

TL;DR: The nature of variabilities is discussed, to describe the kinds of speech knowledge that may help us understand variabilities, and to advocate and suggest specific procedures for the increased utilization ofspeech knowledge in automatic speech recognition.

...read moreread less

Abstract: In automatic speech recognition, the acoustic signal is the only tangible connection between the talker and the machine. While the signal conveys linguistic information, this information is often encoded in such a complex manner that the signal exhibits a great deal of variability. In addition, variations in environment and speaker can introduce further distortions that are linguistically irrelevant. This paper has three aims: 1) to discuss the nature of variabilities; 2) to describe the kinds of speech knowledge that may help us understand variabilities; and 3) to advocate and suggest specific procedures for the increased utilization of speech knowledge in automatic speech recognition.

...read moreread less

186 citations

Journal Article•

A critical review of electroglottography.

[...]

Donald G. Childers¹, A. Krishnamurthy¹•Institutions (1)

University of Florida¹

01 Jan 1985-Critical Reviews in Biomedical Engineering

TL;DR: The technique of electroglottography is reviewed from the perspective of a laboratory instrument for assessing laryngeal function, a device to assist speech and speaker recognition, and as a potential diagnostic aid in the clinic.

...read moreread less

Abstract: The technique of electroglottography is reviewed from the perspective of a laboratory instrument for assessing laryngeal function, a device to assist speech and speaker recognition, and as a potential diagnostic aid in the clinic. A description of the electronic functioning of the electroglottograph (EGG) is provided. Considerable emphasis is given to contemporary research which has focused on laryngeal assessment using the EGG. Methods for validating and aiding the interpretation or reading of the EGG are discussed, including photoglottography, stroboscopy, ultrahigh-speed laryngeal cinematography, and others. The relationship of the EGG to glottal area and glottal volume velocity estimated by inverse filtering is presented. An elementary model of the EGG is described and used to predict characteristic features of the EGG waveform. Clinical data as well as data obtained from subjects with a normal functioning larynx are analyzed. Applications of the EGG to speech processing are outlined, including real-time detection of voicing, voiced and unvoiced speech segments, and silence intervals. The EGG device has potential for assisting speech and speaker recognition systems in certain applications.

...read moreread less

177 citations

Journal Article•DOI•

Speaker evaluations as a function of speech rate, accent and context

[...]

Bruce L. Brown, Howard Giles, Jitendra N. Thakerar

01 Jan 1985-Language & Communication

109 citations

Journal Article•DOI•

The Phonetic Bases of Speaker Recognition by Francis Nolan

[...]

Francis Nolan, Harry Hollien

01 Aug 1985-Journal of the Acoustical Society of America

68 citations

Journal Article•

Familiar voice recognition: Parameters and patterns. Part II. Recognition of rate-altered voices

[...]

Diana Van Lancker, Jody Kreiman¹, Jody Kreiman², Jody Kreiman³, T. Wickens - Show less +1 more•Institutions (3)

University of North Dakota¹, University of California, Berkeley², University of California, Los Angeles³

01 Jan 1985-Journal of Phonetics

67 citations

Journal Article•DOI•

Identification of known voices as a function of familiarity and narrow‐band coding

[...]

A. Schmidt‐Nielsen, Karen R. Stern

01 Feb 1985-Journal of the Acoustical Society of America

TL;DR: Tape recordings of 24 speakers conversing over an unprocessed channel and over an LPC voice processing system were subjected to listening tests, suggesting that frequently voiced concerns about speaker recognition over narrow‐band voice communication systems may not be justified.

...read moreread less

Abstract: Tape recordings of 24 speakers conversing over an unprocessed channel and over an LPC voice processing system were subjected to listening tests. The listeners were 24 co‐workers who attempted to identify each speaker from a group of about 40 people working in the same branch. Prior to the recognition test, each of the listeners also rated his or her familiarity with each of the speakers and the distinctiveness of each speaker’s voice. There was some loss in voice recognition over LPC, but the recognition accuracy was still quite high (69% vs 88% for unprocessed voices), suggesting that frequently voiced concerns about speaker recognition over narrow‐band voice communication systems may not be justified. Talker familiarity was significantly correlated with correct identifications. There was no significant correlation between the rated distinctiveness of the speaker and correct identifications. However, familiarity and distinctiveness ratings were highly correlated. This suggests that people consider a familiar voice to be distinctive regardless of whatever characteristics might make that particular voice stand out in a crowd.

...read moreread less

45 citations

Journal Article•

Voice recognition device as a computer interface for motor and speech impaired people.

[...]

Melanie Fried-Oken¹•Institutions (1)

Tufts University¹

01 Oct 1985-Archives of Physical Medicine and Rehabilitation

TL;DR: Results indicate that the voice recognition system might be appropriate for rehabilitation programs though further technologic refinement of the device would increase its effectiveness.

...read moreread less

Proceedings Article•DOI•

A real-time, isolated-word, speech recognition system for dictation transcription

[...]

Frederick Jelinek¹•Institutions (1)

IBM¹

01 Apr 1985

TL;DR: The architecture of an experimental, real-time, isolated-word, speech recognition system with a 5,000-word vocabulary which can be used for dictating office correspondence is described and some recent experimental results obtained are given.

...read moreread less

Abstract: The Speech Recognition Group at IBM, Yorktown Heights, has recently completed the implementation of an experimental, real-time, isolated-word, speech recognition system with a 5,000-word vocabulary which can be used for dictating office correspondence. Typical recognition accuracy is greater than 94% correct word recognition for words within the vocabulary. We first describe the architecture of this system, and then give some recent experimental results obtained with it for read and spontaneously dictated speech from five speakers.

...read moreread less

Proceedings Article•DOI•

Investigation of text-independent speaker indentification over telephone channels

[...]

Herbert Gish¹, K. Karnofsky, M. Krasner, S. Roucos, Richard Schwartz, J. Wolf - Show less +2 more•Institutions (1)

BBN Technologies¹

01 Apr 1985

TL;DR: The methods found to be most effective rely on the training process to incorporate channel variability, and it is shown that the direct approach, of using simple channel-invariant features, can discard much speaker dependent information.

...read moreread less

Abstract: In this paper, we examine several methods for text-independent speaker identification of telephone speech with limited duration data, The issue addressed is the assessment of channel characteristics, especially linear aspects, and methods for improving speaker identification performance when the speaker to be identified is on a different telephone channel than that data used for training. We show experimental evidence illustrating the cross-channel problem and also show that the direct approach, of using simple channel-invariant features, can discard much speaker dependent information. The methods we have found to be most effective rely on the training process to incorporate channel variability.

...read moreread less

Proceedings Article•DOI•

Text-dependent speaker recognition using vector quantization

[...]

J. Buck¹, D. Burton, J. Shore•Institutions (1)

United States Naval Research Laboratory¹

01 Apr 1985

TL;DR: An application of source coding to speaker recognition is described, where each speaker is represented by a sequence of vector quantization codebooks; known input utterances are classified using these codebook sequences and the resulting classification distortion is compared to a rejection threshold.

...read moreread less

Abstract: An application of source coding to speaker recognition is described. The method is text-dependent - the text spoken is known, and the problem is to determine who said it. Each speaker is represented by a sequence of vector quantization codebooks; known input utterances are classified using these codebook sequences and the resulting classification distortion is compared to a rejection threshold. On a 16 speaker test population with an additional 111 imposters, this method achieved a false rejection rate of 0.8%, an imposter acceptance rate of 1.8%, and within the 16 speakers, an identification error rate of 0.0%.

...read moreread less

Proceedings Article•DOI•

Speaker dependent connected speech recognition via phonetic Markov models

[...]

Hervé Bourlard¹, Y. Kamp, C. Wellekens•Institutions (1)

Philips¹

01 Apr 1985

TL;DR: A method for speaker dependent connected speech recognition based on phonemic units is described, in which each phoneme is characterized by a very simple 3-state Hidden Markov Model which is trained on connected speech by a Viterbi algorithm.

...read moreread less

Abstract: In this paper, a method for speaker dependent connected speech recognition based on phonemic units is described. In this recognition system, each phoneme is characterized by a very simple 3-state Hidden Markov Model (HMM) which is trained on connected speech by a Viterbi algorithm. Each state has associated with it a continuous (Gaussian) or discrete probability density function (pdf). With the phonemic models so obtained, the recognition is then performed either directly at word level (by the reconstruction of reference words from the models of the constituting phonemes) or via a phonemic labelling. Good results are obtained as well with a German ten digit vocabulary (20 phonemes) as with a French 80 word vocabulary (36 phonemes).

...read moreread less

Journal Article•DOI•

Voice recognition and the ontological status of self-deception.

[...]

Harold A. Sackeim¹, Ruben C. Gur•Institutions (1)

York University¹

01 May 1985-Journal of Personality and Social Psychology

TL;DR: This article pointed out the logical fallacy in Douglas and Gibbins' argument that self-other and acquaintance-other recognition errors are often instances of self-deception, and they presented no evidence that either type of recognition error was not an instance of self deception.

...read moreread less

Abstract: Douglas and Gibbins (1983) recently argued that our demonstration that errors in self-other recognition are often instances of self-deception was inadequate. In their study, they found that both self-other and acquaintance-other recognition errors met two of the four criteria we had offered as necessary and sufficient for ascribing self-deception. They presented no evidence that either type of recognition error was not an instance of self-deception. Here we describe the original basis of our demonstration and point out the logical fallacy in Douglas and Gibbins' argument.

...read moreread less

Proceedings Article•DOI•

Speaker independent telephone speech recognition

[...]

H. Iizuka

01 Apr 1985

TL;DR: The speech recognition accuracy of this method in recognizing non-training voice data was 95.8% with automatic segmentation, and the category of the nearest reference pattern is taken as the result.

...read moreread less

Abstract: This paper descrives recognition method, reference pattern generation method, and evaluation about the speaker independent recognition for telephone speech response systems. Input utterance is analyzed by 19 channel BPFs. The power and vocal cord source characteristics are normalized. The time normalization is realized by linearly compressing or expanding to 32 frames. The speech pattern undergoes pattern matching with male and female reference patterns, and the category of the nearest reference pattern is taken as the result. It is necessary to optimize the reference patterns so that the speech can be correctly recognized in spite of the difference of formant frequencies, and slight segmentation errors. To optimize the reference patterns, the recognition of the training patterns and updating of the reference patterns are repeated. A total of 256 male and female reference patterns were generated The speech recognition accuracy of this method in recognizing non-training voice data was 95.8% with automatic segmentation.

...read moreread less

Journal Article•DOI•

Text‐independent speaker recognition experiments using codebooks in vector quantization

[...]

Kiyohiro Shikano

01 Apr 1985-Journal of the Acoustical Society of America

TL;DR: A text‐independent speaker clustering approach to speaker‐indepencent speaker recognition through vector quantization (VQ) was investigated, where the distortion value was used as a clustering measure.

...read moreread less

Abstract: A text‐independent speaker clustering approach to speaker‐indepencent speaker recognition through vector quantization (VQ) was investigated, where the distortion value was used as a clustering measure. To show the possibility of the text‐independent speaker clustering, speaker recognition experiments were carried out using the Harvard sentence database. Nine male speakers uttered ten different Harvard sentences each. Codebooks were generated from the first five sentences for each speaker using Weighted Likelihood Ratio measure (WLR) through LPC analysis. Using 128 vectors in each codebook, a speaker recognition rate of 98% was attained on the latter five Harvard sentences. Effects of codebook size and input length are also discussed. The above approach based on framewise VQ only utilizes the static distribution of LPC spectra. VQ for multiframe codebooks was used to represent the coarticulation units. The results of speaker recognition experiments based on multi‐frame codebooks will be compared with fixed length VQ approaches.

...read moreread less

Proceedings Article•DOI•

Large vocabulary speaker-independent Japanese speech recognition system

[...]

S. Morii, K. Niyada, S. Fujii, M. Hoshimi

01 Apr 1985

TL;DR: This paper describes the speaker independent large vocabulary speech recognition system based on phoneme recognition, which employs LPC cepstrum coefficients as the feature parameter and statistical distance measure between an input pattern and phoneme reference template.

...read moreread less

Abstract: This paper describes the speaker independent large vocabulary speech recognition system based on phoneme recognition. Phoneme recognition employs LPC cepstrum coefficients as the feature parameter and statistical distance measure between an input pattern and phoneme reference template. Using power dips of low and high frequency range, similarity to unvoiced feature and similarity to nasal feature, the consonant segments are detected. The discrimination of phonemes is performed individually for vowels, semi-vowels and consonants. Phoneme sequence which is result of phoneme recognition is matched with each item of the word dictionary and the item with the highest similarity in the dictionary is output as the recognition result. An average phoneme recognition score is 81.4% for 212 words uttered by forty speakers including males and females; 90.6% for vowels, 78.0% for semivowels and 71.9% for consonants. An average score of word recognition is 95.6% for 274 Japanese city names uttered by forty speakers.

...read moreread less

Proceedings Article•DOI•

A connected speech recognition system using a diphone-based language model

[...]

A. M. Colla, C. Scagliola, D. Sciarra

01 Apr 1985

TL;DR: A very positive experience is reported which is being made with a system based on very short sub-word units, called "diphones", with much alleviated problems related to storage require, discrimination of similar words and training time.

...read moreread less

Abstract: Almost all CSR systems presently in practical use are based on whole-word template matching. Although their performances are quite high, a few problems arise due to the use of whole words as ba sic units. They are related to storage require ments, coarticulation effects at the junction be tween words, discrimination of similar words and training time, especially for large vocabularies. In this paper we report a very positive experi ence which is being made with a system based on very short sub-word units, called "diphones". With the approach described in this paper, the above mentioned problems are much alleviated, with no penalty on performance. The first part is devoted to the presentation and discussion of the peculiar characteristics of the diphones, of the language model based on them and of the overall recognition system. Then a set of procedures used for training the system on a new application and for extracting the diphone templates for any new speaker are briefly de scribed. Finally we report and discuss the results of different tests performed on various recogni tion tasks.

...read moreread less

Proceedings Article•DOI•

Phonetically guided clustering for isolated word recognition

[...]

D. Mergel¹, Hermann Ney•Institutions (1)

Philips¹

01 Apr 1985

TL;DR: A variant of the Markov source modelling of entire words based on automatically determined subword units is described, applied to speaker-dependent and independent recognition of the German digits (telephone speech).

...read moreread less

Abstract: A variant of the Markov source modelling of entire words based on automatically determined subword units is described. Each word of the vocabulary is modelled as a linear sequence of phoneme segments given by a phonetic transcription. For every phoneme a minimum and maximum duration are to be specified. Matching an utterance to the models must be performed within these absolute durational constraints. This is achieved by a dynamic programming time alignment different from the conventional ones. The acoustic emission is defined by means of phonetically labelled prototype vectors. The parameters of the models are automatically trained by an iterative procedure similar to the Viterbi algorithm. The method is applied to speaker-dependent and independent recognition of the German digits (telephone speech).

...read moreread less

Journal Article•DOI•

Human speaker recognition performance of LPC voice processors

[...]

Z. Uzdy¹•Institutions (1)

The Aerospace Corporation¹

01 Jun 1985-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: An experimental investigation to determine the human speaker recognition performance of LPC voice processors indicates the importance of high-frequency data bandwidth for speaker recognition.

...read moreread less

Abstract: Immediate identification of speakers' voices can be highly important to efficient communication in certain applications. This correspondence describes an experimental investigation to determine the human speaker recognition performance of LPC voice processors. A small group of coworkers were used as the test subjects. The test results indicate the importance of high-frequency data bandwidth for speaker recognition.

...read moreread less

Study to determine potential flight applications and human factors design guidelines for voice recognition and synthesis systems

[...]

R. W. White¹, D. L. Parks•Institutions (1)

Boeing Commercial Airplanes¹

01 Jul 1985

TL;DR: A study was conducted to determine potential commercial aircraft flight deck applications and implementation guidelines for voice recognition and synthesis as mentioned in this paper, and the potential voice recognition applications fell into five general categories: programming, interrogation, data entry, switch and mode selection, and continuous/time-critical action control.

...read moreread less

Abstract: A study was conducted to determine potential commercial aircraft flight deck applications and implementation guidelines for voice recognition and synthesis At first, a survey of voice recognition and synthesis technology was undertaken to develop a working knowledge base Then, numerous potential aircraft and simulator flight deck voice applications were identified and each proposed application was rated on a number of criteria in order to achieve an overall payoff rating The potential voice recognition applications fell into five general categories: programming, interrogation, data entry, switch and mode selection, and continuous/time-critical action control The ratings of the first three categories showed the most promise of being beneficial to flight deck operations Possible applications of voice synthesis systems were categorized as automatic or pilot selectable and many were rated as being potentially beneficial In addition, voice system implementation guidelines and pertinent performance criteria are proposed Finally, the findings of this study are compared with those made in a recent NASA study of a 1995 transport concept

...read moreread less

Proceedings Article•DOI•

Pattern Recognition Through Dynamic Programming

[...]

B. Burg, Ph. Missakian, B. Zavidovique

19 Dec 1985

TL;DR: A speech recognition time warping algorithm is adapted to picture analysis to recognize patterns despite variations in scale and orientation so that objects may be recognized regardless of whether they are embedded in other parts or they are distorted.

...read moreread less

Abstract: The aim of this study is to adapt a speech recognition time warping algorithm to picture analysis. Our goal is to recognize patterns despite variations in scale and orientation. We may recognize objects regardless of whether they are embedded in other parts or they are distorted. The programs input real pictures, extract the contours and then encode and compare them to a pattern dictionary. The computer time is particularly short for such a recognition process.

...read moreread less

Journal Article•DOI•

Voice-excited predictive coder (VEPC) implementation on a high-performance signal processor

[...]

Claude Galand¹, C. Couturier¹, Guy Platel¹, Robert Vermot-Gauchy¹•Institutions (1)

IBM¹

01 Mar 1985-Ibm Journal of Research and Development

TL;DR: Quality evaluation tests are reported which show that this type of coder, operating at 7.2 kbps, allows the transmission of telephone speech with communications quality and is a good candidate for telephony applications such as digital trunk transmissions, satellite speech communications, secure voice communications, and audio distribution systems.

...read moreread less

Abstract: In this paper, we discuss the implementation of a medium-bit-rate linear prediction baseband coder on an IBM bipolar signal processor prototype having a high processing capacity. We show that the implementation of our algorithm requires a processing load of 5 MIPS, with a program size of 5K instructions. We then discuss the application of our coder in a normal telephone environment, which requires mu-law to linear PCM conversion and other signal processing functions such as voice activity detection, automatic gain control, echo control, and error recovery. Quality evaluation tests are also reported which show that this type of coder, operating at 7.2 kbps, allows the transmission of telephone speech with communications quality. Moreover, obtained intelligibility scores and speaker recognition levels are high enough to demonstrate that this coder is a good candidate for telephony applications such as digital trunk transmissions, satellite speech communications, secure voice communications, and audio distribution systems.

...read moreread less

Book Chapter•DOI•

15 Speech recognition using LPC distance measures

[...]

P.J. Thomson, P. de Souza

01 Jan 1985-Handbook of Statistics

Journal Article•DOI•

Speaker‐independent recognition of isolated digits using a weighted cepstral distance

[...]

Yoh’ichi Tohkura

01 Apr 1985-Journal of the Acoustical Society of America

TL;DR: In this article, a weighted cepstral distance measure using LPC derived cepstrum coefficient variability was tested in a speaker-independent English digit recognition system using standard DTW alignment techniques.

...read moreread less

Abstract: The cepstral distance has been one of the most efficient spectral distance measures in speech and speaker recognition [S. Furui, IEEE Trans. Acoust. Speech Signal Process. ASSP‐29, 254–272 (1981)]. A new weighted cepstral distance measure using LPC derived cepstrum coefficient variability was tested in a speaker‐independent English digit recognition system using standard DTW alignment techniques [L. R. Rubiner, S. E. Levinson, A. E. Rosenberg, and J. G. Wilpon, IEEE Trans. Acoust. Speech Signal Process. ASSP‐27, 134–141 (1979)]. The results show a recognition accuracy of > 99% for the digits [K. L. Shipley. A. E. Rosenberg, and D. E. Bock, J. Acoust. Soc. Am. Suppl. 1 72, S80 (1982)]. Recognition results using the same data base and the log likelihood LPC distance are about 97.4%. Hence there is a large improvement in performance with the new weighted cepstral distance.

...read moreread less

Proceedings Article•DOI•

Word boundary detection and speech recognition of noisy speech by means of iterative noise cancellation techniques

[...]

Min-In Chung, W. Kushner, J. Damoulakis

01 Apr 1985

TL;DR: This paper describes the approach for enhancing the performance of a speaker-dependent, discrete word recognition system in a noisy environment by means of cepstral subtraction techniques applied iteratively which results in a significant improvement in speech recognition accuracy.

...read moreread less

Abstract: This paper describes the approach for enhancing the performance of a speaker-dependent, discrete word recognition system in a noisy environment by means of cepstral subtraction techniques applied iteratively. A series of experiments have shown that these iterative methods provide enhanced performance in the word boundary detector which results in a significant improvement in speech recognition accuracy.

...read moreread less

Patent•

Keyword recognition system using template-concatenation model

[...]

Higgins Alan Lawrence, Robert E. Wohlford, Lawrence G. Bahler

27 Sep 1985

TL;DR: This article used a general language model to evaluate both the keyword hypothesis and the alternative hypothesis that the observed speech is not a keyword, and used concatenations of a set of filler templates.

...read moreread less

Abstract: A system employing a method that detects the occurrence of keywords in continuously-spoken speech evaluates both the keyword hypothesis, and the alternative hypothesis that the observed speech is not a keyword. A general language model is used to evaluate the latter hypothesis. Arbitrary utterances of the language, according to this model, are approximated by concatenations of a set of filler templates. The system allows for automatic detection of the occurrence of keywords in unrestricted natural speech. The system can be trained by a particular speaker, or can function independently of the speaker.

...read moreread less

Patent•

Voice control system for car

[...]

Matsunami Masahito, Iwazawa Toshiyuki, Miyamoto Masahito

11 Mar 1985

TL;DR: In this article, the output from a voice recognition system is synthesized by a voice synthesizer into a voice corresponding with the switch code signal and announced through a speaker to confirm the operator of the command.

...read moreread less

Abstract: PURPOSE:To control each load easily through voice, by providing means for recognized word, means for controlling the load on the basis of the results of voice recognition and means for identifying only the voice of driver. CONSTITUTION:Voice for control is produced by a driver through a microphone 1 for voice recognition sensor provided in the cabin and recognized by a voice recognition system 2. The output from said system 2 is synthesized by a voice synthesizer 3 into a voice corresponding with the switch code signal and announced through a speaker 4 to confirm the operator of the command. Alternatively, it is converted by a display drive 5 into a signal such as character and displayed 6. If there is no error in the command, each load is controlled by an operating system controller 7 and a non-operating system controller 8 after elapsing predetermined time. Tone of operator has been stored in a voice identifier to prevent control of the operating system load through the voice of other than the operator.

...read moreread less