Showing papers on "Speaker recognition published in 1984"

PDF

Open Access

Proceedings Article•DOI•

A database for speaker-independent digit recognition

[...]

R. Leonard¹•Institutions (1)

19 Mar 1984

TL;DR: A large speech database has been collected for use in designing and evaluating algorithms for speaker independent recognition of connected digit sequences and formal human listening tests on this database provided certification of the labelling of the digit sequences.

...read moreread less

Abstract: A large speech database has been collected for use in designing and evaluating algorithms for speaker independent recognition of connected digit sequences. This dialect balanced database consists of more than 25 thousand digit sequences spoken by over 300 men, women, and children. The data were collected in a quiet environment and digitized at 20 KHz. Formal human listening tests on this database provided certification of the labelling of the digit sequences, and also provided information about human recognition performance and the inherent recognizability of the data.

...read moreread less

599 citations

Automatic lipreading to enhance speech recognition (speech reading)

[...]

Eric David Petajan

01 Jan 1984

TL;DR: An automatic lipreading system which has been developed and the combination of the acoustic and visual recognition candidates is shown to yield a final recognition accuracy which greatly exceeds the acoustic recognition accuracy alone.

...read moreread less

Abstract: Automatic recognition of the acoustic speech signal alone is inaccurate and computationally expensive. Additional sources of speech information, such as lipreading (or speechreading), should enhance automatic speech recognition, just as lipreading is used by humans to enhance speech recognition when the acoustic signal is degraded. This paper describes an automatic lipreading system which has been developed. A commercial device performs the acoustic speech recognition independently of the lipreading system. The recognition domain is restricted to isolated utterances and speaker dependent recognition. The speaker faces a solid state camera which sends digitized video to a minicomputer system with custom video processing hardware. The video data is sampled during an utterance and then reduced to a template consisting of visual speech parameter time sequences. The distances between the incoming template and all of the trained templates for each utterance in the vocabulary are computed and a visual recognition candidate is obtained. The combination of the acoustic and visual recognition candidates is shown to yield a final recognition accuracy which greatly exceeds the acoustic recognition accuracy alone. Practical considerations and the possible enhancement of speaker independent and continuous speech recognition systems are also discussed.

...read moreread less

389 citations

Journal Article•DOI•

Learning unfamiliar voices

[...]

Gordon E. Legge, Carla Grosmann, Christina M. Pieper

01 Apr 1984-Journal of Experimental Psychology: Learning, Memory and Cognition

TL;DR: In this article, the authors listen to a series of recorded voice samples obtained from unfamiliar speakers and were then given a two-alternative forced-choice recognition test, and found that voice learning was inferior to face learning.

...read moreread less

Abstract: Subjects listened to a series of recorded voice samples obtained from unfamiliar speakers and were then given a two-alternative forced-choice recognition test. Recognition performance improved when the voice-sample duration was increased from 6 to 60 s, when the target set size was reduced from 20 to 5 voices, and when slides of faces provided context information. Recognition performance was not significantly different for retention intervals of 15 min and 10 days. For the conditions of our experiments, voice learning was inferior to face learning.

...read moreread less

93 citations

Patent•DOI•

Speech recognition apparatus

[...]

Sadakazu C¹, O Patent Division Watanabe, Teruhiko C, O Patent Division Ukita•Institutions (1)

Toshiba¹

10 May 1984-Journal of the Acoustical Society of America

TL;DR: In this paper, a speech recognition apparatus includes a speech signal analyzing circuit for time-sequentially generating acoustic parameter patterns representing the phonetic features of speech signals, and phoneme reference memories each storing a plurality of reference parameter pattern vectors.

...read moreread less

Abstract: A speech recognition apparatus includes a speech signal analyzing circuit for time-sequentially generating acoustic parameter patterns representing the phonetic features of speech signals, and phoneme reference memories each storing a plurality of reference parameter pattern vectors. A phoneme pattern vector from the speech signal analyzing circuit is compared with each of the reference pattern vectors stored in the phoneme reference memories in order to recognize an input speech. The speech signal analyzing circuit has a parameter extraction circuit for time-sequentially extracting acoustic parameter patterns representing the speech signal, a first phoneme pattern vector memory for storing a phoneme pattern vector including an acoustic parameter pattern of each frame from the parameter extraction circuit, and a second phoneme pattern vector memory for storing a phoneme pattern vector including a plurality of parameter patterns from the parameter extraction circuit.

...read moreread less

69 citations

Patent•

Text-independent speaker recognition system and method based on acoustic segment matching

[...]

Higgins Alan Lawrence, Robert E. Wohlford

31 Dec 1984

TL;DR: In this paper, a method and system for speaker enrollment, as well as for speaker recognition, is described, where each candidate speaker is assigned a set of short acoustic segments of phonemic duration.

...read moreread less

Abstract: The invention provides a method and system for speaker enrollment, as well as for speaker recognition. Speaker enrollment creates for each candidate speaker a set of short acoustic segments, or templates, of phonemic duration. An equal number of templates is derived from every candidate speaker's training utterance. A speaker's template set serves as a model for that speaker. Recognition is accomplished by employing a continuous speech recognition (CSR) system to match the recognition utterance with each speaker's template set in turn. The system selects the speaker whose templates match the recognition utterance most closely, that is, the speaker whose CSR match score is lowest. The method of the invention incorporates the entire training utterance in each speaker model, and explains the entire test utterance. The method of the invention models individual short segments of the speech utterances as well as their long-term statistics. Both static and dynamic speaker characteristics are captured in the speaker models.

...read moreread less

28 citations

Proceedings Article•DOI•

Automatic diphone bootstrapping for speaker-adaptive continuous speech recognition

[...]

A. Colla, D. Sciarra

01 Mar 1984

TL;DR: The paper describes an automatic method, called Automatic Diphone Bootstrapping (or A.D.R.B.B.), for template extraction for Speaker-Adaptive Continuous Speech Recognition using "diphones" as speech units, which operates without any manual intervention and performed very well for all the speakers on which it was tested.

...read moreread less

Abstract: The paper describes an automatic method, called Automatic Diphone Bootstrapping (or A.D.B.), for template extraction for Speaker-Adaptive Continuous Speech Recognition using "diphones" as speech units. Diphones have proved to be very suitable for C.S.R. as they meet the main requirements of phonetic units: invariance with the context and economy. Furthermore the performance of diphone-based speaker dependent C.S.R. systems is very high. For a long time manual extraction has been presented in the literature as the only completely reliable method for sub-word template creation for any speaker (see [1] as an example). Recently some automatic techniques for reference pattern extraction were developed [2,3], but they also require some manual corrections. The A.D.B. procedure operates without any manual intervention and performed very well for all the speakers on which it was tested. In a connected digit recognition task, a W.R.R. of 98.79% was achieved by using the speaker-adaptive templates created by the A.D.B. procedure.

...read moreread less

14 citations

Proceedings Article•DOI•

Time alignment of natural speech to synthetic speech

[...]

M. Hunt¹•Institutions (1)

National Research Council¹

19 Mar 1984

TL;DR: An artificial speech recognition experiment is introduced as a convenient means of assessing alignment accuracy, and alignment accuracy is found to be improved considerably by applying certain speaker adaptation transformations to the synthetic speech.

...read moreread less

Abstract: A capacity to carry out reliable automatic time alignment of synthetic speech to naturally produced speech offers potential benfits in speech recognition and speaker recognition as well as in synthesis itself. Phrase alignment experiments are described that indicate that alignment to synthetic speech is more difficult than alignment of speech from two natural speakers. An artificial speech recognition experiment is introduced as a convenient means of assessing alignment accuracy. By this measure, alignment accuracy is found to be improved considerably by applying certain speaker adaptation transformations to the synthetic speech, by modifying the spectrum similarity metric, and by generating the synthetic spectra directly from the control parameters using simplified excitation spectra. The improvements seem to limit, however, at a level below that found between natural speakers. It is conjectured that further improvement requires modifications to the synthesis rules themselves.

...read moreread less

13 citations

Journal Article•DOI•

Pattern compression in isolated word recognition

[...]

R. Pieraccini¹•Institutions (1)

CSELT¹

01 Sep 1984-Signal Processing

TL;DR: In this work three different pattern compression techniques are compared on the basis of efficiency as well as recognition performance when applied to pattern matching by means of dynamic programming in a speaker dependent context.

...read moreread less

10 citations

Proceedings Article•DOI•

Some experiments in automatic recognition of a thousand word vocabulary

[...]

J. Mari, J. Haton

01 Mar 1984

TL;DR: The basic idea is to reduce the number of word candidates for the recognition by looking for robust phonetic features computed from the input signal, and it is possible to design a multiprocessor structure in order to reduced the overall recognition time.

...read moreread less

Abstract: Our group has been designing for the past twelve years several speech recognition systems, from isolated vocabulary pattern matching systems to continuous speech understanding systems. The experiments we carried out showed us that the systems designed for restricted vocabularies task were not readily extensible to large vocabularies. We therefore started some years ago implementing a 200 word recognition system using a phonetic approach. This system was tested successfully in 1980. In continuation of this research we decided to extend our approach to a 1000 word vocabulary. This paper describes the principles involved in this system together with the preliminary results already obtained. The basic idea is to reduce the number of word candidates for the recognition by looking for robust phonetic features computed from the input signal. These features are used as a key for accessing the lexicon. Since the determination of the features is carried out in parallel with the phonetic decoding of the input word, it is possible to design a multiprocessor structure in order to reduce the overall recognition time. The determination of crude phonetic features is described together with the organization of the lexicon. Some preliminary results are finally presented and discussed.

...read moreread less

8 citations

Proceedings Article•DOI•

Speaker-independent connected digit recognition

[...]

Nobuo Hataoka¹, Yoshiaki Asakawa, Akio Komatsu, Akira Ichikawa•Institutions (1)

Hitachi¹

01 Mar 1984

TL;DR: An algorithm for speaker-independent connected digit recognition for telephone use, and its experimental results are described, which shows the average correct recognition score to be 94% for each Japanese digit in their connected utterances through actual telephone lines.

...read moreread less

Abstract: An algorithm for speaker-independent connected digit recognition for telephone use, and its experimental results are described The main features of this algorithm are the use of multiple reference templates assigned to each speaker class, a continuous DP matching process for word spotting, and partial reference templates to confirm spotted digits The K-nearest neighbor decision rule and pair-comparison judgement are used to obtain the final result from spotted digit sequences Experimental results show the average correct recognition score to be 94% for each Japanese digit in their connected utterances through actual telephone lines

...read moreread less

8 citations

Proceedings Article•DOI•

Automatic recognition of spoken words from a large vocabulary using syllable templates

[...]

Hidehiko Fujisaki¹, K. Hirose, T. Inoue, Y. Sato•Institutions (1)

University of Tokyo¹

01 Mar 1984

TL;DR: A system is proposed for automatic speech recognition using syllable templates that is based on an overall likelihood measure calculated for each item stored in the lexicon.

...read moreread less

Abstract: A system is proposed for automatic speech recognition using syllable templates. In this system, input speech signal is analyzed and matched against syllable templates and converted into parameters characterizing each candidate syllable. Word recognition is based on an overall likelihood measure calculated for each item stored in the lexicon. A method is also developed for the optimization of syllable templates. The validity of the proposed method was tested in a preliminary recognition experiment in which a lexicon consisting of 1000 city names was used to recognize utterances of 100 city names by a female speaker. The rate of correct recognition was 96.5%.

...read moreread less

Proceedings Article•DOI•

A speaker independent word recognition system based on phoneme recognition for a large size (212 words) vocabulary

[...]

Shozo Makino¹, K. Kido•Institutions (1)

Tohoku University¹

01 Mar 1984

TL;DR: This paper describes the speaker-independent spoken word recognition system for a large size vocabulary and results are obtained for the training samples in the 212 words uttered by 10 male and 10 female speakers.

...read moreread less

Abstract: This paper describes the speaker-independent spoken word recognition system for a large size vocabulary. Speech is analyzed by the filter bank, from whose logarithmic spectrum the 11 features are extracted every 10 ms. Using the features the speech is first segmented and the primary phoneme recognition is carried out for every segment using the Bayes decision method. After correcting errors in segmentation and phoneme recognition, the secondary recognition of part of the consonants is carried out and the phonemic sequence is determined. The word dictionary item having maximum likelihood to the sequence is chosen as the recognition output. The 75.9% score for the phoneme recognition and the 92.4% score for the word recognition are obtained for the training samples in the 212 words uttered by 10 male and 10 female speakers. For the same words uttered by 30 male and 20 female speakers different from the above speakers, the 88.1% word recognition score is obtained.

...read moreread less

Proceedings Article•DOI•

A speaker recognizability test

[...]

P. Papamichalis¹, George R. Doddington•Institutions (1)

Texas Instruments¹

01 Mar 1984

TL;DR: A Speaker Recognizability Test (SRT) is presented, which tries to establish how well a given communications system preserves a speaker's identity.

...read moreread less

Abstract: Speech intelligibility and quality are the two most often tested features of speech coding systems. However, another feature of interest in store-and-forward applications is the preservation of a speaker's identity. Here, a Speaker Recognizability Test (SRT) is presented, which tries to establish how well a given communications system preserves a speaker's identity. Contrary to previous efforts, no attempt is made to identify the cues used by listeners for speaker recognition. Instead, listeners are asked directly to identify a speaker who says an utterance by comparing the uttered sentence with reference sentences, one from each speaker. Among the issues considered in the design of the test is the choice of speakers, the use of reference sentences from the same or different sessions of data collection, and the use of processed or unprocessed speech for reference.

...read moreread less

Proceedings Article•DOI•

Phrase speech recognition of large vocabulary using feature in articulatory domain

[...]

Katsuhiko Shirai¹, Tetsunori Kobayashi¹•Institutions (1)

Waseda University¹

19 Mar 1984

TL;DR: A phrase unit speech recognition system is discussed, which is applicable for a large vocabulary and is independent of the task, and a technique to recognize phrases based on the phoneme recognition is introduced.

...read moreread less

Abstract: A phrase unit speech recognition system is discussed, which is applicable for a large vocabulary and is independent of the task. In the case of large vocabulary, it is desirable to express the words in the dictionary by the sequence of phonemes or phoneme-like units. Therefore, the recognition of phonemes in continuous speech is essential to achieve a flexible speech understanding system. In this paper, a technique to recognize phrases based on the phoneme recognition is introduced. The system is composed of the phoneme recognition part and the phrase recognition part. In the phoneme recognition part, the features in the articulatory domain are extracted and applied to compensate coarticulation. In the phrase recognition part, a word sequence corresponding to the phoneme sequence is determined by using two-level DP matching with automaton control, in which words are processed symbolically to attain the acceptable processing speed.

...read moreread less

Journal Article•DOI•

Statistical clustering procedures applied to low-cost speech recognition.

[...]

Robert I. Damper¹, S.L. MacDonald¹•Institutions (1)

University of Southampton¹

01 Oct 1984-Journal of Biomedical Engineering

TL;DR: Word classification based on the Mahalonobis distance metric, and using templates derived from cluster analysis of the training inputs, was found to give results superior to the other strategies studied, and the principle of clustering was successfully applied to produce an adaptive system which tracked changes in the user's voice.

...read moreread less

Proceedings Article•DOI•

Unsupervised adaptation to new speakers in feature-based letter recognition

[...]

Moshe J. Lasry¹, Richard M. Stern•Institutions (1)

Carnegie Mellon University¹

19 Mar 1984

TL;DR: Two new methods by which the CMU feature-based recognition system can learn the acoustical characteristics of individual speakers without feedback from the user are described.

...read moreread less

Abstract: This paper describes two new methods by which the CMU feature-based recognition system can learn the acoustical characteristics of individual speakers without feedback from the user. We have previously described how the system uses MAP techniques to update its estimates of the mean values of features used by the classifier in recognizing the letters of the English alphabet on the basis of a priori information and labelled observations. In the first of the new procedures described in this paper the system assumes a correct decision every time it classifies a new utterance with a sufficiently high confidence level. In the second new procedure the system adjusts its estimates of the means on the basis of their correlation with the average values of the features over all utterances. Experiments were conducted on two confusable sets of letters using both speaker adaptation procedures. In each case classification performance using the unsupervised estimation procedures could equal that obtained using speaker adaptation with feedback from the user, although which method provided the better performance depended on which set of letters was being classified.

...read moreread less

Proceedings Article•DOI•

Speaker adaptation in large-vocabulary voice recognition

[...]

Y. Kijima¹, Y. Nara, A. Kobayashi, S. Kimura•Institutions (1)

Fujitsu¹

01 Mar 1984

TL;DR: A speaker adaptation method that follows two steps -- selection of "persons" who have voices similar to the user's and generation of a speaker-adapted dictionary from their dictionaries is studied.

...read moreread less

Abstract: A speaker-trained voice recognition system with a large vocabulary has a serious weak point, that is, the user must register a large number of words prior to its use. To be freed from this problem, the authors have studied a speaker adaptation method. This method follows two steps -- 1) selection of "persons" who have voices similar to the user's and 2) generation of a speaker-adapted dictionary from their dictionaries. Results of simulation using 1000-word speech samples by 40 male speakers (20 for standard dictionaries and 20 for performance evaluation) are reported. The results indicated the advantage of this method. The speaker-trained dictionary gave 90.1% recognition accuracy, the speaker-independent dictionary gave 83.6%, and the speaker-adapted dictionary which required only 10% of the vocabulary for training gave 85.7%.

...read moreread less

Proceedings Article•

A system of plans for connected speech recognition

[...]

Renato De Mori¹, Yu F. Mong¹•Institutions (1)

Concordia University Wisconsin¹

06 Aug 1984

TL;DR: A planning system for recognizing connected letters is described and some preliminary experimental results are reported.

...read moreread less

Abstract: A planning system for recognizing connected letters is described and some preliminary experimental results are reported.

...read moreread less

Patent•

Input voice identification method and apparatus

[...]

Okada Kazuo

28 Jan 1984

A speaker independent word recognition system based on phoneme recognition

[...]

Shozo Makino

01 Jan 1984

TL;DR: In this paper, a speaker-independent spoken word recognition system for a large size vocabulary is described, in which speech is analyzed by the filter bank, from whose logarithmic spectrum the 11 features are extracted every 10 ms.

...read moreread less

Proceedings Article•

A speaker independent word recognition system based on phoneme recognition for a large size (212 words) vocabulary

[...]

正三牧野

01 Jan 1984

Proceedings Article•DOI•

Voice excited predictive coder (VEPC) implementation on 10 MIPS signal processor

[...]

Claude Galand¹, C. Couturier, Guy Platel, Robert Vermot-Gauchy•Institutions (1)

IBM¹

01 Mar 1984

TL;DR: It is shown that this type of coder operating at 7.2 kbps, provides a good communications quality, an intelligibility which is sufficient for most of telephony applications, and a perfect speaker recognition (natural voice).

...read moreread less

Abstract: In this paper, we discuss the Implementation of a low bit-rate linear prediction base-band coder on a bipolar signal processor having a processing capacity of 10 millions of instructions per second (MIPS). We show that the implementation of our algorithm requires less than 5 MIPS, with a ROS occupancy less than 5 K instructions. Some quality evaluation tests are also reported, and show that this type of coder operating at 7.2 kbps, provides a good communications quality, an intelligibility which is sufficient for most of telephony applications, and a perfect speaker recognition (natural voice).

...read moreread less

Proceedings Article•DOI•

Cost-effective speech processing

[...]

Janet M. Baker, R. Roth, P. Bamberg

01 Mar 1984

TL;DR: The Mark II system provides both speaker dependent and multiple speaker recognition of up to a 32 isolated word active vocabulary in real-time on a 2 MHz 6502, with no custom hardware, except an inexpensive microphone, pre-amp, and 8-bit A/D converter.

...read moreread less

Abstract: Recent developments have made it possible to implement high performance speech recognition with much less computation than traditional techniques, thereby enabling real-time computation on standard microprocessors. Concepts such as time-domain acoustic-phonetic speech signal- processing as well as efficient adaptations of hidden Markov models can provide this type of capability. The Mark II system provides both speaker dependent and multiple speaker recognition of up to a 32 isolated word active vocabulary in real-time on a 2 MHz 6502, with no custom hardware, except an inexpensive microphone, pre-amp, and 8-bit A/D converter. On an initial test of 5120 test utterances (Texas Instruments isolated word data base, Spectrum, Sept., 1981), the Mark II achieved an error rate of only 0.67% (34 errors).

...read moreread less

Journal Article•DOI•

Speaker-independent word recognition in connected speech on the basis of Phoneme recognition

[...]

Kiyoshi Maenobu¹, Yasuo Ariki¹, Toshiyuki Sakai¹•Institutions (1)

Kyoto University¹

31 Jul 1984-Information Sciences

TL;DR: A method of speaker-independent connected-word recognition by robust segmentation for speaker variation by varying the matching path adaptively with respect to each phoneme, at the dynamic-programming word-matching level is proposed.

...read moreread less

Proceedings Article•DOI•

Operational evaluation modeling of automatic speaker verification systems

[...]

David E. Crabbs, John R. Clymer

01 Jan 1984

TL;DR: This study uses operational evaluation techniques to model a system which processes human speech to verify the identity persons seeking access to a facility resource and decides whether the speaker is valid or imposter based on the degree of similarity observed.

...read moreread less

Abstract: This study uses operational evaluation techniques to model a system which processes human speech to verify the identity persons seeking access to a facility resource. The system consists of hardware and software for accepting analog speech; extracting time, frequency, and amplitude characteristics; producing compact digital templates containing the features for speaker identification; and cross-referencing templates with reference patterns establish the degree of similarity between utterence and a set of utterences for the person whose identity is being claimed. decision algorithm is implemented determine whether the speaker is valid or imposter based on the degree of similarity observed.A conceptual model has been tested and used to simulate variations in system attributes in order to optimize system performance. Performance is evaluated in terms of number of imposters who can defeat system, and the number of rejected valid speakers.

...read moreread less

Text-independent speaker recognition.

[...]

N. Mohankrishnan

01 Jan 1984

Patent•

Dictionary learning system for voice recognition

[...]

Yoichi Takebayashi¹, Hidenori Shinoda¹•Institutions (1)

Toshiba¹

02 Oct 1984

TL;DR: In this paper, a plurality of speech feature vectors are generated from the time series of the speech feature parameter for the input speech pattern, by taking account of knowledge concerning the variation tendencies of speech patterns, and the learning (preparation) of a reference pattern vectors for speech recognition is carried out by the use of these feature vectors thus generated.

...read moreread less

Abstract: In the learning method of reference pattern vectors for speech recognition in accordance with the present invention, a plurality of speech feature vectors are generated (block 20) from the time series of speech feature parameter for the input speech pattern, by taking account of knowledge concerning the variation tendencies of the speech patterns, and the learning (preparation) of a reference pattern vectors for speech recognition is carried out (block 22) by the use of these speech feature vectors thus generated. In particular, the method according to the present invention will become effective when it is combined with a statistical pattern recognition method that can absorb wide variations in the speech patterns.

...read moreread less

Journal Article•DOI•

Review: The phonetic bases of speaker recognition by F.J. Nolan

[...]

Peter Ladefoged¹•Institutions (1)

University of California, Los Angeles¹

01 Jan 1984-Journal of Phonetics

Identification as a function of familiarity for known voices talking over an unprocessed channel and an LPC (Linear Predictive Coding) voice processor

[...]

A. Schmidt-Nielsen, K. R. Stern

16 Jul 1984

TL;DR: A commonly cited drawback of narrowband systems such as the DoD standard linear predictive coding (LPC) algorithm is that speaker recognition is poor, yet it is the opinion of many users that they frequently recognize the speaker.

...read moreread less

Abstract: : A commonly cited drawback of narrowband systems such as the DoD standard linear predictive coding (LPC) algorithm is that speaker recognition is poor Yet it is the opinion of many users that they frequently recognize the speaker Tape recordings of 24 speakers conversing over an unprocessed channel and over an LPC voice processing system were subjected to listening tests Twenty four co workers listened to the tapes and attempted to identify each speaker from a list of about 40 people in the same branch Prior to the recognition tests, each of the listeners also rated his or her familiarity with each of the speakers and the distinctiveness of each speaker's voice There was some loss in voice recognition over LPC, but the recognition rate was still quite high Unprocessed voices were correctly identified 88% of the time, whereas the same people talking over the LPC system were correctly identified 69% of the time Talker familiarity was significantly correlated with correct identifications There was no significant correlation between the rated distinctiveness of the speaker and correct identifications However, familiarity and distinctiveness ratings were highly correlated

...read moreread less