Showing papers on "Speaker recognition published in 1982"

PDF

Open Access

Book Chapter•DOI•

25 Continuous speech recognition: Statistical methods

[...]

Frederick Jelinek, Robert Leroy Mercer, Lalit R. Bahl

01 Jan 1982-Handbook of Statistics

TL;DR: This chapter focuses on Continuous Speech Recognition (CSR) and summarizes acoustic processing techniques and describes an elegant linguistic decoder based on dynamic programming that is practical under certain conditions.

...read moreread less

Abstract: Publisher Summary Speech recognition research can be distinguished into three areas: isolated word recognition where words are separated by distinct pauses; continuous speech recognition where sentences are produced continuously in a natural manner; and speech understanding where the aim is not transcription but understanding in the sense that the system responds correctly to a spoken instruction or request. This chapter focuses on Continuous Speech Recognition (CSR) and summarizes acoustic processing techniques. The Markov models of speech processes are introduced in the chapter and it describes an elegant linguistic decoder based on dynamic programming that is practical under certain conditions. The the practical aspects of the sentence hypothesis search conducted by the linguistic decoder is discussed in the chapter and it introduces algorithms for extracting model parameter values automatically from the data. The methods of assessing the performance of the CSR systems and the relative difficulty of recognition tasks are discussed. The chapter illustrates the capabilities of present recognition systems by describing the results of certain recognition experiments.

...read moreread less

66 citations

Journal Article•DOI•

The categorization of male and female voices in infancy

[...]

Cynthia L. Miller¹, Barbara A. Younger², Philip A. Morse³•Institutions (3)

University of Western Ontario¹, University of Texas at Austin², University of Wisconsin-Madison³

01 Jan 1982-Infant Behavior & Development

TL;DR: In this article, a set of studies explored the nature of the 7-month-old infant's perception of human voices and found that infants learned to respond discriminatively to groups of male vs. female voices.

...read moreread less

Abstract: The present set of studies explored the nature of the 7-month-old infant's perception of human voices. In Experiment I, infants learned to respond discriminatively to groups of male vs. female voices. That this was evidence of male/female categorization was supported in Experiment II, in which it was shown that infants did not learn to respond discriminatively to the same voices when they were randomly organized into “categories” containing both male and female voices. The extent to which fundamental frequency may have contributed to this male/female classification was investigated in Experiment III. The combined results of these three studies suggested that, although pitch is possibly one cue to which infants are attending when classifying these voices, it could not account fully for this ability. It remains for future research to identify other cues which may contribute to male/female categorization, as well as to investigate the developmental course of speaker recognition and classification in general.

...read moreread less

45 citations

Patent•DOI•

Method and apparatus for text-independent speaker recognition

[...]

Kung-Pu Li, Edwin H. Wrench

03 Nov 1982-Journal of the Acoustical Society of America

TL;DR: In this article, a method and apparatus for recognizing an unknown speaker from a plurality of speaker candidates is presented, where portions of speech from the speaker candidates and from the unknown speaker are sampled and digitized.

...read moreread less

Abstract: A method and apparatus for recognizing an unknown speaker from a plurality of speaker candidates. Portions of speech from the speaker candidates and from the unknown speaker are sampled and digitized. The digitized samples are converted into frames of speech, each frame representing a point in an LPC-12 multi-dimensional speech space. Using a character covering algorithm, a set of frames of speech is selected, called characters, from the frames of speech of all speaker candidates. The speaker candidates' portions of speech are divided into smaller portions called segments. A smaller plurality of model characters for each speaker candidate is selected from the character set. For each set of model characters the distance from each speaker candidate's frame of speech to the closest character in the model set is determined and stored in a model histogram. When a model histogram is completed for a segment a distance D is found whereby at least a majority of frames have distances greater D. The mean distance value of D and variance across all segments for both speaker and imposter is then calculated. These values are added to the set of model characters to form the speaker model. To perform recognition the frames of the unknown speaker as they are received are buffered and compared with the sets of model characters to form model histograms for each speaker. A likelihood ratio is formed. The speaker candidate with the highest likelihood ratio is chosen as the unknown speaker.

...read moreread less

44 citations

Proceedings Article•DOI•

The application of probability density estimation to text-independent speaker identification

[...]

Richard Schwartz¹, S. Roucos, M. Berouti•Institutions (1)

BBN Technologies¹

01 May 1982

TL;DR: This paper develops the use of probability density function (pdf) estimation for text-independent speaker identification and compares the performance of two parametric and one non-parametric pdf estimation methods to one distance classification method that uses the Mahalanobis distance.

...read moreread less

Abstract: Most text-independent speaker identification methods to date depend on the use of some distance metric for classification. In this paper we develop the use of probability density function (pdf) estimation for text-independent speaker identification. We compare the performance of two parametric and one non-parametric pdf estimation methods to one distance classification method that uses the Mahalanobis distance. Under all conditions tested, the pdf estimation methods performed substantially better than the Mahalanobis distance method. The best method is a non-parametric pdf estimation method.

...read moreread less

35 citations

Journal Article•DOI•

Speaker independent connected word recognition using a syntax-directed dynamic programming procedure

[...]

C. Myers¹, Stephen E. Levinson²•Institutions (2)

Massachusetts Institute of Technology¹, Alcatel-Lucent²

01 Aug 1982-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: A method for speaker independent connected word recognition is described, based on a syntax-directed dynamic programming algorithm which matches the isolated word templates to sentence length utterances of a 100 speaker population.

...read moreread less

Abstract: A method for speaker independent connected word recognition is described. Speaker independence is achieved by clustering isolated word utterances of a 100 speaker population. Connected word recognition is based on a syntax-directed dynamic programming algorithm which matches the isolated word templates to sentence length utterances. The method has been tested on an artificial task-oriented language based on a 127 word vocabulary. Four subjects, two men and two women, spoke a total of 209 sentences comprising 1750 words. At an average speaking rate of 171 words/min over dialed-up telephone lines, a correct word recognition rate of 97 percent was observed.

...read moreread less

27 citations

Proceedings Article•DOI•

A comparison of learning techniques in speech recognition

[...]

Gary L. Bradshaw¹, R. Cole, Zongge Li•Institutions (1)

Carnegie Mellon University¹

01 May 1982

TL;DR: This paper examines several template-based recognition techniques using isolated utterances and highly ambiguous vocabularies in a speaker-dependent recognition system and concludes that a system which combined both featural and template information led to the best performance for six out of eight speakers.

...read moreread less

Abstract: Template-based recognition systems overcome errors in the short-term matching process by comparing whole sequences of acoustic events. In many vocabularies, each word has a highly distinctive sequence. Some vocabularies have confusable words with very similar sequences, leading to poor recognition performance. Improvements in discriminability among similar words may be achieved by altering the matching algorithm, or by improving the reference template set. Both techniques are instances of multi-exemplar learning techniques which improve recognition performance through automatic evaluation of training data. This paper examines several such techniques using isolated utterances and highly ambiguous vocabularies (e.g., the "E" set; 3 B C D E G P V T Z) in a speaker-dependent recognition system. A system which combined both featural and template information led to the best performance for six out of eight speakers. Using this technique, E-set error rates improved from 37% to 10%.

...read moreread less

18 citations

Patent•DOI•

Method of speech recognition

[...]

Mitsuhiro Hakaridani, Yoshiki Nishioka, Hiroyuki Iwahashi

21 Dec 1982-Journal of the Acoustical Society of America

TL;DR: In a speech recognition system, similarity calculations between speech feature patterns are reduced by stopping similarity calculations for any one reference pattern when a frame in the pattern fails to exceed a corresponding similarity threshold as discussed by the authors.

...read moreread less

Abstract: In a speech recognition system, similarity calculations between speech feature patterns are reduced by stopping similarity calculations for any one reference pattern when a frame in the pattern fails to exceed a corresponding similarity threshold.

...read moreread less

17 citations

Proceedings Article•DOI•

Discrete utterance speech recognition without time normalization

[...]

J. Shore¹, D. Burton•Institutions (1)

United States Naval Research Laboratory¹

01 May 1982

TL;DR: A new, fast method for discrete utterance recognition of telephone bandwidth speech that obviates time normalization and uses approximately 6000 bits to represent each utterance in the recognition vocabulary is presented.

...read moreread less

Abstract: We present a new, fast method for discrete utterance recognition of telephone bandwidth speech. The method is based on speech coding by vector quantization and minimum cross-entropy pattern classification. Separate vector quantization codebooks are designed from training sequences for each word in the recognition vocabulary. Inputs from outside the training sequence are classified by performing vector quantization and finding the codebook that achieves the lowest average distortion per speech frame. The new method obviates time normalization and uses approximately 6000 bits to represent each utterance in the recognition vocabulary. Preliminary limited testing on speaker dependent digit recognition has demonstrated excellent performance. Detailed tests are now in progress.

...read moreread less

16 citations

Proceedings Article•DOI•

Speaker adaptation by a linear transformation with optimised parameters

[...]

J. Jaschul¹•Institutions (1)

Technische Universität München¹

01 May 1982

TL;DR: A grouping of phoneme is proposed so that one adaptation parameter set is used for all phonemes that belong to any one group, and the cost of phoneme class-specific adaptation is very high, but the method needs a large learning set.

...read moreread less

Abstract: Speaker dependence of automatic speech recognition systems can be reduced by applying speaker-specific transformations to adapt the speech signal of a new speaker to that of the reference speaker. Initial investigations showed that speaker adaptation can be performed by transformations using spectral weighting and spectral warping. These heuristic methods can be substituted by a general linear matrix transformation, the parameters of which are determined by mean square error optimisation. The improvement of the recognition rate achievable by this matrix transformation is very high, but the method needs a large learning set. This can be reduced by restriction of the matrix to a band including the main diagonal in the middle. This banded matrix yields results close to those of the general matrix. Adaptation can be performed speaker-specifically as well as speaker- and class-specifically. As the cost of phoneme class-specific adaptation is very high, a grouping of phonemes is proposed so that one adaptation parameter set is used for all phonemes that belong to any one group.

...read moreread less

15 citations

Journal Article•DOI•

Text‐independent speaker recognition with short utterances

[...]

K. P. Li, E. H. Wrench

01 Nov 1982-Journal of the Acoustical Society of America

TL;DR: A new approach to text‐independent speaker recognition, developed to perform with short unknown utterances, models the spectral traits of a speaker with multiple sub‐models rather than using a single statistical distribution as done with previous approaches.

...read moreread less

Abstract: This paper presents a new approach to text‐independent speaker recognition. The technique, developed to perform with short unknown utterances, models the spectral traits of a speaker with multiple sub‐models rather than using a single statistical distribution as done with previous approaches. The recognition is based on the statistical distribution of the distances between the unknown speaker and each of the speaker models. Only frames that are close to one of the speaker's sub‐models are considered in the recognition decision, so that speech events not encountered in the training data do not bias the recognition. The technique has been tested on a conversational data base. Models were generated using 100 s of speech from each of 11 male talkers. Unknown speech was obtained one week after the model data. Recognition accuracies of 96%, 87%, and 79% were obtained for unknown speech durations of 10, 5, and 3 s, respectively. The use of multiple sub‐models to characterize spectral traits results in improved discrimination between speakers, particularly when short speech segments are recognized. [Work supported by U. S. Air Force, Rome Air Development Center.]

...read moreread less

12 citations

Journal Article•DOI•

Text-independent speaker recognition: A review and some new results

[...]

Malayappan Shridhar¹, N. Mohankrishnan¹•Institutions (1)

University of Windsor¹

01 Dec 1982-Speech Communication

TL;DR: The development of a high accuracy (about 99%) text-independent speaker recognition system is discussed in this paper and any two parameter sets of the first stage tests are combines logically to obtain a significantly higher recognition accuracy than is possible with any single-speaker-sensitive parameter set.

...read moreread less

Book•DOI•

Automatic Speech Analysis and Recognition

[...]

Jean-Paul Haton

01 Jan 1982

Proceedings Article•DOI•

Phoneme recognition in continuous speech

[...]

Akio Komatsu¹, Akira Ichikawa¹, Kazuo Nakata¹, Yoshiaki Asakawa¹, H. Matsuzaka¹ - Show less +1 more•Institutions (1)

Hitachi¹

03 May 1982

TL;DR: An algorithm for phoneme recognition in continuous speech is presented, a continuous matching process is employed to bypass the segmentation problem and a hierarchical recognition algorithm is proposed to realize feasible matching in a real time.

...read moreread less

Abstract: An algorithm for phoneme recognition in continuous speech is presented. A continuous matching process is employed to bypass the segmentation problem. A large set of standard patterns is used to solve the allophonic variation problem. Also, a hierarchical recognition algorithm is proposed to realize feasible matching in a real time. In the first stage of the hierarchical recognition algorithm, vowels in speech are spotted. To optimize accuracy in vowel spotting, each standard pattern is carefully selected, constraints on the "phoneme chain" of continuous speech are utilized, and partial standard pattern matching is employed for detailed phoneme analysis. The second stage recognizes consonants between vowels. Experimental results show a 91% vowel recognition rate and 80% consonant recognition rate for a specified speaker.

...read moreread less

Proceedings Article•DOI•

Development of Japanese voice-activated word processor using isolated monosyllable recognition

[...]

T. Nitta¹, T. Murata, Harumi Tsuboi, Koichi Takeda, T. Kawada, S. Watanabe - Show less +2 more•Institutions (1)

Toshiba¹

01 May 1982

TL;DR: A newly developed voice-activated word processor and a two-stage recognition method to achieve a precise recognition of isolated monosyllables are described.

...read moreread less

Abstract: This paper describes a newly developed voice-activated word processor and a two-stage recognition method to achieve a precise recognition of isolated monosyllables At the first stage, the recognizer segments a monosyllable into an initial consonantal part and a final part (ie, the vowel region), and computes similarities between the input speech and orthonormal mode functions of each consonantal segment which is designed from multiple speakers using K-L expansion and adapted to a new speaker ( Adaptive Multiple Similarity Method) At the second stage, frame-by-frame similarity scores, extracted at the phoneme recognizer using Multiple Similarity Method, are applied to candidate monosyllables to make a final decision The average monosyllable recognition accuracy with six speakers was about 95%

...read moreread less

Proceedings Article•DOI•

Speaker recognition using a feature weighting technique

[...]

Hermann Ney¹, R. Gierloff•Institutions (1)

Philips¹

01 May 1982

TL;DR: The experiments indicate that feature weighting and feature selection can reduce the error rates by a factor of two or more both for speaker identification and speaker verification.

...read moreread less

Abstract: This paper describes a technique for increasing the ability of a text-dependent speaker recognition system to discriminate between speaker classes; this technique is to be performed in conjunction with the nonlinear time alignment between a reference pattern and a test pattern. Unlike the standard approach, where the training of the recognition system merely consists of storing and averaging or selecting the time normalized training patterns separately for each class, the training phase of the system is extended in that a weight is determined for each individual feature component of the complete reference pattern according to the ability of the feature to distinguish between speaker classes. The weights depend on the time axis as well as on the frequency axis. The overall distance computed after nonlinear time alignment between a reference pattern and a test pattern thus becomes a function of the given set of weights of the reference class considered. For each class, the optimum weights result from the ideal criterion of minimum error rate. Instead of this criterion, the closely related but mathematically more convenient Fisher criterion is used that leads to a closed from solution for the unknown weights. Based on these weights, the selection of subsets of effective features is studied in order to further improve the class discrimination. The feature weighting and selecting techniques are tested using a data base of utterances recorded off dialed-up telephone lines. The experiments indicate that feature weighting and feature selection can reduce the error rates by a factor of two or more both for speaker identification and speaker verification.

...read moreread less

Journal Article•DOI•

A Walsh-Hadamard Transform LSI for Speech Recognition

[...]

Hidefumi Ohga¹, Hidekazu Yabuuchi¹, Eiichi Tsuboka¹, Kazuaki Mayumi¹, Kozo Adachi, Osamu Nishijima - Show less +2 more•Institutions (1)

Panasonic¹

01 Aug 1982-IEEE Transactions on Consumer Electronics

TL;DR: A low cost speaker-dependent speech recognition unit using Walsh-Hadamard transform (WIT) and a WHT LSI has been developed to reduce the cost and the space of the recognition unit, and a high rate of recognition has been obtained.

...read moreread less

Abstract: Speech recognition systems are coming to a practical stage thanks to the recent progress of the semiconductor technology. We have developed a low cost speaker-dependent speech recognition unit using Walsh-Hadamard transform (WIT). A WHT LSI has been developed to reduce the cost and the space of the recognition unit, and a high rate of recognition has been obtained. The speech recognition algorithm and the LSI are described in this paper.

...read moreread less

Patent•

Recognition of word voice

[...]

Kunio Akiba, Takao Irumano, Hisanori Kanezashi

13 Oct 1982

Book Chapter•DOI•

Speaker Recognition: A Survey

[...]

Patrick Corsi¹•Institutions (1)

IBM¹

01 Jan 1982

TL;DR: This paper presents a unified discussion of the scientific and practical issues in the field of speaker recognition, and distinguishes between the Verification and Identification tasks.

...read moreread less

Abstract: This paper presents a unified discussion of the scientific and practical issues in the field of speaker recognition. Besides some background on speaker recognition by listening and visual analysis of spectrograms, we survey the computer recognition methods, and briefly discuss some technical aspects of various speaker recognizers, Methods for selecting an efficient set of features, and examples of results of experimental studies are also presented. We then differentiate between the Verification and Identification tasks.

...read moreread less

Book Chapter•DOI•

Speaker Independent Connected Word Recognition

[...]

Stephen E. Levinson¹•Institutions (1)

Bell Labs¹

01 Jan 1982

...read moreread less

Abstract: A method for speaker independent connected word recognition is described. Speaker independence is achieved by clustering isolated word utterances of a 100 speaker population. Connected word recognition is based on a syntax-directed dynamic programming algorithm which matches the isolated word templates to sentence length utterances. The method has been tested on a task oriented English-like language based on a 127 word vocabulary. Four subjects, two men and two women, spoke a total of 209 sentences comprising 1750 words. At an average speaking rate of 171 words per minute over dialed-up telephone lines, a correct word recognition rate of 97% was observed.

...read moreread less

Patent•

Recognition of voice

[...]

Masao Watari, Makoto Akaha, Toshihiko Waku, Hisao Nishioka

25 Feb 1982

Journal Article•DOI•

Advances in speech recognition

[...]

N. Zagoruiko

01 Mar 1982

TL;DR: Reports on major areas of technological development in the field of automatic recognition of speech processing and its major characteristics, device construction, and applications for speech processing.

...read moreread less

Abstract: Reports on major areas of technological development in the field of automatic recognition of speech processing. Identifies its major characteristics, device construction, and applications for speech processing.

...read moreread less

Proceedings Article•DOI•

A composite scheme for text-independent speaker recognition

[...]

N. Mohankrishnan¹, Malayappan Shridhar, M. Sid-Ahmed•Institutions (1)

University of Windsor¹

01 May 1982

TL;DR: This work addresses the development of a reliable, high accuracy text-independent speaker recognition system for a small population, with the reference parameters characterizing each speaker obtained from short segments of speech.

...read moreread less

Abstract: This work addresses the development of a reliable, high accuracy text-independent speaker recognition system for a small population, with the reference parameters characterizing each speaker obtained from short segments of speech. Initially the potential for speaker discrimination of several different vocal parameter sets was investigated. These included the LPC, Reflection, Cepstrum and Log Area Ratio coefficients, speech power spectrum parameters and the inverse filter spectral coefficients. It was then decided to use any two parameter sets in a composite decision-making scheme. A "repeat feature" was incorporated into the speaker recognition system, whereby a speaker was asked to read a fresh test speech segment if the decisions made by using the two different parameter sets individually were not coincident. Test results indicate that a significant improvement in accuracy is realizable.

...read moreread less

Speech recognition using LPC analysis

[...]

John L. Ostrander, Timothy D. Hopmann, Edward J. Delp

01 Jan 1982

Trying for speaker independence in the use of speaker dependent voice recognition equipment

[...]

Entner Roland, B. Jay Martin, N. D. Schwalm, G. K. Poock

01 Dec 1982

TL;DR: An experiment to determine the possibilities of obtaining some speaker independence using speaker dependent voice recognition equipment revealed about 99% accuracy when the user's speech templates were in memory along with those of four other users.

...read moreread less

Abstract: : This report discusses the results of an experiment to determine the possibilities of obtaining some speaker independence using speaker dependent voice recognition equipment. The results revealed about 99% accuracy when the user's speech templates were in memory along with those of four other users. If the user's voice patterns were not in memory but those of the four other users still were in memory, recognition accuracy still hovered around 95%. (Author)

...read moreread less

Proceedings Article•DOI•

Phonetic recognition to assist lip-reading for deaf children

[...]

M. Di Benedetto¹, Francis Destombes¹, B. Merialdo, Jean-Pierre Tubach¹•Institutions (1)

IBM¹

03 May 1982

TL;DR: The final goal of this work is to provide the deaf person with additional information (or "keys") which disambiguate the labial image.

...read moreread less

Abstract: Lip-reading is widely used by profoundly deaf individuals for the reception of the spoken language. This is a very difficult task because the labial image is ambiguous, The final goal of this work is to provide the deaf person with additional information (or "keys") which disambiguate the labial image. Phoneme recognition in continuous speech is used to produce the keys. To allow complete freedom in running speech, and in order to provide keys synchronously with speech production, no lexical, syntactical or semantical informations are used. Algorithms are adapted to a given speaker through a learning phase where prototypes are built for the phonetic units to be recognized. Recognition algorithms are a combination of segmentation and centisecond labeling. The keys system is optimized taking into account the confusions made by the recognition programs. Recognition scores for multiple speakers are indicated both at the phonetic level and at the keys level.

...read moreread less

Proceedings Article•DOI•

Automatic segmentation, recognition of phonetic units and training in the KEAL speech recognition system

[...]

Guy Mercier¹, A. Callec, J. Monne, M. Querre, O. Trevarain - Show less +1 more•Institutions (1)

CNET¹

01 May 1982

TL;DR: The acoustic-phonetic recognizer which performs the early stages of analysis in the KEAL system is described, which is to transform the continuous speech signal representing the uttered sentence into a string of lower units.

...read moreread less

Abstract: This paper describes the acoustic-phonetic recognizer which performs the early stages of analysis in the KEAL system. The objective of this module is to transform the continuous speech signal representing the uttered sentence into a string of lower units. Four main linguistic units have been considered : phones, phonemes, syllables and words. The KEAL acoustic-phonetic recognizer consists of components for carrying out three main tasks : acoustic analysis, labelling and training. Syllabic segmentation accuracy of 95%, an average phonemic recognition rate of 61% and a word recognition accuracy of 93% are obtained using 26 phonemic classes, isolated words (digits and operators : +,-,*,...) and continuous speech which different speakers. Preliminary results on number recognition (each number being composed of several digits spoken without insertion of pauses) give an accuracy of 90% after speaker adaptation.

...read moreread less

Proceedings Article•DOI•

Large-vocabulary spoken word recognition using simplified time-warping patterns

[...]

Y. Nara¹, K. Iwata, Y. Kijima, A. Kobayashi, S. Kimura, S. Sasaki, J. Tanahashi - Show less +3 more•Institutions (1)

Fujitsu¹

01 May 1982

TL;DR: A new matching algorithm for large vocabulary spoken word recognition is proposed, which gives a recognition score compatible to that of the traditional DP matching algorithm, but requires less than 1/10 as much calculation.

...read moreread less

Abstract: We propose a new matching algorithm for large vocabulary spoken word recognition, which gives a recognition score compatible to that of the traditional DP matching algorithm, but requires less than 1/10 as much calculation. By a computer simulation of 1,000 categories in speaker dependent recognition of speech samples uttered by five male adult speakers, an average recognition score of 95.8% was obtained. We have constructed a real-time speaker dependent speech recognizer using our algorithm. We are now examining the application of this recognizer to Japanese text input.

...read moreread less

Proceedings Article•

An Integrated Voice Recognition System

[...]

Ngoc Chau Bui, J.J. Monbaron, J. Michel

01 Sep 1982

TL;DR: In this article, a low cost voice recognition system for isolated words and small vocabularies (typically 15 words) is described, where the two main features of the system are: possibility of integration in a small size CMOS chip (typically 35 mm2) having minimum power consumption (less than 200?W at 3V) and automatic adaptation to the speaker without any tedious training mode.

...read moreread less

Abstract: A low cost voice recognition system for isolated words and small vocabularies (typically 15 words) is described. The two main features of the system are: possibility of integration in a small size CMOS chip (typically 35 mm2) having minimum power consumption (less than 200?W at 3V) and automatic adaptation to the speaker without any tedious training mode.

...read moreread less

Journal Article•DOI•

Automatic speaker recognition using time alignment of spectrograms

[...]

Hermann Ney¹•Institutions (1)

Philips¹

01 Jan 1982-Speech Communication

TL;DR: New techniques for automatic speaker recognition from telephone speech are described, based on spectral analysis of fixed sentence-long utterances, which is carried out by a dynamic programming algorithm which minimizes timing differences between corresponding speech events.

...read moreread less

Journal Article•DOI•

Speaker‐independent isolated word recognition using a new matching method

[...]

I. Nose, K. Mizuno, K. Yamada

01 Apr 1982-Journal of the Acoustical Society of America

TL;DR: A speaker‐independent isolated word recognition system which accepts telephone line speech which gets the recognition accuracy greater than 96% with 12 words spoken by 130 talkers and the same result was also obtained in the recognition test of the prototype machine.

...read moreread less

Abstract: This paper describes a speaker‐independent isolated word recognition system which accepts telephone line speech. A recognition method is named selective weighted matching (SWM) which uses a weighted distance measure. The input speech signal is frequency‐analyzed every 10 ms by a filter bank. The individual glottal characteristic is normalized frame by frame using a least‐square‐fit line of the speech spectrum. Each reference pattern has a specific region in the time‐frequency domain. In the matching process of that region, the weighted distance computation is carried out under the predetermined condition. In the computer simulation of telephone line speech, we got the recognition accuracy greater than 96% with 12 words (digits and two command words in Japanese) spoken by 130 talkers. The same result was also obtained in the recognition test of the prototype machine.

...read moreread less