scispace - formally typeset
Search or ask a question

Showing papers on "Word error rate published in 1982"


Journal ArticleDOI
TL;DR: The probability distribution of the phase angle between two vectors perturbed by correlated Gaussian noises is studied in detail and its asymptotic behavior for large signal-to-noise for "small," "near \pi/2 ," and "large" angles is found.
Abstract: The probability distribution of the phase angle between two vectors perturbed by correlated Gaussian noises is studied in detail. Definite integral expressions are derived for the distribution function, and its asymptotic behavior for large signal-to-noise is found for "small," "near \pi/2 ," and "large" angles. The results are applied to obtain new formulas for the symbol error rate in MDPSK, to calculate the distribution of instantaneous frequency, to study the error rate in digital FM with partial-bit integration in the postdetection filter, and to obtain a simplified expresion for the error rate in DPSK with a phase error in the reference signal. In the degenerate case in which one of the vectors is noise free, the results lead to the symbol error rate in MPSK.

452 citations


Proceedings ArticleDOI
01 May 1982
TL;DR: The distribution of words in terms of patterns derived from broad categorization of the phonemes was investigated and implications for phonetically-based isolated word recognition strategies are discussed.
Abstract: As part of our goal to design large-vocabulary, phonetically-based isolated word recognition systems, we investigated the statistical properties and constraints of the phonemic structures of English words. Our database consisted of five lexicons varying in size from 1250 to 20,000 words. The lexicons included, in addition to a phonemic transcription for each word, the word's frequency of occurrence as determined from the Brown Corpus. We studied the distributions of the phonemes, both individually and by class, within the lexicon and within the corpus. Distributions of consonant clusters were also obtained. Finally, the distribution of words in terms of patterns derived from broad categorization of the phonemes was investigated. This paper summarizes the results of these studies and discusses implications for phonetically-based isolated word recognition strategies.

104 citations


Journal ArticleDOI
01 Aug 1982
TL;DR: This paper provides a tutorial overview of methods of error monitoring under four broad classifications, namely, test sequences, parameter measurements, violation detection, and pseudo-error monitoring.
Abstract: The error rate is an important measure of performance in digital communications system, since it gives an indication of the quality of the received information. This paper provides a tutorial overview of methods of error monitoring under four broad classifications, namely, test sequences, parameter measurements, violation detection, and pseudo-error monitoring. A brief discussion of several approaches towards performance monitoring and the definition of error rate parameters is also given. The various methods are described and compared; but, no one monitor is singled out as the "best" since the final choice depends largely on the specific requirements of a given application.

90 citations


Patent
03 Mar 1982
TL;DR: In this article, a time-independent feature vector for any given word consists of a representation of the frequency of occurrence of each particular feature at any of several "time slots" in the word and the extra information about the word gained by a comparison of its vector with a corresponding vector of each training word assists in the final decision.
Abstract: Speech recognition accuracy is significantly enhanced by employing recognition criteria that involve comparison (in blocks 74, 84), as between spoken command words and stored "training" words, of both time-dependent feature arrays (72,73) and time-independent feature vectors (82, 83). The novel time-independent feature vector for any given word consists of a representation of the frequency of occurrence of each particular feature at any of several "time slots" in the word. The extra information about the word gained by a comparison (84) of its vector with a corresponding vector of each training word assists in the final decision (90).

53 citations


PatentDOI
Masao Watari1, Hiroaki Sakoe1
TL;DR: In this paper, a continuous speech recognition system determines the similarity between input patterns and reference patterns over time such that similarities between previously spoken speech patterns and references are determined while speech continues to be spoken.
Abstract: A continuous speech recognition system determines the similarity between input patterns and reference patterns over time such that similarities between previously spoken speech patterns and reference patterns are determined while speech continues to be spoken. Degrees of dissimilarity at arbitrary reference pattern word times are determined asymptotically and are recorded. The minimum degree of dissimilarity is determined and the corresponding word is categorized. Recognition decisions are ultimately made in reverse chronological order.

52 citations


Patent
03 May 1982
TL;DR: In this paper, an adaptive update of reference word codes is used to improve the performance of a speech recognition system by using a memory for decoding and memorizing the word in the spoken word.
Abstract: Speech recognition is improved by adaptive update of reference word codes. The apparatus can be used for controlling a piece of equipment and comprises memories which in coded form contains the references of the vocabulary of the machine; a microphone, a coding circuit and a memory for coding and memorizing the word pronounced by the user; a device for displaying the word recognized by the apparatus; and a control circuit for comparing the pronounced word to the different memorized references and for displaying at each repetition of the word the words corresponding to the references in their order of resemblance to the pronounced word. The user is able to train the apparatus easily by repeating a word until the displayed word agrees with what he has said, the coded form of this spoken word then being stored as the reference for the corresponding word of the vocabulary.

52 citations


Proceedings ArticleDOI
01 May 1982
TL;DR: A dynamic programming pattern matching isolated word recognition system has been modified in order to emphasize the transient parts of speech in the similarity measure, to weight the word distances with a normalized spectral change function.
Abstract: A dynamic programming pattern matching isolated word recognition system has been modified in order to emphasize the transient parts of speech in the similarity measure. The technique is to weight the word distances with a normalized spectral change function. A small positive effect is measured. Emphasizing the stationary parts is shown to substantially decrease the performance. Adding the time derivative of the speech parameters to the word patterns improves performance significantly. This is probably a consequence of an improvement in the description of the transient segments.

30 citations


Patent
Takashi Hoshino1, Takao Arai1
09 Jun 1982
TL;DR: In this paper, a series of data words in the input data is first modified into a form in which when the input word is erroneous it is replaced by correct input word immediately preceding the erroneous word.
Abstract: Errors in input data are concealed by interpolation using a mean value and the previous word which are automatically interchanged in accordance with the conditions of occurrence of errors. A series of data words in the input data is first modified into a form in which when the input word is erroneous it is replaced by correct input word immediately preceding the erroneous word. Then, in association with each word in the modified data word series a mean value between words immediately before and after that word or between that word and a word thereafter is produced, and if that word is the replaced word it is further replaced by the mean value produced in association with that data. Thus, an independent error is concealed by a mean value between the correct word immediately before and after the erroneous word while continuous errors are concealed in such a manner that the last erroneous word is replaced by the mean value between correct words occurring immediately before and after the continuous erroneous words and the other erroneous words are all replaced by the correct word occurring immediately before the continuous erroneous words.

25 citations


Proceedings ArticleDOI
03 May 1982
TL;DR: A speaker-dependent, limited vocabulary system using an 8088 microprocessor that carries out a preliminary time normalisation and reduces the amount of information used and eliminates any remaining time distortion during the recognition phase.
Abstract: In the last few years microprocessors have been used successfully in the construction of single board isolated word recognition systems. The interest that connected word recognition presently attracts, and the work that has been carried out at LIMSI in isolated word recognition, have brought us to implement a connected word and word spotting algorithm on a microprocessor. Herein we describe a speaker-dependent, limited vocabulary system using an 8088 microprocessor. There is only one training pass for each vocabulary word, except in the case of very short words, where we use two. There is no limit to the number of words in each utterance. We employ a compression method which carries out a preliminary time normalisation and reduces the amount of information used. A concise and efficient dynamic time warping procedure eliminates any remaining time distortion during the recognition phase. Real time processing is made possible by compressing and using the time warping method while acquisition is being carried out.

24 citations


Proceedings ArticleDOI
Akio Komatsu1, Akira Ichikawa1, Kazuo Nakata1, Yoshiaki Asakawa1, H. Matsuzaka1 
03 May 1982
TL;DR: An algorithm for phoneme recognition in continuous speech is presented, a continuous matching process is employed to bypass the segmentation problem and a hierarchical recognition algorithm is proposed to realize feasible matching in a real time.
Abstract: An algorithm for phoneme recognition in continuous speech is presented. A continuous matching process is employed to bypass the segmentation problem. A large set of standard patterns is used to solve the allophonic variation problem. Also, a hierarchical recognition algorithm is proposed to realize feasible matching in a real time. In the first stage of the hierarchical recognition algorithm, vowels in speech are spotted. To optimize accuracy in vowel spotting, each standard pattern is carefully selected, constraints on the "phoneme chain" of continuous speech are utilized, and partial standard pattern matching is employed for detailed phoneme analysis. The second stage recognizes consonants between vowels. Experimental results show a 91% vowel recognition rate and 80% consonant recognition rate for a specified speaker.

11 citations


Proceedings ArticleDOI
01 May 1982
TL;DR: An improved training procedure for connected word (digit) recognition is proposed in whichword reference patterns from isolated occurrences of the vocabulary words are combined with word reference patterns extracted from within connected word strings to give a robust, reliable word recognizer over all normal speaking rates.
Abstract: The "conventional" way of obtaining word reference patterns for connected word recognition systems is to use isolated word patterns, and to rely on the dynamics of the matching algorithm to account for the differences in connected speech. Connected word recognition, based on such an approach, tends to become unreliable (high error rates) when the talking rate becomes grossly incommensurate with the rate at which the isolated word training patterns were spoken. To alleviate this problem, an improved training procedure for connected word (digit) recognition is proposed in which word reference patterns from isolated occurrences of the vocabulary words are combined with word reference patterns extracted from within connected word strings to give a robust, reliable word recognizer over all normal speaking rates. In a test of the system (as a speaker trained, connected digit recognizer) with 18 talkers each speaking 40 different strings (of variable length from 2 to 5 digits), median string error rates of 0% and 2.5% were obtained for deliberately spoken strings and naturally spoken strings, respectively, when the string length was known. Using just isolated word training tokens, the comparable error rates were 10% and 11.3% respectively.

Proceedings ArticleDOI
Hermann Ney1, R. Gierloff
01 May 1982
TL;DR: The experiments indicate that feature weighting and feature selection can reduce the error rates by a factor of two or more both for speaker identification and speaker verification.
Abstract: This paper describes a technique for increasing the ability of a text-dependent speaker recognition system to discriminate between speaker classes; this technique is to be performed in conjunction with the nonlinear time alignment between a reference pattern and a test pattern. Unlike the standard approach, where the training of the recognition system merely consists of storing and averaging or selecting the time normalized training patterns separately for each class, the training phase of the system is extended in that a weight is determined for each individual feature component of the complete reference pattern according to the ability of the feature to distinguish between speaker classes. The weights depend on the time axis as well as on the frequency axis. The overall distance computed after nonlinear time alignment between a reference pattern and a test pattern thus becomes a function of the given set of weights of the reference class considered. For each class, the optimum weights result from the ideal criterion of minimum error rate. Instead of this criterion, the closely related but mathematically more convenient Fisher criterion is used that leads to a closed from solution for the unknown weights. Based on these weights, the selection of subsets of effective features is studied in order to further improve the class discrimination. The feature weighting and selecting techniques are tested using a data base of utterances recorded off dialed-up telephone lines. The experiments indicate that feature weighting and feature selection can reduce the error rates by a factor of two or more both for speaker identification and speaker verification.

Journal ArticleDOI
TL;DR: A modification of a compound model is shown that allows more accurate modeling of a wider class of channels including the high error rate radio channel.
Abstract: Most of the existing mathematical models for binary communication channels describe low error rate wire lines satisfactorily. For typical high error rate channels, like the ultrahigh frequency (UHF) or very high frequency (VHF) wideband data channel encountered in military uses, finite-state as well as denumerable infinite state Markov chain models do not achieve an accurate characterization. The above models, including some more recent compound models, are compared against data from the actual channel using the multigap distribution as a tool. A modification of a compound model is shown that allows more accurate modeling of a wider class of channels including the high error rate radio channel.


Journal ArticleDOI
TL;DR: This paper describes a general, practical decoding method for use in a double-encoding system and a decoding system which is a practical simplification of the above method.
Abstract: Double encoding is a method by which a new code is constructed by twofold encoding with two codes. Using this method, a code with excellent error-correcting capabilities can be constructed from relatively simple codes. The decoding can also be made simpler compared with a single code with the same code length and the same error-correcting performance. Both random and burst errors can be corrected. This paper describes a general, practical decoding method for use in a double-encoding system. Algebraic decoding is used in both the first and the second stages, but the detailed information obtained in the first decoding is fully utilized in the second stage: the usual error-correcting method is applied in the first decoding and a Chase algorithm is applied in the second decoding. A decoding system which is a practical simplification of the above method is also proposed. The proposed methods are compared with widely-used conventional systems in terms of the decoding error rate; it is shown that the decoding error rate can drastically be improved.

Journal ArticleDOI
30 Aug 1982
TL;DR: The error rate of a software application may function as a measure of code quality and hence code quality prior to an application's release because of the accuracy of the prediction of the error rate.
Abstract: The error rate of a software application may function as a measure of code quality. A methodology has been developed which allows for the accurate prediction of the error rate and hence code quality prior to an application's release.Many factors were considered which could conceivably be related to the error rate. These factors were divided into two categories: those factors which vary with time, and those factors which do not vary with time. Factors which vary with time were termed environmental factors and included such items as: number of users, errors submitted to date, etc. Factors which do not vary with time were termed internal factors and included Halstead metrics, McCabe metrics and lines of code.


Patent
03 Jun 1982
TL;DR: In this paper, the intensity of the receiving field is measured for every one bit in parallel with date receiving and is converted to a binary bit error rate by an A/D converter and an error rate correspondence ROM 6 and is stored in a shift register 7.
Abstract: PURPOSE:To decode a code in less probability of erroneous decoding, by operating and processing a syndrome, which is calculated from a receiving code, and the intensity of a receiving field. CONSTITUTION:Data received by a receiver 2 is led to a register 3 and a syndrome calculating circuit 4. The intensity of the receiving field is measured for every one bit in parallel with date receiving and is converted to a binary bit error rate by an A/D converter 5 and an error rate correspondence ROM 6 and is stored in a shift register 7. Meanwhile, the syndrome calculated by the calculating circuit 4 gives an address to an error position correspondence ROM 8, and the product between outputs of both ROMs 7 and 8 is obtained by a multiplier 11 dn is applied to a control circuit 9. The cntrol circuit 9 detects a combination of error positions of the highest occurrence probability from combinations of error positions which generate the same syndrome, and the output of the error position correspondence ROM 8 at this time and the output of the register 3 are operated for exclusive OR to obtain corrected data.

Journal ArticleDOI
TL;DR: A Semantic Syntax-Directed Translation is presented, its rules are used to segment continuous speech and, at the same time, to produce phonetic interpretations.

Journal ArticleDOI
TL;DR: Results indicate that a dynamic range of 30 dB seems to be adequate and three or four bits per channel are sufficient to encode the amplitude information, and the influence of varying the filter parameters was investigated.

Journal ArticleDOI
TL;DR: Rabiner et al. as discussed by the authors proposed a method for speaker independent isolated digit recognition based on modeling entire words as discrete probabilistic functions of a Markov process, which is a three-part process comprising conventional methods of linear prediction analysis and vector quantization of the LPCs followed by an algorithm.
Abstract: A method for speaker independent isolated digit recognition based on modeling entire words as discrete probabilistic functions of a Markov process is described. Training is a three‐part process comprising conventional methods of linear prediction analysis and vector quantization of the LPCs followed by an algorithm [L. E. Baum, Inequalities 3, 1–8 (1972)] for estimating the parameters of a hidden Markov process. Recognition utilizes linear prediction and vector quantization steps prior to maximum likelihood classification based on the Viterbi algorithm [A. J. Viterbi, IEEE Trans. Inf. Theo. IT‐13, 260–269 (1967)]. After training based on a 1000‐token set, recognition experiments were conducted on a separate 1000‐token test set obtained from 100 new talkers. In this test a 3.5% error rate was observed which is comparable to that measured in an identical test of an LPC/DTW system [L. R. Rabiner et al., IEEE Trans. Acoust. Speech Signal Process. ASSP‐37, 336–349 (1979)]. The computational demand for recognit...

Proceedings ArticleDOI
01 May 1982
TL;DR: The hypothesizing process is supported successfully by a fast lexicon access method based on hash-coding and it proves to be robust even under failure in the prerecognized phonems.
Abstract: Recognition of isolated or connected spoken words or sentences including a large vocabulary results in a great amount of classification expenditure. Reducing this expense by hypothesizing the words embedded in the speech signal is the goal of the hypothesizing process proposed in this paper. The process bases on the acoustic sound patterns and is accomplished by preclassification of significant phonems such as vowels and voiced consonants. The sequence of these phonems and their time distances within the speech signal is an appropriate criterion for hypothesizing and selecting of references from the lexicon. It is shown that this method can be applied successfully to isolated and connected word recognition on word and subword level reducing the classification expenditure by a great amount (120 to 2860 for isolated words). Results of the hypothesizing efficiency are presented for a 5000 word German vocabulary most frequently used. The hypothesizing process is supported successfully by a fast lexicon access method based on hash-coding and it proves to be robust even under failure in the prerecognized phonems.

Journal ArticleDOI
TL;DR: A compact and low cost text to speech synthesis unit for the personal computer has been developed that uses an LSI speech synthesizer with formant parameters and the analysis-synthesis technique has been introduced.
Abstract: A compact and low cost text to speech synthesis unit for the personal computer has been developed. This unit uses an LSI speech synthesizer with formant parameters. The speech synthesis method used is based on speech synthesis by rule using the CV, VC speech segments compilation method and the glottal pole formant model. In order to obtain high quality, the analysis-synthesis technique has been introduced. The glottal pole and formant parameters for CV, VC segments are extracted by a kind of analysis by synthesis technique. A Japanese text to speech synthesis system has been developed using these techniques. Intelligibility tests for well known Japanese words indicated an about 1% error rate.

Journal Article

Proceedings ArticleDOI
Y. Nara1, K. Iwata, Y. Kijima, A. Kobayashi, S. Kimura, S. Sasaki, J. Tanahashi 
01 May 1982
TL;DR: A new matching algorithm for large vocabulary spoken word recognition is proposed, which gives a recognition score compatible to that of the traditional DP matching algorithm, but requires less than 1/10 as much calculation.
Abstract: We propose a new matching algorithm for large vocabulary spoken word recognition, which gives a recognition score compatible to that of the traditional DP matching algorithm, but requires less than 1/10 as much calculation. By a computer simulation of 1,000 categories in speaker dependent recognition of speech samples uttered by five male adult speakers, an average recognition score of 95.8% was obtained. We have constructed a real-time speaker dependent speech recognizer using our algorithm. We are now examining the application of this recognizer to Japanese text input.

Journal ArticleDOI
TL;DR: A speaker‐independent isolated word recognition system which accepts telephone line speech which gets the recognition accuracy greater than 96% with 12 words spoken by 130 talkers and the same result was also obtained in the recognition test of the prototype machine.
Abstract: This paper describes a speaker‐independent isolated word recognition system which accepts telephone line speech. A recognition method is named selective weighted matching (SWM) which uses a weighted distance measure. The input speech signal is frequency‐analyzed every 10 ms by a filter bank. The individual glottal characteristic is normalized frame by frame using a least‐square‐fit line of the speech spectrum. Each reference pattern has a specific region in the time‐frequency domain. In the matching process of that region, the weighted distance computation is carried out under the predetermined condition. In the computer simulation of telephone line speech, we got the recognition accuracy greater than 96% with 12 words (digits and two command words in Japanese) spoken by 130 talkers. The same result was also obtained in the recognition test of the prototype machine.

Proceedings ArticleDOI
G. Kuhn1
01 May 1982
TL;DR: An experiment on talker-independent word recognition in which three experimental parameters were manipulated found a multivariate word model did not do better than a word model based on a sum of univariate distributions and the Pass 2 log odds score did better than the Pass 1 distance score.
Abstract: Results are presented from an experiment on talker-independent word recognition in which three experimental parameters were manipulated. Those parameters were 1) whether the word model had a mean vector and a variance vector, or a mean vector and a covariance matrix, at each time frame; 2) whether a single estimate was made of the word model or whether there was one re-estimate; and 3) whether word recognition was based on a Pass 1 distance score or on a novel Pass 2 log odds score. The results were 1) a multivariate word model did not do better than a word model based on a sum of univariate distributions; 2) a re-estimated word model did better than a word model based on initial estimates; and 3) the Pass 2 log odds score did better than the Pass 1 distance score.

Patent
21 Aug 1982
TL;DR: In this paper, the authors propose to compose bit error detecting circuits of the same constitution by installing a reference time pulse generating circuit in common to the bit error counting circuits and selecting reference pulses on the side of the bit counting circuit.
Abstract: PURPOSE:To compose bit error detecting circuits of the same constitution by installing a reference time pulse generating circuit in common to the bit error counting circuits and selecting reference pulses on the side of the bit counting circuit. CONSTITUTION:To detect a bit error rate of 10 error bits/second with regard to a digital circuit 21, reference pulses are supplied from a reference time pulse generating circuit 27 via a selection switch 29 to a decimal counter 25, and for the detection of a bit error rate of 10 error bits/second, the reference pulses from the reference time pulse generating circuit 28 are selected similarly. Then, reference time pulses from the circuits 27 and 28 are also supplied to a decimal counters 26, which detects the bit error rate of a digital circuit 22, in common to be selected with a selection switch 30. Even when both the digital multiplex circuits have different set error rate values, their circuit terminating devices, decimal counters and selection switches are connected in exactly the same way, new devices are added, and exactly the same circuits are provided as 0 counting devices.


Book
01 Jan 1982