scispace - formally typeset
Search or ask a question

Showing papers on "Word error rate published in 1984"


Proceedings ArticleDOI
19 Mar 1984
TL;DR: Results for a speaker dependent connected digit speech recognition task with a base error rate of 1.6%, show that preprocessing the noisy unknown speech with a 10 dB signal-to-noise ratio reduces the error rate from 42% to 10%.
Abstract: Acoustic noise suppression is treated as a problem of finding the minimum mean square error estimate of the speech spectrum from a noisy version. This estimate equals the expected value of its conditional distribution given the noisy spectral value, the mean noise power and the mean speech power. It is shown that speech is not Gaussian. This results in an optimal estimate which is a non-linear function of the spectral magnitude. This function differs from the Wiener filter, especially at high instantaneous signal-to-noise ratios. Since both speech and Gaussian noise have a uniform phase distribution, the optimal estimator of the phase equals the noisy phase. The paper describes how the estimator can be calculated directly from noise-free speech. It describes how to find the optimal estimator for the complex spectrum, the magnitude, the squared magnitude, the log magnitude, and the root-magnitude spectra. Results for a speaker dependent connected digit speech recognition task with a base error rate of 1.6%, show that preprocessing the noisy unknown speech with a 10 dB signal-to-noise ratio reduces the error rate from 42% to 10%. If the template data are also preprocessed in the same way, the error rate reduces to 2.1%, thus recovering 99% of the recognition performance lost due to noise.

138 citations


Journal ArticleDOI
TL;DR: This paper examined the relation between word frequency, repetition and stimulus quality in a lexical decision experiment, and found that frequency and repetition were correlated with stimulus quality, and the implications of this result for models of word recognition are discussed within the framework of Becker's verification model.
Abstract: This paper describes a lexical decision experiment, which examined the relation between word frequency, repetition and stimulus quality. In contrast to earlier studies (Stanners, Jastrzembski and Westbrook, 1975; Becker and Killion, 1977), frequency and stimulus quality were found to interact. The implications of this result for models of word recognition are discussed within the framework of Becker's verification model.

83 citations


Proceedings ArticleDOI
01 Mar 1984
TL;DR: A model of lexical access using partial phonetic information, where rather than performing detailed phonetic analysis, a word is characterized in terms of broad phonetic and prosodic information, which is used to retrieve a small set of words from a large lexicon.
Abstract: Current approaches to isolated word recognition rely on classical pattern recognition techniques which utilize little or no speech specific knowledge. While the performance of these systems is quite good, they are not readily extensible to tasks involving very large vocabularies and many different speakers. This paper presents a model of lexical access using partial phonetic information. Rather than performing detailed phonetic analysis, a word is characterized in terms of broad phonetic and prosodic information. This partial description is then used to retrieve a small set of words from a large lexicon. The broad class representation used in the model is both relatively insensitive to variability in the speech signal, and very powerful in differentiating among the words in a large lexicon. In order to evaluate the use of this model, we have implemented a word hypothesizer which uses partial phonetic information in lexical access. The system performs a broad phonetic categorization of the acoustic signal. This broad classification is used to return a small set of word candidates from a 20,000 word lexicon. The system is not trained to a specific speaker or vocabulary.

61 citations


Journal Article
TL;DR: In this paper, the effects of three error correction procedures on word recognition and fluency were examined, and the results indicated that word drill and phrase drill procedures are similarly better than word supply on the recognition of words in isolation, but that phrase drill is superior to word drill for recognition in context.
Abstract: The correction of oral reading errors is viewed differently from alternative perspectives on the process of reading. Proponents of the psycholinguistic perspective view error correction as detrimental to the "hypothesis testing" process while supporters of direct instruction view error correction as an aid to word recognition. Based upon the latter view point, the effects of three error correction procedures on word recognition and fluency were examined. The results indicate that word drill and phrase drill procedures are similarly better than word supply on the recognition of words in isolation, but that phrase drill is superior to word drill and word supply for recognition of words in context. No differences were found between the word drill and phrase drill procedures in improving fluency rates.

50 citations


PatentDOI
TL;DR: The work recognition system may be incorporated within an electronic device which is also equipped with speech synthesis capability such that the electronic device is able to recognize simple words as spoken thereto and to provide an audible comment via speech synthesis which is related to the spoken word.
Abstract: Speaker-independent word recognition method and system for identifying individual spoken words based upon an acoustically distinct vocabulary of a limited number of words. The word recognition system may employ memory storage associated with a microprocessor or microcomputer in which reference templates of digital speech data representative of a limited number of words comprising the word vocabulary are stored. The word recognition system accepts an input analog speech signal from a microphone as derived from a single word-voice command spoken by any speaker. The analog speech signal is directed to an energy measuring circuit and a zero-crossing detector for determining a sequence of feature vectors based upon the zero-crossing rate and energy measurements of the sampled analog speech signal. The sequence of feature vectors are then input to the microprocessor or microcomputer for individual comparison with the feature vectors included in each of the reference templates as stored in the memory portion of the microprocessor or microcomputer. Comparison of the sequence of feature vectors as determined from the input analog speech signal with the feature vectors included in the plurality of reference templates produces a cumulative cost profile for enabling logic circuitry within the microprocessor or microcomputer to make a decision as to the identity of the spoken word. The work recognition system may be incorporated within an electronic device which is also equipped with speech synthesis capability such that the electronic device is able to recognize simple words as spoken thereto and to provide an audible comment via speech synthesis which is related to the spoken word.

49 citations


Patent
21 Aug 1984
TL;DR: In this paper, a system for through checking the accuracy of generation of the error correction codes and decoding of error correction code is described, where a data word parity signal is generated for storage with the associated data word and its associated check bit.
Abstract: For use with a digital memory system that generates error correction code signals for storage with associated data words and for correction of detected error(s) in the associated data words when accessed, a system for through checking the accuracy of generation of the error correction codes and the decoding of error correction code is described. A data word parity signal is generated for storage with the associated data word and its associated check bit. When a data word is accessed, the read data word and its associated check bits are applied to error correction circuitry that results in a determination of whether or not any bits of the read data word are in error. Correction circuitry corrects those error in the read data word that are correctable. The corrected read data word is applied to a parity generator circuit that generates that parity of the corrected read data word. A comparison circuit compares the word parity calculated for the corrected read data word. Comparison indicates that the error correction system and through check system functioned properly, and failure of comparison indicates an error occurred in the throughput of the data word.

43 citations


Journal ArticleDOI
TL;DR: The letter proposes a continuous ARQ scheme, to be used under high error rate conditions, that preserves the ordering of the data blocks and yields a better throughput efficiency than some known comparable schemes.
Abstract: The letter proposes a continuous ARQ scheme, to be used under high error rate conditions. The scheme preserves the ordering of the data blocks and yields a better throughput efficiency—especially for channels with large round-trip delay—than some known comparable schemes, for all block error probabilities larger than 50%.

41 citations


PatentDOI
John W. Klovstad1
TL;DR: In this article, a word is a sequence of acoustic kernels, each kernel a phoneme spectral vector with min-max duration data on a template, and each kernel is activated or deactivated at the "kernel" level.
Abstract: Speech recognition calculations are decreased by deactivating (or activating) a word in a grammar graph at the "kernel" level. A word is a sequence of acoustic kernels, each kernel a phoneme spectral vector with min-max duration data on a template.

33 citations


PatentDOI
TL;DR: In a speech recognition system, the beginning of speech versus non-speech (a cough or noise) is distinguished by reverting to a nonspeech decision process whenever the liklihood cost of template (vocabulary) patterns, including silence, is worse than a predetermined threshold, established by a Joker Word which represents a non-vocabulary word score and path in the grammar graph as discussed by the authors.
Abstract: In a speech recognition system, the beginning of speech versus non-speech (a cough or noise) is distinguished by reverting to a non-speech decision process whenever the liklihood cost of template (vocabulary) patterns, including silence, is worse than a predetermined threshold, established by a Joker Word which represents a non-vocabulary word score and path in the grammar graph.

29 citations


Proceedings ArticleDOI
01 Mar 1984
TL;DR: This paper deals with two experiments with a large vocabulary isolated word recognizer and investigates the performance of the recognition system on sentences containing words outside the vocabulary of the recognizer.
Abstract: This paper deals with two experiments with a large vocabulary isolated word recognizer. The first compares word error rates for 1) meaningful sentences belonging to actual documents and 2) random word lists from the same vocabulary. The error rate is considerably lower for random word lists. The second experiment investigates the performance of the recognition system on sentences containing words outside the vocabulary of the recognizer. Sentences from a 5000 word vocabulary task are recognized with a recognizer limited to a 2000 word subvocabulary. The error rate is only slightly higher than it would be if recognition of the full 5000 word vocabulary was allowed.

19 citations


Patent
Takashi Kaneko1
27 Feb 1984
TL;DR: In this article, an interpolation circuit and method corrects a data word having an error by either substituting the most recent prior correct data word with the data word subsequent to the one having the error, where appropriate, using a minimum of hardware.
Abstract: An interpolation circuit and method corrects a data word having an error by either substituting the most recent prior correct data word or substituting the mean value of the most recent prior correct data word and the data word subsequent to the one having the error The method is carried out using a novel architecture having memory access control A RAM stores the most recent prior correct word, A j , which will be the current word A n if A n has no error If the subsequent workd A n+1 is without error it is stored The two memory locations are, in effect, reversible, to enable the subsequent work A n+1 to become the most recent correct prior word, A j , for subsequent processing, where appropriate, using a minimum of hardware The most recent prior word A j (which may be A n ) is read to an adder which performs addition and division by two Depending upon the error content of A n and A n+1 , A j is either added to itself or to A n+1 to obtain the final interpolated data All words are divided into upper-half and lower-half portions to cut down on the hardware required


Journal ArticleDOI
TL;DR: The gating paradigm (Grosjean, 1980) was used to show that the impact of the frequency of occurrence of a word can be reduced by the semantic context preceding that word.
Abstract: There is increasing evidence that the properties of a spoken word such as its length or its phonotactic configuration interact with the preceding semantic context during word recognition: the more constraining the context, the less important the role of the word properties. The gating paradigm (Grosjean, 1980) was used to show that the impact of the frequency of occurrence of a word can also be reduced by the semantic context preceding that word. A 66-msec difference between the time it takes to isolate low- and high-frequency words in a low-constraint condition was reduced to a 4-msec difference in a high-constraint condition. The theoretical implications for this significant interaction are discussed briefly.

Proceedings ArticleDOI
01 Mar 1984
TL;DR: Experiments performed show that the use of individual word templates for references together with the k-nearest neighbor decision procedure substantially improves the performance in isolated word recognition.
Abstract: This study compares the recognition rates attainable with the aid of two different methods of generating reference templates from training words and two different decision rules. The test enviroment consists of isolated words from a small vocabulary spoken by a large number of speakers over the public telephone system. Experiments performed show that the use of individual word templates for references together with the k-nearest neighbor decision procedure substantially improves the performance in isolated word recognition. We attempted to minimize the computations involved in the k-nearest neighbor decision procedure by assuming that the dynamic time-warp distance was a metric, which would allow use of a 1-nearest neighbor decision rule with appropriately relabeled reference data. Results indicate that this step leads to an error rate exceeding that obtainable with the 1-nearest neighbor rule on the original nonrelabeled data.

Journal ArticleDOI
TL;DR: The main objective is to show the additional degradation of an error rate of a binary DPSK system when undesired multiple co-channel M-ary DPSK interferers are virtually generated in a practical land mobile radio channel.
Abstract: The error rate performance of a binary differentially encoded phase-shift keying (DPSK) system in the presence of both thermal noise and multiple co-channel interferers is theoretically analyzed in the fast Rayleigh fading environment encountered in the typical UHF or microwave land mobile radio channels. The main objective is to show the additional degradation of an error rate of a binary DPSK system when undesired multiple co-channel M-ary DPSK interferers are virtually generated in a practical land mobile radio channel. The error probabilities are presented by a simple closed form as a function of noise correlation, interferer correlation, the number of interferers, M-ary modulating phase of interferer, Doppler frequency, carrier-to-noise (CNR) and carrier-to-interference (CIR) average power ratios.

Journal ArticleDOI
R. Pieraccini1
TL;DR: In this work three different pattern compression techniques are compared on the basis of efficiency as well as recognition performance when applied to pattern matching by means of dynamic programming in a speaker dependent context.

Proceedings ArticleDOI
01 Mar 1984
TL;DR: The results indicate that smoothing in addition to twicing provides significant performance improvement in high noise background, and the twicing algorithm reduces the error rate in the noisy environments by a significant amount.
Abstract: In this paper we present a study on the performance of a speaker-dependent continuous speech recognition algorithm in various background noise levels, including the mismatch tolerance of the algorithm. This mismatch exists in most applications where the user is trained in one noise level and does the recognition in different and highly variable noise levels. Finally, we introduce a pre-processing technique called 'twicing' and a simple 3-point moving average post-processor. The twicing algorithm, while still maintaining the high performance in the quiet background, reduces the error rate in the noisy environments by a significant amount. The results indicate that smoothing in addition to twicing provides significant performance improvement in high noise background.

Proceedings ArticleDOI
01 Mar 1984
TL;DR: It is shown that prosodic information, e.g., the rhythmic structure of an input word, its syllabic structure, voiced/unvoiced regions in the word and the temporal distribution of back/front vowels, nasals and liquids and glides, can be used effectively to select a substantially reduced subvocabulary of candidates, before any fine phonetic analysis is attempted to recognize the word.
Abstract: Prosodic information is believed to be valuable informnation in human speech perception, but speech recognition systems to date have largely been based on segmental spectral analysis. In this paper I describe parts of a front end to a very-large-vocabulary isolated word recognition system using prosodic information. The present front end is template independent (speaker training for large vocabulary systems (> 20,000 words) is undesirable) and makes use of robust cues in the incoming speech to obtain a presorted vocabulary of candidates. It is shown that prosodic information, e.g., the rhythmic structure of an input word, its syllabic structure, voiced/unvoiced regions in the word and the temporal distribution of back/front vowels, nasals and liquids and glides, can be used effectively to select a substantially reduced subvocabulary of candidates, before any fine phonetic analysis is attempted to recognize the word.

Proceedings ArticleDOI
01 Mar 1984
TL;DR: The basic idea is to reduce the number of word candidates for the recognition by looking for robust phonetic features computed from the input signal, and it is possible to design a multiprocessor structure in order to reduced the overall recognition time.
Abstract: Our group has been designing for the past twelve years several speech recognition systems, from isolated vocabulary pattern matching systems to continuous speech understanding systems. The experiments we carried out showed us that the systems designed for restricted vocabularies task were not readily extensible to large vocabularies. We therefore started some years ago implementing a 200 word recognition system using a phonetic approach. This system was tested successfully in 1980. In continuation of this research we decided to extend our approach to a 1000 word vocabulary. This paper describes the principles involved in this system together with the preliminary results already obtained. The basic idea is to reduce the number of word candidates for the recognition by looking for robust phonetic features computed from the input signal. These features are used as a key for accessing the lexicon. Since the determination of the features is carried out in parallel with the phonetic decoding of the input word, it is possible to design a multiprocessor structure in order to reduce the overall recognition time. The determination of crude phonetic features is described together with the organization of the lexicon. Some preliminary results are finally presented and discussed.

Patent
26 Nov 1984
TL;DR: In this paper, the number of information bits per PCM word is reduced and the transmission capacity thus obtained is used to transmit check bits of an error-correcting code, which is performed automatically via a back channel on the basis of error rate measurement at the receiving end.
Abstract: In a noise-affected transmission channel, the number of information bits per PCM word is reduced and the transmission capacity thus obtained is used to transmit check bits of an error-correcting code. The changeover to error-protection mode is performed automatically via a back channel on the basis of error rate measurement at the receiving end.

Proceedings ArticleDOI
01 Mar 1984
TL;DR: This paper describes the speaker-independent spoken word recognition system for a large size vocabulary and results are obtained for the training samples in the 212 words uttered by 10 male and 10 female speakers.
Abstract: This paper describes the speaker-independent spoken word recognition system for a large size vocabulary. Speech is analyzed by the filter bank, from whose logarithmic spectrum the 11 features are extracted every 10 ms. Using the features the speech is first segmented and the primary phoneme recognition is carried out for every segment using the Bayes decision method. After correcting errors in segmentation and phoneme recognition, the secondary recognition of part of the consonants is carried out and the phonemic sequence is determined. The word dictionary item having maximum likelihood to the sequence is chosen as the recognition output. The 75.9% score for the phoneme recognition and the 92.4% score for the word recognition are obtained for the training samples in the 212 words uttered by 10 male and 10 female speakers. For the same words uttered by 30 male and 20 female speakers different from the above speakers, the 88.1% word recognition score is obtained.

Journal ArticleDOI
TL;DR: This talk presents results of a series of speaker independent, isolated word recognition tests using a 10‐word digits vocabularies, and shows that the information in the prosodic energy contour complements the segmental information of the LPC spectrum, thereby providing small but consistent improvements in performance for small word vocABularies.
Abstract: The technique of vector quantization has been widely applied in the area of speech coding and has recently been introduced into the area of speech recognition. For the conventional statistical pattern recognition word recognizer using LPC feature sets as the analysis frames, the use of vector quantization leads to a large reduction in computation for the dynamic time warping pattern matching, and a concomittant small increase in average word error rate. A second technique that has been recommended for improving the performance of isolated word recognizers is the addition of temporal energy information into the distance metric for comparing frames of speech. It has been shown that the information in the prosodic energy contour complements the segmental information of the LPC spectrum, thereby providing small but consistent improvements in performance for small word vocabularies. In this talk we present results of a series of speaker independent, isolated word recognition tests using a 10‐word digits vocabu...

Proceedings ArticleDOI
19 Mar 1984
TL;DR: A phrase unit speech recognition system is discussed, which is applicable for a large vocabulary and is independent of the task, and a technique to recognize phrases based on the phoneme recognition is introduced.
Abstract: A phrase unit speech recognition system is discussed, which is applicable for a large vocabulary and is independent of the task. In the case of large vocabulary, it is desirable to express the words in the dictionary by the sequence of phonemes or phoneme-like units. Therefore, the recognition of phonemes in continuous speech is essential to achieve a flexible speech understanding system. In this paper, a technique to recognize phrases based on the phoneme recognition is introduced. The system is composed of the phoneme recognition part and the phrase recognition part. In the phoneme recognition part, the features in the articulatory domain are extracted and applied to compensate coarticulation. In the phrase recognition part, a word sequence corresponding to the phoneme sequence is determined by using two-level DP matching with automaton control, in which words are processed symbolically to attain the acceptable processing speed.

Proceedings ArticleDOI
01 Mar 1984
TL;DR: This paper discusses some of the limitations of the existing isolated word speech recognition system (IWSR) when applied to confusable vocabulary, and uses a new measure, called Performance Index, to evaluate the changes in performance due to innovations carried out on small data sets.
Abstract: In this paper we discuss some of the limitations of the existing isolated word speech recognition system (IWSR) when applied to confusable vocabulary. For our study we have chosen a subset of Hindi stop consonants as the confusable word set. The members of this set differ among themselves primarily in the short leading consonant part and at the interface of the consonant and the following dominant vowel part. We adopt a signal-dependent approach for parameter extraction and matching strategy. This approach gives better performance compared with the conventional approach, but the performance still falls far short of the desired goal of 100% recognition. Refined signal processing suitable for appropriate segments of speech appear to be the way out of this problem. We discuss our studies in this direction. We use a new measure, called Performance Index, to evaluate the changes in performance due to innovations carried out on small data sets.

Proceedings ArticleDOI
01 Mar 1984
TL;DR: Vector quantization techniques were applied to a continuous speech recognition system as a means of reducing both memory usage and computation time and provided a significant computational advantage over the clustering technique.
Abstract: Vector quantization techniques were applied to a continuous speech recognition system as a means of reducing both memory usage and computation time. The speech recognition system computes time-aligned distances between unknown speech segments and template frames. Vector quantization allowed the replacement of speech frames (vectors) with single index numbers which referenced an ordered set, or codebook, of representative frames. Two techniques for generating this codebook, clustering and covering, were examined. The covering technique provided a significant computational advantage over the clustering technique although both techniques generated codebooks which performed well in this task. Results are presented for a ten speaker, 100 word vocabulary experiment. Using speaker dependent codebooks, system performance levels were maintained while the number of distance calculations was reduced by a factor of 2 and the template storage required was reduced by a factor of 4.6. With an increase in error rate of about one third, these factors were 6.8 and 7.8 respectively.


Patent
13 Nov 1984
TL;DR: In this paper, a branching circuit consisting of an error correcting decoder and an error rate detector is used to improve the detection precision of a circuit error rate by providing a syndrome extractor to a receiving demodulation system.
Abstract: PURPOSE:To improve the detection precision of a circuit error rate by providing a syndrome extractor to a receiving demodulation system device. CONSTITUTION:The error correcting decoder consisting of a branching circuit 20, register 21, syndrome extractor 22, error corrector 23, and an error rate detector 24 is provided to receiving demodulation system devices in a terminal device and a specific repeater which forms a digital radio transmission system. A signal 101 containing a digital multiplex signal and a clock signal is branched by the circuit 20 into signals 102 and 103, and the signal 103 is inputted to the syndrome extractor 22 to extract a syndrome from an error correction code. Whether an error occurs to the signal 101 or not is detected from the syndrome, and the syndrome 22 outputted by the detector 22 and a clock signal 105 are inputted to the detector 24 to detect an error rate corresponding to a specific time length and output it. Further, a correction pulse signal 107 for the signal 101 is outputted to the corrector 23.

Proceedings ArticleDOI
19 Mar 1984
TL;DR: A new method to compensate for endpoint detection errors is proposed and an improved method is compared with two existing methods on the alphadigit vocabulary.
Abstract: Inaccurate detection of the endpoints of the test and reference patterns is a major source of errors in discrete utterance recognition by dynamic time warping. If the vocabulary contains similar sounding words whose differences are at their beginnings or ends as in the alphadigit vocabulary, the error rate may greatly increase due to endpoint detection errors. Several methods to improve the recognition accuracy by relaxing or adjusting the endpoints have been suggested. They, however, do not work well in all cases and actually the error rate may increase. We propose a new method to compensate for endpoint detection errors and compare our improved method with two existing methods on the alphadigit vocabulary.

Proceedings ArticleDOI
01 Mar 1984
TL;DR: The confusion matrices from an extension of the Doddington-Schalk tests of commercial speech recognizers were used to determine the Relative Information Loss (RIL), and, when used in conjunction with a rate distortion model, can reflect the costs of individual errors in voice entry systems.
Abstract: The popular "recognition accuracy" and "substitutionary error rate" measures of performance for speech recognizers fail to account for the distribution of errors among vocabulary items, the potentially different costs of errors, and the difficulties of various recognition tasks. One new performance measure, called the Relative Information Loss (RIL), can account for the distribution of errors, and, when used in conjunction with a rate distortion model, can reflect the costs of individual errors in voice entry systems. When error rate is used as the performance measure, worst-case performances vary with vocabulary size, while, with RIL, the scale from best to worst-case performance is independent of vocabulary size. The confusion matrices from an extension of the Doddington-Schalk [1] tests of commercial speech recognizers were used to determine the RIL of each of 10 tested recognizers. Cost-performance analysis, using a program for rate-distortion analysis, determined for any user-defined limits on the expected costs of errors, which of the recognizers would perform adequately for the tested task conditions.

01 Jan 1984
TL;DR: In this paper, a speaker-independent spoken word recognition system for a large size vocabulary is described, in which speech is analyzed by the filter bank, from whose logarithmic spectrum the 11 features are extracted every 10 ms.
Abstract: This paper describes the speaker-independent spoken word recognition system for a large size vocabulary. Speech is analyzed by the filter bank, from whose logarithmic spectrum the 11 features are extracted every 10 ms. Using the features the speech is first segmented and the primary phoneme recognition is carried out for every segment using the Bayes decision method. After correcting errors in segmentation and phoneme recognition, the secondary recognition of part of the consonants is carried out and the phonemic sequence is determined. The word dictionary item having maximum likelihood to the sequence is chosen as the recognition output. The 75.9% score for the phoneme recognition and the 92.4% score for the word recognition are obtained for the training samples in the 212 words uttered by 10 male and 10 female speakers. For the same words uttered by 30 male and 20 female speakers different from the above speakers, the 88.1% word recognition score is obtained.