scispace - formally typeset
Search or ask a question

Showing papers on "Word error rate published in 1978"


Proceedings ArticleDOI
Lalit R. Bahl1, J. Baker1, Paul S. Cohen1, Frederick Jelinek1, Burn L. Lewis1, Robert Leroy Mercer1 
10 Apr 1978
TL;DR: Preliminary results have been obtained with a system for recognizing continuously read sentences from a naturally-occurring corpus (Laser Patents), restricted to a 1000-word vocabulary.
Abstract: Preliminary results have been obtained with a system for recognizing continuously read sentences from a naturally-occurring corpus (Laser Patents), restricted to a 1000-word vocabulary. Our model of the task language has an entropy of about 4.8 bits/word and a perplexity of 21.11 words. Many new problems arise in recognition of a substantial natural corpus (compared to recognition of an artificially constrained language). Some techniques are described for treating these problems. On a test set consisting of 20 sentences having a total of 486 words, there was a word error rate of 33.1%.

132 citations



Journal ArticleDOI
TL;DR: A modification to the basic Go-Back- N ARQ error control technique is described which yields improved throughput efficiency performance for all block error rates.
Abstract: A modification to the basic Go-Back- N ARQ error control technique is described which yields improved throughput efficiency performance for all block error rates.

93 citations


Journal ArticleDOI
TL;DR: It was found that, with word frequency controlled, words judged to be of early acquisition had a significantly lower recognition threshold than words judgedto be of later acquisition.
Abstract: Recent investigations have shown that the latency in object naming is affected by when in life the naming word is learned—the age-at-acquisition of the naming word. The present study investigated the effect of age-at-acquisition in the recognition of tachistoscopically presented words. It was found that, with word frequency controlled, words judged to be of early acquisition had a significantly lower recognition threshold than words judged to be of later acquisition.

47 citations


Proceedings ArticleDOI
10 Apr 1978
TL;DR: Performance results on the recognition of continuously spoken sentences from the finite state grammar for the "New Raleigh Language" are reported, using a new centisecond-level model for the acoustic processor.
Abstract: We report performance results on the recognition of continuously spoken sentences from the finite state grammar for the "New Raleigh Language" (vocabulary-250 words; average sentence length-8 words; entropy-2.86 bits/word; perplexity-7.27 words). Sentence and word error rates of 5% and 0.6% , respectively, are achieved, using a new centisecond-level model for the acoustic processor. We also report results for the "CMU-AIX05 Language" (vocabulary-1011 words; average sentence length-about 7 words; entropy-2.18 bits/word; perplexity-4.53 words), using both our earlier phone-level model and the centisecond-level model. With the phone-level acoustic-processor model, sentence and word error rates of 2% and 0.8%, respectively, are achieved. With the centisecond-level model, sentence and word error rates are 1% and 0.1%, respectively.

45 citations


Journal ArticleDOI
TL;DR: In this article, a speech recognition system has been implemented which accepts reasonably natural English sentences spoken as isolated words, and the major components of the system are a speakerdependent word recognizer, a programmed grammar, and a syntax analyzer.
Abstract: A speech recognition system has been implemented which accepts reasonably natural English sentences spoken as isolated words. The major components of the system are a speaker-dependent word recognizer, a programmed grammar, and a syntax analyzer. The system permits formulation of complete sentences from a vocabulary of 127 words. The set of sentences selected for investigation is intended for use as requests in an automated travel information system. Results are presented of evaluations for speakers using their own stored reference patterns, the reference patterns of other speakers, and composite reference patterns averaged over several speakers. For speakers using their own reference patterns the median error rate for acoustic recognition of the individual words is 11.7 percent. When syntax analysis is applied to the complete sentence, word recognition errors can be corrected and the error rate reduced to 0.4 percent.

35 citations


Journal ArticleDOI
TL;DR: It is found that an acoustic word error rate is reduced to 0.2 percent after syntactic analysis, resulting in a sentence error rate of 1 percent, which is indicative of the performance which will be attained by a real speech recognition system which uses the syntacticAnalysis algorithm described herein.
Abstract: In this paper we examine the effects of an algorithm for syntactic analysis on word recognition accuracy. The behavior of the algorithm is studied by means of a computer simulation. We describe the syntactic analysis technique, the problem domain to which it was applied, and the details of the simulation. We then present the results of the simulation and their implications. We find, for example, that an acoustic word error rate of 10 percent is reduced to 0.2 percent after syntactic analysis, resulting in a sentence error rate of 1 percent. These figures are based on a 127-word vocabulary and an average of 10.3 words per sentence for 1000 sentences. We expect that these results are indicative of the performance which will be attained by a real speech recognition system which uses the syntactic analysis algorithm described herein.

29 citations


Proceedings ArticleDOI
01 Apr 1978
TL;DR: In this paper, the authors discuss two systems which are much faster, but no less accurate than, systems in which the acoustic and syntactic analyses are sequential and isolated, and discuss methods for using syntax directed recognition to improve overall system accuracy.
Abstract: In syntax directed speech recognition communication between the acoustic and syntactic processors of the system causes the acoustic analyzer to evaluate only those grammatically correct word hypotheses generated by the syntax analysis algorithm. In this paper we discuss two such systems which are much faster, but no less accurate than, systems in which the acoustic and syntactic analyses are sequential and isolated. Theoretical details of the systems are given and experimental results of tests are presented. Finally we discuss methods for using syntax directed recognition to improve overall system accuracy. We also show the relevance of one of our methods to the recognition of connected speech.

8 citations


Journal ArticleDOI
Kashyap1, Mittal
TL;DR: A method of recognizing isolated words and phrases from a given vocabulary spoken by any member in a given group of speakers, the identity of the speaker being unknown to the system is described.
Abstract: We describe a method of recognizing isolated words and phrases from a given vocabulary spoken by any member in a given group of speakers, the identity of the speaker being unknown to the system. The word utterance is divided into 20-30 nearly equal frames, frame boundaries being aligned with glottal pulses for voiced speech. A constant number of pitch periods are included in each frame. Statistical decision rules are used to determine the phoneme in each frame. Using the string of phonemes from all the frames of the utterance, a word decision is obtained using (phonological) syntactic rules. The syntactic rules used here are of 2 types, namely, 1) those obtained from the theory of word construction from phonemes in English as applied to our vocabulary, 2) those used to correct possible errors in phonemic decisions obtained earlier based on the decisions of neighboring segments. In our experiment, the vocabulary had 40 words, consisting of many pairs of words which are phonemically close to each other. The number of speakers was 6. The identity of the speaker is not known to the system. In testing 400 words utterances, the recognition rate was about 80 percent for phonemes (for 11 phonemes) but the word recognition was 98.1 percent correct. Phonological-syntactic rules played an important role in upgrading the word recognition rate over the phoneme recognition rate.

8 citations


01 Jan 1978
TL;DR: The parameters and the decision rules used in the segmentation are described and even the error rate does not affect the recognition accuracy of the Harpy system significantly because the scheme is designed to provide several extra segments at the cost of speed of operation.
Abstract: The first step in the recognition of continuous speech by machine is segmentation of the utterance. The Harpy continuous speech recognition system, developed at Carnegie-Mellon University , uses a segmentation procedure based on simple time domain parameters called ZAPDASH. In this paper the parameters and the decision rules used in the segmentation are described. Considerations in the choice of parameters are discussed briefly. The heuristics used in arriving at some of the decision rules are also discussed. The performance of the segmentation scheme is evaluated by comparing the results with the results of hand segmentation of the waveform of the utterance. The results show an overall error rate of 4% in 34 utterances. However, even the error rate does not affect the recognition accuracy of the Harpy system significantly because the scheme is designed to provide several extra segments at the cost of speed of operation. The average duration of the segments obtained by this technique was found to be 4.7 centiseconds. The robustness of the segmentation scheme for noise and distortion in input speech is currently being investigated.

8 citations


Journal ArticleDOI
TL;DR: One of the most difficult problems in speaker recognition is that the feature parameters frequently vary after a long time interval; one uses the time pattern of both the fundamental frequency and log‐area‐ratio parameters and the other uses several kinds of statistical features derived from them.
Abstract: One of the most difficult problems in speaker recognition is that the feature parameters frequently vary after a long time interval. We examined this effect on two kinds of speaker recognition; one uses the time pattern of both the fundamental frequency and log‐area‐ratio parameters and the other uses several kinds of statistical features derived from them. Results of speaker recognition experiments revealed that the long‐term variation effects have a great influence on both recognition methods, but are more evident in recognition using statistical parameters. In order to reduce the error rate after a long interval, it is desirable to collect learning samples of each speaker over a long period and measure the weighted distance based on the long‐term variability of the feature parameters. When the learning samples are collected over a short period, it is effective to apply spectral equalization using the spectrum averaged over all the voiced portions of the input speech. By this method, an accuracy of 95% can be obtained in speaker verification even after five years using statistical parameters of a spoken word.One of the most difficult problems in speaker recognition is that the feature parameters frequently vary after a long time interval. We examined this effect on two kinds of speaker recognition; one uses the time pattern of both the fundamental frequency and log‐area‐ratio parameters and the other uses several kinds of statistical features derived from them. Results of speaker recognition experiments revealed that the long‐term variation effects have a great influence on both recognition methods, but are more evident in recognition using statistical parameters. In order to reduce the error rate after a long interval, it is desirable to collect learning samples of each speaker over a long period and measure the weighted distance based on the long‐term variability of the feature parameters. When the learning samples are collected over a short period, it is effective to apply spectral equalization using the spectrum averaged over all the voiced portions of the input speech. By this method, an accuracy of 95% ...

Proceedings ArticleDOI
01 Apr 1978
TL;DR: The accuracy of the word recognition is improved by implementing new features extraction method and new phoneme connecting rules in addition to the previously reported system.
Abstract: This paper describes the newly improved spoken word recognition system. The accuracy of the word recognition is improved by implementing new features extraction method and new phoneme connecting rules in addition to the previously reported system. The new features for the phoneme recognition are the spectral local peaks and four parameters by using the least squares fit line of speech spectrum. By increasing the features for the phoneme recognition, and by using new phoneme connecting rules, the accuracies of phoneme recognition and segmentation are improved. In the last step of the system, the item of the dictionary having maximum similarity to the recognized phonemic sequence is chosen. Every item of the dictionary is written in phonemic symbols, and it is easy to change the objective words. The score of the word recognition was found to be 85.7% for 166 city names uttered by 15 male speakers.

Proceedings ArticleDOI
01 Apr 1978
TL;DR: An optimum matching method between a classified phoneme string and a phonemestring of a lexical entry in a word dictionary is described by using a phonemic similarity matrix and a dynamic programming technique.
Abstract: In this paper, we describe an optimum matching method between a classified phoneme string and a phoneme string of a lexical entry in a word dictionary. This method is performed by using a phoneme similarity matrix and a dynamic programming technique. A classified segment consists of the first candidate, second candidate, reliability and duration. The effect of coarticulation is normalized in this matching procedure. We also describe a word spotting method in continuous speech by modifying this method.


Journal ArticleDOI
TL;DR: The authors performed speaker-independent recognition of words spoken in isolation using a very large vocabulary of over 26 000 words taken from the “Brown” data set, with negligible difference in performance between male and female speakers.
Abstract: Speaker‐independent recognition of words spoken in isolation was performed using a very large vocabulary of over 26 000 words taken from the “Brown” data set. (Computational Analysis of Present‐Day American English by Kucera and Francis). After discarding 4% of the data judged to be spoken incorrectly, experimental recognition error rate was 2.3% (1.8% substitution and 0.5% rejection), with negligible difference in performance between male and female speakers. Experimental error rate for vocabulary subsets, ordered by frequency of usage, was 1.0% for the first 50 words, 0.8% for the first 120 words, and 1.2% error for the first 1500 words. An analysis of recognition errors and a discussion of ultimate performance limitations will be presented.

Journal ArticleDOI
R. E. Langseth1
TL;DR: Calculated results were obtained for both linear and square arrays, and for rectangularpulse signaling over single-pole and two-pole Butterworth equivalent low-pass channels, as well as for impulsive signaling over an idealized channel with a raised-cosine frequency response.
Abstract: Some calculations of the effect of phased-array dispersion on degrading phase-shift-keyed (PSK) error rates are presented. The results are given in the form of curves of the bit error rate versus the ratio of array "fill-time" (propagation time across the array) to signaling interval. Values of this ratio in the range 0.5 to 0.8 are required to degrade the error rate by a factor of 2. These results were obtained for both linear and square arrays, and for rectangularpulse signaling over single-pole and two-pole Butterworth equivalent low-pass channels, as well as for impulsive signaling over an idealized channel with a raised-cosine frequency response.

Patent
07 Oct 1978
TL;DR: In this article, the authors proposed to simultaneously observe entire reception burst and shorten the required time in error rate measurement by providing counters counting bit number, a memory circuit and a counting circuit and counting the times of reference timings and bit errors.
Abstract: PURPOSE:To simultaneously observe entire reception burst and shorten the required time in error rate measurement by providing counters counting bit number, a memory circuit and a counting circuit and counting the times of reference timings and the times of bit errors.

ReportDOI
01 Nov 1978
TL;DR: This work was an initial effort in the use of voice data entry for information data handling to develop the technology for a large vocabulary (1000 word) isolated word recognition system capable of quick adaptation and high accuracy for a limited number of people.
Abstract: : This work was an initial effort in the use of voice data entry for information data handling. The objective of this effort was to develop the technology for a large vocabulary (1000 word) isolated word recognition system capable of quick adaptation and high accuracy for a limited number of people. Techniques for word boundary detection, noise suppression, and frequency sealing were examined. Tests were conducted on a 1000 word and a 100 word unstructured vocabulary. Recognition accuracies of 30.5% and 66% were obtained for the untrained case and 62.4% and 90% after training each word once. (Author)


Patent
27 Jun 1978
TL;DR: In this paper, the quality detecting signal according to the error rate of the reception signal is obtained, which can be used to judge the quality of the received data signal by obtaining the quality detection signal.
Abstract: PURPOSE:To clearly grasp the data error rate at data transmission and to enable the judgement of quality of data signal, by obtaining the quality detecting signal according to the error rate of the reception signal.

Patent
01 Dec 1978
TL;DR: In this article, the authors aim to improve the actual time of the data transfer on a time sharing circuit and the delivery error rate by completing the coding or decoding in the time slot in one assignment of a time-sharing circuit.
Abstract: PURPOSE:To improve the actual time of the data transfer on a time sharing circuit and the delivery error rate, by completing the coding or decoding in the time slot in one assignment of a time sharing circuit


Journal ArticleDOI
TL;DR: In this paper, the changes of parameters of a statistical model of errors of magnetic tapes are discussed due to the increase of packing density, which will be helpful to determine the tape formats for digital audio recorders with stationary head.
Abstract: In spite of the great advance in performance, digital audio recorders have some critical points in handling and reliability resulted from the code errors These code errors are caused by (1) drop outs of tapes, by (2) peak, shift, jitter or noise, and by (3) finger prints, injuring tape edge, or dusts. (1) and (3) will cause rather burst errors and (2) will cause random errors. But, in effect, the burst and random errors cannot be separated discretely, and the trend is described by one parameter “bit error correlation coefficient.” In this paper, the changes of parameters of a statistical model of errors of magnetic tapes are discussed due to the increase of packing density. That is not only the increase of error rate but also the decrease of bit error correlation coefficient. Various error correcting schemes are evaluated by means of computer simulation for various values of error rate and bit error correlation coefficient. Another important point to increase the reliability of the systems is error interpolation ability when errors exceed the ability of error correcting schemes. These simulations will be helpful to determine the tape formats for digital audio recorders with stationary head.


Patent
12 Aug 1978
TL;DR: In this article, the authors propose to decrease the error rate of process writing, reading and prossing data blocks and processing time in a tape control system by setting the idly feeding distance of the tape correspondingly to the existence of the error detection of the data block.
Abstract: PURPOSE:To decrease the error rate of process writing, reading and prossing data blocks and processing time in a tape control system by setting the idly feeding distance of the tape correspondingly to the existence of the error detection of the data block.

Journal ArticleDOI
TL;DR: When using experimental designs with more than three treatments, an investigator is often confronted with the problem of locating mean differences following rejection of the overall null hypothesis, and conceptual and procedural differences between a priori and a posteriori multiple comparisons arise.
Abstract: When using experimental designs with more than three treatments, an investigator is often confronted with the problem of locating mean differences following rejection of the overall null hypothesis. That is, if no specific hypotheses have been proposed in advance that are amenable to analysis by a priori procedures (e.g., orthogonal comparisons using either the t or F ratio), it becomes necessary to use "data snooping" procedures a posteriori (Hays, 1973; Kirk, 1968; Lindquist, 1956; Winer, 1971). Conceptual and procedural differences between a priori and a posteriori multiple comparisons arise, in part, out of concern with error rate. That is, "Should the probability of committing a Type I error be set at a for each individual comparison or should the probability of an error equal a or less for some larger conceptual unit such as the collection of comparisons?" (Kirk, 1968, p.78). Conceptually, the problem is one of decision-a decision on the "conceptual unit for error rate" (e.g., individual comparison, hypothesis, family of comparisons, or the experiment). Confusion arises because, as the number of comparisons increases, so does the probability of making a false positive (Type I error) (Kirk, 1968; Ryan, 1962). Procedurally, the problem is one "of regulating and apportioning the Type I error rate" (Winer, 1971, p. 199). "For planned orthogonal comparisons, contemporary practice in the behavioral sciences favors setting the Type I error probability at a for each comparison. For planned and unplanned nonorthogonal comparisons it is suggested