scispace - formally typeset
Search or ask a question

Showing papers on "Word error rate published in 1979"


Journal ArticleDOI
H. Sakoe1
TL;DR: A general principle of connected word recognition is given based on pattern matching between unknown continuous speech and artificially synthesized connected reference patterns and Computation time and memory requirement are both proved to be within reasonable limits.
Abstract: This paper reports a pattern matching approach to connected word recognition. First, a general principle of connected word recognition is given based on pattern matching between unknown continuous speech and artificially synthesized connected reference patterns. Time-normalization capability is allowed by use of dynamic programming-based time-warping technique (DP-matching). Then, it is shown that the matching process is efficiently carried out by breaking it down into two steps. The derived algorithm is extensively subjected to recognition experiments. It is shown in a talker-adapted recognition experiment that digit data (one to four digits) connectedly spoken by five persons are recognized with as high as 99.6 percent accuracy. Computation time and memory requirement are both proved to be within reasonable limits.

289 citations


Journal ArticleDOI
TL;DR: A speaker-independent isolated word recognition system is described which is based on the use of multiple templates for each word in the vocabulary, and shows error rates that are comparable to, or better than, those obtained with speaker-trained isolatedword recognition systems.
Abstract: A speaker-independent isolated word recognition system is described which is based on the use of multiple templates for each word in the vocabulary. The word templates are obtained from a statistical clustering analysis of a large database consisting of 100 replications of each word (i.e., once by each of 100 talkers). The recognition system, which accepts telephone quality speech input, is based on an LPC analysis of the unknown word, dynamic time warping of each reference template to the unknown word (using the Itakura LPC distance measure), and the application of a K-nearest neighbor (KNN) decision rule. Results for several test sets of data are presented. They show error rates that are comparable to, or better than, those obtained with speaker-trained isolated word recognition systems.

245 citations


Journal ArticleDOI
TL;DR: The authors investigated the relationship between the contextual probability of lexical items in spontaneous speech, as measured by the Cloze procedure, and word frequency, and attempted to detegre dete...
Abstract: This study investigated the relationship between the contextual probability of lexical items in spontaneous speech, as measured by the Cloze procedure, and word frequency. It also attempted to dete...

121 citations


Proceedings ArticleDOI
01 Apr 1979
TL;DR: In this paper, a speaker independent, isolated word recognition system is proposed which is based on the use of multiple templates for each word in the vocabulary, which are obtained from a statistical clustering analysis of a large data base consisting of 100 replications of each word (i.e. once by each of 100 talkers).
Abstract: A speaker independent, isolated word recognition system is proposed which is based on the use of multiple templates for each word in the vocabulary. The word templates are obtained from a statistical clustering analysis of a large data base consisting of 100 replications of each word (i.e. once by each of 100 talkers). The recognition system, which uses telephone recordings, is based on an LPC analysis of the unknown word, dynamic time warping of each reference template to the unknown word (using the Itakura LPC distance measure), and the application of a K-nearest neighbor (KNN) decision rule to lower the probability of error. Results are presented on two test sets of data which show error rates that are comparable to, or better than, those obtained with speaker trained, isolated word recognition systems.

120 citations


Journal ArticleDOI
TL;DR: This article examined the effects of word characteristics on word recognition and found that high vocabulary scores were more rapid in Experiment 1 but were slower in Experiment 2, compared to subjects with a low vocabulary score.
Abstract: Previous studies of the effects of word characteristics on word recognition have used orthogonal combinations of word variables and have failed to consider individual differences. The present study examined word naming (Experiment 1) and lexical decision (Experiment 2) tasks using an unrestricted set of words and a correlational analysis. Individual differences were considered using a measure of the subjects’ knowledge of the English vocabulary. The results of Experiment 1 indicated that log (RT) for word naming is affected by word length, word frequency, and the number of syllables in the word; the results of Experiment 2 confirmed the effects of length and frequency but also showed that log (RT) is a function of the age at which the word is introduced to a child’s reading vocabulary. Subjects with a high vocabulary score were more rapid in Experiment 1 but were slower in Experiment 2, compared to subjects with a low vocabulary score. More importantly, high-vocabulary subjects, in both studies, were less affected by word length than the low-vocabulary subjects. The results suggest that subjects do differ in their reading strategy and that word length and word frequency may affect different stages in the word recognition process.

94 citations


Journal ArticleDOI
TL;DR: The next important step is to investigate fully automatic techniques for clustering multiple versions of a single word into a set of speaker‐independent word templates.
Abstract: Recent work at Bell Laboratories has demonstrated the utility of applying sophisticated pattern recognition techniques to obtain a set of speaker‐independent word templates for an isolated word recognition system [Levinson et al., IEEE Trans. Acoust. Speech Signal Process. ASSP‐27 (2), 134–141 (1979); Rabiner et al., IEEE Trans. Acoust. Speech Signal Process.(in press)]. In these studies, it was shown that a careful experimenter could guide the clustering algorithms to choose a small set of templates that were representative of a large number of replications for each word in the vocabulary. Subsequent word recognition tests verified that the templates chosen were indeed representative of a fairly large population of talkers. Given the success of this approach, the next important step is to investigate fully automatic techniques for clustering multiple versions of a single word into a set of speaker‐independent word templates. Two such techniques are described in this paper. The first method uses distance data (between replications of a word) to segment the population into stable clusters. The word template is obtained as either the cluster minimax, or as an averaged version of all the elements in the cluster. The second method is a variation of the one described by Rabiner [IEEE Trans. Acoust. Speech Signal Process. ASSP‐26 (3), 34–42 (1978)] in which averaging techniques are directly combined with the nearest neighbor rule to simultaneously define both the word template (i.e., the cluster center) and the elements in the cluster. Experimental data show the first method to be superior to the second method when three or more clusters per word are used in the recognition task.

71 citations


Proceedings Article
01 Jan 1979

35 citations


Journal ArticleDOI
TL;DR: A distribution-free lower bound on the Bayes error rate is formulated in terms of the asymptotic error rate of the nearest neighbor rule with a reject option and a closed form expression for an upper bound is established.
Abstract: A distribution-free lower bound on the Bayes error rate is formulated in terms of the asymptotic error rate of the nearest neighbor rule with a reject option. Next, a closed form expression for an upper bound of the k th nearest neighbor error rate in terms of the Bayes rate is established. These results are discussed in the framework of recent works on nonparametric estimation of the Bayes error rate.

31 citations


Proceedings ArticleDOI
01 Apr 1979
TL;DR: Two fully automatic techniques for clustering multiple versions of a single word into a set of speaker-independent word templates are investigated and shown to be superior to the second method when three or more clusters per word are used in the recognition task.
Abstract: Recent work at Bell Laboratories has demonstrated the utility of applying sophisticated pattern recognition techniques to obtain a set of speaker independent word templates for an isolated word recognition system [1,2]. In these studies, it was shown that a careful experimenter could guide the clustering algorithms to choose a small set of templates that were representative of a large number of replications for each word in the vocabulary. Subsequent word recognition tests verified that the templates chosen were indeed representative of a fairly large population of talkers. Given the success of this approach, the next important step is to investigate fully automatic techniques for clustering multiple versions of a single word into a set of speaker independent word templates. Two such techniques are described in this paper. The first method uses distance data (between replications of a word) to segment the population into stable clusters. The word template is obtained as either the cluster minimax, or as an averaged version of all the elements in the cluster. The second method is a variation of the one described by Rabiner [3] in which averaging techniques are directly combined with the nearest neighbor rule to simultaneously define both the word template (i.e. the cluster center) and the elements in the cluster. Experimental data shows the first method to be superior to the second method when 3 or more clusters per word are used in the recognition task.

28 citations


Proceedings ArticleDOI
01 Apr 1979
TL;DR: One of these processors, which achieved a 0% error rate on New Raleigh sentences, has been used to decode sentences from the New Raleigh Language without the benefit of syntactic guidance during the decoding process.
Abstract: The statistical training and decoding procedures developed at IBM Research can be used with a wide variety of acoustic processors. We have recently (July and August 1978) achieved error-free or nearly error-free decoding results with several different acoustic processors on sentences from the New Raleigh Language (vocabulary 250 words, perplexity 7.27 words). One of these processors, which achieved a 0% error rate on New Raleigh sentences, has been used to decode sentences from the New Raleigh Language without the benefit of syntactic guidance during the decoding process. On this much more difficult task, it has achieved an error rate of 8.8% at the word level, corresponding to a sentence error rate of 53%. All of these processors are non-segmenting processors which produce output once every 10ms.

19 citations


Journal ArticleDOI
TL;DR: This article found that although recognition test scores were higher for low frequency target words than for high when general population distractors were utilized, recognition test score was higher for high frequency target word when orthographic distractor was used.
Abstract: In two experiments, subjects were asked to learn high and low frequency words, following which they were given a recognition test which employed either orthographic or general population distractors. Results revealed that although recognition test scores were higher for low frequency target words than for high when general population distractors were utilized, recognition test scores were higher for high frequency target words when orthographic distractors were used. It is suggested that the findings of earlier investigators who obtained results revealing superior recognition test scores for lowfrequency words were a function of the kind of distractors which were employed, and that any theory of recognition memory must take into consideration the fact that recognition scores are always related to the kinds of distractors which are used.

01 Aug 1979
TL;DR: Two major experiments were conducted to assess the potential operational utility of state-of-the-art word recognition technology in air traffic control applications and the quality and efficiency of the voice system versus the existing keyboard method of entering complete operational messages.
Abstract: : Two major experiments and a number of subsidiary pilot studies were conducted to assess the potential operational utility of state-of-the-art word recognition technology in air traffic control applications. Experiment I, employing 12 operators or 'talkers', secured baseline data representing the inherent 'best case' recognition accuracy of the system. Three of the subvocabularies of an operational data entry language were tested exhaustively to a total of over 46,000 spoken words. On the average, across all speakers and all three subvocabularies, only 1 percent of the words spoken were erroneously recognized. Subsequently 'tuning' of the recognition algorithm reduced the error rate to less than 0.4 percent. Experiment II compared the quality and efficiency of the voice system versus the existing keyboard method of entering complete operational messages.


Proceedings ArticleDOI
01 Apr 1979
TL;DR: A speaker dependent system for recognizing carefully articulated continuous speech that accepts English sentences composed from a 127 word vocabulary appropriate to an airline information reservation task and achieves 75% sentence recognition.
Abstract: A speaker dependent system for recognizing carefully articulated continuous speech is described. The system accepts English sentences composed from a 127 word vocabulary appropriate to an airline information reservation task. The system is controlled by a finite state parser which generates word candidates and established their temporal locations in hypothetical sentences. The word candidates are evaluated by an LPC distance measure and a dynamic programming algorithm which nonlinearly time aligns isolated word reference templates with the input speech stream. The input is recognized as the hypothetical sentence having the lowest distance according to a well-defined criterion. In a preliminary test based on 100 sentences spoken over dialed up telephone lines by two male talkers, 90% word accuracy, resulting in 75% sentence recognition, was achieved.

Journal ArticleDOI
TL;DR: In this article, the authors assess the Kurzweil reading machine's ability to read three different type styles produced by five different means and conclude that the reading machine can read them well.
Abstract: This study was designed to assess the Kurzweil Reading Machine's ability to read three different type styles produced by five different means. The results indicate that the Kurzweil Reading Machine...

Patent
10 Mar 1979
TL;DR: In this article, the authors proposed to increase the accuracy for the error rate supervisory circuit by adding the noise component to the base band signal before identification for identification and reproduction and then carrying out the logic calculation between the identified reproduction output and the identification reproduction signal of the main route to obtain the false error signal.
Abstract: PURPOSE:To increase the accuracy for the error rate supervisory circuit by adding the noise component to the base band signal before identification for identification and reproduction and then carrying out the logic calculation between the identified reproduction output and the identified reproduction signal of the main route to obtain the false error signal. CONSTITUTION:For the received 4-phase modulated wave, synchronous detection 16 is given with 90 deg. - phase shift 15 between the 4-multiplied 11 signal and the 4-multiplied carrier obtained from the 4-multiplied wave with deletion of the noise component through LPF12. Thus, the base band convereted noise component is drawn out to be added 20 to the base band signal which is obtained by branching off from the main route and before identification and reproduction. Then identified reproduction 21 is carried out, and logic operation 22 is given between the identified reproduction output and the identified reproduction signal of the main route to obtain the false error signal. In this way, a constant corresponding relation can be kept between the deterioration quantity of the error rate of the supervisory circuit group and the real error rate of the real circuit against the phase fluctuation of the timing signal. As a result, the accuracy can be increased for the error rate supervisory circuit.


Journal ArticleDOI
TL;DR: A method for evaluating the character error rate for QPRS in the signal space using the classical Bayes hypothesis-testing technique is presented and may be extended to account for the effect of carrier tracking error and, in principle, phase jitter.
Abstract: A method for evaluating the character error rate for QPRS in the signal space using the classical Bayes hypothesis-testing technique is presented. Since the decision regions are rectangular, analytical expressions for the error rates can be found. This method may be extended to account for the effect of carrier tracking error and, in principle, phase jitter.

Proceedings ArticleDOI
01 Apr 1979
TL;DR: It is proposed to characterize the speech short-term spectrum with a reduced number of parameters (4 to 7) computed from a rough spectral analysis that permits a correct classification of the steady-state French speech sounds pronounced by different speakers.
Abstract: Tracking and identifying the formants in order to perform speech recognition is a time-consuming, error full and speaker-dependent operation. It is proposed to characterize the speech short-term spectrum with a reduced number of parameters (4 to 7) computed from a rough spectral analysis. These parameters permit a correct classification of the steady-state French speech sounds (vowels, including nasals, and unvoiced fricatives) pronounced by different speakers. A word recognition experiment based on the same parameters gives good results with words differing from each other by one phoneme only (single speaker, one learning pass).

Patent
06 Dec 1979
TL;DR: In this article, the authors secure the correspondence between the generation period of an alarm display and the deterioration period of the error rate by counting the code error number within each delay time and every fixed time.
Abstract: PURPOSE:To secure the correspondence the generation period of the alarm display and the deterioration period of the error rate by counting the code error number within each delay time and every fixed time and then generating the alarm signal when counting the fixed error number within each supervisory time. CONSTITUTION:The counter receives the error correction pulse from decording part 1 of the error circuit and counts the error pulses generated in the period of time t1 until being reset by the reset pulse whose repetitive cycle is t1 (t1: the integer to satisfy t1 = T1M; M: the plus integer). The output of counter 2 is shifted by M- step shift register 5, and the M-step output is added through operator 6. When the number of the output is more than N-units, the count-up signal is supplied to FF3, and the output turns to a higher rank. At the same time, FF3 is reset at every time t1 and sampled then via memory 4 to be held for period of time t1. Thus, the correspondence can be secured between the generation period of the alarm display and the deterioration period of the error rate.

Journal ArticleDOI
TL;DR: Two adaptive differential pulse code modulation algorithms are discussed that provide for transmitter-receiver resynchronization when used in a non-zero error rate digital transmission medium.
Abstract: Two adaptive differential pulse code modulation (ADPCM) algorithms are discussed that provide for transmitter-receiver resynchronization when used in a non-zero error rate digital transmission medium. Novel techniques are used in the algorithms (e.g., step-size bias) and in their extensive characterization (e.g., intermodulation distortion measurements).

Patent
28 May 1979
TL;DR: In this paper, a frame synchronous circuit detects the frame synchronizing position of a time-division multiple PCM signal, shift registers 20 23 are provided which have a number of bits equivalent to the word length of the signal, and the phase of a write clock to register 20 23 is compared with that of a switching signal sequentially changing over paralleled outputs of register 2023.
Abstract: PURPOSE:To attain the improvement of speech quality and a decrease in error rate by arranging shift registers for the write control of input PCM signals and by allowing them to be an elastic storage and series-parallel converter circuit. CONSTITUTION:Frame synchronous circuit 18 detects the frame synchronizing position of a time-division multiple PCM signal, shift registers 20 23 are provided which has a number of bits equivalent to the word length of the signal, and the phase of a write clock to registers 20 23 is compared 23 with that of a switching signal sequentially changing over paralleled outputs of registers 20 23. Corresponding to its result, it is decided through write control 19 whether a word of fixed codes in the PCM signal should be written to the 1st and 2nd registers at the same time and the following word should be written to the 3rd register or the following words should be written skipping over the word to be written to the 1st register. Namely, registers 20 22 are used as both an elastic storage and series-parallel converter, thereby realizing the improvement of speech quality and a decrease in error rate.



Proceedings ArticleDOI
02 Apr 1979
TL;DR: This work describes a system developed on a PDP 11/45 computer which recognizes spoken commands and controls a Scheinmann MIT mechanical arm to serve the needs of an immobilized patient.
Abstract: This work describes a system developed on a PDP 11/45 computer which recognizes spoken commands and controls a Scheinmann MIT mechanical arm to serve the needs of an immobilized patient. The speech recognition is discrete word with template matching, using as features zero crossing rate, average absolute magnitude, and frequency and normalized error as derived from the two pole linear predictive analysis. Commands are made up of words from a 16-word vocabulary obeying a specified, syntax. Upon recognition of a command the arm performs the corresponding pre-programmed task. A recognition time of two seconds is attained (including the one second sampling interval), plus an additional one second to verify the word. With the command syntax a recognition rate of 88% is attained, as compared with only 56% when each word is matched against every word in the vocabulary.

Journal ArticleDOI
TL;DR: The results show a significant improvement, of power l.1 dB per degree phase error which can be achieved at an error rate of 10-3 over the range of signal-to-noise ratio values of 32–40 dB.
Abstract: An analysis is presented briefly showing the effect of carrier phase error due to jitter on the error rate in digital communication systems. The information hearing signal is taken to be impaired by peak-limited additive interference and multiplicative noise, such as carrier phase jitter. Probability of error and distance function due to slowly varying phase jitter (φ<π/2) and computer simulation results on the system performance for different signal-to-noise ratios corresponding to bit rates 10.8. 14.4, 18.0 and 21.6 Hi kbits/s and various channel bandwidths 1.2, 2.4, 3.6 and 4.8 k Hz are presented along with the possibility of minimization of such effects using a decision feedback equalizer. The results show a significant improvement, of power l.1 dB per degree phase error which can be achieved at an error rate of 10-3 over the range of signal-to-noise ratio values of 32–40 dB.