scispace - formally typeset
Search or ask a question

Showing papers on "Word error rate published in 1983"


Journal ArticleDOI
TL;DR: In this paper, a prediction rule is constructed on the basis of some data, and then the error rate of this rule is estimated in classifying future observations using cross-validation.
Abstract: We construct a prediction rule on the basis of some data, and then wish to estimate the error rate of this rule in classifying future observations. Cross-validation provides a nearly unbiased estimate, using only the original data. Cross-validation turns out to be related closely to the bootstrap estimate of the error rate. This article has two purposes: to understand better the theoretical basis of the prediction problem, and to investigate some related estimators, which seem to offer considerably improved estimation in small samples.

2,331 citations


Journal ArticleDOI
TL;DR: The results of a new method based on rate-distortion speech coding (speech coding by vector quantization), minimum cross-entropy pattern classification, and information-theoretic spectral distortion measures for discrete utterance speech recognition are presented.
Abstract: The results of a new method are presented for discrete utterance speech recognition. The method is based on rate-distortion speech coding (speech coding by vector quantization), minimum cross-entropy pattern classification, and information-theoretic spectral distortion measures. Separate vector quantization code books are designed from training sequences for each word in the recognition vocabulary. Inputs from outside the training sequence are classified by performing vector quantization and finding the code book that achieves the lowest average distortion per speech frame. The new method obviates time alignment. It achieves 99 percent accuracy for speaker-dependent recognition of a 20 -word vocabulary that includes the ten digits, with higher accuracy for recognition of the digit subset. For speaker-independent recognition, the method achieves 88 percent accuracy for the 20 -word vocabulary and 95 percent for the digit subset. Background of the method, detailed empirical results, and an analysis of computational requirements are presented.

92 citations


Patent
Donald W. Peterson1
11 Mar 1983
TL;DR: In this article, an error detection and correction circuit is used to detect and correct transient errors in information readout of a memory location and generate an error signal which interrupts the microprocessor.
Abstract: A microcomputer system in which transient errors occurring in a memory are corrected and logged by a program controlled microprocessor and a simple error detection and correction circuit. When an error occurs in information readout of a memory location, the error detection and correction circuit is responsive to the error to (1) store the address of memory block containing the location, (2) store the type of error, and (3) generate an error signal which interrupts the microprocessor. In response to the interrupt, the microprocessor enters an interrupt routine to: (1) identify the block of memory locations in which the error occurred, (2) determine the type of error, (3) reaccess each memory location of the memory block to effect a rereading thereof, (4) receive each word of readout information, corrected if necessary by the error detection and correction circuit, (5) rewrite each of the received words back into the memory at the proper reaccessed memory location, (6) read out each of the rewritten locations to determine if any error is still present which would indicate a permanent rather than a transient error, and (7) finally, log the error in an error rate table if it is a transient error.

78 citations


Proceedings ArticleDOI
01 Apr 1983
TL;DR: New technique for use in a word recognition system where word templates are represented as sequences of descrete phoneme-like (pseudo-phoneme) templates which are automatically determined from a training set of word utterances by a clustering technique.
Abstract: This paper describes new technique for use in a word recognition system. This recognition system is especially efffective in speaker-dependent large vocabulary word recognition based on multiple reference templates. In this system, word templates are represented as sequences of descrete phoneme-like (pseudo-phoneme) templates which are automatically determined from a training set of word utterances by a clustering technique. In speaker-dependent 641 city names word recognition experiments, 96.3% recognition accuracy was obtained using 256 phoneme-like templates.

56 citations


Journal ArticleDOI
TL;DR: The whole word pattern-matching principles used in these machines are described, and it is shown how these principles can be extended to deal with continuously spoken sequences of words.
Abstract: Machines that recognize isolated words from a small, predefined vocabulary have been commercially available for many years. The whole word pattern-matching principles used in these machines are described, and it is shown how these principles can be extended to deal with continuously spoken sequences of words. Details are given of the resulting connected word recognition algorithm which has several novel features and potentially useful extensions. The algorithm has already been implemented in real-time hardware, which will be used to explore the full potential and limitations of the method in many different applications.

51 citations


Journal ArticleDOI
TL;DR: Lower complexity decoding approaches are presented that can achieve asymptotic optimality of error rate while being computationally faster and simpler than MLSE for many modulations and error rate performance can be traded for complexity reduction.
Abstract: Digital angle modulations having input symbol memory can be demodulated using maximum likelihood sequence estimation (MLSE or Viterbi decoding). The demodulation of the more bandwidth efficient of these can require a large number of computations. In this paper, lower complexity decoding approaches are presented. These decoders use a predetermined processing order and a reduced number of survivor signals, S , at every-time NT . Processing is performed on the signal sequences using metrics (likelihoods) obtained by a matched filter bank similar to that needed for MLSE. The decoders can achieve asymptotic optimality of error rate while being computationally faster and simpler than MLSE for many modulations. In addition, error rate performance can be traded for complexity reduction. Expected performance has been verified for representative modulations.

51 citations



PatentDOI
TL;DR: In this paper, a low cost, speaker independent, limited vocabulary, word recognizing microcomputer was proposed, which divides each spoken word into a series of word states, determines the length of each state and classifies each state as fricative, vowel-like, or silent.
Abstract: A low cost, speaker independent, limited vocabulary, word recognizing microcomputer. The microcomputer divides each spoken word into a series of word states, determines the length of each state and classifies each state as fricative, vowel-like, or silent. The incoming speech pattern, in the form of two arrays: an array of classified word states and an array of associated word lengths is then compared sequentially with a series of templates, defining the limited vocabulary stored in the microcomputer's memory. Where the states match, an error score is generated based on the difference in lengths between the template lengths and the word state lengths. Provision is made for recognizing a spoken word as a template word even when the array of states representing the spoken word is not identical to an array of states in any of the template words. This permits recognition of the same word by the microcomputer even when the word is spoken in substantially different ways.

45 citations


Patent
Jonathan S. Turner1
07 Dec 1983
TL;DR: A trunk controller and processor arrangement for monitoring the error rate occurring in packets received from a high speed trunk is described in this paper. But this work assumes that each trunk controller has an error rate monitoring circuit.
Abstract: A trunk controller and processor arrangement for monitoring the error rate occurring in packets received from a high speed trunk. Within a packet switching system, packets comprising logical addresses, and voice/data information are communicated through the system by packet switching networks which are interconnected by high speed digital trunks with each of the latter being directly terminated on both ends by trunk controllers. During initial call setup of a particular call, central processors associated with each network in the desired route store the necessary logical to physical address information in the controllers which perform all logical to physical address translation on subsequent packets of the call. Each network comprises stages of switching nodes which are responsive to the physical address associated with a packet by a controller to communicate that packet to a designated subsequent node. Each trunk controller has an error rate monitoring circuit for measuring the error rate occurring in packets during transmission over the attached trunk. The error rate circuit notifies the associated processor when error rate excursions increase or decrease in excess of a multitude of processor specified percentages of error rate.

41 citations


Journal ArticleDOI
TL;DR: The error rate performance of the proposed demodulation method is theoretically and experimentally studied for quaternary DPSK and experimental results agree with the theory, which indicates that performance is superior to conventional DPSK, but poorer than coherent detection.
Abstract: Theoretical analysis and experimental results for a DPSK system with nonredundant error correction are described. The error correction capability of the proposed demodulation method is achieved without utilizing additional bandwidth. The demodulator utilizes outputs of differentially coherent detectors that employ the received signal delayed by two or more time slots as references. These outputs are shown to be the parity check sums of two or more conSecutive outputs of the conventional detector under noise-free conditions. The error rate performance of the proposed demodulation method is theoretically and experimentally studied for quaternary DPSK. Experimental results agree with the theory, which indicates that performance is superior to conventional DPSK by 1.2 dB, but poorer than coherent detection by 1.3 dB. This method can be applied effectively to TDMA communications and to on-board regenerative repeaters.

41 citations


Proceedings ArticleDOI
01 Apr 1983
TL;DR: The experiments reported here are the first in which a direct comparison is made between two conceptually different methods of treating the non-stationarity problem in speech recognition by implicitly dividing the speech signal into quasi-stationary intervals.
Abstract: A method for speaker independent isolated digit recognition based on modeling entire words as discrete probabilistic functions of a Markov chain is described. Training is a three part process comprising conventional methods of linear prediction coding (LPC) and vector quantization of the LPCs followed by an algorithm for estimating the parameters of a hidden Markov process. Recognition utilizes linear prediction and vector quantization steps prior to maximum likelihood classification based on the Viterbi algorithm. Vector quantization is performed by a K-means algorithm which finds a codebook of 64 prototypical vectors that minimize the distortion measure (Itakura distance) over the training set. After training based on a 1,000 token set, recognition experiments were conducted on a separate 1,000 token test set obtained from the same talkers. In this test a 3.5% error rate was observed which is comparable to that measured in an identical test of an LPC/DTW (dynamic time warping) system. The computational demand for recognition under the new system is reduced by a factor of approximately 10 in both time and memory compared to that of the LPC/DTW system. It is also of interest that the classification errors made by the two systems are virtually disjoint; thus the possibility exists to obtain error rates near 1% by a combination of the methods. In describing our experiments we discuss several issues of theoretical importance, namely: 1) Alternatives to the Baum-Welch algorithm for model parameter estimation, e.g., Lagrangian techniques; 2) Model combining techniques by means of a bipartite graph matching algorithm providing improved model stability; 3) Methods for treating the finite training data problem by modifications to both the Baum-Welch algorithm and Lagrangian techniques; and 4) Use of non-ergodic Markov chains for isolated word recognition. We note that the experiments reported here are the first in which a direct comparison is made between two conceptually different (i.e. parametric and non-parametric) methods of treating the non-stationarity problem in speech recognition by implicitly dividing the speech signal into quasi-stationary intervals.

Proceedings ArticleDOI
01 Apr 1983
TL;DR: FEATURE as discussed by the authors is a speaker-independent isolated letter recognition system that performs a series of feature measurements on an input utterance, then classifies the sound as one of 26 English letters using statistical pattern classification techniques.
Abstract: FEATURE is a speaker-independent isolated letter recognition system. The system performs a series of feature measurements on an input utterance, then classifies the sound as one of 26 English letters using statistical pattern classification techniques. Performance was evaluated for 10 male and 10 female speakers. For each speaker tested, the system was trained on 4 tokens of each letter provided by the remaining 19 speakers. The average error rate was 10.5%. The system can be used in either a speaker-independent or dynamic adaptation mode. In this latter mode, the user provides feedback when an error is made, and the system changes the statistical parameters that are used during classification. In this way, the system dynamically adapts to the speech patterns of the current user. The use of tuning produced a decrease in the error rate from 10.5% to 6.2%, averaged across the 20 speakers. FEATURE is significant because it is able to perform fine phonetic distinctions (such as between the letters B-D-E, P-T-G, V-Z, M-N, J-K, I-R) in a speaker- independent mode.

Journal ArticleDOI
M. Kuhn1, H. Tomaschewski
TL;DR: Possibility for improving the recognition accuracy are investigated for a given feature extraction, which is based on a short term spectrum analysis by means of band-pass filtering and a method based on spectral change is investigated, alone and in combination with dynamic programming.
Abstract: For isolated word recognition, possibilities for improving the recognition accuracy are investigated for a given feature extraction, which is based on a short term spectrum analysis by means of band-pass filtering. A number of preprocessing steps are discussed, which are to be applied prior to time alignment via dynamic programming. These preprocessing steps include normalization of short term spectra with respect to the long term spectrum, amplitude normalization and spectral channel contour smoothing. For nonlinear time alignment, a method based on spectral change is investigated, alone and in combination with dynamic programming. The resulting distance measure is incorporated into pattern recognition schemes according to the minimum distance and nearest neighbor principles. The different processing steps are evaluated in a speaker dependent mode of operation separately for two vocabularies: the ten German digits and twelve major German airport city names. In comparison with the use of standard mean normalization of short term spectra and dynamic programming, the aforementioned techniques allow for a performance improvement in terms of error rate reduction by a factor of 3-5, while at the same time offering savings in computing time and reference memory requirements by a factor of 10 and 3, respectively.

Journal ArticleDOI
TL;DR: This is defined, some real examples are given illustrating how poor the optimistic apparent error rate is as an estimate of true future performance, and alternative measures are suggested.
Abstract: Classification and diagnosis are concepts of fundamental importance in medicine. Yet all too frequently in published papers the only measure of performance of a classification rule is the optimistic apparent error rate. This is defined, some real examples are given illustrating how poor it is as an estimate of true future performance, and alternative measures are suggested.

Journal ArticleDOI
TL;DR: A speaker-independent segmentation procedure which automatically adapts the classifier to the speaker-dependent effects of coarticulation and is well suited for a speech input where the number of words in a word string is not known to the recognition system.
Abstract: Recognition of connected words can be performed by segmenting the word string automatically into single-word components which are then classified by a single-word recognition system. We propose and investigate a speaker-independent segmentation procedure which is based completely on statistical principles. An estimation algorithm, adapted to the statistical data of the signal parameters, determines the word boundaries. The statistical data are computed from vocabulary-dependent speech samples of different speakers. The segmentation procedure, which operates independently of the single-word recognizer, has been tested with connected digits. The results show that an estimation algorithm based on quadratic polynomials yields a very reliable segmentation. The segmentation procedure is also well suited for a speech input where the number of words in a word string is not known to the recognition system. Based on the above segmentation procedure, we have carried out several recognition experiments on two-to-four-digit strings. The investigations show that the proposed segmentation algorithm provides an efficient tool to tackle the effects of coarticulation between adjacent words. We present a training procedure which automatically adapts the classifier to the speaker-dependent effects of coarticulation.

Journal ArticleDOI
J.A. Spriet1, P. Herman1
TL;DR: In this paper, a Monte-Carlo simulation study with second order autoregressive models has been carried out and it is found that the properties of the structure discriminating statistics are remarkably independent of the parameter values or the location of the poles.

Proceedings ArticleDOI
01 Apr 1983
TL;DR: A dynamic speaker-adaptation algorithm for the C-MU feature-based isolated letter recognition system, FEATURE, is described and a significant improvement in the recognition performance was observed for different vocabularies as the system tuned to the the characteristics of a new speaker.
Abstract: A dynamic speaker-adaptation algorithm for the C-MU feature-based isolated letter recognition system, FEATURE, is described. The algorithm, based on maximum a posteriori probability estimation techniques, uses the labelled observations input thus far to the classifier, as well as the a priori correlations of the features within and across the various letters or sets of letters (classes). The probability density functions (pdf) of all the classes are updated simultaneously rather than on a class-by-class basis so that the pdf of a given class is updated before any observation from that class has been input. A significant improvement in the recognition performance was observed for different vocabularies as the system tuned to the the characteristics of a new speaker. Finally, the algorithm was compared to simpler forms of dynamic adaptation. It produced a faster decrease of the error rate than the other tuning procedures. After a small number of iterations, however, the various procedures yielded similar results.


Patent
16 Jun 1983
TL;DR: In this paper, the similarity between a candidate word and words registered in a word dictionary is calculated using the similarity measure calculated by a system control part to calculate the similarity of the candidate words and the words in a dictionary.
Abstract: PURPOSE: To perform recognition easily and efficiently by employing processing which utilizes hierarchical structure and performing its recognition retrial processing when the result of recognition of a word in some hierarchy is not obtained CONSTITUTION: A word recognizing part 5 is brought under the control of a system control part 10 to calculate 8 the similarity between a candidate word and words registered in a word dictionary, performing word recognition If the result of recognition of a word in some hierarchy of an input word data string is not obtained in the word recognition, at least one of combination processing, omission recovery processing, integration processing, coupling processing, and separation processing which utilize the hierarchical structure is used to perform its recognition retrial processing Consequently, the word data string is recognaized effectively from the relation among the words forming the hierarchical structure of the input word data string Therefore, a word data string for adress display, etc, is recognized effectively COPYRIGHT: (C)1985,JPO&Japio


01 Jan 1983
TL;DR: In this article, a special module, called expectation system, has been designed and implemented to aid in the speech recognition process, which has been done through the study of repetition and patterns in dialogues.
Abstract: A commercial voice recognizer, the NEC DP-200, has been added to an existing natural language processor, called the Natural Language Computer (NLC). The resulting voice-interactive natural language system is called the Voice Natural Language Computer (VNLC). This system can accept both discrete speech, where a pause must be inserted after each word, and connected speech, where adjacent words do not necessarily have to be separated by a pause. Many errors arise during speech recognition, creating faulty input to the natural language processor. Due to these errors, system performance drops dramatically. In connected speech, errors were too serious to be repaired by low level linguistic and domain information. To reach the goal of a viable connected-speech-interactive natural language system, special software was developed to provide higher level knowledge as an aid to error correction. The higher level knowledge provided for this purpose is called 'expectation'. A special module, called an expectation system, has been designed and implemented to aid in the speech recognition process. This has been done through the study of repetition and patterns in dialogues. As a user talks to VNLC, the expectation system stores information about the user's dialogues. It then attempts to find patterns that are repeated in the dialogues and to create a more general dialogue based on these patterns. This generalized dialogue is then used as an aid in error correcting future sentences by making predictions about what might be said when. An analysis was made concerning the error correcting power that could be anticipated from such a technique. Tests were run with results showing error correction capabilities quite similar to the theoretical predictions. Tests were also run on the system using human subjects to determine the performance of expectation in various dialogue situations. Results of this experiment indicate that the system is capable of reducing an average sentence error rate of 53% to less than 8%. Finally, the expectation system is shown to be a technique viable in predicting what might happen in any situation that tends to be repeated. Such situations include one so common as a trip to a restaurant.

Journal ArticleDOI
Jean Hudson1, John Haworth1
01 Jul 1983-Literacy

Journal ArticleDOI
TL;DR: An error correction system for use with multitrack parallel recording of High Density Digital Recording on magnetic tape corrects for error bursts of any length on two tracks, simultaneously, providing improvements in bit error rate.
Abstract: An error correction system for use with multitrack parallel recording of High Density Digital Recording on magnetic tape is described. The system corrects for error bursts of any length on two tracks, simultaneously, providing improvements in bit error rate on the order of 106Calculated theoretical performance curves are provided and compared with actual test data. The test described is one in which new, unconditioned magnetic tape was used, which provided an uncorrected error rate on the order of 10-5and a corrected, or output error rate less than 1 in 1010.

Journal ArticleDOI
TL;DR: The system comprises a low‐cost solution to the problem of high‐volume, fast data recording at remote locations when a small error rate can be tolerated and is capable of recording and playing back 172 800 bytes/s.
Abstract: A system is described that records digital data onto video cassettes and writes the data to a computer tape for subsequent processing. The system is capable of recording and playing back 172 800 bytes/s. This corresponds to the maximum rate achievable with a nine‐track computer tape drive writing 1600 bits per inch at 125 ips. A standard two‐hour cassette has a capacity of 1.2 Gbytes. Six checkbits are written with each 16‐bit word to facilitate error detection and correction. Upon playback, the data are written via DMA into a computer and then to a nine‐track computer tape. Error rates of less than 1 word in 300 000 have been achieved with an off‐the‐shelf portable video recorder and commercially available tape. The system comprises a low‐cost solution to the problem of high‐volume, fast data recording at remote locations when a small error rate can be tolerated.


Proceedings ArticleDOI
Aaron E. Rosenberg1
14 Apr 1983
TL;DR: A probabilistic model is developed to account for the error rate behavior of isolated word speech recognition systems and results indicate that two-way mixture distributions account quite well for the experimental performance results.
Abstract: A probabilistic model is developed to account for the error rate behavior of isolated word speech recognition systems. Two kinds of error are examined, confusion error, an a priori characterization of a recognizer which measures differences between words, and recognition rank error, an a posteriori characterization, which, in addition to taking into account differences between words, accounts for differences between different tokens of the same word. It is shown that these kinds of error can be modelled by describing recognition trials as Bernoulli trials. Good models of error rate behavior as a function of vocabulary size can be obtained if the distributions of confusion or rank number are considered to be mixtures of binomial distributions. The data obtained from a recent experiment in isolated word recognition with a large vocabulary, (1109 words), are used to evaluate the model. Model functions based on mixture distributions are fit by means of an optimization algorithm to experimental error rate functions obtained from each of six talkers and three partitions of the vocabulary. The results indicate that two-way mixture distributions account quite well for the experimental performance results.

Patent
19 Dec 1983
TL;DR: In this paper, the problem of controlling a correcting operation control circuit on the basis of the output of an error rate counting circuit which counts the code error rate of an input signal was addressed.
Abstract: PURPOSE:To control a correcting operation control circuit on the basis of the output of an error rate counting circuit which counts the code error rate of an input signal and to start and stop error correcting operation according to the code error rate of the input signal, by providing the counting circuit, control circuit, etc. CONSTITUTION:Respective symbols from a data and parity symbol input terminal 1 are inputted to the 1st- the 3rd arithmetic circuits 2-4, which output their syndromes; and a syndrome checking circuit 5 checks the 1st- the 3rd syndromes and when one syndrome is found, it is set in an R-S latch 6 as an error. At the same time, an error detection pulse is sent to the error rate counting circuit 14 and the circuits 3 and 4 perform specific operations; and a coincidence detecting circuit 7 checks the contents of the circuits 2-4 to apply a coincidence pulse to an AND gate 16. Then, the circuit 14 counts errors within a specific period to control the correcting operation control circuit 15, whose output is applied to the gate 16, which sends its output to an error address latch 9 to start and stop the error correcting operation.

Proceedings ArticleDOI
10 Nov 1983
TL;DR: This paper describes an error detection and correction system, based on an interleaved Reed-Solomon code, that is capable of correcting multiple burst errors, and achieves an increase in reliability of more than ten orders of magnitude.
Abstract: Due to the relatively high raw error rate on present day Optical Media, a high performance, real-time error detection and correction system was necessary to achieve the kind of data reliability required in computer storage systems. This paper describes such an error detection and correction system, based on an interleaved Reed-Solomon code, that is capable of correcting multiple burst errors. A pipelined architecture combines several special-purpose processors designed for specific decoding functions. This system, operating at speeds of up to 30 megabits per second, achieves an increase in reliability of more than ten orders of magnitude.© (1983) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

Book ChapterDOI
01 Jan 1983
TL;DR: Jet Propulsion Laboratory flight projects and the Deep Space Network require accurate prediction of telecommunications link capability, and analyst commonly translates error rates into corresponding signal-to-noise ratios or received signal power levels to validate predicted capability.
Abstract: Jet Propulsion Laboratory flight projects and the Deep Space Network require accurate prediction of telecommunications link capability Link capability determines spacecraft command message error rate, science data telemetry error rate, and radiometric angle, velocity, and position errors during a mission To establish link performance capability, the analyst commonly translates these error rates into corresponding signal-to-noise ratios or received signal power levels To validate predicted capability, the analyst compares actual (measured) signal-to-noise ratios or signal levels with previously predicted values for the same link configuration and time

Journal ArticleDOI
TL;DR: In this article, a word model is used to detect nonspeech artifacts, such as lip smacks, tongue and teeth clicks, and breath noise, in isolated utterance recognition.
Abstract: Endpoint detection is a critical issue for several types of isolated utterance recognizers, because improper endpoints often result in recognition errors. Endpoint errors often stem from nonspeech artifacts, namely lip smacks, tongue and teeth clicks, and breath noise. Endpoint detectors based only on energy thresholds cannot correctly reject these artifacts, but adding a word model allows most of these artifacts to be properly rejected. The rules which implement the word model are (1) the word cannot begin or end with two released plosives, (2) word initial stop gaps are less than 120 ms and word final ones less than 200 ms, (3) a word must contain a vocalic nucleus and be at least 100 ms in length, (4) word final sounds containing only mid‐frequency energy are breath noise. The detection algorithm has been implemented on a Heuristics Speech Recognizer and tested using the Texas Instruments isolated word data base. The word model based system substantially reduced the error rate relative to an energy thr...