scispace - formally typeset
Search or ask a question

Showing papers on "Word error rate published in 1981"


Journal ArticleDOI
TL;DR: In this paper, a set of functions of time obtained from acoustic analysis of a fixed, sentence-long utterance are extracted by means of LPC analysis successively throughout an utterance to form time functions, and frequency response distortions introduced by transmission systems are removed.
Abstract: This paper describes new techniques for automatic speaker verification using telephone speech. The operation of the system is based on a set of functions of time obtained from acoustic analysis of a fixed, sentence-long utterance. Cepstrum coefficients are extracted by means of LPC analysis successively throughout an utterance to form time functions, and frequency response distortions introduced by transmission systems are removed. The time functions are expanded by orthogonal polynomial representations and, after a feature selection procedure, brought into time registration with stored reference functions to calculate the overall distance. This is accomplished by a new time warping method using a dynamic programming technique. A decision is made to accept or reject an identity claim, based on the overall distance. Reference functions and decision thresholds are updated for each customer. Several sets of experimental utterances were used for the evaluation of the system, which include male and female utterances recorded over a conventional telephone connection. Male utterances processed by ADPCM and LPC coding systems were used together with unprocessed utterances. Results of the experiment indicate that verification error rate of one percent or less can be obtained even if the reference and test utterances are subjected to different transmission conditions.

1,187 citations


Journal ArticleDOI
TL;DR: The results of the experiments show that there is only a slight difference between the recognition accuracies for statistical features and dynamic features over the long term, and it is more efficient to use statistical features than dynamic features.
Abstract: This paper describes results of speaker recognition experiments using statistical features and dynamic features of speech spectra extracted from fixed Japanese word utterances. The speech wave is transformed into a set of time functions of log area ratios and a fundamental frequency. In the case of statistical features, a mean value and a standard deviation for each time function and a correlation matrix between these functions are calculated in the voiced portion of each word, and after a feature selection procedure, they are compared with reference features. In the case of dynamic features, the time functions are brought into time registration with reference functions. The results of the experiments show that there is only a slight difference between the recognition accuracies for statistical features and dynamic features over the long term. Since the amount of calculation necessary for recognition using statistical features is only about one-tenth of that for recognition using dynamic features, it is more efficient to use statistical features than dynamic features. When training utterances are recorded over ten months for each customer and spectral equalization is applied, 99.5 percent and 96.3 percent verification accuracies can be obtained for input utterances ten months and five years later, respectively, using statistical features extracted from two words. Combination of dynamic features with statistical features can reduce the error rate to half that obtained with either one alone.

131 citations


Journal ArticleDOI
TL;DR: In this paper, the effects of age-of-acquisition on word naming speed and auditory recognition of words presented at a low volume were investigated and the results are interpreted as supporting the view that the age of acquisition variable mainly affects word production and has little effect on word recognition processes.
Abstract: This paper reports two experiments concerning the effects of word age-of-acquisition on word naming speed and auditory recognition of words presented at a low volume. The first experiment found significant facilitating effects of word age-of-acquisition in word naming even when word length, frequency and familiarity were taken into account. The second experiment found no evidence of age-of-acquisition effects in auditory word recognition. The results are interpreted as supporting the view that the age-of-acquisition variable mainly affects word production and has little effect on word recognition processes.

76 citations


01 Jan 1981
TL;DR: The results of the experiments show that there is only a slight differ- ence between the recognition accuracies for statistical features and dy- namic features over the long term, and it is more efficient to use statistical features than dynamic features.
Abstract: This paper describes results of speaker recognition experi- ments using statistical features and dynamic features of speech spectra extracted from fixed Japanese word utterances. The speech wave is transformed into a set of time functions of log area ratios and a funda- mental frequency. In the case of statistical features, a mean value and a standard deviation for each time function and a correlation matrix be- tween these functions are calculated in the voiced portion of each word, and after a feature selection procedure, they are compared with refer- ence features. In the case of dynamic features, the time functions are brought into time registration with reference functions. The results of the experiments show that there is only a slight differ- ence between the recognition accuracies for statistical features and dy- namic features over the long term. Since the amount of calculation necessary for recognition using statistical features is only about one-tenth of that for recognition using dynamic features, it is more efficient to use statistical features than dynamic features. When training utterances are recorded over ten months for each customer and spectral equalization is applied, 99.5 percent and 96.3 percent verification accuracies can be obtained for input utterances ten months and five years later, respec- tively, using statistical features extracted from two words. Combination of dynamic features with statistical features can reduce the error rate to half that obtained with either one alone.

52 citations


Proceedings ArticleDOI
01 Apr 1981
TL;DR: Improvements in discriminability among similar words can be achieved by modifying the pattern similarity algorithm so that the recognition decision is made in two passes.
Abstract: One of the major drawbacks of the standard pattern recognition approach to isolated word recognition is that poor performance is generally achieved for word vocabularies with acoustically similar words. This poor performance is related to the pattern similarity (distance) algorithms that are generally used in which a global distance between the test pattern and each reference pattern is computed. Since acoustically similar words are, by definition, globally similar, it is difficult to reliably discriminate such words, and a high error rate is obtained. By modifying the pattern similarity algorithm so that the recognition decision is made in two passes, improvements in discriminability among similar words can be achieved. In particular, on the first pass the recognizer provides a set of global distance scores which are used to decide a class (or a set of possible classes) in which the spoken word is estimated to belong. On the second pass a locally weighted distance is used to provide optimal separation among words in the chosen class (or classes) and the recognition decision is made on the basis of these local distance scores. For a highly complex vocabulary (letters of the alphabet, digits, and 3 command words) recognition improvements of from 3 to 7 percent were obtained using the two-pass recognition strategy.

32 citations


Proceedings ArticleDOI
01 Apr 1981
TL;DR: This algorithm is shown to be significantly more efficient than the one proposed by Sakoe while solving the exact same problem and the effectiveness of the proposed algorithm for connected digit recognition is experimentally verified.
Abstract: The technique of dynamic time warping has proven itself reliable and robust for a wide variety of isolated word recognition tasks. Recently extensions of the algorithm have been investigated for application to the problem of connected word recognition. In this paper a level building technique is proposed for optimally aligning a test pattern, consisting of a sequence of connected words, with a sequence of isolated word reference patterns. This algorithm is shown to be significantly more efficient than the one proposed by Sakoe while solving the exact same problem. Implementation parameters for the level building algorithm are presented and the effectiveness of the proposed algorithm for connected digit recognition is experimentally verified.

31 citations


Journal ArticleDOI
TL;DR: Modifying the pattern-similarity algorithm so that the recognition decision is made in two passes can achieve improvements in discriminability among similar words, and for a highly complex vocabulary, this strategy is obtained.
Abstract: One of the major drawbacks of the standard pattern-recognition approach to isolated word recognition is that poor performance is generally achieved for word vocabularies with acoustically similar words. This poor performance is related to the pattern similarity (distance) algorithms that are generally used in which a global distance between the test pattern and each reference pattern is computed. Since acoustically similar words are, by definition, globally similar, it is difficult to reliably discriminate such words, and a high error rate is obtained. By modifying the pattern-similarity algorithm so that the recognition decision is made in two passes, we can achieve improvements in discriminability among similar words. In particular, on the first pass the recognizer provides a set of global distance scores which are used to decide a class (or a set of possible classes) in which the spoken word is estimated to belong. On the second pass we use a locally weighted distance to provide optimal separation among words in the chosen class (or classes), and make the recognition decision on the basis of these local distance scores. For a highly complex vocabulary (letters of the alphabet, digits, and three command words), we obtain recognition improvements of from 3 to 7 percent using the two-pass recognition strategy.

30 citations


PatentDOI
TL;DR: In this paper, the reference candidate series of overlap-words is transformed under dynamic time warping so as to time-match the utterance series of overlapping-words, i.e., words whose first phoneme is the end phoneme of the preceding word in a string of words.
Abstract: Recognition of continuous speech by comparison with prestored isolated words may be confused by the merging together of spoken adjacent words (coarticulation). Improved recognition is attained by generating overlap-words, e.g., words whose first phoneme is the end phoneme of the preceding word in a string of words. The reference candidate series of overlap-words is transformed under dynamic time warping so as to time-match the utterance series of overlap-words.

30 citations


01 Jan 1981
TL;DR: In this paper, a pattern matching word recognition system has been modified in order to emphasize the transient parts of speech in the similarity mesure, the technique is to weight the word distances with a norma...
Abstract: A pattern matching word recognition system has been modified in order to emphasize the transient parts of speech in the similarity mesure. The technique is to weight the word distances with a norma ...

30 citations


Journal ArticleDOI
TL;DR: This article reported two experiments concerning the effects of word age-of-acquisition and other word attributes on visual recognition thresholds and found that word length and frequency were the major determinants of the exposure durations required for correct' recognition of visually presented words.
Abstract: This paper reports two experiments concerning the effects of word age-of-acquisition and other word attributes on visual recognition thresholds. The results indicated that word length and frequency were the major determinants of the exposure durations required for correct' recognition of visually presented words. Apparent age-of-acquisition effects were redundant on length and frequency.

28 citations


Patent
16 Apr 1981
TL;DR: In this article, the data transmission rate of a digital data transmission system is automatically varied to maintain the measured error rate within predetermined limits to achieve the highest transmission rate consistent with predetermined error rate limits.
Abstract: The data transmission rate of a digital data transmission system is automatically varied to maintain the measured error rate within predetermined limits to achieve the highest transmission rate consistent with predetermined error rate limits. The initial data transmission rate is determined by varying the data transmission rate until the measured error rate reaches the upper limit thereof and then transmitting at slightly reduced data transmission rate.

Patent
15 Apr 1981
TL;DR: In this paper, a hierarchical communication system has multipaths for different levels of the heirarchy, each set of paths is assigned a criticalness to the successful operation of the system and error rates for all of the paths are monitored.
Abstract: A hierarchical communication system has multipaths for different levels of the heirarchy, each set of paths is assigned a criticalness to the successful operation of the system. Error rates for all of the paths are monitored. A threshold for defining an unusable data path is based upon the criticalness of the path to successful operation. That is, the more critical the path, the higher the error rate that will be sustained. A specific embodiment employs shift registers for indicating the error rate of the last predetermined number of usages of the given paths. A mass storage system employing the error-rate system is described.

Journal ArticleDOI
TL;DR: It is shown that, based on results from some simple word spotting and connected word recognition experiments, the local minimum method performs considerably better than the fixed-range method.
Abstract: Several variations on algorithms for dynamic time warping for speech processing applications have been proposed. This paper compares two of these algorithms, the fixed-range method and the local minimum method. We show that, based on results from some simple word spotting and connected word recognition experiments, the local minimum method performs considerably better than the fixed-range method. We describe explanations of this behavior and techniques for optimizing the parameters of the local minimum algorithm for both word spotting and connected word recognition.

Proceedings ArticleDOI
01 Apr 1981
TL;DR: This paper describes the results of an experiment in which the speech recognition system was applied to the problem of recognizing sentences from the restricted laser-patent corpus when the sentences are read with pauses between the words.
Abstract: This paper describes the results of an experiment in which we have applied our speech recognition system to the problem of recognizing sentences from our restricted laser-patent corpus when the sentences are read with pauses between the words. Except for changes to the phonology and the training data, nothing has been done to adapt the system to isolated word recognition. On 20 sentences a word error rate of 3.1% was obtained. This compares with 8.7% for the same sentences when spoken continuously by the same talker.

Journal ArticleDOI
TL;DR: This paper examined the effect of context on the size of the perceptual unit used in word recognition and found that for subjects with some degree of reading skill, accuracy of recognition is increased for words presented in context as con- trasted with words presented by poor second grade readers.
Abstract: Good and poor readers from the second and fourth grades read words which varied in length from 3 to 6 letters under three exposure conditions; context, miscue and no-context. Word recognition latency for the nouns in each word length category was recorded. An increase in latency relative to word length would suggest component-letter processing, while no increase would suggest holistic processing. Results indicated that under all conditions poor second grade readers used holistic processing. Poor fourth grade readers used holistic processing with context but component-letter processing in no-context and miscue conditions. These findings suggest that the size of the word recognition unit is sensitive to reader skill and con- text condition. The effect of context on speed and accuracy of word recognition has been well documented. It is known, for example, that for subjects with some degree of reading skill, accuracy of recognition is increased for words presented in context as con- trasted with words presented in isolation. It is also known that, depending upon the kind of context which is used, context can either facilitate or retard recognition speed (Tulving & Gold, 1963; Samuels, Begy, & Chen, 1975). What has not been well documented is the effect of context on the size of the perceptual unit used in word recognition, Previous studies have examined the size of the perceptual unit for words in isolation. In the present study, while the design also allows us to examine the effects of context on speed of word recognition, the major focus and interest are on the perceptual unit of recognition. There are contrasting models of word recognition which suggest that different size units may be used in the recognition process. Hierarchical models (Estes, 1974; LaBerge & Samuels, 1974) assume that a word code may be activated through outputs The authors would like to acknowledge the efforts of Barbara Dewitz who assisted in collecting the data and Lawrence Hecht who assisted in analyzing the data.


Book ChapterDOI
24 Aug 1981

Journal ArticleDOI
TL;DR: The error performance of differentially coherent detection of a binary differential phase-shift keying (DPSK) system operating over a hard-limiting satellite channel is derived and shows that as long as the symbols are equiprobable, the error probability is not dependent upon the downlink noise correlation.
Abstract: The error performance of differentially coherent detection of a binary differential phase-shift keying (DPSK) system operating over a hard-limiting satellite channel is derived. The main objective is to show the extent of error rate degradation of a DPSK system when a power imbalance exists between the two symbol pulses that are used in a bit decision interval. Consideration is also given to the DPSK error rate performance for the special case of {\em uncorrelated} uplink and {\em correlated} downlink noises at the sampling instants in adjacent time slots. Error probabilities are given as functions of uplink signal-to-noise ratio (SNR) and downlink SNR with different levels of SNR imbalance and different downlink SNR and uplink SNR as parameters, respectively. Our numerical results show that 1) as long as the symbols are equiprobable, the error probability is not dependent upon the downlink noise correlation, regardless of whether there is a power imbalance; 2) error performance is definitely affected by the power imbalance for all cases of symbol distributions; and 3) the error probability does depend upon downlink noise correlation for all levels of power imbalance if the symbol probabilities are not equal.

Proceedings ArticleDOI
01 Apr 1981
TL;DR: The paper describes isolated-word recognition experiments on a multi-speaker speech recognition system that uses Redundant Hash Addressing for fast comparison of the phonemic transcriptions with referent strings stored in a dictionary.
Abstract: The paper describes isolated-word recognition experiments on a multi-speaker speech recognition system. The system is organized in two main stages. At the phonemic recognition stage the phonemic transcription of the speech waveform is produced by simultaneous segmentation and labeling accomplished by the Learning Subspace Method. It directly produces an approximately correct number of phonemes. At the word recognition stage Redundant Hash Addressing is used for fast comparison of the phonemic transcriptions with referent strings stored in a dictionary. The average word recognition accuracy in a 200-word experiment with five speakers was about 95 per cent.

Journal ArticleDOI
TL;DR: The results showed that system performance was best with an analysis parameter set equivalent to what is currently being used in the computer simulations, and that variations in parameter values that reduced computation also degraded performance, whereas variations in parameters that increased computation did not lead to improved performance.
Abstract: For practical hardware implementations of isolated-word recognition systems, it is important to understand how the feature set chosen for recognition affects the overall performance of the recognizer. In particular, we would like to determine whether hardware implementations could be simplified by reducing computation and memory requirements without significantly degrading overall system performance. The effects of system bandwidth (both in training and testing the recognizer) on the performance must also be considered since the conditions under which the system is used may be different than those under which it was trained. Finally, we must take account of the effects of finite word-length implementations, on both the computation of features and of distances, for the system to properly operate. In this paper we present the results of a study to determine the effects on recognition error rate of varying the basic analysis parameters of a linear predictive coding (LPC) model of speech. The results showed that system performance was best with an analysis parameter set equivalent to what is currently being used in the computer simulations, and that variations in parameter values that reduced computation also degraded performance, whereas variations in parameter values that increased computation did not lead to improved performance.

Journal ArticleDOI
TL;DR: Time-altered versions of the Auditec recordings of CID W-22 and Northwestern Auditory Test No. 6 (NU-6) were compared at five time-compressed ratios and scores were consistently poorer than the W- 22's, with significant differences observed at the 30% and 60% time-Compressed conditions.
Abstract: Time-altered versions of the Auditec recordings of CID W-22 and Northwestern Auditory Test No. 6 (NU-6) were compared at five time-compressed ratios (0%, 30%, 40%, 50% and 60%) on twenty-eight norm...

Proceedings ArticleDOI
01 Jan 1981
TL;DR: In this paper, the authors describe a system for connected word recognition in which a sentence in a formal language, uttered without pauses between words, is recognized by finding the grammatically well formed sequence of isolated word templates to which its distance is least.
Abstract: In this paper we describe a system for connected word recognition in which a sentence in a formal language, uttered without pauses between words, is recognized by finding the grammatically well formed sequence of isolated word templates to which its distance is least. This is accomplished by means of a single monolithic algorithm in which temporal registration, segmentation and grammatical analysis are performed simultaneously. The algorithm is a syntax-directed version of the level building dynamic time warping algorithm of Myers and Rabiner. A test was conducted on a total of 208 sentences comprising 1781 words and spoken by two male and two female speakers. The sentences were composed from a 127 word vocabulary according to a moderately complex grammar and semantic structure appropriate to an airline information and reservation task. Test results revealed a 13% sentence error rate and a 6% word error rate.

Proceedings ArticleDOI
01 Apr 1981
TL;DR: A segmentation procedure which is based completely on statistical principles is proposed and investigated, which shows that an estimation algorithm based on quadratic polynomials yields sufficiently accurate segmentation.
Abstract: Recognition of connected word strings can be performed by segmenting the word string automatically into single-word components which are then classified by a single-word recognition system. We propose and investigate a segmentation procedure which is based completely on statistical principles. An estimation algorithm, adapted to the statistical data of the signal parameters, determines the word boundaries. This procedure, which offers several advantages over other methods, has been tested with connected digits. The results show that an estimation algorithm based on quadratic polynomials yields sufficiently accurate segmentation. Recognition results for 2-to 4-digit strings are presented in this paper.

01 Jan 1981
TL;DR: A system for connected word recognition in which a sentence in a formal language, uttered without pauses between words, is recognized by finding the grammatically well formed sequence of isolated word templates to which its distance is least by means of a single monolithic algorithm.
Abstract: In this paper we describe a system for connected word recognition in which a sentence in a formal language, uttered without pauses between words, is recognized by finding the grammatically well formed sequence of isolated word templates to which its distance is least. This is accomplished by means of a single monolithic algorithm in which temporal registration, segmentation and grammatical analysis are performed simultaneously. The algorithm is a syntax-directed version of the level building dynamic time warping algorithm of Myers and Rabiner. A test was conducted on a total of 208 sentences comprising 1781 words and spoken by two male and two female speakers. The sentences were composed from a 127 word vocabulary according to a moderately complex grammar and semantic structure appropriate to an airline information and reservation task. Test results revealed a 13% sentence error rate and a 6% word error rate.

Journal ArticleDOI
TL;DR: A 33 percent increase in linear density at an error rate of 10-10 can be obtained by the proper choice of code relative to MFM without changing the head/media/system interface.
Abstract: The limitations of selected run length limited codes relative to their practical recording density at given error rates are examined with a known head/disc system interface. A method is presented for the evaluation of run length limited codes on the basis of error rate as a function of linear density. Experimental measurements of the intrinsic error rate as a function of the linear density were utilized with theoretical characteristics of the code to determine the practical data density limitations. Depending upon the system configuration, the effectiveness of the code can be limited either by the noise characteristics and/or pattern induced peak shift. A 33 percent increase in linear density at an error rate of 10-10can be obtained by the proper choice of code relative to MFM without changing the head/media/system interface.

Journal ArticleDOI
TL;DR: In this paper, the authors discuss the complex stylistic problems involved in translating the short story "The Lady and the Pedlar" by S. Y. Agnon, which is composed of three sub-systems.
Abstract: In this paper we discuss the complex stylistic problems involved in translating the short story \"The Lady and the Pedlar\" by S. Y. Agnon. Most of these translation difficulties stem from the existence of a \"word System\" (Aphek 1979) in the original Hebrew text. A word System is a matrix of words with a common denominator which may be semantic, phonological, etymological, folk-etymological, or associative. In the original Hebrew version of this short story there is a word System reiated to the semantic concept of \"seeing\" or \"looking\" and its variations and nuances. This word System is composed of three sub-systems:

Journal ArticleDOI
TL;DR: The authors presented an algorithm which chooses a reference template for each word in the vocabulary from a set of N exemplars, which minimizes the worst matching behavior and total error over the N sets of exemplars.
Abstract: Presented here for a speaker dependent system, is an algorithm which chooses a reference template for each word in the vocabulary from a set of N exemplars. The goal of the algorithm is to produce a reference set that minimizes the worst matching behavior and total error over the N sets of exemplars. The results of the experiments presented here show a reduction in the average error rate from 16.4% to 10.2% over a set of 4 male speakers and 4 female speakers.

Patent
11 Mar 1981
TL;DR: In this article, a pseudo television signal from a VTR is amplified at 1, is DC- reproduced by a clamping circuit 2, is led to plural comparators 3a-3c, and is discriminated by plural discrimination levels.
Abstract: PURPOSE:To execute a stable data separation which is low in an error rate of a data, by discriminating a digital data having an error detecting code by plural discrimination levels, detecting an error in accordance with the error detecting code, and contrasting the error rate. CONSTITUTION:A pseudo television signal from a VTR is amplified at 1, is DC- reproduced by a clamping circuit 2, is led to plural comparators 3a-3c, and is discriminated by plural discrimination levels. As for a discriminated data, its error is detected by error detecting circuits 11a-11c, and whenever the error is detected, a detection pulse is provided to ternary counters 13a-13c. When the counters 13a-13c have counted a data error three times, they provide the outputs to monostable multivibrators 14a-14c and an AND circuit 16, and obtain information corresponding to an error rate. Output pulses from the monostable multivibrators are inputted to FFs 15a-15c, respectively, when an output has been provided from the circuit 16, switches 6a-6c are driven selectively by outputs of the FFs, and a separated data from the comparators 3a-3c is sent to a PCM demodulator 7.

Patent
09 Jul 1981
TL;DR: In this paper, the error rate of an in-use transmission line is monitored by monitoring the error rates of a stand-by transmission line and by changing threshold values of the error ratio when it is switched over to a switch control part.
Abstract: PURPOSE:To prevent switching operation from being repeated when transmission lines are switched by monitoring the error rate of an in-use transmission line, by providing two threshold values of the error rate to be changed over and by changing threshold values of the error rate when it is changed over to a stand-by transmission line. CONSTITUTION:While an in-use transmission line is provided between transmitting terminal station 2a and receiving terminal station 4a, a stand-by transmission line is provided between transmitting terminal station 2b and receiving terminal station 4b and receiving terminal station 4a is provided with a method of measuring error rates of transmission lines. Firstly, the threshold value of the error rate is set to 10 and if the error rate of the stand-by system reaches 10 or more, a switching indication signal is sent to switch control part 6, which applies a switching control signal to switch parts 1 and 5 to change the in-use system over to the stand-by system. This switching control signal is applied to receiving terminal station 4a and the threshold value of the error rate is changed over to 10 to measure the error rate of the monitoring pattern of the in-use system, but even if the error rate changes to about 10 , the switching operation is not performed and when it decreases down to 10 or less, the switching indication signal is sent to control part 6 to changes the stand-by system over to the in-use system.

Patent
23 Apr 1981
TL;DR: In this paper, it was proposed to detect the error rate of receiving data only with receiving data, by counting the false error pulses, which has a fixed relation to the number of error data included in receiving data and generated in transmission, during a fixed time and averaging them after D/A conversion.
Abstract: PURPOSE:To make it possible to detect the error rate of receiving data only with receiving data, by counting the number of false error pulses, which has a fixed relation to the number of error data, during a fixed time and by averaging them after D/A conversion. CONSTITUTION:False error pulse generator 1 generates the number of false error pulses which has a fixed relation to the number of error data included in receiving data and generated in transmission. Counter 4 counts false error pulses during the time of the gate pulse of N-bit length generated in a proper position of the transmission data part in the data burst to be measured, and contents of counter 4 are set to register 5. D/A converter 6 generates an analogue signal corresponding to the digital output held in register 7 and integrates this analogue signal by low-pass filter 7 and averages it, and this signal is output through amplifier 8 as a DC output voltage indicating an error rate.