scispace - formally typeset
Search or ask a question

Showing papers on "Word error rate published in 1987"


Proceedings ArticleDOI
06 Apr 1987
TL;DR: A new training procedure called multi-style training has been developed to improve performance when a recognizer is used under stress or in high noise but cannot be trained in these conditions.
Abstract: A new training procedure called multi-style training has been developed to improve performance when a recognizer is used under stress or in high noise but cannot be trained in these conditions Instead of speaking normally during training, talkers use different, easily produced, talking styles This technique was tested using a speech data base that included stress speech produced during a workload task and when intense noise was presented through earphones A continuous-distribution talker-dependent Hidden Markov Model (HMM) recognizer was trained both normally (5 normally spoken tokens) and with multi-style training (one token each from normal, fast, clear, loud, and question-pitch talking styles) The average error rate under stress and normal conditions fell by more than a factor of two with multi-style training and the average error rate under conditions sampled during training fell by a factor of four

344 citations


Journal ArticleDOI
TL;DR: This paper has found that a bandpass "liftering" process reduces the variability of the statistical components of LPC-based spectral measurements and hence it is desirable to use such a liftering process in a speech recognizer.
Abstract: In a template-based speech recognition system, distortion measures that compute the distance or dissimilarity between two spectral representations have a strong influence on the performance of the recognizer. Accordingly, extensive comparative studies have been conducted to determine good distortion measures for improved recognition accuracy. Previous studies have shown that the log likelihood ratio measure, the likelihood ratio measure, and the truncated cepstral measures all gave good recognition performance (comparable accuracy) for isolated word recognition tasks. In this paper we extend the interpretation of distortion measures, based upon the observation that measurements of speech spectral envelopes (as normally obtained from standard analysis procedures such as LPC or filter banks) are prone to statistical variations due to window position fluctuations, excitation interference, measurement noise, etc., and may not accurately characterize the true speech spectrum because of analysis model constraints. We have found that these undesirable spectral measurement variations can be partially controlled (i.e., reduced in the level of variation) by appropriate signal processing techniques. In particular, we have found that a bandpass "liftering" process reduces the variability of the statistical components of LPC-based spectral measurements and hence it is desirable to use such a liftering process in a speech recognizer. We have applied this liftering process to several speech recognition tasks: in particular, single frame vowel recognition and isolated word recognition. Using the liftering process, we have been able to achieve an average digit error rate of 1 percent in a speaker-independent isolated digit test. This error rate is about one-half that obtained without the liftering process.

291 citations


Journal ArticleDOI
TL;DR: It is argued that the written language notion of the word has had too much impact on models of spoken word recognition and a view of continuous word recognition is presented which takes into account the alternating pattern of weak and strong syllables in the speech stream.

221 citations


Journal ArticleDOI
TL;DR: Results on the application of several bootstrap techniques in estimating the error rate of 1-NN and quadratic classifiers show that, in most cases, the confidence interval of a bootstrap estimator of classification error is smaller than that of the leave-one-out estimator.
Abstract: The design of a pattern recognition system requires careful attention to error estimation. The error rate is the most important descriptor of a classifier's performance. The commonly used estimates of error rate are based on the holdout method, the resubstitution method, and the leave-one-out method. All suffer either from large bias or large variance and their sample distributions are not known. Bootstrapping refers to a class of procedures that resample given data by computer. It permits determining the statistical properties of an estimator when very little is known about the underlying distribution and no additional samples are available. Since its publication in the last decade, the bootstrap technique has been successfully applied to many statistical estimations and inference problems. However, it has not been exploited in the design of pattern recognition systems. We report results on the application of several bootstrap techniques in estimating the error rate of 1-NN and quadratic classifiers. Our experiments show that, in most cases, the confidence interval of a bootstrap estimator of classification error is smaller than that of the leave-one-out estimator. The error of 1-NN, quadratic, and Fisher classifiers are estimated for several real data sets.

201 citations


Journal ArticleDOI
TL;DR: The experimental results show that the weighted cepstral distance measure works substantially better than both the Euclidean cepStral distance and the log likelihood ratio distance measures across two different databases.
Abstract: A weighted cepstral distance measure is proposed and is tested in a speaker-independent isolated word recognition system using standard DTW (dynamic time warping) techniques. The measure is a statistically weighted distance measure with weights equal to the inverse variance of the cepstral coefficients. The experimental results show that the weighted cepstral distance measure works substantially better than both the Euclidean cepstral distance and the log likelihood ratio distance measures across two different databases. The recognition error rate obtained using the weighted cepstral distance measure was about 1 percent for digit recognition. This result was less than one-fourth of that obtained using the simple Euclidean cepstral distance measure and about one-third of the results using the log likelihood ratio distance measure. The most significant performance characteristic of the weighted cepstral distance was that it tended to equalize the performance of the recognizer across different talkers.

181 citations


Journal ArticleDOI
01 Jul 1987
TL;DR: This paper confirms that the asymptotic classification error rate of the (unweighted) k-nearest neighbor (k-NN) rule is lower than that of any weighted k-NN rule, and presents experimental results that were obtained using a generalized form of a weighting function proposed by Dudani.
Abstract: It was previously proved by Bailey and Jain that the asymptotic classification error rate of the (unweighted) k-nearest neighbor (k-NN) rule is lower than that of any weighted k-NN rule. Equations are developed for the classification error rate of a test sample when the number of training samples is finite, and it is argued intuitively that a weighted rule may then in some cases achieve a lower error rate than the unweighted rule. This conclusion is confirmed by analytically solving a particular simple problem, and as an illustration, experimental results are presented that were obtained using a generalized form of a weighting function proposed by Dudani.

94 citations


Proceedings ArticleDOI
06 Apr 1987
TL;DR: An effort to make a Hidden Markov Model Isolated Word Recognizer (IWR) tolerant to such speech changes caused by speaker stress was made.
Abstract: Most current speech recognition systems are sensitive to variations in speaker style, the following is the result of an effort to make a Hidden Markov Model (HMM) Isolated Word Recognizer (IWR) tolerant to such speech changes caused by speaker stress More than an order-of-magnitude reduction of the error rate was achieved for a 105 word simulated stress database and a 0% error rate was achieved for the TI 20 isolated word database

55 citations


Book ChapterDOI
01 Jan 1987
TL;DR: NEXUS, a computer speech recognition system, incorporates learning heuristics based on this method that permit the system to identify a set of primitive acoustic concepts from experience with words, where the recognition error rate is only one-seventh that of traditional architectures.
Abstract: Pattern recognition systems of necessity incorporate an approximate-matching process to determine the degree of similarity between an unknown input and all stored references. The matching process serves as an automatic generalization mechanism that permits each reference pattern to act as a set of specific instances. Learning mechanisms do not need to operate by manipulating hypotheses in an abstraction hierarchy, but instead can “seed” instances into the concept space, leaving generalization to the matching algorithm. This strategy represents an attractive alternative to data-driven generalization and discrimination techniques when the abstraction space is very large. NEXUS, a computer speech recognition system, incorporates learning heuristics based on this method that permit the system to identify a set of primitive acoustic concepts from experience with words. The efficacy of the resulting concepts is demonstrated by comparative recognition tests where the recognition error rate is only one-seventh that of traditional architectures.

38 citations


Proceedings ArticleDOI
01 Apr 1987
TL;DR: A Gaussian probabilistic model was developed to screen and select from the large set of features and the significant harmonics of the signature were sorted according to the chi-square value, which is equivalent to the signal-to-noise ratio.
Abstract: Features such as shape, motion and pressure, minutiae details and timing, and transformation methods such as Hadamard and Walsh have been used in signature recognition with various degrees of success. One of the better studies was done by Sato and Kogure using nonlinear warping function. However, it is time consuming in terms of computer time and programming time. In this research, the signatures were normalized for size, orientation, etc. After normalization, the X and Y coordinates of each sampled point of a signature over time (to capture the dynamics of signature writing) were represented as a complex number and the set of complex numbers transformed into the frequency domain via the fast Fourier transform. A Gaussian probabilistic model was developed to screen and select from the large set of features (e.g. amplitude of each harmonics). The significant harmonics of the signature were sorted according to the chi-square value, which is equivalent to the signal-to-noise ratio. Fifteen harmonics with the largest signal-to-noise ratios from the true signatures were used in a discriminant analysis. A total of eight true signatures from a single person and eight each from nineteen forgers were used. It results in an error rate of 2.5%, with the normally more conservative jacknife procedure yielding the same small error rate.

37 citations


Patent
09 Jun 1987
TL;DR: In this article, a mobile detects a threshold exit bit or symbol error rate in a message transmitted from the base station with which it is communicating, by sampling sequentially transmitted messages from adjacent base stations (on the same or different frequencies).
Abstract: An automatic cell transfer is made, when a mobile detects certain threshold exit bit or symbol error rate in a message transmitted from the base station with which it is communicating, by the mobile then sampling sequentially transmitted messages from adjacent base stations (on the same or different frequencies) and monitoring their bit or symbol error rates until a base station is found having a bit or symbol error rate less than a preselected minimum entry threshold bit or symbol error rate which in turn is less than the exit bit or symbol error rate and transferring communication from the mobile to the base station so selected. In one arrangement, each base station repeatedly and intermittently transmits a known quality assessment message and the mobile continually reviews the received message for its error rate.

35 citations


Proceedings ArticleDOI
A.-M. Derouault1
06 Apr 1987
TL;DR: This paper shows that both the analysis of the errors made by the recognizer, and linguistic facts about phonetic context influence, suggest a method for choosing context dependent models, which allows to limit the growing of the number of phonems, and still account for the most important coarticulation effects.
Abstract: One approach to large vocabulary speech recognition, is to build phonetic Markov models, and to concatenate them to obtain word models. In previous work, we already designed a recognizer based on 40 phonetic Markov machines, which accepts a 10,000 words vocabulary ([3]), and recently 200,000 words vocabulary ([5]). Since there is one machine per phoneme, these models obviously do not account for coarticulatory effects, which may lead to recognition errors. In this paper, we improve the phonetic models by using general principles about coarticulation effects on automatic phoneme recognition. We show that both the analysis of the errors made by the recognizer, and linguistic facts about phonetic context influence, suggest a method for choosing context dependent models. This method allows to limit the growing of the number of phonems, and still account for the most important coarticulation effects. We present our experiments with a system applying these principles to a set of models for French. With this new system including context-dependant machines, the phoneme recognition rate goes from 82.2% to 85.3%, and the error rate on words with a 10,000 word dictionary, is decreased from 11.2 to 9.8%.

Proceedings ArticleDOI
01 Apr 1987
TL;DR: A two-stage isolated word speech recognition system that uses a Hidden Markov Model (HMM) recognizer in the first stage and a discriminant analysis system in the second stage, reducing the overall error rate by more than a factor of two.
Abstract: This paper describes a two-stage isolated word speech recognition system that uses a Hidden Markov Model (HMM) recognizer in the first stage and a discriminant analysis system in the second stage. During recognition, when the first-stage recognizer is unable to clearly differentiate between acoustically similar words such as "go" and "no" the second-stage discriminator is used. The second-stage system focuses on those parts of the unknown token which are most effective at discriminating the confused words. The system was tested on a 35 word, 10,710 token stress speech isolated word data base created at Lincoln Laboratory. Adding the second-stage discriminating system produced the best results to date on this data base, reducing the overall error rate by more than a factor of two.

Journal ArticleDOI
TL;DR: In this article, the authors compared the frequency and type of articulation error, including error migration, between single word production and connected speech samples when vocabulary was held constant, and found that there were significantly more errors in connected samples than single word utterances.

Proceedings ArticleDOI
01 Apr 1987
TL;DR: This paper presents a study of talker- stress-induced intraword variability, and an algorithm that compensates for the systematic changes observed, based on Hidden Markov Models trained by speech tokens in various talking styles.
Abstract: Automtic speech recognition algorithms generally rely on the assumption that for the distance measure used, intraword variabilities are smaller than interword variabilities so that appropriate separation in the measurement space is possible. As evidenced by degradation of recognition perforrmnce, the validity of such an assumption decreases from simple tasks to complex tasks, from cooperative talkers to casual talkers, and from laboratory talking environments to practical talking environments. This paper presents a study of talker- stress-induced intraword variability, and an algorithm that commpensates for the systematic changes observed. The study is based on Hidden Markov Models trained by speech tokens in various talking styles. The talking styles include normal speech, fast speech, loud speech, soft speech, and talking with noise injected through earphones; the styles are designed to simulate speech produced under real stressful conditions. Cepstral coefficients are used as the parameters in the Hidden Markov Models. The stress compensation algorithm compensates for the variations in the cepstral coefficients in a hypothesis-driven manner. The functional form of the compensation is shown to correspond to the equalization of spectral tilts. Preliminary experiments indicate that a substantial reduction in recognition error rate can be achieved with relatively little increase in computation and storage requirements.

Proceedings ArticleDOI
06 Apr 1987
TL;DR: The stochastic segment model, the recognition algorithm, and the iterative training algorithm for estimating segment models from continuous speech, including speaker-dependent continuous speech recognition, are described.
Abstract: Developing accurate and robust phonetic models for the different speech sounds is a major challenge for high performance continuous speech recognition. In this paper, we introduce a new approach, called the stochastic segment model, for modelling a variable-length phonetic segment X, an L-long sequence of feature vectors. The stochastic segment model consists of 1) time-warping the variable-length segment X into a fixed-length segment Y called a resampled segment, and 2) a joint density function of the parameters of the resampled segment Y, which in this work is assumed Gaussian. In this paper, we describe the stochastic segment model, the recognition algorithm, and the iterative training algorithm for estimating segment models from continuous speech. For speaker-dependent continuous speech recognition, the segment model reduces the word error rate by one third over a hidden Markov phonetic model.

Proceedings ArticleDOI
F.K. Soong1, Man Mohan Sondhi1
06 Apr 1987
TL;DR: A more noise-resistant, weighted spectral distortion measure which weights the high SNR regions in frequency more than the lowSNR regions, and weights spectral distortion more at the peaks than at the valleys of the spectrum.
Abstract: The performance of a recognizer based on the Itakura spectral distortion measure deteriorates when speech signals are corrupted by noise, specially if it is not feasible to train and to test the recognizer under similar noise conditions. To alleviate this problem, we consider a more noise-resistant, weighted spectral distortion measure which weights the high SNR regions in frequency more than the low SNR regions. For the weighting function we choose a "bandwidth broadened" test spectrum; it weights spectral distortion more at the peaks than at the valleys of the spectrum. The amount of weighting is adapted according to an estimate of SNR, and becomes essentially constant in the noise-free case. The new measure has the dot product form and computaional efficiency of the Itakura distortion measure in the autocorrelation domain. It has been tested on a 10 speaker, isolated digit data base in a series of speaker independent speech recognition experiments. Additive white Gaussian noise was used to simulate different SNR conditions (from 5 dB to ∞ dB). The new measure performs as well as the original unweighted Itakura distortion measure at high SNR's, and significantly better at medium to low SNRs. At an SNR of 5 dB, the new measure achieves a digit error rate of 12.49% while the original Itakura distortion gives an error rate of 27.6%. The equivalent SNR improvement at low SNR's, is about 5 - 7 dB.

Proceedings ArticleDOI
Bernard Merialdo1
06 Apr 1987
TL;DR: A new strategy is proposed, the Multi-Level Decoding (MLD), that allows to use a Very Large Size Dictionary (VLSD, size more than 100,000 words) in speech recognition and is experimented on a dictation task of French texts.
Abstract: This paper proposes a new strategy, the Multi-Level Decoding (MLD), that allows to use a Very Large Size Dictionary (VLSD, size more than 100,000 words) in speech recognition. MLD proceeds in three steps: \bullet a Syllable Match procedure uses an acoustic model to build a list of the most probable syllables that match the acoustic signal from a given time frame. \bullet from this list, a Word Match procedure uses the dictionary to build partial word hypothesis. \bullet then a Sentence Match procedure uses a probabilistic language model to build partial sentence hypothesis until total sentences are found. An original matching algorithm is proposed for the Syllable Match procedure. This strategy is experimented on a dictation task of French texts. Two different dictionaries are tested, \bullet one composed of the 10,000 most frequent words, \bullet the other composed of 200,000 words. The recognition results are given and compared. The error rate on words with 10,000 words is 17.3%. If the errors due to the lack of coverage are not counted, the error rate with 10,000 words is reduced to 10.6%. The error rate with 200,000 words is 12.7%.

PatentDOI
Hirohiko Katayama1
TL;DR: A system for having a mode for registering storing a necessary typical spoken word pattern for use in speech recognition and inputting a same spoken word a plurality of times.
Abstract: A system for having a mode for registering storing a necessary typical spoken word pattern for use in speech recognition and inputting a same spoken word a plurality of times. In the mode for registering a word pattern, a previous input spoken word, input prior to the Nth time, is reproduced. After listening to the reproduced spoken prompt word, the Nth input of the same spoken word is effected. The system registers one typical spoken word of a well-averaged voice pattern with high accuracy while increasing the system speech recognition rate.

Patent
12 Mar 1987
TL;DR: In this article, a comparison is carried out between the true meaning provided from a transmission memory and the meaning of a speech sample or a test signal recognized by the speech recogniser or speaker recogniser.
Abstract: In the measuring method according to the invention, speech samples and/or test signals, from which a speech recogniser or speaker recogniser has previously formed a reference pattern during a learning phase, are presented to this speech recogniser or speaker recogniser via a speech coder to be assessed or a transmission route to be tested Using an evaluation computer a comparison is carried out between the true meaning provided from a transmission memory and the meaning of a speech sample or a test signal recognised by the speech recogniser or speaker recogniser In this process, an error rate or a measure of the reliability of recognition is simultaneously calculated over a measurement cycle

Proceedings ArticleDOI
Masafumi Nishimura1, K. Toshioka
01 Apr 1987
TL;DR: A new vector quantization (VQ; so-called labeling) method of a speech recognition system based on hidden Markov model (HMM) which generates multiple labels at each frame while keeping a conventional HMM formulation.
Abstract: This paper describes a new vector quantization (VQ; so-called labeling) method of a speech recognition system based on hidden Markov model (HMM). For improving the VQ accuracy in a simple manner, "multi-labeling" which generates multiple labels at each frame was introduced while keeping a conventional HMM formulation. Furthermore, in order to represent characteristics of speech accurately and effectively, "multi-dimensional labeling" was also introduced which quantizes multiple features such as spectral dynamics and spectrum independently. This labeling method was tested in an isolated word recognition task using 150 Japanese confusable words. The recognition error rate was roughly reduced to 1/2 or less compared with the conventional method.

Journal ArticleDOI
01 Apr 1987
TL;DR: It is shown that coherent Detection and differential decoding yields better performance than limiter-discriminator detection and differential detection, whereas two noncoherent detectors yield approximately identical performance.
Abstract: The paper presents a relatively simple method for analysing the effect of IF filtering on the performance of multilevel FM signals. Using this method, the error rate performance of narrowband FM signals is analysed for three different detection techniques, namely limiter-discriminator detection, differential detection and coherent detection followed by differential decoding. The symbol error probabilities are computed for a Gaussian IF filter and a second-order Butterworth IF filter. It is shown that coherent detection and differential decoding yields better performance than limiter-discriminator detection and differential detection, whereas two noncoherent detectors yield approximately identical performance.

Proceedings ArticleDOI
Chin-Hui Lee1
01 Apr 1987
TL;DR: Preliminary experiments on natural speech data indicate that the robust LP procedure is relatively insensitive to the placement of the LPC analysis window and to the value of the pitch period, for a given section of speech signal.
Abstract: In this paper, a robust linear prediction algorithm is proposed. Rather than minimizing the sum of squared residuals as in the conventional linear prediction procedures, the robust LP procedure minimizes the sum of appropriately weighted residuals. The weight is a function of the prediction residual, and the cost function is selected to give more weight to the bulk of smaller residuals while de-weighting the small portion of large residuals. Based on Robustness Theory, the proposed algorithm will always give a more efficient (lower variance) estimate for the prediction coefficients if the excitation source is of Gaussian mixture such that a large portion of the excitations are from a normal distribution with a very small variance while a small portion of the excitations at the glottal openings and closures are from some unknown distribution with a much larger variance. The robust LP algorithm can be used in the front-end feature extractor for a speech recognition system and as an analyzer for a speech coding system. Testing on synthetic vowel data demonstrates that the robust LP procedure is able to reduce the formant and bandwidth error rate by more than an order of magnitude compared to the conventional LP procedures. Preliminary experiments on natural speech data indicate that the robust LP procedure is relatively insensitive to the placement of the LPC analysis window and to the value of the pitch period, for a given section of speech signal.

Proceedings ArticleDOI
01 Apr 1987
TL;DR: A template-based connected speech recognition system, which represents words as sequences of diphone-like segments, has been implemented and evaluated and an evaluation of the recognizer has been carried out on a database of connected digit utterances spoken by a single male talker.
Abstract: A template-based connected speech recognition system, which represents words as sequences of diphone-like segments, has been implemented and evaluated. The inventory of segments is divided into two principal classes: "steady-state" speech sounds such as vowels, fricatives, and nasals, and "composite" speech sounds consisting of sequences of two or more speech sounds in which the transitions from one sound to another are intrinsic to the representation of the composite sound. Templates representing these segments are extracted from labelled training utterances. Words are represented by network models whose branches are diphone segments. Word juncture phenomena are accommodated by including segment branches that characterize transition pronunciations between specified claases of words. The recognition of a word in a specified utterance takes place by "spotting" all the segments contained in the model of the word. Putative words and word combinations are found by searching for best scoring sequences of segments specified by the models subject to segment separation constraints. A pruning procedure finds the best scoring string of words subject to constraints on word lengths, separations, and overlaps. An evaluation of the recognizer has been carried out on a database of connected digit utterances spoken by a single male talker. Templates are extracted from half the database consisting of 2100 digit utterances and system performance tested on the remaining 2100 utterances. The performance obtained to date is approximately 2% digit error rate and 7 to 8% digit string error rate.

Proceedings ArticleDOI
01 Apr 1987
TL;DR: A new continuous speech recognition method by phoneme-based word spotting and time-synchronous context-free parsing that is task-independent in terms of reference patterns and task language.
Abstract: This paper proposes a new continuous speech recognition method by phoneme-based word spotting and time-synchronous context-free parsing The word pattern is composed of the concatenation of phoneme patterns The knowledge of syntax is given in Backus Normal Form Therefore, our method is task-independent in terms of reference patterns and task language The system first spots word candidates in an input sentence, and then generates a word lattice The word spotting is performed by a dynamic time warping method Secondly, it selects the best word sequences found in the word lattice from all possible sentences which are defined by a context-free grammar

01 Jan 1987
TL;DR: This study examines the set of CV, VC, CVC and some CCVC sequences which are non-occurring in monomorphemic words in a 20,000 word lexicon and suggests that many sequences in which the prevocalic and postvocalic consonants are similar, or identical, are excluded.
Abstract: This study examines the set of CV, VC, CVC and some CCVC sequences which are non-occurring in monomorphemic words in a 20,000 word lexicon. A preliminary analysis suggests that many sequences in which the prevocalic and postvocalic consonants are similar, or identical, are excluded. The sequences are discussed in relation to 'reduced forms', characteristic offast speech, word boundary assimilation and lexical access.


Journal ArticleDOI
TL;DR: It is shown how choice of a good metric in nearest neighbour estimates of posterior probability can lead to improved average conditional error rate estimates.

Patent
09 Mar 1987
TL;DR: In this article, the error correction encoder/decoder in the inside of a differential logic was proposed to improve the error rate of a multilevel QAM communication system.
Abstract: PURPOSE: To improve an error rate characteristic with comparatively simple constitution, by providing a required error correction encoder/decoder in the inside of a differential logic. CONSTITUTION: In a multilevel Quadrature Amplitude Modulation(QAM) communication system on which the differential ligic is applied setting a multivalue as Z n , the error correction encoder 2 which performs error encoding independently for each of (n) series is provided in the inside of a differential logic part 1 on a transmission side, and a multilevel QAM signal is transmitted via the encoder 2 and a transmission modulation part 3. Also, on a reception side, the error correction decoder 5 is provided in the inside of a differential logic part 6 similarly. ln such a way, the expansion of a one-bit error to a two-bit error due to the continuance of the one-bit error which occurs in a case where the error correction encoder/decoder is arranged outside the logic can be prevented from being generated, thereby, no two-bit error correction circuit, etc., is required, and it is possible to heighten an error rate characteristic with comparatively simple constitution. COPYRIGHT: (C)1988,JPO&Japio

Patent
Thomas Cook1
14 Apr 1987
TL;DR: In this paper, the error rates above a given threshold are detected by initiating a counter to count a group of n bits on each occurrence of an error bit, and then inspecting the counters on each instance of the error to see whether the counter initiated x error bits earlier is still counting.
Abstract: Error rates above a given threshold are detected by initiating a counter to count a group of n bits on each occurrence of an error bit. The counters are inspected on each occurrence of an error to see whether the counter initiated x error bits earlier is still counting. If the counter is still counting the error rate is above a threshold of x error bits in a group of n bits in a serial stream.

Proceedings ArticleDOI
01 Apr 1987
TL;DR: An unsupervised procedure for the construction of template sets for connected speech recognition based on a "segment spotting" approach, where the segments are diphone-like units.
Abstract: This paper describes an unsupervised procedure for the construction of template sets for connected speech recognition. The procedure has been developed for use in a speech recognition system based on a "segment spotting" approach, where the segments are diphone-like units. The procedure makes use of both phonetic and acoustic knowledge: the former consists of a model of all the words in the task language in terms of the chosen units; the latter is implicitly represented by an initial set of "training" templates. The performance obtained by using the bootstrapped templates in a connected digit recognition task is good (average word error rate of less than 4%).