scispace - formally typeset
Search or ask a question

Showing papers on "Word error rate published in 1990"


Journal ArticleDOI
TL;DR: A review is given of different ways of estimating the error rate of a prediction rule based on a statistical model and how cross-validation can be used to obtain an adjusted predictor with smaller error rate.
Abstract: A review is given of different ways of estimating the error rate of a prediction rule based on a statistical model. A distinction is drawn between apparent, optimum and actual error rates. Moreover it is shown how cross-validation can be used to obtain an adjusted predictor with smaller error rate. A detailed discussion is given for ordinary least squares, logistic regression and Cox regression in survival analysis. Finally, the splitsample approach is discussed and demonstrated on two data sets.

528 citations


Journal ArticleDOI
TL;DR: The modifications made to a connected word speech recognition algorithm based on hidden Markov models which allow it to recognize words from a predefined vocabulary list spoken in an unconstrained fashion are described.
Abstract: The modifications made to a connected word speech recognition algorithm based on hidden Markov models (HMMs) which allow it to recognize words from a predefined vocabulary list spoken in an unconstrained fashion are described. The novelty of this approach is that statistical models of both the actual vocabulary word and the extraneous speech and background are created. An HMM-based connected word recognition system is then used to find the best sequence of background, extraneous speech, and vocabulary word models for matching the actual input. Word recognition accuracy of 99.3% on purely isolated speech (i.e., only vocabulary items and background noise were present), and 95.1% when the vocabulary word was embedded in unconstrained extraneous speech, were obtained for the five word vocabulary using the proposed recognition algorithm. >

472 citations


Book ChapterDOI
TL;DR: Two new context-dependent phonetic units are introduced: function-word-dependent phone models, which focus on the most difficult subvocabulary; and generalized triphones, which combine similar triphones on the basis of an information-theoretic measure.
Abstract: Context-dependent phone models are applied to speaker-independent continuous speech recognition and shown to be effective in this domain. Several previously proposed context-dependent models are evaluated, and two new context-dependent phonetic units are introduced: function-word-dependent phone models, which focus on the most difficult subvocabulary; and generalized triphones, which combine similar triphones on the basis of an information-theoretic measure. The subword clustering procedure used for generalized triphones can find the optimal number of models, given a fixed amount of training data. It is shown that context-dependent modeling reduces the error rate by as much as 60%. >

228 citations


Journal ArticleDOI
TL;DR: In this article, a fuzzy model of the reliability analysis is presented, which is based on the operation of dependence and operation of fuzziness which is contained in the qualitative expression, and the evaluation of the failure possibility and the error possibility.

165 citations


Proceedings Article
29 Jul 1990
TL;DR: The generalized mutual information statistic is derived, the parsing algorithm is described, and results and sample output from the parser are presented.
Abstract: The purpose of this paper is to characterize a constituent boundary parsing algorithm, using an information-theoretic measure called generalized mutual information, which serves as an alternative to traditional grammar-based parsing methods. This method is based on the hypothesis that constituent boundaries can be extracted from a given sentence (or word sequence) by analyzing the mutual information values of the part of speech n-grams within the sentence. This hypothesis is supported by the performance of an implementation of this parsing algorithm which determines a recursive unlabeled bracketing of unrestricted English text with a relatively low error rate. This paper derives the generalized mutual information statistic, describes the parsing algorithm, and presents results and sample output from the parser.

142 citations


Journal ArticleDOI
01 Aug 1990
TL;DR: Results indicate that the perceptron based decision feedbackequaliser provides better bit error rate performance relative to the least mean square decision feedback equaliser, especially in high noise conditions, and biterror rate performance degrades less owing to decision errors and is also less sensitive to gain variation.
Abstract: The paper describes a new approach for a decision feedback equaliser using the multilayer perceptron structure for equalisation in digital communications systems. Results indicate that the perceptron based decision feedback equaliser provides better bit error rate performance relative to the least mean square decision feedback equaliser, especially in high noise conditions, also that bit error rate performance degrades less owing to decision errors and is also less sensitive to gain variation.

130 citations


Journal ArticleDOI
TL;DR: The author's method requires considerably less computer time to obtain results comparable to those of the other methods, and it has a low degree of programming difficulty.
Abstract: A computationally simple approach for reliability-redundancy optimization problems is proposed. It is compared by means of a simulation study with the other two existing approaches: (1) the LMBB method, which incorporates the Lagrange multiplier technique in conjunction with the Kuhn-Tucker condition and the branch-and-bound method, and (2) sequential search techniques in combination with heuristic redundancy allocation methods, including an extension of combinations of four heuristics and two search techniques. Using 100 sets of randomly generated test problems with nonlinear constraints for both series systems and a complex system, the authors measured and evaluated the performance of these approaches in terms of optimality rate, error rate, and execution time. In general, the author's method requires considerably less computer time to obtain results comparable to those of the other methods, and it has a low degree of programming difficulty. >

98 citations


PatentDOI
TL;DR: In this paper, a speaker independent recognition of small vocabularies, spoken over the long distance telephone network, is achieved using two types of models, one for defined vocabulary words (e.g., collect, calling-card, person, third number and operator), and one type for extraneous input which ranges from non-speech sounds to groups of non-vocabulary words.
Abstract: Speaker independent recognition of small vocabularies, spoken over the long distance telephone network, is achieved using two types of models, one type for defined vocabulary words (e.g., collect, calling-card, person, third-number and operator), and one type for extraneous input which ranges from non-speech sounds to groups of non-vocabulary words (e.g. `I want to make a collect call please`). For this type of key word spotting, modifications are made to a connected word speech recognition algorithm based on state-transitional (hidden Markov) models which allow it to recognize words from a pre-defined vocabulary list spoken in an unconstrained fashion. Statistical models of both the actual vocabulary words and the extraneous speech and background noises are created. A syntax-driven connected word recognition system is then used to find the best sequence of extraneous input and vocabulary word models for matching the actual input speech.

92 citations


Proceedings Article
01 Oct 1990
TL;DR: The results suggest that genetic algorithms are becoming practical for pattern classification problems as faster serial and parallel computers are developed.
Abstract: Genetic algorithms were used to select and create features and to select reference exemplar patterns for machine vision and speech pattern classification tasks. For a complex speech recognition task, genetic algorithms required no more computation time than traditional approaches to feature selection but reduced the number of input features required by a factor of five (from 153 to 33 features). On a difficult artificial machine-vision task, genetic algorithms were able to create new features (polynomial functions of the original features) which reduced classification error rates from 19% to almost 0%. Neural net and k nearest neighbor (KNN) classifiers were unable to provide such low error rates using only the original features. Genetic algorithms were also used to reduce the number of reference exemplar patterns for a KNN classifier. On a 338 training pattern vowel-recognition problem with 10 classes, genetic algorithms reduced the number of stored exemplars from 338 to 43 without significantly increasing classification error rate. In all applications, genetic algorithms were easy to apply and found good solutions in many fewer trials than would be required by exhaustive search. Run times were long, but not unreasonable. These results suggest that genetic algorithms are becoming practical for pattern classification problems as faster serial and parallel computers are developed.

82 citations


Patent
19 Mar 1990
TL;DR: In this article, the framing bit errors of a received digital communications signal are monitored and recorded and an audible alarm is sounded when the error rate exceeds a predetermined threshold value in a plurality of calculation modes.
Abstract: The framing bit errors of a received digital communications signal are monitored and recorded. The framing bit error rate is determined and an audible alarm is sounded when the error rate exceeds a predetermined threshold value in a plurality of calculation modes. The framing bit error rate and the total framing bit errors detected over a predetermined fixed time period is also displayed. A link to a remote network monitor can be implemented for monitoring and displaying framing bit error rate at a remote location.

64 citations


Journal ArticleDOI
TL;DR: A general theory of software reliability that proposes that software failure rates are the product of the software average error size, apparent error density, and workload is developed and models of these factors that are consistent with the assumptions of classical software-reliability models are developed.
Abstract: A general theory of software reliability that proposes that software failure rates are the product of the software average error size, apparent error density, and workload is developed. Models of these factors that are consistent with the assumptions of classical software-reliability models are developed. The linear, geometric and Rayleigh models are special cases of the general theory. Linear reliability models result from assumptions that the average size of remaining errors and workload are constant and that its apparent error density equals its real error density. Geometric reliability models differ from linear models in assuming that the average-error size decreases geometrically as errors are corrected, whereas the Rayleigh model assumes that the average size of remaining errors increases linearly with time. The theory shows that the abstract proportionality constants of classical models are composed of more fundamental and more intuitively meaningful factors, namely, the initial values of average size of remaining errors, real error density, workload, and error content. It is shown how the assumed behavior of the reliability primitives of software (average-error size, error density, and workload) is modeled to accommodate diverse reliability factors. >

Proceedings Article
01 Oct 1990
TL;DR: The results suggest that the selection of a classifier for a particular task should be guided not so much by small differences in error rate, but by practical considerations concerning memory usage, computational resources, ease of implementation, and restrictions on training and classification times.
Abstract: Seven different pattern classifiers were implemented on a serial computer and compared using artificial and speech recognition tasks. Two neural network (radial basis function and high order polynomial GMDH network) and five conventional classifiers (Gaussian mixture, linear tree, K nearest neighbor, KD-tree, and condensed K nearest neighbor) were evaluated. Classifiers were chosen to be representative of different approaches to pattern classification and to complement and extend those evaluated in a previous study (Lee and Lippmann, 1989). This and the previous study both demonstrate that classification error rates can be equivalent across different classifiers when they are powerful enough to form minimum error decision regions, when they are properly tuned, and when sufficient training data is available. Practical characteristics such as training time, classification time, and memory requirements, however, can differ by orders of magnitude. These results suggest that the selection of a classifier for a particular task should be guided not so much by small differences in error rate, but by practical considerations concerning memory usage, computational resources, ease of implementation, and restrictions on training and classification times.

Journal ArticleDOI
TL;DR: Comparisons to real failure/repair information obtained from field engineers show that, in about 85% of the cases, the error symptoms recognized by this approach correspond to real problems.
Abstract: A methodology is proposed for recognizing the symptoms of persistent problems in large systems. The system error rate is used to identify the error states among which relationships may exist. Statistical techniques are used to validate and quantify the strength of the relationship among these error states. As input, the approach takes the raw error logs containing a single entry for each error that is detected as an isolated event. As output, it produces a list of symptoms that characterize persistent errors. Thus, given a failure, it is determined whether the failure is an intermittent manifestation of a common fault or whether it is an isolated (transient) incident. The technique is shown to work on two CYBER systems and on IBM 3081 multiprocessor system. Comparisons to real failure/repair information obtained from field engineers show that, in about 85% of the cases, the error symptoms recognized by this approach correspond to real problems. The remaining 15% of the cases, although not directly supported by field data, are confirmed as being valid problems. >

Proceedings ArticleDOI
17 Jun 1990
TL;DR: EAR, an English alphabet recognizer that performs speaker-independent recognition of letters spoken in isolation, has high level of performance and is attributed to accurate and explicit phonetic segmentation, the use of speech knowledge to select features that measure the important linguistic information, and the ability of the neural classifier to model the variability of the data.
Abstract: A description is presented of EAR, an English alphabet recognizer that performs speaker-independent recognition of letters spoken in isolation. During recognition, (a) signal processing routines transform the digitized speech into useful representations, (b) rules are applied to the representations to locate segment boundaries, (c) feature measurements are computed on the speech segments, and (d) a neural network uses the feature measurements to classify the letter. The system was trained on one token of each letter from 120 speakers. Performance was 95% when tested on a new set of 30 speakers. Performance was 96% when tested on a second token of each letter from the original 120 speakers. The recognition accuracy is 6% higher than that of previously reported systems. The high level of performance is attributed to accurate and explicit phonetic segmentation, the use of speech knowledge to select features that measure the important linguistic information, and the ability of the neural classifier to model the variability of the data

Journal ArticleDOI
TL;DR: An analysis is made of the impact of various design decisions on the error detection capability of the fiber distributed data interface (FDDI), a 100-Mb/s fiber-optic LAN standard being developed by the ANSI, and the frame error rate, token loss rate, and undetected error rate are quantified.
Abstract: An analysis is made of the impact of various design decisions on the error detection capability of the fiber distributed data interface (FDDI), a 100-Mb/s fiber-optic LAN standard being developed by the American National Standards Institute (ANSI). In particular, the frame error rate, token loss rate, and undetected error rate are quantified. Several characteristics of the 32-b frame check sequence (FCS) polynomial, which is also used in IEEE 802 LAN protocols, are discussed. The standard uses a nonreturn to zero invert on ones (NRZI) signal encoding and a 4-b to 5-b (4b/5b) symbol encoding in the physical layer. Due to the combination of NRZI and 4b/5b encoding, many noise events are detected by code (or symbol) violations. A large percentage of errors are detected by FCS violations. The errors that escape these three violations remain undetected. The probability of undetected errors due to creation of false starting delimiters, false ending delimiters, or merging of two frames is analyzed. It is shown that every noise event results in two code bit errors, which in turn may result in up to four data bit errors. The FCS can detect up to two noise events. Creation of a false starting delimiter or ending delimiter on a symbol boundary also requires two noise events. This assumes enhanced frame validity criteria. The author justifies the enhancements by quantifying their effect. >

Proceedings ArticleDOI
20 Aug 1990
TL;DR: The proposed NETgram is a neural network for word category prediction that requires fewer parameters than the statistical model and performs effectively for unknown data, i.e., the NETgram interpolates sparse training data.
Abstract: Word category prediction is used to implement an accurate word recognition system. Traditional statistical approaches require considerable training data to estimate the probabilities of word sequences, and many parameters to memorize probabilities. To solve this problem, NETgram, which is the neural network for word category prediction, is proposed. Training results show that the performance of the NETgram is comparable to that of the statistical model although the NETgram requires fewer parameters than the statistical model. Also the NETgram performs effectively for unknown data, i.e., the NETgram interpolates sparse training data. Results of analyzing the hidden layer show that the word categories are classified into linguistically significant groups. The results of applying the NETgram to HMM English word recognition show that the NETgram improves the word recognition rate from 81.0% to 86.9%

Journal ArticleDOI
TL;DR: Several coding techniques are combined to create a simple error control system capable of providing very low end-to-end symbol error rates over a digital mobile phone link.
Abstract: Several coding techniques are combined to create a simple error control system capable of providing very low end-to-end symbol error rates over a digital mobile phone link. Interleaving is used to spread out the impact of deep fades, allowing for the use of shorter block codes with correspondingly simple encoders and decoders. The degree of error control provided by these block codes is then substantially improved through the adoption of a type-1 hybrid-ARQ protocol. The performance of this system is explored in detail for implementation based on Reed-Solomon codes. >

PatentDOI
TL;DR: In this article, a speech coding apparatus coupled to a transmission channel includes m (m is an integer greater than 1) coders, m decoders and m or (m-1) error-correcting coders.
Abstract: A speech coding apparatus coupled to a transmission channel includes m (m is an integer greater than 1) coders, m decoders and m or (m-1) error correcting coders. The apparatus also includes an evaluation unit which evaluates a quality of each of reproduced speech signals from the input speech signal and the reproduced speech signals and which outputs an evaluated quality of each of the reproduced speech signals. The quality of each of the reproduced speech signals is evaluated in a state having no transmission error. A decision unit identifies one of the m coders which provides the reproduced speech signal having a smallest distortion on the basis of the evaluated quality of each of the reproduced speech signals, a current error rate of the transmission channel and error correcting abilities of the error correcting coders, and generates a coder identification number representative of a selected one of the m coders. An output part outputs a multiplexed transmission signal including the coded speech signal generated by the one of the m coders identified by the decision unit and the error correcting code generated by a corresponding one of the m error correcting coders.

Journal ArticleDOI
TL;DR: Development of context-dependent allophonic hidden Markov models (HMMs) implemented in a 75 000-word speaker-dependent Gaussian-HMM recognizer are reported, showing that when a large amount of data is used to train context- dependent HMMs, the word recognition error rate is reduced by 33%, compared with the context-independent HMMs.

Proceedings ArticleDOI
03 Apr 1990
TL;DR: The use of an acoustic subword unit (ASWU)-based speech recognition system for the recognition of isolated words is discussed and it is shown that the use of a modified k-means algorithm on the likelihoods derived through the Viterbi algorithm provides the best deterministic-type of word lexicon.
Abstract: The use of an acoustic subword unit (ASWU)-based speech recognition system for the recognition of isolated words is discussed. Some methods are proposed for generating the deterministic and the statistical types of word lexicon. It is shown that the use of a modified k-means algorithm on the likelihoods derived through the Viterbi algorithm provides the best deterministic-type of word lexicon. However, the ASWU-based speech recognizer leads to better performance with the statistical type of word lexicon than with the deterministic type. Improving the design of the word lexicon makes it possible to narrow the gap in the recognition performances of the whole word unit (WWU)-based and the ASWU-based speech recognizers considerably. Further improvements are expected by designing the word lexicon better. >

Proceedings ArticleDOI
03 Apr 1990
TL;DR: The use of vocabulary-independent (VI) models to improve the usability of speech recognizers is described and initial results show that with more training data and more detailed modeling, the error rate of VI models can be reduced substantially.
Abstract: The use of vocabulary-independent (VI) models to improve the usability of speech recognizers is described. Initial results using generalized triphones as VI models show that with more training data and more detailed modeling, the error rate of VI models can be reduced substantially. For example, the error rates for VI models with 5000, 10000, and 15000 training sentences, are 23.9%, 15.2%, and 13.3%, respectively. Moreover, if task-specific training data are available, one can interpolate them with VI models. This task adaptation can reduce the error rate by 18% over task-specifying models. >

Proceedings ArticleDOI
03 Apr 1990
TL;DR: A novel speech recognition system based on this theory in which the acoustic-to-phonetic mapping is accomplished by means of a particular form of hidden Markov model and is independent of lexical and syntactic constraint is described.
Abstract: A widely accepted linguistic theory holds that speech recognition in humans proceeds from an intermediate representation of the acoustic signal in terms of a small number of phonetic symbols. A novel speech recognition system based on this theory in which the acoustic-to-phonetic mapping is accomplished by means of a particular form of hidden Markov model and is independent of lexical and syntactic constraint is described. Word recognition is then treated as a classical string-to-string editing problem which is solved with a two-level dynamic programming algorithm that accounts for lexical and syntactic structure. The system was tested on speaker-independent recognition of fluent speech from the 991-word DARPA resource management task, on which 76.6% word accuracy was achieved. In informal tests it was observed that the phonetic transcription can be resynthesized to provide a 100-bit/s vocoder with word intelligibility rates of approximately 75%. >

Proceedings ArticleDOI
24 Jun 1990
TL;DR: The results suggest that intra-word and inter-word units should be modeled independently, even when they appear in the same context, and that the spectral vectors, corresponding to the same speech unit, behave differently statistically, depending on whether they are at word boundaries or within a word.
Abstract: We report on some recent improvements to an HMM-based, continuous speech recognition system which is being developed at AT&T Bell Laboratories. These advances, which include the incorporation of inter-word, context-dependent units and an improved feature analysis, lead to a recognition system which achieve better than 95% word accuracy for speaker independent recognition of the 1000-word, DARPA resource management task using the standard word-pair grammar (with a perplexity of about 60). It will be shown that the incorporation of inter-word units into training results in better acoustic models of word juncture coarticulation and gives a 20% reduction in error rate. The effect of an improved set of spectral and log energy features is to further reduce word error rate by about 30%. We also found that the spectral vectors, corresponding to the same speech unit, behave differently statistically, depending on whether they are at word boundaries or within a word. The results suggest that intra-word and inter-word units should be modeled independently, even when they appear in the same context. Using a set of sub-word units which included variants for intra-word and inter-word, context-dependent phones, an additional decrease of about 10% in word error rate resulted.

Patent
Akihiro Shikakura1
06 Jun 1990
TL;DR: An error detection and correction device in which a word train including a number of error correction codes each constructed from a plurality of words is input, the error correction code being of the type that two error words can be corrected by each error correction as discussed by the authors.
Abstract: An error detection and correction device in which a word train including a number of error correction codes each constructed from a plurality of words is input, the error correction code being of the type that two error words can be corrected by each error correction code. Error words are corrected by using the error correction code within the word train. Mode setting information associated with an error rate of the word train is generated, and an error correction means is controlled in a first or a second error correction mode depending on the mode setting. The error correction means corrects one or two error words for each error correction code in the first error correction mode and corrects only one error word for each error correction code in the second error correction mode.

Proceedings ArticleDOI
03 Apr 1990
TL;DR: Different speaker adaptation methods for speech recognition systems adapting automatically to new and unknown speakers in a short training phase are discussed, and the results show that in both systems speaker-adaptive error rates are close to speaker-dependent error rates.
Abstract: Different speaker adaptation methods for speech recognition systems adapting automatically to new and unknown speakers in a short training phase are discussed. The adaptation techniques aim at transformations of feature vectors, optimized with respect to some constraints. Two different adaptation strategies are discussed. The first one is based on least mean-squared-error optimization. The second method is a codebook-driven feature transformation. Both adaptation techniques are incorporated into two different recognition systems: dynamic time warping (DTW) and hidden Markov modeling (HMM). The results show that in both systems speaker-adaptive error rates are close to speaker-dependent error rates. In the best case the mean error rate of four test speakers decreases by a factor of six compared to the interspeaker error rate without adaptation. A hardware realization of the speaker-adaptive HMM-recognizer is described. >

Journal ArticleDOI
TL;DR: In this paper, an analysis technique for complex SEP response has been developed that allows system analysis for the data in order to increase confidence in their thoroughness, and the resulting data are directly applicable to existing orbital error rate codes.
Abstract: Single event phenomena (SEP) responses of complex analog and digital integrated circuits (ICs) cannot be characterized using common (i.e. memory) techniques. An analysis technique for complex SEP response has been developed that allows system analysis for the data in order to increase confidence in their thoroughness. The resulting data are directly applicable to existing orbital error rate codes. The general analysis technique is the main thrust of this study. For a circuit with a complex SEP response, guidelines are established which allow a complete solution to the linear system by establishing relevant error categories, the raw error variables, and a method of error separation. When the data are presented in the framework of a complete linear system, the required independent analysis for system use can proceed with high confidence in the SEP data. The technique is used to analyze the SEP response of an analog-to-digital converter. >

Proceedings ArticleDOI
03 Apr 1990
TL;DR: A spectral-estimation algorithm designed to improve the noise robustness of speech-recognition systems is presented and evaluated and is tailored for filter-bank-based systems, where the estimation seeks to minimize the distortion as measured by the recognizer's distance metric.
Abstract: A spectral-estimation algorithm designed to improve the noise robustness of speech-recognition systems is presented and evaluated. The algorithm is tailored for filter-bank-based systems, where the estimation seeks to minimize the distortion as measured by the recognizer's distance metric. This minimization is achieved by modeling the speech distribution as consisting of clusters; the energies at different frequency channels are assumed to be uncorrelated within each cluster. The algorithm was tested with a continuous-speech, speaker-independent hidden Markov model (HMM) recognition system using the NIST Resource Management Task speech database. When trained on a clean speech database and tested with additive white Gaussian noise, the recognition accuracy with the new algorithm is comparable to that under the ideal condition of training and testing at constant SNR. When trained on clean speech and tested with a desktop microphone in a noisy environment, the error rate is only slightly higher than that with a close-talking microphone. >

Journal ArticleDOI
TL;DR: This article used context-dependent micro-segmental hidden Markov models (HMMs) for six stop phoneme for CVC word lists consisting of CVC words, which reduced the error rate by 35% compared with the result obtained by using one HMM for each phoneme.
Abstract: The motivation of this study is the poor performance of speech recognizers on the stop consonants. To overcome this weakness, word initial and word final stop consonants are modeled at a subphonemic (microsegmental) level. Each stop consonant is segmented into a few relatively stationary microsegments: silence, voice bar, burst, and aspiration. Microsegments of certain phonemically different stops are trained together due to their similar spectral properties. Microsegmental models of burst and aspiration are conditioned on the adjacent vowel category: front versus nonfront vowels. The resulting context‐dependent microsegmental hidden Markov models (HMMs) for six stops possess the desired properties for a compromise between modeling accuracy and modeling robustness. They allow the recognizer to focus discrimination onto those regions of a stop that serve to distinguish it from other stops. Use of these models in recognition experiments for word lists consisting of CVC words reduces the error rate by 35% compared with the result obtained by using one HMM for each stop phoneme.

Patent
28 Aug 1990
TL;DR: In this paper, a spelling check apparatus checks a spelling of a word entered from a keyboard, and also displays a word having a spelling similar to the spelling of the input word.
Abstract: A spelling check apparatus checks a spelling of a word entered from a keyboard, and also displays a word having a spelling similar to the spelling of the input word. The input word is modified by way of plural preselected methods, and when this modified word coincides with a word which has been stored in a word dictionary memory employed in this spelling check apparatus, this modified word is regarded as the similar word. The similar word is displayed every time the modification process for the input word by one preselected method has been accomplished.

Proceedings ArticleDOI
24 Jun 1990
TL;DR: Recent efforts to further improve the performance of the Sphinx system for speaker-independent continuous speech recognition are reported, with incorporation of additional dynamic features, semi-continuous hidden Markov models, and speaker clustering.
Abstract: The paper reports recent efforts to further improve the performance of the Sphinx system for speaker-independent continuous speech recognition. The recognition error rate is significantly reduced with incorporation of additional dynamic features, semi-continuous hidden Markov models, and speaker clustering. For the June 1990 (RM2) evaluation test set, the error rates of our current system are 4.3% and 19.9% for word-pair grammar and no grammar respectively.