scispace - formally typeset
Search or ask a question
Topic

Word error rate

About: Word error rate is a research topic. Over the lifetime, 11939 publications have been published within this topic receiving 298031 citations.


Papers
More filters
Proceedings ArticleDOI
07 May 1996
TL;DR: This work addresses efficiency issues associated with a search organization based on pronunciation prefix trees (PPTs) by presenting a mechanism that eliminates redundant computations in non-reentrant trees, a comparison of two methods for distributing language model probabilities in PPTs, and results on two look ahead pruning strategies.
Abstract: The need for ever more efficient search organizations persists as the size and complexity of the knowledge sources used in continuous speech recognition (CSR) tasks continues to increase. We address efficiency issues associated with a search organization based on pronunciation prefix trees (PPTs). In particular we present (1) a mechanism that eliminates redundant computations in non-reentrant trees, (2) a comparison of two methods for distributing language model probabilities in PPTs, and (3) report results on two look ahead pruning strategies. Using the 1994 DARPA 20 k NAB word bigram for the male segment of si dev5m 92 (the 5k speaker independent development test set for the WSJ), the error rate was 12.2% with a real-time factor of 1.0 on a 120 MHz Pentium.

66 citations

Proceedings Article
01 Jan 1997
TL;DR: Preliminary experiments show that sharing acoustic models across the two languages has not resulted in improved performance, while sharing a backoff node at the LM component provides flexibility and ease in recognizing bilingual sentences at the expense of a slight increase in word error rate in some cases.
Abstract: This paper describes our work in developing multilingual (Swedish and English) speech recognition systems in the ATIS domain. The acoustic component of the multilingual systems is realized through sharing Gaussian codebooks across Swedish and English allophones. The language model (LM) components are constructed by training a statistical bigram model, with a common backoff node, on bilingual texts, and by combining two monolingual LMs into a probabilistic finite state grammar. This system uses a single decoder for Swedish and English sentences, and is capable of recognizing sentences with words from both languages. Preliminary experiments show that sharing acoustic models across the two languages has not resulted in improved performance, while sharing a backoff node at the LM component provides flexibility and ease in recognizing bilingual sentences at the expense of a slight increase in word error rate in some cases. As a by-product, the bilingual decoder also achieves good performance on language identification (LID).

66 citations

Journal ArticleDOI
TL;DR: Experimental results show that this hybrid method effectively simplifies features selection by reducing the number of features needed, and could constitute a valuable tool for gene expression analysis in future studies.
Abstract: The purpose of gene expression analysis is to discriminate between classes of samples, and to predict the relative importance of each gene for sample classification. Microarray data with reference to gene expression profiles have provided some valuable results related to a variety of problems and contributed to advances in clinical medicine. Microarray data characteristically have a high dimension and a small sample size. This makes it difficult for a general classification method to obtain correct data for classification. However, not every gene is potentially relevant for distinguishing the sample class. Thus, in order to analyze gene expression profiles correctly, feature (gene) selection is crucial for the classification process, and an effective gene extraction method is necessary for eliminating irrelevant genes and decreasing the classification error rate. In this paper, correlation-based feature selection (CFS) and the Taguchi chaotic binary particle swarm optimization (TCBPSO) were combined into a hybrid method. The K-nearest neighbor (K-NN) with leave-one-out cross-validation (LOOCV) method served as a classifier for ten gene expression profiles. Experimental results show that this hybrid method effectively simplifies features selection by reducing the number of features needed. The classification error rate obtained by the proposed method had the lowest classification error rate for all of the ten gene expression data set problems tested. For six of the gene expression profile data sets a classification error rate of zero could be reached. The introduced method outperformed five other methods from the literature in terms of classification error rate. It could thus constitute a valuable tool for gene expression analysis in future studies.

66 citations

Patent
13 Jul 1994
TL;DR: In this article, a method of making a speech recognition model database is disclosed, which is formed based on a training string utterance signal and a plurality of sets of current speech recognition models.
Abstract: A method of making a speech recognition model database is disclosed. The database is formed based on a training string utterance signal and a plurality of sets of current speech recognition models. The sets of current speech recognition models may include acoustic models, language models, and other knowledge sources. In accordance with an illustrative embodiment of the invention, a set of confusable string models is generated, each confusable string model comprising speech recognition models from two or more sets of speech recognition models (such as acoustic and language models). A first scoring signal is generated based on the training string utterance signal and a string model for that utterance, wherein the string model for the utterance comprises speech recognition models from two or more sets of speech recognition models. One or more second scoring signals are also generated, wherein a second scoring signal is based on the training string utterance signal and a confusable string model. A misrecognition signal is generated based on the first scoring signal and the one or more second scoring signals. Current speech recognition models are modified, based on the misrecognition signal to increase the probability that a correct string model will have a rank order higher than other confusable string models.

66 citations

Book ChapterDOI
31 Aug 2005
TL;DR: A zero-order local deformation model is employed to model the visual variability of video streams of American sign language (ASL) words and two possible ways of combining the model with the tangent distance used to compensate for affine global transformations are discussed.
Abstract: In this paper, we employ a zero-order local deformation model to model the visual variability of video streams of American sign language (ASL) words. We discuss two possible ways of combining the model with the tangent distance used to compensate for affine global transformations. The integration of the deformation model into our recognition system improves the error rate on a database of ASL words from 22.2% to 17.2%.

66 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
88% related
Feature extraction
111.8K papers, 2.1M citations
86% related
Convolutional neural network
74.7K papers, 2M citations
85% related
Artificial neural network
207K papers, 4.5M citations
84% related
Cluster analysis
146.5K papers, 2.9M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023271
2022562
2021640
2020643
2019633
2018528