scispace - formally typeset
Search or ask a question
Topic

Word error rate

About: Word error rate is a research topic. Over the lifetime, 11939 publications have been published within this topic receiving 298031 citations.


Papers
More filters
PatentDOI
TL;DR: In this article, a continuous speech recognition system with a speech processor and a word recognition computer subsystem is described, which is characterized by an element for developing a graph for confluent links between confluent nodes.
Abstract: A continuous speech recognition system having a speech processor and a word recognition computer subsystem, characterized by an element for developing a graph for confluent links between confluent nodes; an element for developing a graph of boundary links between adjacent words; an element for storing an inventory of confluent links and boundary links as a coding inventory; an element for converting an unknown utterance into an encoded sequence of confluent links and boundary links corresponding to recognition sequences stored in the word recognition subsystem recognition vocabulary for speech recognition. The invention also includes a method for achieving continouous speech recognition by characterizing speech as a sequence of confluent links which are matched with candidate words. The invention also applies to isolated word speech recognition as with continuous speech recognition, except that in such case there are no boundary links.

68 citations

01 Jan 2015
TL;DR: Basic principles of ASR evaluation are summarized, and the state of the current ASR errors detection and correction research is reviewed, and emerging techniques using word error rate metric are focused on.
Abstract: Even though Automatic Speech Recognition (ASR) has matured to the point of commercial applications, high error rate in some speech recognition domains remain as one of the main impediment factors to the wide adoption of speech technology, and especially for continuous large vocabulary speech recognition applications. The persistent presence of ASR errors have intensified the need to find alternative techniques to automatically detect and correct such errors. The correction of the transcription errors is very crucial not only to improve the speech recognition accuracy, but also to avoid the propagation of the errors to the subsequent language processing modules such as machine translation. In this paper, basic principles of ASR evaluation are first summarized, and then the state of the current ASR errors detection and correction research is reviewed. We focus on emerging techniques using word error rate metric.

67 citations

Journal ArticleDOI
H.-K.J. Kuo1, Chin-Hui Lee
TL;DR: This paper shows how discriminative training can significantly improve classifiers used in natural language processing, using as an example the task of natural language call routing, where callers are transferred to desired departments based on natural spoken responses to an open-ended "How may I direct your call?" prompt.
Abstract: This paper shows how discriminative training can significantly improve classifiers used in natural language processing, using as an example the task of natural language call routing, where callers are transferred to desired departments based on natural spoken responses to an open-ended "How may I direct your call?" prompt. With vector-based natural language call routing, callers are transferred using a routing matrix trained on statistics of occurrence of words and word sequences in a training corpus. By re-training the routing matrix parameters using a minimum classification error criterion, a relative error rate reduction of 10-30% was achieved on a banking task. Increased robustness was demonstrated in that with 10% rejection, the error rate was reduced by 40%. Discriminative training also improves portability; we were able to train call routers with the highest known performance using as input only text transcription of routed calls, without any human intervention or knowledge about what terms are important or irrelevant for the routing task. This strategy was validated with both the banking task and a more difficult task involving calls to operators in the UK. The proposed formulation is applicable to algorithms addressing a broad range of speech understanding, information retrieval, and topic identification problems.

67 citations

Proceedings Article
01 Jan 2003
TL;DR: This paper shows that using cross- and multilingual detectors to support an HMM based speech recognition system significantly reduces the word error rate.
Abstract: The use of articulatory features, such as place and manner of articulation, has been shown to reduce the word error rate of speech recognition systems under different conditions and in different settings. For example recognition systems based on features are more robust to noise and reverberation. In earlier work we showed that articulatory features can compensate for inter language variability and can be recognized across languages. In this paper we show that using cross- and multilingual detectors to support an HMM based speech recognition system significantly reduces the word error rate. By selecting and weighting the features in a discriminative way, we achieve an error rate reduction that lies in the same range as that seen when using language specific feature detectors. By combining feature detectors from many languages and training the weights discriminatively, we even outperform the case where only monolingual detectors are being used.

67 citations

Proceedings ArticleDOI
05 Jun 2000
TL;DR: A new latent semantic indexing (LSI) method for spoken audio documents that smoothing by the closest document clusters is important here, because the documents are often short and have a high word error rate (WER).
Abstract: This paper describes a new latent semantic indexing (LSI) method for spoken audio documents. The framework is indexing broadcast news from radio and TV as a combination of large vocabulary continuous speech recognition (LVCSR), natural language processing (NLP) and information retrieval (IR). For indexing, the documents are presented as vectors of word counts, whose dimensionality is rapidly reduced by random mapping (RM). The obtained vectors are projected into the latent semantic subspace determined by SVD, where the vectors are then smoothed by a self-organizing map (SOM). The smoothing by the closest document clusters is important here, because the documents are often short and have a high word error rate (WER). As the clusters in the semantic subspace reflect the news topics, the SOMs provide an easy way to visualize the index and query results and to explore the database. Test results are reported for TREC's spoken document retrieval databases (www.idiap.ch/kurimo/thisl.html).

67 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
88% related
Feature extraction
111.8K papers, 2.1M citations
86% related
Convolutional neural network
74.7K papers, 2M citations
85% related
Artificial neural network
207K papers, 4.5M citations
84% related
Cluster analysis
146.5K papers, 2.9M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023271
2022562
2021640
2020643
2019633
2018528