scispace - formally typeset
Search or ask a question
Topic

Word error rate

About: Word error rate is a research topic. Over the lifetime, 11939 publications have been published within this topic receiving 298031 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: A novel approach to the voiced-unvoiced-silence detection problem is proposed in which a spectral characterization of each of the three classes of signal is obtained during a training session, and an LPC distance measure and an energy distance are nonlinearly combined to make the final discrimination.
Abstract: One of the most difficult problems in speech analysis is reliable discrimination among silence, unvoiced speech, and voiced speech which has been transmitted over a telephone line. Although several methods have been proposed for making this three-level decision, these schemes have met with only modest success. In this paper, a novel approach to the voiced-unvoiced-silence detection problem is proposed in which a spectral characterization of each of the three classes of signal is obtained during a training session, and an LPC distance measure and an energy distance are nonlinearly combined to make the final discrimination. This algorithm has been tested over conventional switched telephone lines, across a variety of speakers, and has been found to have an error rate of about 5 percent, with the majority of the errors (about \frac{2}{3} ) occurring at the boundaries between signal classes. The algorithm is currently being used in a speaker-independent word recognition system.

73 citations

Proceedings ArticleDOI
12 May 2019
TL;DR: This paper presents a RNN-free end-to-end model: self-attention aligner (SAA), which applies the self-Attention networks to a simplified recurrent neuralaligner (RNA) framework and proposes a chunk-hopping mechanism, which enables the SAA model to encode on segmented frame chunks one after another to support online recognition.
Abstract: Self-attention network, an attention-based feedforward neural network, has recently shown the potential to replace recurrent neural networks (RNNs) in a variety of NLP tasks. However, it is not clear if the self-attention network could be a good alternative of RNNs in automatic speech recognition (ASR), which processes the longer speech sequences and may have online recognition requirements. In this paper, we present a RNN-free end-to-end model: self-attention aligner (SAA), which applies the self-attention networks to a simplified recurrent neural aligner (RNA) framework. We also propose a chunk-hopping mechanism, which enables the SAA model to encode on segmented frame chunks one after another to support online recognition. Experiments on two Mandarin ASR datasets show the replacement of RNNs by the self-attention networks yields a 8.4%-10.2% relative character error rate (CER) reduction. In addition, the chunk-hopping mechanism allows the SAA to have only a 2.5% relative CER degradation with a 320ms latency. After jointly training with a self-attention network language model, our SAA model obtains further error rate reduction on multiple datasets. Especially, it achieves 24.12% CER on the Mandarin ASR benchmark (HKUST), exceeding the best end-to-end model by over 2% absolute CER.

73 citations

Journal ArticleDOI
TL;DR: In this paper, a new technique is presented for searching digital audio at the word/phrase level, which combines high speed and accuracy, supports open vocabulary, imposes low penalty for new words, permits phonetic and inexact spelling, enables user-determined depth of search, and is amenable to parallel execution for highly scalable deployment.
Abstract: A new technique is presented for searching digital audio at the word/phrase level. Unlike previous methods based upon Large Vocabulary Continuous Speech Recognition (LVCSR, with inherent problems of closed vocabulary and high word error rate), phonetic searching combines high speed and accuracy, supports open vocabulary, imposes low penalty for new words, permits phonetic and inexact spelling, enables user-determined depth of search, and is amenable to parallel execution for highly scalable deployment. A detailed comparison of accuracy between phonetic searching and one popular embodiment of LVCSR is presented along with other operating characteristics of the new technique. The current implementation for Digital Media Asset Management (DMAM) is described along with suggested applications in other domains.

72 citations

Proceedings ArticleDOI
07 May 1996
TL;DR: This work provides evidence for the claim that a modern continuous speech recognizer can be used successfully in "black-box" fashion for robustly interpreting spontaneous utterances in a dialogue with a human.
Abstract: This paper presents a new technique for overcoming several types of speech recognition errors by post-processing the output of a continuous speech recognizer. The post-processor output contains fewer errors, thereby making interpretation by higher-level modules, such as a parser, in a speech understanding system more reliable. The primary advantage to the post-processing approach over existing approaches for overcoming SR errors lies in its ability to introduce options that are not available in the SR module's output. This work provides evidence for the claim that a modern continuous speech recognizer can be used successfully in "black-box" fashion for robustly interpreting spontaneous utterances in a dialogue with a human.

72 citations

Proceedings ArticleDOI
07 May 1996
TL;DR: A large vocabulary spontaneous dialogue speech recognizer using cross-word context constrained word graphs and the use of class bigram scores as the expected language score for each lexicon tree node decreases the word error rate 25-30% compared to without approximation.
Abstract: This paper proposes a large vocabulary spontaneous dialogue speech recognizer using cross-word context constrained word graphs. In this method, two approximation methods "cross-word context approximation" and "lenient language score smearing" are introduced to reduce the computational cost for word graph generation. The experimental results using a "travel arrangement corpus" show that this recognition method achieves a word hypotheses reduction of 25-40% and a cpu-time reduction of 30-60% compared to without approximation, and that the use of class bigram scores as the expected language score for each lexicon tree node decreases the word error rate 25-30% compared to without approximation.

72 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
88% related
Feature extraction
111.8K papers, 2.1M citations
86% related
Convolutional neural network
74.7K papers, 2M citations
85% related
Artificial neural network
207K papers, 4.5M citations
84% related
Cluster analysis
146.5K papers, 2.9M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023271
2022562
2021640
2020643
2019633
2018528