Topic

Word error rate

About: Word error rate is a research topic. Over the lifetime, 11939 publications have been published within this topic receiving 298031 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Application of an LPC distance measure to the voiced-unvoiced-silence detection problem

[...]

Lawrence R. Rabiner¹, M. Sambur¹•Institutions (1)

Bell Labs¹

01 Aug 1977-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: A novel approach to the voiced-unvoiced-silence detection problem is proposed in which a spectral characterization of each of the three classes of signal is obtained during a training session, and an LPC distance measure and an energy distance are nonlinearly combined to make the final discrimination.

...read moreread less

Abstract: One of the most difficult problems in speech analysis is reliable discrimination among silence, unvoiced speech, and voiced speech which has been transmitted over a telephone line. Although several methods have been proposed for making this three-level decision, these schemes have met with only modest success. In this paper, a novel approach to the voiced-unvoiced-silence detection problem is proposed in which a spectral characterization of each of the three classes of signal is obtained during a training session, and an LPC distance measure and an energy distance are nonlinearly combined to make the final discrimination. This algorithm has been tested over conventional switched telephone lines, across a variety of speakers, and has been found to have an error rate of about 5 percent, with the majority of the errors (about \frac{2}{3} ) occurring at the boundaries between signal classes. The algorithm is currently being used in a speaker-independent word recognition system.

...read moreread less

73 citations

Proceedings Article•DOI•

Self-attention Aligner: A Latency-control End-to-end Model for ASR Using Self-attention Network and Chunk-hopping

[...]

Linhao Dong¹, Feng Wang¹, Bo Xu¹•Institutions (1)

Chinese Academy of Sciences¹

12 May 2019

TL;DR: This paper presents a RNN-free end-to-end model: self-attention aligner (SAA), which applies the self-Attention networks to a simplified recurrent neuralaligner (RNA) framework and proposes a chunk-hopping mechanism, which enables the SAA model to encode on segmented frame chunks one after another to support online recognition.

...read moreread less

Abstract: Self-attention network, an attention-based feedforward neural network, has recently shown the potential to replace recurrent neural networks (RNNs) in a variety of NLP tasks. However, it is not clear if the self-attention network could be a good alternative of RNNs in automatic speech recognition (ASR), which processes the longer speech sequences and may have online recognition requirements. In this paper, we present a RNN-free end-to-end model: self-attention aligner (SAA), which applies the self-attention networks to a simplified recurrent neural aligner (RNA) framework. We also propose a chunk-hopping mechanism, which enables the SAA model to encode on segmented frame chunks one after another to support online recognition. Experiments on two Mandarin ASR datasets show the replacement of RNNs by the self-attention networks yields a 8.4%-10.2% relative character error rate (CER) reduction. In addition, the chunk-hopping mechanism allows the SAA to have only a 2.5% relative CER degradation with a 320ms latency. After jointly training with a self-attention network language model, our SAA model obtains further error rate reduction on multiple datasets. Especially, it achieves 24.12% CER on the Mandarin ASR benchmark (HKUST), exceeding the best end-to-end model by over 2% absolute CER.

...read moreread less

73 citations

Journal Article•DOI•

Phonetic Searching vs. LVCSR: How to Find What You Really Want in Audio Archives

[...]

Peter S. Cardillo, Mark A. Clements, Michael S. Miller

01 Jan 2002-International Journal of Speech Technology

TL;DR: In this paper, a new technique is presented for searching digital audio at the word/phrase level, which combines high speed and accuracy, supports open vocabulary, imposes low penalty for new words, permits phonetic and inexact spelling, enables user-determined depth of search, and is amenable to parallel execution for highly scalable deployment.

...read moreread less

Abstract: A new technique is presented for searching digital audio at the word/phrase level. Unlike previous methods based upon Large Vocabulary Continuous Speech Recognition (LVCSR, with inherent problems of closed vocabulary and high word error rate), phonetic searching combines high speed and accuracy, supports open vocabulary, imposes low penalty for new words, permits phonetic and inexact spelling, enables user-determined depth of search, and is amenable to parallel execution for highly scalable deployment. A detailed comparison of accuracy between phonetic searching and one popular embodiment of LVCSR is presented along with other operating characteristics of the new technique. The current implementation for Digital Media Asset Management (DMAM) is described along with suggested applications in other domains.

...read moreread less

72 citations

Proceedings Article•DOI•

Error correction via a post-processor for continuous speech recognition

[...]

Eric K. Ringger¹, James F. Allen²•Institutions (2)

University of Rochester¹, Carnegie Mellon University²

07 May 1996

TL;DR: This work provides evidence for the claim that a modern continuous speech recognizer can be used successfully in "black-box" fashion for robustly interpreting spontaneous utterances in a dialogue with a human.

...read moreread less

Abstract: This paper presents a new technique for overcoming several types of speech recognition errors by post-processing the output of a continuous speech recognizer. The post-processor output contains fewer errors, thereby making interpretation by higher-level modules, such as a parser, in a speech understanding system more reliable. The primary advantage to the post-processing approach over existing approaches for overcoming SR errors lies in its ability to introduce options that are not available in the SR module's output. This work provides evidence for the claim that a modern continuous speech recognizer can be used successfully in "black-box" fashion for robustly interpreting spontaneous utterances in a dialogue with a human.

...read moreread less

72 citations

Proceedings Article•DOI•

Spontaneous dialogue speech recognition using cross-word context constrained word graphs

[...]

T. Shimizu, Hirofumi Yamamoto, H. Masataki, Shoichi Matsunaga, Yoshinori Sagisaka - Show less +1 more

07 May 1996

TL;DR: A large vocabulary spontaneous dialogue speech recognizer using cross-word context constrained word graphs and the use of class bigram scores as the expected language score for each lexicon tree node decreases the word error rate 25-30% compared to without approximation.

...read moreread less

Abstract: This paper proposes a large vocabulary spontaneous dialogue speech recognizer using cross-word context constrained word graphs. In this method, two approximation methods "cross-word context approximation" and "lenient language score smearing" are introduced to reduce the computational cost for word graph generation. The experimental results using a "travel arrangement corpus" show that this recognition method achieves a word hypotheses reduction of 25-40% and a cpu-time reduction of 30-60% compared to without approximation, and that the use of class bigram scores as the expected language score for each lexicon tree node decreases the word error rate 25-30% compared to without approximation.

...read moreread less

72 citations

Collapse

Network Information

Performance

Metrics

12,777

Papers

335,740

Citations

No. of papers in the topic in previous years
Year	Papers
2023	271
2022	562
2021	640
2020	643
2019	633
2018	528

Word error rate

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics