Topic

Word error rate

About: Word error rate is a research topic. Over the lifetime, 11939 publications have been published within this topic receiving 298031 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Static and Dynamic Variance Compensation for Recognition of Reverberant Speech With Dereverberation Preprocessing

[...]

Marc Delcroix¹, Tomohiro Nakatani¹, Shinji Watanabe¹•Institutions (1)

Nippon Telegraph and Telephone¹

01 Feb 2009-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: A dereverberation method to reduce reverberation prior to recognition and introduces a parametric model for variance adaptation that includes static and dynamic components in order to realize an appropriate interconnection between dereverbers and a speech recognizer.

...read moreread less

Abstract: The performance of automatic speech recognition is severely degraded in the presence of noise or reverberation. Much research has been undertaken on noise robustness. In contrast, the problem of the recognition of reverberant speech has received far less attention and remains very challenging. In this paper, we use a dereverberation method to reduce reverberation prior to recognition. Such a preprocessor may remove most reverberation effects. However, it often introduces distortion, causing a dynamic mismatch between speech features and the acoustic model used for recognition. Model adaptation could be used to reduce this mismatch. However, conventional model adaptation techniques assume a static mismatch and may therefore not cope well with a dynamic mismatch arising from dereverberation. This paper proposes a novel adaptation scheme that is capable of managing both static and dynamic mismatches. We introduce a parametric model for variance adaptation that includes static and dynamic components in order to realize an appropriate interconnection between dereverberation and a speech recognizer. The model parameters are optimized using adaptive training implemented with the expectation maximization algorithm. An experiment using the proposed method with reverberant speech for a reverberation time of 0.5 s revealed that it was possible to achieve an 80% reduction in the relative error rate compared with the recognition of dereverberated speech (word error rate of 31%), and the final error rate was 5.4%, which was obtained by combining the proposed variance compensation and MLLR adaptation.

...read moreread less

63 citations

Journal Article•DOI•

Optimizing Automatic Speech Recognition for Low-Proficient Non-Native Speakers

[...]

Joost van Doremalen¹, Catia Cucchiarini¹, Helmer Strik¹•Institutions (1)

Radboud University Nijmegen¹

01 Jan 2010-Eurasip Journal on Audio, Speech, and Music Processing

TL;DR: Two experiments aimed at selecting utterances from lists of responses indicate that the decoding process can be improved by optimizing the language model and the acoustic models, thus reducing the utterance error rate from 29–26% to 10–8%.

...read moreread less

Abstract: Computer-Assisted Language Learning (CALL) applications for improving the oral skills of low-proficient learners have to cope with non-native speech that is particularly challenging. Since unconstrained non-native ASR is still problematic, a possible solution is to elicit constrained responses from the learners. In this paper, we describe experiments aimed at selecting utterances from lists of responses. The first experiment on utterance selection indicates that the decoding process can be improved by optimizing the language model and the acoustic models, thus reducing the utterance error rate from 29-26% to 10-8%. Since giving feedback on incorrectly recognized utterances is confusing, we verify the correctness of the utterance before providing feedback. The results of the second experiment on utterance verification indicate that combining duration-related features with a likelihood ratio (LR) yield an equal error rate (EER) of 10.3%, which is significantly better than the EER for the other measures in isolation.

...read moreread less

63 citations

Patent•

Speech recognition apparatus and speech recognition navigation apparatus

[...]

Yoshikazu Hirayama, Yoshiyuki Kobayashi

11 Sep 2000

TL;DR: A speech recognition apparatus includes a speech input device, a storage device that stores a recognition word indicating a pronunciation of a word to undergo speech recognition, and a speech recognition processing device that performs speech recognition by comparing audio data obtained through the voice input device and speech recognition data created in correspondence to the recognition word as discussed by the authors.

...read moreread less

Abstract: A speech recognition apparatus includes: a speech input device; a storage device that stores a recognition word indicating a pronunciation of a word to undergo speech recognition; and a speech recognition processing device that performs speech recognition processing by comparing audio data obtained through the voice input device and speech recognition data created in correspondence to the recognition word, and the storage device stores both a first recognition word corresponding to a pronunciation of an entirety of the word to undergo speech recognition and a second recognition word corresponding to a pronunciation of only a starting portion of a predetermined length of the entirety of the word to undergo speech recognition as recognition words for the word to undergo speech recognition.

...read moreread less

62 citations

Proceedings Article•DOI•

Contextual RNN-T for Open Domain ASR.

[...]

Mahaveer Jain¹, Gil Keren¹, Jay Mahadeokar¹, Geoffrey Zweig¹, Florian Metze², Yatharth Saraf¹ - Show less +2 more•Institutions (2)

Facebook¹, Carnegie Mellon University²

25 Oct 2020

TL;DR: Modifications to the RNN-T model are proposed that allow the model to utilize additional metadata text with the objective of improving performance on Named Entities (WER-NE) for videos with related metadata.

...read moreread less

Abstract: End-to-end (E2E) systems for automatic speech recognition (ASR), such as RNN Transducer (RNN-T) and Listen-Attend-Spell (LAS) blend the individual components of a traditional hybrid ASR system - acoustic model, language model, pronunciation model - into a single neural network. While this has some nice advantages, it limits the system to be trained using only paired audio and text. Because of this, E2E models tend to have difficulties with correctly recognizing rare words that are not frequently seen during training, such as entity names. In this paper, we propose modifications to the RNN-T model that allow the model to utilize additional metadata text with the objective of improving performance on these named entity words. We evaluate our approach on an in-house dataset sampled from de-identified public social media videos, which represent an open domain ASR task. By using an attention model and a biasing model to leverage the contextual metadata that accompanies a video, we observe a relative improvement of about 16% in Word Error Rate on Named Entities (WER-NE) for videos with related metadata.

...read moreread less

62 citations

Proceedings Article•DOI•

Feature extraction based on minimum classification error/generalized probabilistic descent method

[...]

A. Biem, Shigeru Katagiri

27 Apr 1993

TL;DR: Although the proposed discriminative feature extraction approach is a direct and simple extension of MCE/GPD, it is a significant departure from conventional approaches, providing a comprehensive basis for the entire system design.

...read moreread less

Abstract: A novel approach to pattern recognition which comprehensively optimizes both a feature extraction process and a classification process is introduced. Assuming that the best features for recognition are the ones that yield the lowest classification error rate over unknown data, an overall recognizer, consisting of a feature extractor module and a classifier module, is trained using the minimum classification error (MCE)/generalized probabilistic descent (GPD) method. Although the proposed discriminative feature extraction approach is a direct and simple extension of MCE/GPD, it is a significant departure from conventional approaches, providing a comprehensive basis for the entire system design. Experimental results are presented for the simple example of optimally designing a cepstrum representation for vowel recognition. The results clearly demonstrate the effectiveness of the proposed method. >

...read moreread less

62 citations

Collapse

Network Information

Performance

Metrics

12,777

Papers

335,740

Citations

No. of papers in the topic in previous years
Year	Papers
2023	271
2022	562
2021	640
2020	643
2019	633
2018	528

Word error rate

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics