Topic

Word error rate

About: Word error rate is a research topic. Over the lifetime, 11939 publications have been published within this topic receiving 298031 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

The modified Kanerva model for automatic speech recognition

[...]

Richard W. Prager¹, Frank Fallside¹•Institutions (1)

University of Cambridge¹

01 Jan 1989-Computer Speech & Language

TL;DR: This paper describes how the design for the modified Kanerva model is derived from Kanerva's original theory, and develops a method to deal with the time varying nature of the speech signal by recognizing static patterns together with a fixed quantity of contextual information.

...read moreread less

63 citations

Proceedings Article•DOI•

Syllable detection in read and spontaneous speech

[...]

Hartmut R. Pfitzinger, S. Burger¹, S. Heid¹•Institutions (1)

Ludwig Maximilian University of Munich¹

03 Oct 1996

TL;DR: A new method for automatic detection of syllable nuclei was presented and two large spoken language corpora were labelled by three phoneticians and used to adjust the key parameters of the algorithm and to evaluate its error rate.

...read moreread less

Abstract: Automatic syllable detection is an important task when analysing very large speech corpora in order to answer questions concerning prosody, rhythm, speech rate, speech recognition and synthesis. A new method for automatic detection of syllable nuclei is presented. Two large spoken language corpora (PhonDatII, Verbmobil) were labelled by three phoneticians and then used to adjust the key parameters of the algorithm and to evaluate its error rate. Additionally, parts of the corpora were used to test the inter- and intra-individual consistency of the transcribers. The evaluation of the algorithm currently shows an error rate of 12.87% for read speech and 21.03% for spontaneous speech. The inter-individual consistency of 95.8% might be considered as an upper limit for any automatic detection method.

...read moreread less

63 citations

Proceedings Article•DOI•

Quaternion Convolutional Neural Networks for End-to-End Automatic Speech Recognition.

[...]

Titouan Parcollet, Ying Zhang, Mohamed Morchid, Chiheb Trabelsi, Georges Linarès, Renato De Mori, Yoshua Bengio - Show less +3 more

02 Sep 2018

TL;DR: In this article, a quaternion-valued convolutional neu-ral network (QCNN) was proposed for sequence-to-sequence mapping with the CTC model.

...read moreread less

Abstract: Recently, the connectionist temporal classification (CTC) model coupled with recurrent (RNN) or convolutional neural networks (CNN), made it easier to train speech recognition systems in an end-to-end fashion. However in real-valued models , time frame components such as mel-filter-bank energies and the cepstral coefficients obtained from them, together with their first and second order derivatives, are processed as individual elements, while a natural alternative is to process such components as composed entities. We propose to group such elements in the form of quaternions and to process these quaternions using the established quaternion algebra. Quaternion numbers and quaternion neural networks have shown their efficiency to process multidimensional inputs as entities, to encode internal dependencies , and to solve many tasks with less learning parameters than real-valued models. This paper proposes to integrate multiple feature views in quaternion-valued convolutional neu-ral network (QCNN), to be used for sequence-to-sequence mapping with the CTC model. Promising results are reported using simple QCNNs in phoneme recognition experiments with the TIMIT corpus. More precisely, QCNNs obtain a lower phoneme error rate (PER) with less learning parameters than a competing model based on real-valued CNNs.

...read moreread less

63 citations

Proceedings Article•

On the Use of Morphological Analysis for Dialectal Arabic Speech Recognition

[...]

Mohamed Afify, Ruhi Sarikaya, Hong-Kwang Jeff Kuo, Laurent Besacier, Yuqing Gao - Show less +1 more

01 Jan 2006

TL;DR: A simple word decomposition algorithm is introduced which only requires a text corpus and a predefined list of affixes to create the lexicon for Iraqi Arabic ASR and results in about 10% relative improvement in word error rate (WER).

...read moreread less

Abstract: Arabic has a large number of affixes that can modify a stem to form words. In automatic speech recognition (ASR) this leads to a high out-of-vocabulary (OOV) rate for typical lexicon size, and hence a potential increase in WER. This is even more pronounced for dialects of Arabic where additional affixes are often introduced and the available data is typically sparse. To address this problem we introduce a simple word decomposition algorithm which only requires a text corpus and a predefined list of affixes. Using this algorithm to create the lexicon for Iraqi Arabic ASR results in about 10% relative improvement in word error rate (WER). Also using the union of the segmented and unsegmented vocabularies and interpolating the corresponding language models results in further WER reduction. The net WER improvement is about 13%.

...read moreread less

63 citations

Journal Article•DOI•

Grow and Prune Compact, Fast, and Accurate LSTMs

[...]

Xiaoliang Dai¹, Hongxu Yin¹, Niraj K. Jha¹•Institutions (1)

Princeton University¹

01 Mar 2020-IEEE Transactions on Computers

TL;DR: A hidden-layer H-LSTM (H-L STM) is proposed that adds hidden layers to LSTM's original one-level nonlinear control gates to increase accuracy while employing fewer external stacked layers, thus reducing the number of parameters and run-time latency significantly.

...read moreread less

Abstract: Long short-term memory (LSTM) has been widely used for sequential data modeling. Researchers have increased LSTM depth by stacking LSTM cells to improve performance. This incurs model redundancy, increases run-time delay, and makes the LSTMs more prone to overfitting. To address these problems, we propose a hidden-layer LSTM (H-LSTM) that adds hidden layers to LSTM's original one-level nonlinear control gates. H-LSTM increases accuracy while employing fewer external stacked layers, thus reducing the number of parameters and run-time latency significantly. We employ grow-and-prune (GP) training to iteratively adjust the hidden layers through gradient-based growth and magnitude-based pruning of connections. This learns both the weights and the compact architecture of H-LSTM control gates. We have GP-trained H-LSTMs for image captioning, speech recognition, and neural machine translation applications. For the NeuralTalk architecture on the MSCOCO dataset, our three models reduce the number of parameters by 38.7× [floating-point operations (FLOPs) by 45.5×], run-time latency by 4.5×, and improve the CIDEr-D score by 2.8 percent, respectively. For the DeepSpeech2 architecture on the AN4 dataset, the first model we generated reduces the number of parameters by 19.4× and run-time latency by 37.4 percent. The second model reduces the word error rate (WER) from 12.9 to 8.7 percent. For the encoder-decoder sequence-to-sequence network on the IWSLT 2014 German-English dataset, the first model we generated reduces the number of parameters by 10.8× and run-time latency by 14.2 percent. The second model increases the BLEU score from 30.02 to 30.98. Thus, GP-trained H-LSTMs can be seen to be compact, fast, and accurate.

...read moreread less

63 citations

Collapse

Network Information

Performance

Metrics

12,777

Papers

335,740

Citations

No. of papers in the topic in previous years
Year	Papers
2023	271
2022	562
2021	640
2020	643
2019	633
2018	528

Word error rate

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics