scispace - formally typeset
Search or ask a question
Topic

Word error rate

About: Word error rate is a research topic. Over the lifetime, 11939 publications have been published within this topic receiving 298031 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: This paper describes how the design for the modified Kanerva model is derived from Kanerva's original theory, and develops a method to deal with the time varying nature of the speech signal by recognizing static patterns together with a fixed quantity of contextual information.

63 citations

Proceedings ArticleDOI
03 Oct 1996
TL;DR: A new method for automatic detection of syllable nuclei was presented and two large spoken language corpora were labelled by three phoneticians and used to adjust the key parameters of the algorithm and to evaluate its error rate.
Abstract: Automatic syllable detection is an important task when analysing very large speech corpora in order to answer questions concerning prosody, rhythm, speech rate, speech recognition and synthesis. A new method for automatic detection of syllable nuclei is presented. Two large spoken language corpora (PhonDatII, Verbmobil) were labelled by three phoneticians and then used to adjust the key parameters of the algorithm and to evaluate its error rate. Additionally, parts of the corpora were used to test the inter- and intra-individual consistency of the transcribers. The evaluation of the algorithm currently shows an error rate of 12.87% for read speech and 21.03% for spontaneous speech. The inter-individual consistency of 95.8% might be considered as an upper limit for any automatic detection method.

63 citations

Proceedings ArticleDOI
02 Sep 2018
TL;DR: In this article, a quaternion-valued convolutional neu-ral network (QCNN) was proposed for sequence-to-sequence mapping with the CTC model.
Abstract: Recently, the connectionist temporal classification (CTC) model coupled with recurrent (RNN) or convolutional neural networks (CNN), made it easier to train speech recognition systems in an end-to-end fashion. However in real-valued models , time frame components such as mel-filter-bank energies and the cepstral coefficients obtained from them, together with their first and second order derivatives, are processed as individual elements, while a natural alternative is to process such components as composed entities. We propose to group such elements in the form of quaternions and to process these quaternions using the established quaternion algebra. Quaternion numbers and quaternion neural networks have shown their efficiency to process multidimensional inputs as entities, to encode internal dependencies , and to solve many tasks with less learning parameters than real-valued models. This paper proposes to integrate multiple feature views in quaternion-valued convolutional neu-ral network (QCNN), to be used for sequence-to-sequence mapping with the CTC model. Promising results are reported using simple QCNNs in phoneme recognition experiments with the TIMIT corpus. More precisely, QCNNs obtain a lower phoneme error rate (PER) with less learning parameters than a competing model based on real-valued CNNs.

63 citations

Proceedings Article
01 Jan 2006
TL;DR: A simple word decomposition algorithm is introduced which only requires a text corpus and a predefined list of affixes to create the lexicon for Iraqi Arabic ASR and results in about 10% relative improvement in word error rate (WER).
Abstract: Arabic has a large number of affixes that can modify a stem to form words. In automatic speech recognition (ASR) this leads to a high out-of-vocabulary (OOV) rate for typical lexicon size, and hence a potential increase in WER. This is even more pronounced for dialects of Arabic where additional affixes are often introduced and the available data is typically sparse. To address this problem we introduce a simple word decomposition algorithm which only requires a text corpus and a predefined list of affixes. Using this algorithm to create the lexicon for Iraqi Arabic ASR results in about 10% relative improvement in word error rate (WER). Also using the union of the segmented and unsegmented vocabularies and interpolating the corresponding language models results in further WER reduction. The net WER improvement is about 13%.

63 citations

Journal ArticleDOI
TL;DR: A hidden-layer H-LSTM (H-L STM) is proposed that adds hidden layers to LSTM's original one-level nonlinear control gates to increase accuracy while employing fewer external stacked layers, thus reducing the number of parameters and run-time latency significantly.
Abstract: Long short-term memory (LSTM) has been widely used for sequential data modeling. Researchers have increased LSTM depth by stacking LSTM cells to improve performance. This incurs model redundancy, increases run-time delay, and makes the LSTMs more prone to overfitting. To address these problems, we propose a hidden-layer LSTM (H-LSTM) that adds hidden layers to LSTM's original one-level nonlinear control gates. H-LSTM increases accuracy while employing fewer external stacked layers, thus reducing the number of parameters and run-time latency significantly. We employ grow-and-prune (GP) training to iteratively adjust the hidden layers through gradient-based growth and magnitude-based pruning of connections. This learns both the weights and the compact architecture of H-LSTM control gates. We have GP-trained H-LSTMs for image captioning, speech recognition, and neural machine translation applications. For the NeuralTalk architecture on the MSCOCO dataset, our three models reduce the number of parameters by 38.7× [floating-point operations (FLOPs) by 45.5×], run-time latency by 4.5×, and improve the CIDEr-D score by 2.8 percent, respectively. For the DeepSpeech2 architecture on the AN4 dataset, the first model we generated reduces the number of parameters by 19.4× and run-time latency by 37.4 percent. The second model reduces the word error rate (WER) from 12.9 to 8.7 percent. For the encoder-decoder sequence-to-sequence network on the IWSLT 2014 German-English dataset, the first model we generated reduces the number of parameters by 10.8× and run-time latency by 14.2 percent. The second model increases the BLEU score from 30.02 to 30.98. Thus, GP-trained H-LSTMs can be seen to be compact, fast, and accurate.

63 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
88% related
Feature extraction
111.8K papers, 2.1M citations
86% related
Convolutional neural network
74.7K papers, 2M citations
85% related
Artificial neural network
207K papers, 4.5M citations
84% related
Cluster analysis
146.5K papers, 2.9M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023271
2022562
2021640
2020643
2019633
2018528