Topic
Word error rate
About: Word error rate is a research topic. Over the lifetime, 11939 publications have been published within this topic receiving 298031 citations.
Papers published on a yearly basis
Papers
More filters
••
12 Apr 2018TL;DR: The authors explored different topologies and their variants of the attention layer, and compared different pooling methods on the attention weights, and showed that attention-based models can improve the Equal Error Rate (EER) of speaker verification system by relatively 14% compared to a non-attention LSTM baseline model.
Abstract: Attention-based models have recently shown great performance on a range of tasks, such as speech recognition, machine translation, and image captioning due to their ability to summarize relevant information that expands through the entire length of an input sequence. In this paper, we analyze the usage of attention mechanisms to the problem of sequence summarization in our end-to-end text-dependent speaker recognition system. We explore different topologies and their variants of the attention layer, and compare different pooling methods on the attention weights. Ultimately, we show that attention-based models can improves the Equal Error Rate (EER) of our speaker verification system by relatively 14% compared to our non-attention LSTM baseline model.
174 citations
•
TL;DR: In this article, the reconstruction error of the nonlinear responses is minimized subject to a low-rank constraint, which helps to reduce the complexity of filters and reduces the accumulated error when multiple layers are approximated.
Abstract: This paper aims to accelerate the test-time computation of deep convolutional neural networks (CNNs). Unlike existing methods that are designed for approximating linear filters or linear responses, our method takes the nonlinear units into account. We minimize the reconstruction error of the nonlinear responses, subject to a low-rank constraint which helps to reduce the complexity of filters. We develop an effective solution to this constrained nonlinear optimization problem. An algorithm is also presented for reducing the accumulated error when multiple layers are approximated. A whole-model speedup ratio of 4x is demonstrated on a large network trained for ImageNet, while the top-5 error rate is only increased by 0.9%. Our accelerated model has a comparably fast speed as the "AlexNet", but is 4.7% more accurate.
173 citations
••
06 Jul 2002TL;DR: A method for constructing a word graph to represent alternative hypotheses in an efficient way to ensure that these hypotheses can be rescored using a refined language or translation model.
Abstract: Statistical machine translation systems usually compute the single sentence that has the highest probability according to the models that are trained on data. We describe a method for constructing a word graph to represent alternative hypotheses in an efficient way. The advantage is that these hypotheses can be rescored using a refined language or translation model. Results are presented on the German-English Verbmobil corpus.
173 citations
••
06 Sep 2015TL;DR: Several integration architectures are proposed and tested, including a pipeline architecture of L STM-based SE and ASR with sequence training, an alternating estimation architecture, and a multi-task hybrid LSTM network architecture.
Abstract: Long Short-Term Memory (LSTM) recurrent neural network has
proven effective in modeling speech and has achieved outstanding
performance in both speech enhancement (SE) and automatic
speech recognition (ASR). To further improve the performance of
noise-robust speech recognition, a combination of speech enhancement
and recognition was shown to be promising in earlier work.
This paper aims to explore options for consistent integration of SE
and ASR using LSTM networks. Since SE and ASR have different
objective criteria, it is not clear what kind of integration would
finally lead to the best word error rate for noise-robust ASR tasks.
In this work, several integration architectures are proposed and
tested, including: (1) a pipeline architecture of LSTM-based SE and
ASR with sequence training, (2) an alternating estimation architecture,
and (3) a multi-task hybrid LSTM network architecture.
The proposed models were evaluated on the 2nd CHiME speech
separation and recognition challenge task, and show significant
improvements relative to prior results.
173 citations
••
03 Oct 1996TL;DR: A new highly parallel approach to automatic recognition of speech, inspired by early Fetcher's research on articulation index, and based on independent probability estimates in several sub-bands of the available speech spectrum, is presented.
Abstract: A new highly parallel approach to automatic recognition of speech, inspired by early Fetcher's research on articulation index, and based on independent probability estimates in several sub-bands of the available speech spectrum, is presented. The approach is especially suitable for situations when part of the spectrum of speech is computed. In such cases, it can yield an order-of-magnitude improvement in the error rate over a conventional full-band recognizer.
172 citations