scispace - formally typeset
Search or ask a question
Topic

Word error rate

About: Word error rate is a research topic. Over the lifetime, 11939 publications have been published within this topic receiving 298031 citations.


Papers
More filters
Proceedings ArticleDOI
04 May 2020
TL;DR: Experiments show that the proposed Beam-TasNet significantly outperforms the conventional TasNet without beamforming and, moreover, successfully achieves a word error rate comparable to an oracle mask-based MVDR beamformer.
Abstract: Recent studies have shown that acoustic beamforming using a microphone array plays an important role in the construction of high-performance automatic speech recognition (ASR) systems, especially for noisy and overlapping speech conditions. In parallel with the success of multichannel beamforming for ASR, in the speech separation field, the time-domain audio separation network (TasNet), which accepts a time-domain mixture as input and directly estimates the time-domain waveforms for each source, achieves remarkable speech separation performance. In light of these two recent trends, the question of whether TasNet can benefit from beamforming to achieve high ASR performance in overlapping speech conditions naturally arises. Motivated by this question, this paper proposes a novel speech separation scheme, i.e., Beam-TasNet, which combines TasNet with the frequency-domain beamformer, i.e., a minimum variance distortionless response (MVDR) beamformer, through spatial covariance computation to achieve better ASR performance. Experiments on the spatialized WSJ0-2mix corpus show that our proposed Beam-TasNet significantly outperforms the conventional TasNet without beamforming and, moreover, successfully achieves a word error rate comparable to an oracle mask-based MVDR beamformer.

75 citations

Journal ArticleDOI
TL;DR: Experiments show that the proposed robust MFCC-based feature significantly reduces the recognition error rate over a wide signal-to-noise ratio range.

75 citations

Proceedings ArticleDOI
04 May 2014
TL;DR: This work has evaluated the proposed direct SC-based adaptation method in the large scale 320-hr Switchboard task and shown that the proposed method leads to up to 8% relative reduction in word error rate in Switchboard by using only a very small number of adaptation utterances per speaker.
Abstract: Recently an effective fast speaker adaptation method using discriminative speaker code (SC) has been proposed for the hybrid DNN-HMM models in speech recognition [1]. This adaptation method depends on a joint learning of a large generic adaptation neural network for all speakers as well as multiple small speaker codes using the standard back-propagation algorithm. In this paper, we propose an alternative direct adaptation in model space, where speaker codes are directly connected to the original DNN models through a set of new connection weights, which can be estimated very efficiently from all or part of training data. As a result, the proposed method is more suitable for large scale speech recognition tasks since it eliminates the time-consuming training process to estimate another adaptation neural networks. In this work, we have evaluated the proposed direct SC-based adaptation method in the large scale 320-hr Switchboard task. Experimental results have shown that the proposed SC-based rapid adaptation method is very effective not only for small recognition tasks but also for very large scale tasks. For example, it has shown that the proposed method leads to up to 8% relative reduction in word error rate in Switchboard by using only a very small number of adaptation utterances per speaker (from 10 to a few dozens). Moreover, the extra training time required for adaptation is also significantly reduced from the method in [1].

75 citations

Journal ArticleDOI
TL;DR: It is shown that bitext word alignment and translation under the model can be performed with standard finite state machine operations involving these transducers, and the contribution of each of the model components to different aspects of alignment andtranslation performance is identified.
Abstract: We present a Weighted Finite State Transducer Translation Template Model for statistical machine translation. This is a source-channel model of translation inspired by the Alignment Template translation model. The model attempts to overcome the deficiencies of word-to-word translation models by considering phrases rather than words as units of translation. The approach we describe allows us to implement each constituent distribution of the model as a weighted finite state transducer or acceptor. We show that bitext word alignment and translation under the model can be performed with standard finite state machine operations involving these transducers. One of the benefits of using this framework is that it avoids the need to develop specialized search procedures, even for the generation of lattices or N-Best lists of bitext word alignments and translation hypotheses. We report and analyze bitext word alignment and translation performance on the Hansards French-English task and the FBIS Chinese-English task under the Alignment Error Rate, BLEU, NIST and Word Error-Rate metrics. These experiments identify the contribution of each of the model components to different aspects of alignment and translation performance. We finally discuss translation performance with large bitext training sets on the NIST 2004 Chinese-English and Arabic-English MT tasks.

75 citations

Posted Content
TL;DR: Nite-sample exponential bounds on the error rate (in probability and in expectation) of general aggregation rules under the Dawid-Skene crowdsourcing model are provided and can be used to analyze many aggregation methods, including majority voting, weighted majority voting and the oracle Maximum A Posteriori rule.
Abstract: Crowdsourcing has become an eective and popular tool for human-powered computation to label large datasets. Since the workers can be unreliable, it is common in crowdsourcing to assign multiple workers to one task, and to aggregate the labels in order to obtain results of high quality. In this paper, we provide nite-sample exponential bounds on the error rate (in probability and in expectation) of general aggregation rules under the Dawid-Skene crowdsourcing model. The bounds are derived for multi-class labeling, and can be used to analyze many aggregation methods, including majority voting, weighted majority voting and the oracle Maximum A Posteriori (MAP) rule. We show that the oracle MAP rule approximately optimizes our upper bound on the mean error rate of weighted majority voting in certain setting. We propose an iterative weighted majority voting (IWMV) method that optimizes the error rate bound and approximates the oracle MAP rule. Its one step version has a provable theoretical guarantee on the error rate. The IWMV method is intuitive and computationally simple. Experimental results on simulated and real data show that IWMV performs at least on par with the state-of-the-art methods, and it has a much lower computational cost (around one hundred times faster) than the state-of-the-art methods.

75 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
88% related
Feature extraction
111.8K papers, 2.1M citations
86% related
Convolutional neural network
74.7K papers, 2M citations
85% related
Artificial neural network
207K papers, 4.5M citations
84% related
Cluster analysis
146.5K papers, 2.9M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023271
2022562
2021640
2020643
2019633
2018528