Topic

Word error rate

About: Word error rate is a research topic. Over the lifetime, 11939 publications have been published within this topic receiving 298031 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Beam-TasNet: Time-domain Audio Separation Network Meets Frequency-domain Beamformer

[...]

Tsubasa Ochiai¹, Marc Delcroix¹, Rintaro Ikeshita¹, Keisuke Kinoshita¹, Tomohiro Nakatani¹, Shoko Araki¹ - Show less +2 more•Institutions (1)

Nippon Telegraph and Telephone¹

04 May 2020

TL;DR: Experiments show that the proposed Beam-TasNet significantly outperforms the conventional TasNet without beamforming and, moreover, successfully achieves a word error rate comparable to an oracle mask-based MVDR beamformer.

...read moreread less

Abstract: Recent studies have shown that acoustic beamforming using a microphone array plays an important role in the construction of high-performance automatic speech recognition (ASR) systems, especially for noisy and overlapping speech conditions. In parallel with the success of multichannel beamforming for ASR, in the speech separation field, the time-domain audio separation network (TasNet), which accepts a time-domain mixture as input and directly estimates the time-domain waveforms for each source, achieves remarkable speech separation performance. In light of these two recent trends, the question of whether TasNet can benefit from beamforming to achieve high ASR performance in overlapping speech conditions naturally arises. Motivated by this question, this paper proposes a novel speech separation scheme, i.e., Beam-TasNet, which combines TasNet with the frequency-domain beamformer, i.e., a minimum variance distortionless response (MVDR) beamformer, through spatial covariance computation to achieve better ASR performance. Experiments on the spatialized WSJ0-2mix corpus show that our proposed Beam-TasNet significantly outperforms the conventional TasNet without beamforming and, moreover, successfully achieves a word error rate comparable to an oracle mask-based MVDR beamformer.

...read moreread less

75 citations

Journal Article•DOI•

Improved MFCC-based feature for robust speaker identification

[...]

Zunjing Wu¹, Zhigang Cao¹•Institutions (1)

Tsinghua University¹

01 Apr 2005-Tsinghua Science & Technology

TL;DR: Experiments show that the proposed robust MFCC-based feature significantly reduces the recognition error rate over a wide signal-to-noise ratio range.

...read moreread less

75 citations

Proceedings Article•DOI•

Direct adaptation of hybrid DNN/HMM model for fast speaker adaptation in LVCSR based on speaker code

[...]

Shaofei Xue¹, Ossama Abdel-Hamid², Hui Jiang², Li-Rong Dai¹•Institutions (2)

University of Science and Technology of China¹, York University²

04 May 2014

TL;DR: This work has evaluated the proposed direct SC-based adaptation method in the large scale 320-hr Switchboard task and shown that the proposed method leads to up to 8% relative reduction in word error rate in Switchboard by using only a very small number of adaptation utterances per speaker.

...read moreread less

Abstract: Recently an effective fast speaker adaptation method using discriminative speaker code (SC) has been proposed for the hybrid DNN-HMM models in speech recognition [1]. This adaptation method depends on a joint learning of a large generic adaptation neural network for all speakers as well as multiple small speaker codes using the standard back-propagation algorithm. In this paper, we propose an alternative direct adaptation in model space, where speaker codes are directly connected to the original DNN models through a set of new connection weights, which can be estimated very efficiently from all or part of training data. As a result, the proposed method is more suitable for large scale speech recognition tasks since it eliminates the time-consuming training process to estimate another adaptation neural networks. In this work, we have evaluated the proposed direct SC-based adaptation method in the large scale 320-hr Switchboard task. Experimental results have shown that the proposed SC-based rapid adaptation method is very effective not only for small recognition tasks but also for very large scale tasks. For example, it has shown that the proposed method leads to up to 8% relative reduction in word error rate in Switchboard by using only a very small number of adaptation utterances per speaker (from 10 to a few dozens). Moreover, the extra training time required for adaptation is also significantly reduced from the method in [1].

...read moreread less

75 citations

Journal Article•DOI•

A weighted finite state transducer translation template model for statistical machine translation

[...]

Shankar Kumar¹, Yonggang Deng¹, William Byrne¹•Institutions (1)

Johns Hopkins University¹

01 Mar 2006-Natural Language Engineering

TL;DR: It is shown that bitext word alignment and translation under the model can be performed with standard finite state machine operations involving these transducers, and the contribution of each of the model components to different aspects of alignment andtranslation performance is identified.

...read moreread less

Abstract: We present a Weighted Finite State Transducer Translation Template Model for statistical machine translation. This is a source-channel model of translation inspired by the Alignment Template translation model. The model attempts to overcome the deficiencies of word-to-word translation models by considering phrases rather than words as units of translation. The approach we describe allows us to implement each constituent distribution of the model as a weighted finite state transducer or acceptor. We show that bitext word alignment and translation under the model can be performed with standard finite state machine operations involving these transducers. One of the benefits of using this framework is that it avoids the need to develop specialized search procedures, even for the generation of lattices or N-Best lists of bitext word alignments and translation hypotheses. We report and analyze bitext word alignment and translation performance on the Hansards French-English task and the FBIS Chinese-English task under the Alignment Error Rate, BLEU, NIST and Word Error-Rate metrics. These experiments identify the contribution of each of the model components to different aspects of alignment and translation performance. We finally discuss translation performance with large bitext training sets on the NIST 2004 Chinese-English and Arabic-English MT tasks.

...read moreread less

75 citations

Posted Content•

Error Rate Bounds and Iterative Weighted Majority Voting for Crowdsourcing

[...]

Hongwei Li¹, Bin Yu¹•Institutions (1)

University of California, Berkeley¹

15 Nov 2014-arXiv: Machine Learning

TL;DR: Nite-sample exponential bounds on the error rate (in probability and in expectation) of general aggregation rules under the Dawid-Skene crowdsourcing model are provided and can be used to analyze many aggregation methods, including majority voting, weighted majority voting and the oracle Maximum A Posteriori rule.

...read moreread less

Abstract: Crowdsourcing has become an eective and popular tool for human-powered computation to label large datasets. Since the workers can be unreliable, it is common in crowdsourcing to assign multiple workers to one task, and to aggregate the labels in order to obtain results of high quality. In this paper, we provide nite-sample exponential bounds on the error rate (in probability and in expectation) of general aggregation rules under the Dawid-Skene crowdsourcing model. The bounds are derived for multi-class labeling, and can be used to analyze many aggregation methods, including majority voting, weighted majority voting and the oracle Maximum A Posteriori (MAP) rule. We show that the oracle MAP rule approximately optimizes our upper bound on the mean error rate of weighted majority voting in certain setting. We propose an iterative weighted majority voting (IWMV) method that optimizes the error rate bound and approximates the oracle MAP rule. Its one step version has a provable theoretical guarantee on the error rate. The IWMV method is intuitive and computationally simple. Experimental results on simulated and real data show that IWMV performs at least on par with the state-of-the-art methods, and it has a much lower computational cost (around one hundred times faster) than the state-of-the-art methods.

...read moreread less

75 citations

Collapse

Network Information

Performance

Metrics

12,777

Papers

335,740

Citations

No. of papers in the topic in previous years
Year	Papers
2023	271
2022	562
2021	640
2020	643
2019	633
2018	528

Word error rate

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics