scispace - formally typeset
Search or ask a question
Topic

Word error rate

About: Word error rate is a research topic. Over the lifetime, 11939 publications have been published within this topic receiving 298031 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: This paper showed that competition between simultaneously active word candidates can modulate the size of prosodic effects, which suggests that spoken-word recognition must be sensitive both to prosodic structure and to the effects of competition.
Abstract: Spoken utterances contain few reliable cues to word boundaries, but listeners nonetheless experience little difficulty identifying words in continuous speech. The authors present data and simulations that suggest that this ability is best accounted for by a model of spoken-word recognition combining competition between alternative lexical candidates and sensitivity to prosodic structure. In a word-spotting experiment, stress pattern effects emerged most clearly when there were many competing lexical candidates for part of the input. Thus, competition between simultaneously active word candidates can modulate the size of prosodic effects, which suggests that spoken-word recognition must be sensitive both to prosodic structure and to the effects of competition. A version of the Shortlist model (D. G. Norris, 1994b) incorporating the Metrical Segmentation Strategy (A. Cutler & D. Norris, 1988) accurately simulates the results using a lexicon of more than 25,000 words.

267 citations

Posted Content
Wayne Xiong1, Lingfeng Wu1, Fileno A. Alleva1, Jasha Droppo1, Xuedong Huang1, Andreas Stolcke1 
TL;DR: The 2017 version of Microsoft's conversational speech recognition system is described in this article, which adds a CNN-BLSTM acoustic model to the set of model architectures we combined previously, and includes character-based and dialog session aware LSTM language models in rescoring.
Abstract: We describe the 2017 version of Microsoft's conversational speech recognition system, in which we update our 2016 system with recent developments in neural-network-based acoustic and language modeling to further advance the state of the art on the Switchboard speech recognition task. The system adds a CNN-BLSTM acoustic model to the set of model architectures we combined previously, and includes character-based and dialog session aware LSTM language models in rescoring. For system combination we adopt a two-stage approach, whereby subsets of acoustic models are first combined at the senone/frame level, followed by a word-level voting via confusion networks. We also added a confusion network rescoring step after system combination. The resulting system yields a 5.1\% word error rate on the 2000 Switchboard evaluation set.

266 citations

Journal ArticleDOI
TL;DR: In this article, the authors proposed a sending or not sending (Sending or Not sending) protocol based on the twin-field quantum key distribution (TF-QKD), which can tolerate large misalignment error.
Abstract: Based on the novel idea of twin-field quantum key distribution [TF-QKD; Lucamarini et al., Nature (London) 557, 400 (2018)], we present a protocol named the ``sending or not sending TF-QKD'' protocol, which can tolerate large misalignment error. A revolutionary theoretical breakthrough in quantum communication, TF-QKD changes the channel-loss dependence of the key rate from linear to square root of channel transmittance. However, it demands the challenging technology of long-distance single-photon interference, and also, as stated in the original paper, the security proof was not finalized there due to the possible effects of the later announced phase information. Here we show by a concrete eavesdropping scheme that the later phase announcement does have important effects and the traditional formulas of the decoy-state method do not apply to the original protocol. We then present our ``sending or not sending'' protocol. Our protocol does not take postselection for the bits in $Z$-basis (signal pulses), and hence the traditional decoy-state method directly applies and automatically resolves the issue of security proof. Most importantly, our protocol presents a negligibly small error rate in $Z$-basis because it does not request any single-photon interference in this basis. Thus our protocol greatly improves the tolerable threshold of misalignment error in single-photon interference from the original a few percent to more than $45%$. As shown numerically, our protocol exceeds a secure distance of 700, 600, 500, or 300 km even though the single-photon interference misalignment error rate is as large as $15%, 25%, 35%$, or $45%$.

266 citations

Posted Content
TL;DR: A simple end-to-end model for speech recognition, combining a convolutional network based acoustic model and a graph decoding, trained to output letters, without the need for force alignment of phonemes is presented.
Abstract: This paper presents a simple end-to-end model for speech recognition, combining a convolutional network based acoustic model and a graph decoding. It is trained to output letters, with transcribed speech, without the need for force alignment of phonemes. We introduce an automatic segmentation criterion for training from sequence annotation without alignment that is on par with CTC while being simpler. We show competitive results in word error rate on the Librispeech corpus with MFCC features, and promising results from raw waveform.

266 citations

Proceedings ArticleDOI
20 Jun 2005
TL;DR: An algorithm with near optimal time and error rate trade-off is proposed, called WaldBoost, which integrates the AdaBoost algorithm for measurement selection and ordering and the joint probability density estimation with the optimal SPRT decision strategy.
Abstract: In many computer vision classification problems, both the error and time characterizes the quality of a decision. We show that such problems can be formalized in the framework of sequential decision-making. If the false positive and false negative error rates are given, the optimal strategy in terms of the shortest average time to decision (number of measurements used) is the Wald's sequential probability ratio test (SPRT). We built on the optimal SPRT test and enlarge its capabilities to problems with dependent measurements. We show how to overcome the requirements of SPRT - (i) a priori ordered measurements and (ii) known joint probability density functions. We propose an algorithm with near optimal time and error rate trade-off, called WaldBoost, which integrates the AdaBoost algorithm for measurement selection and ordering and the joint probability density estimation with the optimal SPRT decision strategy. The WaldBoost algorithm is tested on the face detection problem. The results are superior to the state-of-the-art methods in the average evaluation time and comparable in detection rates.

264 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
88% related
Feature extraction
111.8K papers, 2.1M citations
86% related
Convolutional neural network
74.7K papers, 2M citations
85% related
Artificial neural network
207K papers, 4.5M citations
84% related
Cluster analysis
146.5K papers, 2.9M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023271
2022562
2021640
2020643
2019633
2018528