Topic
Word error rate
About: Word error rate is a research topic. Over the lifetime, 11939 publications have been published within this topic receiving 298031 citations.
Papers published on a yearly basis
Papers
More filters
••
TL;DR: This paper showed that competition between simultaneously active word candidates can modulate the size of prosodic effects, which suggests that spoken-word recognition must be sensitive both to prosodic structure and to the effects of competition.
Abstract: Spoken utterances contain few reliable cues to word boundaries, but listeners nonetheless experience little difficulty identifying words in continuous speech. The authors present data and simulations that suggest that this ability is best accounted for by a model of spoken-word recognition combining competition between alternative lexical candidates and sensitivity to prosodic structure. In a word-spotting experiment, stress pattern effects emerged most clearly when there were many competing lexical candidates for part of the input. Thus, competition between simultaneously active word candidates can modulate the size of prosodic effects, which suggests that spoken-word recognition must be sensitive both to prosodic structure and to the effects of competition. A version of the Shortlist model (D. G. Norris, 1994b) incorporating the Metrical Segmentation Strategy (A. Cutler & D. Norris, 1988) accurately simulates the results using a lexicon of more than 25,000 words.
267 citations
•
TL;DR: The 2017 version of Microsoft's conversational speech recognition system is described in this article, which adds a CNN-BLSTM acoustic model to the set of model architectures we combined previously, and includes character-based and dialog session aware LSTM language models in rescoring.
Abstract: We describe the 2017 version of Microsoft's conversational speech recognition system, in which we update our 2016 system with recent developments in neural-network-based acoustic and language modeling to further advance the state of the art on the Switchboard speech recognition task. The system adds a CNN-BLSTM acoustic model to the set of model architectures we combined previously, and includes character-based and dialog session aware LSTM language models in rescoring. For system combination we adopt a two-stage approach, whereby subsets of acoustic models are first combined at the senone/frame level, followed by a word-level voting via confusion networks. We also added a confusion network rescoring step after system combination. The resulting system yields a 5.1\% word error rate on the 2000 Switchboard evaluation set.
266 citations
••
TL;DR: In this article, the authors proposed a sending or not sending (Sending or Not sending) protocol based on the twin-field quantum key distribution (TF-QKD), which can tolerate large misalignment error.
Abstract: Based on the novel idea of twin-field quantum key distribution [TF-QKD; Lucamarini et al., Nature (London) 557, 400 (2018)], we present a protocol named the ``sending or not sending TF-QKD'' protocol, which can tolerate large misalignment error. A revolutionary theoretical breakthrough in quantum communication, TF-QKD changes the channel-loss dependence of the key rate from linear to square root of channel transmittance. However, it demands the challenging technology of long-distance single-photon interference, and also, as stated in the original paper, the security proof was not finalized there due to the possible effects of the later announced phase information. Here we show by a concrete eavesdropping scheme that the later phase announcement does have important effects and the traditional formulas of the decoy-state method do not apply to the original protocol. We then present our ``sending or not sending'' protocol. Our protocol does not take postselection for the bits in $Z$-basis (signal pulses), and hence the traditional decoy-state method directly applies and automatically resolves the issue of security proof. Most importantly, our protocol presents a negligibly small error rate in $Z$-basis because it does not request any single-photon interference in this basis. Thus our protocol greatly improves the tolerable threshold of misalignment error in single-photon interference from the original a few percent to more than $45%$. As shown numerically, our protocol exceeds a secure distance of 700, 600, 500, or 300 km even though the single-photon interference misalignment error rate is as large as $15%, 25%, 35%$, or $45%$.
266 citations
•
TL;DR: A simple end-to-end model for speech recognition, combining a convolutional network based acoustic model and a graph decoding, trained to output letters, without the need for force alignment of phonemes is presented.
Abstract: This paper presents a simple end-to-end model for speech recognition, combining a convolutional network based acoustic model and a graph decoding. It is trained to output letters, with transcribed speech, without the need for force alignment of phonemes. We introduce an automatic segmentation criterion for training from sequence annotation without alignment that is on par with CTC while being simpler. We show competitive results in word error rate on the Librispeech corpus with MFCC features, and promising results from raw waveform.
266 citations
••
20 Jun 2005
TL;DR: An algorithm with near optimal time and error rate trade-off is proposed, called WaldBoost, which integrates the AdaBoost algorithm for measurement selection and ordering and the joint probability density estimation with the optimal SPRT decision strategy.
Abstract: In many computer vision classification problems, both the error and time characterizes the quality of a decision. We show that such problems can be formalized in the framework of sequential decision-making. If the false positive and false negative error rates are given, the optimal strategy in terms of the shortest average time to decision (number of measurements used) is the Wald's sequential probability ratio test (SPRT). We built on the optimal SPRT test and enlarge its capabilities to problems with dependent measurements. We show how to overcome the requirements of SPRT - (i) a priori ordered measurements and (ii) known joint probability density functions. We propose an algorithm with near optimal time and error rate trade-off, called WaldBoost, which integrates the AdaBoost algorithm for measurement selection and ordering and the joint probability density estimation with the optimal SPRT decision strategy. The WaldBoost algorithm is tested on the face detection problem. The results are superior to the state-of-the-art methods in the average evaluation time and comparable in detection rates.
264 citations