Topic

Word error rate

About: Word error rate is a research topic. Over the lifetime, 11939 publications have been published within this topic receiving 298031 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•

Beyond Fano's inequality: bounds on the optimal F-score, BER, and cost-sensitive risk and their implications

[...]

Ming-Jie Zhao¹, Narayanan Unny Edakunni¹, Adam Craig Pocock¹, Gavin Brown¹•Institutions (1)

University of Manchester¹

01 Jan 2013-Journal of Machine Learning Research

TL;DR: This work focuses on the two-class problem and uses a general definition of conditional entropy (including Shannon's as a special case) to derive upper/lower bounds on the optimal F-score, BER and cost-sensitive risk, extending Fano's result.

...read moreread less

Abstract: Fano's inequality lower bounds the probability of transmission error through a communication channel Applied to classification problems, it provides a lower bound on the Bayes error rate and motivates the widely used Infomax principle In modern machine learning, we are often interested in more than just the error rate In medical diagnosis, different errors incur different cost; hence, the overall risk is cost-sensitive Two other popular criteria are balanced error rate (BER) and F-score In this work, we focus on the two-class problem and use a general definition of conditional entropy (including Shannon's as a special case) to derive upper/lower bounds on the optimal F-score, BER and cost-sensitive risk, extending Fano's result As a consequence, we show that Infomax is not suitable for optimizing F-score or cost-sensitive risk, in that it can potentially lead to low F-score and high risk For cost-sensitive risk, we propose a new conditional entropy formulation which avoids this inconsistency In addition, we consider the common practice of using a threshold on the posterior probability to tune performance of a classifier As is widely known, a threshold of 05, where the posteriors cross, minimizes error rate--we derive similar optimal thresholds for F-score and BER

...read moreread less

64 citations

Proceedings Article•DOI•

Sequence-Based Multi-Lingual Low Resource Speech Recognition

[...]

Siddharth Dalmia¹, Ramon Sanabria¹, Florian Metze¹, Alan W. Black¹•Institutions (1)

Carnegie Mellon University¹

18 Apr 2018

TL;DR: The authors showed that end-to-end multi-lingual training of sequence models is effective on context independent models trained using Connectionist Temporal Classification (CTC) loss and showed that the trained model can be adapted cross-lingually to an unseen language using just 25% of the target data.

...read moreread less

Abstract: Techniques for multi-lingual and cross-lingual speech recognition can help in low resource scenarios, to bootstrap systems and enable analysis of new languages and domains. End-to-end approaches, in particular sequence-based techniques, are attractive because of their simplicity and elegance. While it is possible to integrate traditional multi-lingual bottleneck feature extractors as front-ends, we show that end-to-end multi-lingual training of sequence models is effective on context independent models trained using Connectionist Temporal Classification (CTC) loss. We show that our model improves performance on Babel languages by over 6% absolute in terms of word/phoneme error rate when compared to mono-lingual systems built in the same setting for these languages. We also show that the trained model can be adapted cross-lingually to an unseen language using just 25% of the target data. We show that training on multiple languages is important for very low resource cross-lingual target scenarios, but not for multi-lingual testing scenarios. Here, it appears beneficial to include large well prepared datasets.

...read moreread less

64 citations

Proceedings Article•

Recognition of printed Devanagari text using BLSTM Neural Network

[...]

Naveen Sankaran¹, C. V. Jawahar¹•Institutions (1)

International Institute of Information Technology, Hyderabad¹

01 Nov 2012

TL;DR: This paper proposes a recognition scheme for the Indian script of Devanagari using a Recurrent Neural Network known as Bidirectional LongShort Term Memory (BLSTM) and reports a reduction of more than 20% in word error rate and over 9% reduction in character error rate while comparing with the best available OCR system.

...read moreread less

Abstract: In this paper, we propose a recognition scheme for the Indian script of Devanagari. Recognition accuracy of Devanagari script is not yet comparable to its Roman counterparts. This is mainly due to the complexity of the script, writing style etc. Our solution uses a Recurrent Neural Network known as Bidirectional LongShort Term Memory (BLSTM). Our approach does not require word to character segmentation, which is one of the most common reason for high word error rate. We report a reduction of more than 20% in word error rate and over 9% reduction in character error rate while comparing with the best available OCR system.

...read moreread less

64 citations

Posted Content•

Support points

[...]

Simon Mak, V. Roshan Joseph

07 Sep 2016-arXiv: Statistics Theory

TL;DR: A new way to compact a continuous probability distribution into a set of representative points called support points, obtained by minimizing the energy distance, which can be formulated as a difference-of-convex program, which is manipulated using two algorithms to efficiently generate representative point sets.

...read moreread less

Abstract: This paper introduces a new way to compact a continuous probability distribution $F$ into a set of representative points called support points. These points are obtained by minimizing the energy distance, a statistical potential measure initially proposed by Sz\'ekely and Rizzo (2004) for testing goodness-of-fit. The energy distance has two appealing features. First, its distance-based structure allows us to exploit the duality between powers of the Euclidean distance and its Fourier transform for theoretical analysis. Using this duality, we show that support points converge in distribution to $F$, and enjoy an improved error rate to Monte Carlo for integrating a large class of functions. Second, the minimization of the energy distance can be formulated as a difference-of-convex program, which we manipulate using two algorithms to efficiently generate representative point sets. In simulation studies, support points provide improved integration performance to both Monte Carlo and a specific Quasi-Monte Carlo method. Two important applications of support points are then highlighted: (a) as a way to quantify the propagation of uncertainty in expensive simulations, and (b) as a method to optimally compact Markov chain Monte Carlo (MCMC) samples in Bayesian computation.

...read moreread less

64 citations

Proceedings Article•DOI•

DeepASL: Enabling Ubiquitous and Non-Intrusive Word and Sentence-Level Sign Language Translation

[...]

Biyi Fang¹, Jillian Co¹, Mi Zhang¹•Institutions (1)

Michigan State University¹

06 Nov 2017

TL;DR: DeepASL is presented, a transformative deep learning-based sign language translation technology that enables ubiquitous and non-intrusive American Sign Language (ASL) translation at both word and sentence levels and represents a significant step towards breaking the communication barrier between deaf people and hearing majority.

...read moreread less

Abstract: There is an undeniable communication barrier between deaf people and people with normal hearing ability. Although innovations in sign language translation technology aim to tear down this communication barrier, the majority of existing sign language translation systems are either intrusive or constrained by resolution or ambient lighting conditions. Moreover, these existing systems can only perform single-sign ASL translation rather than sentence-level translation, making them much less useful in daily-life communication scenarios. In this work, we fill this critical gap by presenting DeepASL, a transformative deep learning-based sign language translation technology that enables ubiquitous and non-intrusive American Sign Language (ASL) translation at both word and sentence levels. DeepASL uses infrared light as its sensing mechanism to non-intrusively capture the ASL signs. It incorporates a novel hierarchical bidirectional deep recurrent neural network (HB-RNN) and a probabilistic framework based on Connectionist Temporal Classification (CTC) for word-level and sentence-level ASL translation respectively. To evaluate its performance, we have collected 7, 306 samples from 11 participants, covering 56 commonly used ASL words and 100 ASL sentences. DeepASL achieves an average 94.5% word-level translation accuracy and an average 8.2% word error rate on translating unseen ASL sentences. Given its promising performance, we believe DeepASL represents a significant step towards breaking the communication barrier between deaf people and hearing majority, and thus has the significant potential to fundamentally change deaf people's lives.

...read moreread less

64 citations

Collapse

Network Information

Performance

Metrics

12,777

Papers

335,740

Citations

No. of papers in the topic in previous years
Year	Papers
2023	271
2022	562
2021	640
2020	643
2019	633
2018	528

Word error rate

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics