Topic

Word error rate

About: Word error rate is a research topic. Over the lifetime, 11939 publications have been published within this topic receiving 298031 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Word Spotting and Recognition with Embedded Attributes

[...]

Jon Almazan¹, Albert Gordo², Alicia Fornés¹, Ernest Valveny¹•Institutions (2)

Autonomous University of Barcelona¹, Xerox²

17 Jul 2014-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: An approach in which both word images and text strings are embedded in a common vectorial subspace, allowing one to cast recognition and retrieval tasks as a nearest neighbor problem and is very fast to compute and, especially, to compare.

...read moreread less

Abstract: This paper addresses the problems of word spotting and word recognition on images. In word spotting, the goal is to find all instances of a query word in a dataset of images. In recognition, the goal is to recognize the content of the word image, usually aided by a dictionary or lexicon. We describe an approach in which both word images and text strings are embedded in a common vectorial subspace. This is achieved by a combination of label embedding and attributes learning, and a common subspace regression. In this subspace, images and strings that represent the same word are close together, allowing one to cast recognition and retrieval tasks as a nearest neighbor problem. Contrary to most other existing methods, our representation has a fixed length, is low dimensional, and is very fast to compute and, especially, to compare. We test our approach on four public datasets of both handwritten documents and natural images showing results comparable or better than the state-of-the-art on spotting and recognition tasks.

...read moreread less

522 citations

Proceedings Article•DOI•

Learning the Speech Front-end with Raw Waveform CLDNNs

[...]

Tara N. Sainath¹, Ron Weiss², Andrew W. Senior¹, Kevin W. Wilson¹, Oriol Vinyals¹ - Show less +1 more•Institutions (2)

Google¹, Massachusetts Institute of Technology²

06 Sep 2015

TL;DR: It is shown that raw waveform features match the performance of log-mel filterbank energies when used with a state-of-the-art CLDNN acoustic model trained on over 2,000 hours of speech.

...read moreread less

Abstract: Learning an acoustic model directly from the raw waveform has been an active area of research. However, waveformbased models have not yet matched the performance of logmel trained neural networks. We will show that raw waveform features match the performance of log-mel filterbank energies when used with a state-of-the-art CLDNN acoustic model trained on over 2,000 hours of speech. Specifically, we will show the benefit of the CLDNN, namely the time convolution layer in reducing temporal variations, the frequency convolution layer for preserving locality and reducing frequency variations, as well as the LSTM layers for temporal modeling. In addition, by stacking raw waveform features with log-mel features, we achieve a 3% relative reduction in word error rate.

...read moreread less

506 citations

Journal Article•DOI•

DCT-Based Iris Recognition

[...]

Donald M. Monro¹, S. Rakshit¹, Dexin Zhang¹•Institutions (1)

University of Bath¹

01 Apr 2007-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A new worst-case metric is proposed for predicting practical system performance in the absence of matching failures, and the worst case theoretical equal error rate (EER) is predicted to be as low as 2.59 times 10-1 available data sets.

...read moreread less

Abstract: This paper presents a novel iris coding method based on differences of discrete cosine transform (DCT) coefficients of overlapped angular patches from normalized iris images. The feature extraction capabilities of the DCT are optimized on the two largest publicly available iris image data sets, 2,156 images of 308 eyes from the CASIA database and 2,955 images of 150 eyes from the Bath database. On this data, we achieve 100 percent correct recognition rate (CRR) and perfect receiver-operating characteristic (ROC) curves with no registered false accepts or rejects. Individual feature bit and patch position parameters are optimized for matching through a product-of-sum approach to Hamming distance calculation. For verification, a variable threshold is applied to the distance metric and the false acceptance rate (FAR) and false rejection rate (FRR) are recorded. A new worst-case metric is proposed for predicting practical system performance in the absence of matching failures, and the worst case theoretical equal error rate (EER) is predicted to be as low as 2.59 times 10-1 available data sets

...read moreread less

503 citations

Proceedings Article•DOI•

PhotoOCR: Reading Text in Uncontrolled Conditions

[...]

Alessandro Bissacco¹, Mark Cummins¹, Yuval Netzer¹, Hartmut Neven¹•Institutions (1)

Google¹

01 Dec 2013

TL;DR: This work describes Photo OCR, a system for text extraction from images that is capable of recognizing text in a variety of challenging imaging conditions where traditional OCR systems fail, notably in the presence of substantial blur, low resolution, low contrast, high image noise and other distortions.

...read moreread less

Abstract: We describe Photo OCR, a system for text extraction from images. Our particular focus is reliable text extraction from smartphone imagery, with the goal of text recognition as a user input modality similar to speech recognition. Commercially available OCR performs poorly on this task. Recent progress in machine learning has substantially improved isolated character classification, we build on this progress by demonstrating a complete OCR system using these techniques. We also incorporate modern data center-scale distributed language modelling. Our approach is capable of recognizing text in a variety of challenging imaging conditions where traditional OCR systems fail, notably in the presence of substantial blur, low resolution, low contrast, high image noise and other distortions. It also operates with low latency, mean processing time is 600 ms per image. We evaluate our system on public benchmark datasets for text extraction and outperform all previously reported results, more than halving the error rate on multiple benchmarks. The system is currently in use in many applications at Google, and is available as a user input modality in Google Translate for Android.

...read moreread less

499 citations

Journal Article•DOI•

Confidence measures for large vocabulary continuous speech recognition

[...]

Frank Wessel, Ralf Schlüter¹, Klaus Macherey¹, Hermann Ney¹•Institutions (1)

RWTH Aachen University¹

01 Mar 2001-IEEE Transactions on Speech and Audio Processing

TL;DR: It is shown that the posterior probabilities computed on word graphs outperform all other confidence measures and are compared with two alternative confidence measures, i.e., the acoustic stability and the hypothesis density.

...read moreread less

Abstract: In this paper, we present several confidence measures for large vocabulary continuous speech recognition. We propose to estimate the confidence of a hypothesized word directly as its posterior probability, given all acoustic observations of the utterance. These probabilities are computed on word graphs using a forward-backward algorithm. We also study the estimation of posterior probabilities on N-best lists instead of word graphs and compare both algorithms in detail. In addition, we compare the posterior probabilities with two alternative confidence measures, i.e., the acoustic stability and the hypothesis density. We present experimental results on five different corpora: the Dutch ARISE 1k evaluation corpus, the German Verbmobil '98 7k evaluation corpus, the English North American Business '94 20k and 64k development corpora, and the English Broadcast News '96 65k evaluation corpus. We show that the posterior probabilities computed on word graphs outperform all other confidence measures. The relative reduction in confidence error rate ranges between 19% and 35% compared to the baseline confidence error rate.

...read moreread less

496 citations

Collapse

Network Information

Performance

Metrics

12,777

Papers

335,740

Citations

No. of papers in the topic in previous years
Year	Papers
2023	271
2022	562
2021	640
2020	643
2019	633
2018	528

Word error rate

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics