scispace - formally typeset
Proceedings ArticleDOI

Improved degraded document recognition with hybrid modeling techniques and character n-grams

Reads0
Chats0
TLDR
A robust multifont character recognition system for degraded documents, such as photocopy or fax, is described and clearly outperforms commercial systems and leads to further error rate reductions compared to previous results reached on this database.
Abstract
A robust multifont character recognition system for degraded documents, such as photocopy or fax, is described. The system is based on hidden Markov models using discrete and hybrid modeling techniques, where the latter makes use of an information theory-based neural network. The presented recognition results refer to the SEDAL-database of English documents using no dictionary. It is also demonstrated that the usage of a language model that consists of character n-grams yields significantly better recognition results. Our resulting system clearly outperforms commercial systems and leads to further error rate reductions compared to previous results reached on this database.

read more

Citations
More filters
Journal ArticleDOI

Markov models for offline handwriting recognition: a survey

TL;DR: A comprehensive overview of the application of Markov models in the research field of offline handwriting recognition, covering both the widely used hidden Markov model and the less complex Markov-chain or n-gram models is provided.
Proceedings ArticleDOI

Robust Recognition of Degraded Documents Using Character N-Grams

TL;DR: A novel recognition approach that results in a 15% decrease in word error rate on heavily degraded Indian language document images by exploiting the additional context present in the character n-gram images, which enables better disambiguation between confusing characters in the recognition phase.

Off-line handwriting recognition using various hybrid modeling techniques and character n-grams

TL;DR: This is the first paper where this novel approach -called tied posteriors- for handwriting recognition is presented, and the usage of a language model, that consists of character n-grams, as an alternative to the recognition with a large dictionary of German words is demonstrated.
Proceedings ArticleDOI

Improving Recognition of Novel Input with Similarity

TL;DR: A probabilistic framework that unifies similarity with prior identity and contextual information is proposed that fusing information sources in a single model to eliminate unrecoverable errors that result from processing the information in separate stages and improve overall accuracy.
Proceedings ArticleDOI

Evaluation of N-grams conflation approach in text-based information retrieval

TL;DR: The experimental results generated using standard collections ADI, CISI and Medlars show a consistent improvement over the traditional conflation methods, as well as demonstrate the viability of the introduced inverse fvequency multiplier technique.
References
More filters
Journal ArticleDOI

An introduction to hidden Markov models

TL;DR: The purpose of this tutorial paper is to give an introduction to the theory of Markov models, and to illustrate how they have been applied to problems in speech recognition.
Proceedings Article

Statistical Language Modeling using the CMU-Cambridge Toolkit

TL;DR: The CMU Statistical Language Modeling toolkit was re leased in in order to facilitate the construction and testing of bigram and trigram language models and the technology as implemented in the toolkit is outlined.
Journal ArticleDOI

An omnifont open-vocabulary OCR system for English and Arabic

TL;DR: An omnifont, unlimited-vocabulary OCR system for English and Arabic based on hidden Markov models (HMM), an approach that has proven to be very successful in the area of automatic speech recognition, is presented.
Proceedings ArticleDOI

Enhancing degraded document images via bitmap clustering and averaging

TL;DR: A method in finding and averaging bitmaps of the same symbol that are scattered across a text page that can be rendered at arbitrary solution for better display quality and recognition accuracy is proposed.
Related Papers (5)