Improved degraded document recognition with hybrid modeling techniques and character n-grams

doi:10.1109/ICPR.2000.902952

Proceedings ArticleDOI

Improved degraded document recognition with hybrid modeling techniques and character n-grams

A. Brakensiek, +2 more

- Vol. 4, pp 4438-4441

Chats0

TLDR

A robust multifont character recognition system for degraded documents, such as photocopy or fax, is described and clearly outperforms commercial systems and leads to further error rate reductions compared to previous results reached on this database.

Abstract:

A robust multifont character recognition system for degraded documents, such as photocopy or fax, is described. The system is based on hidden Markov models using discrete and hybrid modeling techniques, where the latter makes use of an information theory-based neural network. The presented recognition results refer to the SEDAL-database of English documents using no dictionary. It is also demonstrated that the usage of a language model that consists of character n-grams yields significantly better recognition results. Our resulting system clearly outperforms commercial systems and leads to further error rate reductions compared to previous results reached on this database.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Markov models for offline handwriting recognition: a survey

Thomas Plötz, +1 more

- 12 Nov 2009 -

International Journal on Document Analys...

TL;DR: A comprehensive overview of the application of Markov models in the research field of offline handwriting recognition, covering both the widely used hidden Markov model and the less complex Markov-chain or n-gram models is provided.

...read moreread less

Proceedings ArticleDOI

Robust Recognition of Degraded Documents Using Character N-Grams

Shrey Dutta, +3 more

TL;DR: A novel recognition approach that results in a 15% decrease in word error rate on heavily degraded Indian language document images by exploiting the additional context present in the character n-gram images, which enables better disambiguation between confusing characters in the recognition phase.

...read moreread less

Off-line handwriting recognition using various hybrid modeling techniques and character n-grams

A. Brakensiek, +3 more

TL;DR: This is the first paper where this novel approach -called tied posteriors- for handwriting recognition is presented, and the usage of a language model, that consists of character n-grams, as an alternative to the recognition with a large dictionary of German words is demonstrated.

...read moreread less

Proceedings ArticleDOI

Improving Recognition of Novel Input with Similarity

Jerod Weinman, +1 more

TL;DR: A probabilistic framework that unifies similarity with prior identity and contextual information is proposed that fusing information sources in a single model to eliminate unrecoverable errors that result from processing the information in separate stages and improve overall accuracy.

...read moreread less

Proceedings ArticleDOI

Evaluation of N-grams conflation approach in text-based information retrieval

Serhiy Kosinov

TL;DR: The experimental results generated using standard collections ADI, CISI and Medlars show a consistent improvement over the traditional conflation methods, as well as demonstrate the viability of the introduced inverse fvequency multiplier technique.

...read moreread less

References

PDF

Open Access

More filters

Journal ArticleDOI

An introduction to hidden Markov models

Lawrence R. Rabiner, +1 more

- 01 Jan 1986 -

IEEE Assp Magazine

TL;DR: The purpose of this tutorial paper is to give an introduction to the theory of Markov models, and to illustrate how they have been applied to problems in speech recognition.

...read moreread less

Proceedings Article

Statistical Language Modeling using the CMU-Cambridge Toolkit

Philip Clarkson, +1 more

TL;DR: The CMU Statistical Language Modeling toolkit was re leased in in order to facilitate the construction and testing of bigram and trigram language models and the technology as implemented in the toolkit is outlined.

...read moreread less

Experimental evaluation

Ravishankar K. Iyer

Journal ArticleDOI

An omnifont open-vocabulary OCR system for English and Arabic

I. Bazzi, +2 more

- 01 Jun 1999 -

IEEE Transactions on Pattern Analysis an...

TL;DR: An omnifont, unlimited-vocabulary OCR system for English and Arabic based on hidden Markov models (HMM), an approach that has proven to be very successful in the area of automatic speech recognition, is presented.

...read moreread less

Proceedings ArticleDOI

Enhancing degraded document images via bitmap clustering and averaging

J.D. Hobby, +1 more

TL;DR: A method in finding and averaging bitmaps of the same symbol that are scattered across a text page that can be rendered at arbitrary solution for better display quality and recognition accuracy is proposed.

...read moreread less

Related Papers (5)

Word recognition in natural scene and video images using Hidden Markov Model

Sangheeta Roy, +3 more

Two-Step CNN Framework for Text Line Recognition in Camera-Captured Images

Yulia S. Chernyshova, +2 more

- 14 Feb 2020 -

IEEE Access

Improved degraded document recognition with hybrid modeling techniques and character n-grams

Citations

Markov models for offline handwriting recognition: a survey

Robust Recognition of Degraded Documents Using Character N-Grams

Off-line handwriting recognition using various hybrid modeling techniques and character n-grams

Improving Recognition of Novel Input with Similarity

Evaluation of N-grams conflation approach in text-based information retrieval

References

An introduction to hidden Markov models

Statistical Language Modeling using the CMU-Cambridge Toolkit

Experimental evaluation

An omnifont open-vocabulary OCR system for English and Arabic

Enhancing degraded document images via bitmap clustering and averaging

Related Papers (5)

Word recognition in natural scene and video images using Hidden Markov Model

Two-Step CNN Framework for Text Line Recognition in Camera-Captured Images

Comprative study with analysis of OCR algorithms and invention analysis of character recognition approched methodologies

Scene text recognition with high performance CNN classifier and efficient word inference

Recursive Recurrent Nets with Attention Modeling for OCR in the Wild