Cell Extraction and Horizontal-Scale Correction in Structured Documents

doi:10.1007/978-981-32-9291-8_5

Book ChapterDOI

Cell Extraction and Horizontal-Scale Correction in Structured Documents

Divya Srivastava, +1 more

- 01 Jan 2020 -

Advances in intelligent systems and comp...

- Vol. 1024, pp 53-64

TLDR

The effectiveness of horizontal-scale correction is proved by applying it as a preprocessing step in a recognition system proposed in (Almazan et al. in Pattern Anal Mach Intell 36(12):21552–2566, 2014 [2]).

Abstract:

Preprocessing techniques form an important task in document image analysis. In structured documents like forms, cheques, etc., there is a predefined space called frame field/cell for the user to fill the entry. When the user is writing, the nonuniformity of inter-character spacing becomes an issue. Many times, the starting characters of the word are written with sparse spacing between the characters and then gradually with a more compact spacing so as to accommodate the word within the frame field. To deal with this variation in intra-word spacing, horizontal-scale correction is applied to the extracted form fields. The effectiveness of the system is proved by applying it as a preprocessing step in a recognition system proposed in (Almazan et al. in Pattern Anal Mach Intell 36(12):21552–2566, 2014 [2]). The recognition framework results in reduced error rates with this normalization.

References

PDF

Open Access

More filters

Book

Digital Image Processing Using MATLAB

Rafael C. Gonzalez, +2 more

TL;DR: 1. Fundamentals of Image Processing, 2. Intensity Transformations and Spatial Filtering, and 3. Frequency Domain Processing.

...read moreread less

Journal ArticleDOI

A Novel Connectionist System for Unconstrained Handwriting Recognition

Alex Graves, +5 more

- 01 May 2009 -

IEEE Transactions on Pattern Analysis an...

TL;DR: This paper proposes an alternative approach based on a novel type of recurrent neural network, specifically designed for sequence labeling tasks where the data is hard to segment and contains long-range bidirectional interdependencies, significantly outperforming a state-of-the-art HMM-based system.

...read moreread less

Journal ArticleDOI

The IAM-database: an English sentence database for offline handwriting recognition

Urs-Viktor Marti, +1 more

- 01 Nov 2002 -

International Journal on Document Analys...

TL;DR: A database that consists of handwritten English sentences based on the Lancaster-Oslo/Bergen corpus, which is expected that the database would be particularly useful for recognition tasks where linguistic knowledge beyond the lexicon level is used.

...read moreread less

Journal ArticleDOI

Word Spotting and Recognition with Embedded Attributes

Jon Almazan, +3 more

- 17 Jul 2014 -

IEEE Transactions on Pattern Analysis an...

TL;DR: An approach in which both word images and text strings are embedded in a common vectorial subspace, allowing one to cast recognition and retrieval tasks as a nearest neighbor problem and is very fast to compute and, especially, to compare.

...read moreread less

Journal ArticleDOI

Using a statistical language model to improve the performance of an HMM-based cursive handwriting recognition systems

U.-V. Marti, +1 more

- 01 Jun 2001 -

International Journal of Pattern Recogni...

TL;DR: A novel feature of the system is that the HMM is applied in such a way that the difficult problem of segmenting a line of text into individual words is avoided and linguistic knowledge beyond the lexicon level is incorporated in the recognition process.

...read moreread less

Cell Extraction and Horizontal-Scale Correction in Structured Documents

References

Digital Image Processing Using MATLAB

A Novel Connectionist System for Unconstrained Handwriting Recognition

The IAM-database: an English sentence database for offline handwriting recognition

Word Spotting and Recognition with Embedded Attributes

Using a statistical language model to improve the performance of an HMM-based cursive handwriting recognition systems

Related Papers (5)

Method for segmenting text words in document images

Chinese word searching in imaged documents

Word Slant Estimation Using Non-horizontal Character Parts and Core-Region Information

Word spotting in Chinese document images without layout analysis

Slant estimation and core-region detection for handwritten Latin words