scispace - formally typeset
Proceedings ArticleDOI

Automatic localization and correction of line segmentation errors

TLDR
The proposed segmentation post processor, which works in a "learning by examples" framework, is not only independent to segmentation algorithms but also robust to the diversity of scanned pages.
Abstract
Text line segmentation is a basic step in any OCR system. Its failure deteriorates the performance of OCR engines. This is especially true for the Indian languages due to the nature of scripts. Many segmentation algorithms are proposed in literature. Often these algorithms fail to adapt dynamically to a given page and thus tend to yield poor segmentation for some specific regions or some specific pages. In this work we design a text line segmentation post processor which automatically localizes and corrects the segmentation errors. The proposed segmentation post processor, which works in a "learning by examples" framework, is not only independent to segmentation algorithms but also robust to the diversity of scanned pages.We show over 5% improvement in text line segmentation on a large dataset of scanned pages for multiple Indian languages.

read more

Citations
More filters
Journal ArticleDOI

A study on volatile organic compounds emitted by in-vitro lung cancer cultured cells using gas sensor array and SPME-GCMS

TL;DR: The findings in this work conclude that the specific VOC released from the cancer cells can act as the odour signature and potentially to be used as non-invasive screening of lung cancer using gas array sensor devices.
References
More filters
Journal ArticleDOI

Image segmentation evaluation: A survey of unsupervised methods

TL;DR: An extensive evaluation of the unsupervised objective evaluation methods that have been proposed in the literature are presented and the advantages and shortcomings of the underlying design mechanisms in these methods are discussed and analyzed.
Journal ArticleDOI

Document analysis system

TL;DR: The requirements and components for a proposed Document Analysis System, which assists a user in encoding printed documents for computer processing, are outlined and several critical functions have been investigated and the technical approaches are discussed.
Journal ArticleDOI

The document spectrum for page layout analysis

TL;DR: The document spectrum (or docstrum) as discussed by the authors is a method for structural page layout analysis based on bottom-up, nearest-neighbor clustering of page components, which yields an accurate measure of skew, within-line, and between-line spacings and locates text lines and text blocks.
Book

The document spectrum for page layout analysis

TL;DR: The document spectrum (or docstrum), which is a method for structural page layout analysis based on bottom-up, nearest-neighbor clustering of page components, yields an accurate measure of skew, within-line, and between-line spacings and locates text lines and text blocks.
Journal ArticleDOI

A prototype document image analysis system for technical journals

TL;DR: The document image acquisition process and the knowledge base that must be entered into the system to process a family of page images are described, and the process by which the X-Y tree data structure converts a 2-D page-segmentation problem into a series of 1-D string-parsing problems that can be tackled using conventional compiler tools.
Related Papers (5)