Script based text identification: a multi-level architecture

doi:10.1145/2034617.2034630

Proceedings ArticleDOI

Script based text identification: a multi-level architecture

Ehtesham Hassan, +3 more

- pp 11

Chats0

TLDR

The proposed framework presents a top-down approach by performing page, block/paragraph and word level script identification in multiple stages by utilizing texture and shape based information embedded in the documents at different levels for feature extraction.

Abstract:

Script identification in a multi-lingual document environment has numerous applications in the field of document image analysis, such as indexing and retrieval or as an initial step towards optical character recognition. In this paper, we propose a novel hierarchical framework for script identification in bi-lingual documents. The framework presents a top-down approach by performing page, block/paragraph and word level script identification in multiple stages. We utilize texture and shape based information embedded in the documents at different levels for feature extraction. The prediction task at different levels of hierarchy is performed by Support Vector Machine (SVM) and Rejection based classifier defined using AdaBoost. Experimental evaluation of the proposed concept on document collections of Hindi/English and Bangla/English scripts have shown promising results.

Script based text identification: a multi-level architecture

Citations

Offline Script Identification from multilingual Indic-script documents: A state-of-the-art

Benchmark databases of handwritten Bangla-Roman and Devanagari-Roman mixed-script document images

Hindi Text Document Classification System Using SVM and Fuzzy: A Survey

A Hybrid Hindi Printed Document Classification System Using SVM and Fuzzy: An Advancement

Advanced Applications on Bilingual Document Analysis and Processing Systems

References

Document Image Retrieval Using Feature Combination in Kernel Space

Related Papers (5)

SVM Based Scheme for Thai and English Script Identification

Composite Script Identification and Orientation Detection for Indian Text Images

Zone-based structural feature extraction for script identification from Indian documents

A study on word-level multi-script identification from video frames

Script recognition in images with complex backgrounds