Script based text identification: a multi-level architecture
TL;DR: The proposed framework presents a top-down approach by performing page, block/paragraph and word level script identification in multiple stages by utilizing texture and shape based information embedded in the documents at different levels for feature extraction.
Abstract: Script identification in a multi-lingual document environment has numerous applications in the field of document image analysis, such as indexing and retrieval or as an initial step towards optical character recognition. In this paper, we propose a novel hierarchical framework for script identification in bi-lingual documents. The framework presents a top-down approach by performing page, block/paragraph and word level script identification in multiple stages. We utilize texture and shape based information embedded in the documents at different levels for feature extraction. The prediction task at different levels of hierarchy is performed by Support Vector Machine (SVM) and Rejection based classifier defined using AdaBoost. Experimental evaluation of the proposed concept on document collections of Hindi/English and Bangla/English scripts have shown promising results.
...read more
Citations
42 citations
Cites background from "Script based text identification: a..."
...12 – Architecture of the proposed work described in [34]....
[...]
...[34] structural features SVM And AdaBoost Hindi, English, Bangla Printed Page level, Text line level, Word level 98....
[...]
...[34] proposed a novel hierarchical framework for script identification in bi-lingual printed documents....
[...]
...13 – Hierarchical classifier for word level script identification [34]....
[...]
21 citations
11 citations
5 citations
3 citations
References
12,467 citations
10,155 citations
7,786 citations
4,027 citations
290 citations