SVM Based Scheme for Thai and English Script Identification

doi:10.1109/ICDAR.2007.4378770

Proceedings ArticleDOI

SVM Based Scheme for Thai and English Script Identification

Sukalpa Chanda, +2 more

- Vol. 1, pp 551-555

Chats0

TLDR

A SVM based method is proposed for identification of word-wise printed English and Thai scripts from a single line of a document page and 99.36% script identification accuracy is obtained from the proposed scheme.

Abstract:

In some Thai documents, a single text line of a document page may contain both Thai and English scripts. For the optical character recognition (OCR) of such a document page it is better to identify, at first, Thai and English script portions and then to use individual OCR system of the respective scripts on these identified portions. In this paper, a SVM based method is proposed for identification of word-wise printed English and Thai scripts from a single line of a document page. Here, at first, the document is segmented into lines and then lines are segmented into character groups (words). In the proposed scheme, we identify the script of the individual character group combining different character features obtained from structural shape, profile, component overlapping information, topological properties, water reservoir concept etc. Based on the experiment on 6110 data we obtained 99.36% script identification accuracy from the proposed scheme.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Character and numeral recognition for non-Indic and Indic scripts: a survey

Munish Kumar, +3 more

- 01 Dec 2019 -

Artificial Intelligence Review

TL;DR: A comprehensive survey on character and numeral recognition of non-Indic and Indic scripts is presented and major challenges/issues for character/numeral recognition are examined.

...read moreread less

Journal ArticleDOI

Script Identification of Multi-Script Documents: A Survey

Kurban Ubul, +5 more

- 30 Mar 2017 -

IEEE Access

TL;DR: The most vital processes in script identification are addressed in detail: identification and discriminating methods, features extraction (local and global, and classification), and classification.

...read moreread less

Proceedings ArticleDOI

Video Script Identification Based on Text Lines

Trung Quy Phan, +4 more

TL;DR: A new method for video script identification which is essential before choosing an appropriate OCR engine for identifying text lines when a video frame contains more than one language is presented.

...read moreread less

Journal ArticleDOI

New Gradient-Spatial-Structural Features for video script identification

Palaiahnakote Shivakumara, +4 more

- 01 Jan 2015 -

Computer Vision and Image Understanding

TL;DR: This paper proposes to integrate the spatial and the structural features based on end points, intersection points, junction points and straightness of the skeleton of text components in a novel way to identify the scripts.

...read moreread less

Proceedings ArticleDOI

New Spatial-Gradient-Features for Video Script Identification

Danni Zhao, +3 more

TL;DR: New features based on Spatial-Gradient-Features (SGF) at block level for identifying six video scripts namely, Arabic, Chinese, English, Japanese, Korean and Tamil are presented, which helps in enhancing the capability of the current OCR on video text recognition by choosing an appropriate OCR engine when video contains multi-script frames.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

The Nature of Statistical Learning Theory

Vladimir Vapnik

TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?

...read moreread less

Journal ArticleDOI

A Tutorial on Support Vector Machines for Pattern Recognition

Christopher John Burges

- 01 Jun 1998 -

Data Mining and Knowledge Discovery

TL;DR: There are several arguments which support the observed high accuracy of SVMs, which are reviewed and numerous examples and proofs of most of the key theorems are given.

...read moreread less

Journal ArticleDOI

Rotation invariant texture features and their use in automatic script identification

Tieniu Tan

- 01 Jul 1998 -

IEEE Transactions on Pattern Analysis an...

TL;DR: Rotation invariant texture features are computed based on an extension of the popular multi-channel Gabor filtering technique, and their effectiveness is tested with 300 randomly rotated samples of 15 Brodatz textures to solve a practical but hitherto mostly overlooked problem in document image processing.

...read moreread less

Journal ArticleDOI

Determination of the script and language content of document images

A.L. Spitz

- 01 Mar 1997 -

IEEE Transactions on Pattern Analysis an...

TL;DR: This work has developed techniques for distinguishing which language is represented in an image of text using a technique based on character shape codes, a representation of Latin text that is inexpensive to compute.

...read moreread less

Journal ArticleDOI

Touching numeral segmentation using water reservoir concept

Umapada Pal, +2 more

- 01 Jan 2003 -

Pattern Recognition Letters

TL;DR: A robust scheme to take care of variability involved in the writing style of different individuals a robust scheme is presented here, mainly based on features obtained from a concept based on water reservoir.

...read moreread less