scispace - formally typeset
Proceedings ArticleDOI

SVM Based Scheme for Thai and English Script Identification

Reads0
Chats0
TLDR
A SVM based method is proposed for identification of word-wise printed English and Thai scripts from a single line of a document page and 99.36% script identification accuracy is obtained from the proposed scheme.
Abstract
In some Thai documents, a single text line of a document page may contain both Thai and English scripts. For the optical character recognition (OCR) of such a document page it is better to identify, at first, Thai and English script portions and then to use individual OCR system of the respective scripts on these identified portions. In this paper, a SVM based method is proposed for identification of word-wise printed English and Thai scripts from a single line of a document page. Here, at first, the document is segmented into lines and then lines are segmented into character groups (words). In the proposed scheme, we identify the script of the individual character group combining different character features obtained from structural shape, profile, component overlapping information, topological properties, water reservoir concept etc. Based on the experiment on 6110 data we obtained 99.36% script identification accuracy from the proposed scheme.

read more

Citations
More filters
Journal ArticleDOI

Character and numeral recognition for non-Indic and Indic scripts: a survey

TL;DR: A comprehensive survey on character and numeral recognition of non-Indic and Indic scripts is presented and major challenges/issues for character/numeral recognition are examined.
Journal ArticleDOI

Script Identification of Multi-Script Documents: A Survey

TL;DR: The most vital processes in script identification are addressed in detail: identification and discriminating methods, features extraction (local and global, and classification), and classification.
Proceedings ArticleDOI

Video Script Identification Based on Text Lines

TL;DR: A new method for video script identification which is essential before choosing an appropriate OCR engine for identifying text lines when a video frame contains more than one language is presented.
Journal ArticleDOI

New Gradient-Spatial-Structural Features for video script identification

TL;DR: This paper proposes to integrate the spatial and the structural features based on end points, intersection points, junction points and straightness of the skeleton of text components in a novel way to identify the scripts.
Proceedings ArticleDOI

New Spatial-Gradient-Features for Video Script Identification

TL;DR: New features based on Spatial-Gradient-Features (SGF) at block level for identifying six video scripts namely, Arabic, Chinese, English, Japanese, Korean and Tamil are presented, which helps in enhancing the capability of the current OCR on video text recognition by choosing an appropriate OCR engine when video contains multi-script frames.
References
More filters
Book

The Nature of Statistical Learning Theory

TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?
Journal ArticleDOI

A Tutorial on Support Vector Machines for Pattern Recognition

TL;DR: There are several arguments which support the observed high accuracy of SVMs, which are reviewed and numerous examples and proofs of most of the key theorems are given.
Journal ArticleDOI

Rotation invariant texture features and their use in automatic script identification

TL;DR: Rotation invariant texture features are computed based on an extension of the popular multi-channel Gabor filtering technique, and their effectiveness is tested with 300 randomly rotated samples of 15 Brodatz textures to solve a practical but hitherto mostly overlooked problem in document image processing.
Journal ArticleDOI

Determination of the script and language content of document images

TL;DR: This work has developed techniques for distinguishing which language is represented in an image of text using a technique based on character shape codes, a representation of Latin text that is inexpensive to compute.
Journal ArticleDOI

Touching numeral segmentation using water reservoir concept

TL;DR: A robust scheme to take care of variability involved in the writing style of different individuals a robust scheme is presented here, mainly based on features obtained from a concept based on water reservoir.
Related Papers (5)