Journal ArticleDOI
Script and Language Identification in Noisy and Degraded Document Images
Lu Shijian,Chew Lim Tan +1 more
Reads0
Chats0
TLDR
Experimental results show that the proposed identification technique is accurate, easy for extension, and tolerant to noise and various types of document degradation.Abstract:
This paper reports an identification technique that detects scripts and languages of noisy and degraded document images. In the proposed technique, scripts and languages are identified through the document vectorization, which converts each document image into a document vector that characterizes the shape and frequency of the contained character or word images. Document images are vectorized by using vertical component cuts and character extremum points, which are both tolerant to the variation in text fonts and styles, noise, and various types of document degradation. For each script or language under study, a script or language template is first constructed through a training process. Scripts and languages of document images are then determined according to the distances between converted document vectors and the preconstructed script and language templates. Experimental results show that the proposed technique is accurate, easy for extension, and tolerant to noise and various types of document degradation.read more
Citations
More filters
Journal ArticleDOI
Document Image Retrieval through Word Shape Coding
TL;DR: The proposed technique retrieves document images by a new word shape coding scheme, which captures the document content through annotating each word image by a word shape code.
Proceedings ArticleDOI
An Efficient Edge Based Technique for Text Detection in Video Frames
TL;DR: A novel technique for detecting both graphic text and scene text in video images by finding segments containing text in an input image and then using statistical features such as vertical and horizontal bars for edges in the segments for detecting true text blocks efficiently is presented.
Journal ArticleDOI
Script Identification of Multi-Script Documents: A Survey
TL;DR: The most vital processes in script identification are addressed in detail: identification and discriminating methods, features extraction (local and global, and classification), and classification.
Proceedings ArticleDOI
Video Script Identification Based on Text Lines
TL;DR: A new method for video script identification which is essential before choosing an appropriate OCR engine for identifying text lines when a video frame contains more than one language is presented.
Journal ArticleDOI
New Gradient-Spatial-Structural Features for video script identification
TL;DR: This paper proposes to integrate the spatial and the structural features based on end points, intersection points, junction points and straightness of the skeleton of text components in a novel way to identify the scripts.
References
More filters
N-gram-based text categorization
W.B. Cavnar,John M. Trenkle +1 more
TL;DR: An N-gram-based approach to text categorization that is tolerant of textual errors is described, which worked very well for language classification and worked reasonably well for classifying articles from a number of different computer-oriented newsgroups according to subject.
Journal ArticleDOI
Center weighted median filters and their applications to image enhancement
Sung-Jea Ko,Yong Hoon Lee +1 more
TL;DR: The center weighted median (CWM) filter as discussed by the authors is a weighted median filter that gives more weight only to the central value of each window, which can preserve image details while suppressing additive white and/or impulsive-type noise.
Journal ArticleDOI
Evaluation of binarization methods for document images
O.D. Trier,Torfinn Taxt +1 more
TL;DR: This paper presents an evaluation of eleven locally adaptive binarization methods for gray scale images with low contrast, variable background intensity and noise and Niblack's method with the addition of the postprocessing step of Yanowitz and Bruckstein's method (1989) performed the best and was also one of the fastest binarized methods.
Journal ArticleDOI
Rotation invariant texture features and their use in automatic script identification
TL;DR: Rotation invariant texture features are computed based on an extension of the popular multi-channel Gabor filtering technique, and their effectiveness is tested with 300 randomly rotated samples of 15 Brodatz textures to solve a practical but hitherto mostly overlooked problem in document image processing.