Proceedings ArticleDOI
Information extraction and text mining of Ancient Vattezhuthu characters in historical documents using image zoning
E. K. Vellingiriraj,M. Balamurugan,P. Balasubramanie +2 more
- pp 37-40
Reads0
Chats0
TLDR
A system that involves character recognition of Brahmi, Grantha and Vattezuthu characters from palm manuscripts of historical Tamil ancient documents is developed, analyzed the text and machine translated the present Tamil digital text format.Abstract:
The aim of this paper is to develop a system that involves character recognition of Brahmi, Grantha and Vattezuthu characters from palm manuscripts of historical Tamil ancient documents, analyzed the text and machine translated the present Tamil digital text format. Though many researchers have implemented various algorithms and techniques for character recognition in different languages, ancient characters conversion still poses a big challenge. Because image recognition technology has reached near-perfection when it comes to scanning English and other language text. But optical character recognition (OCR) software capable of digitizing printed Tamil text with high levels of accuracy is still elusive. Only a few people are familiar with the ancient characters and make attempts to convert them into written documents manually. The proposed system overcomes such a situation by converting all the ancient historical documents from inscriptions and palm manuscripts into Tamil digital text format. It converts the digital text format using Tamil unicode. Our algorithm comprises different stages: i) image preprocessing, ii) feature extraction, iii) character recognition and iv) digital text conversion. The first phase conversion accuracy of the Brahmi script rate of our algorithm is 91.57% using the neural network and image zoning method. The second phase of the Vattezhuthu character set is to be implemented. Conversion accuracy of Vattezhuthu is 89.75%.read more
Citations
More filters
Journal ArticleDOI
An analytical study of information extraction from unstructured and multidimensional big data
Kiran Adnan,Rehan Akbar +1 more
TL;DR: This research work addresses the competency and limitations of the existing IE techniques related to data pre-processing, data extraction and transformation, and representations for huge volumes of multidimensional unstructured data and presents a systematic literature review of state-of-the-art techniques for a variety of big data.
Journal ArticleDOI
Efficient English text classification using selected Machine Learning Techniques
TL;DR: The Support Vector Machines (SVM) model in classifying English text and documents is implemented and it is observed that the classification rate exceeds 90% when using more than 4000 features.
Journal ArticleDOI
Limitations of information extraction methods and techniques for heterogeneous unstructured big data
Kiran Adnan,Rehan Akbar +1 more
TL;DR: The review finds that advanced techniques for IE, particularly for multifaceted unstructured big data sets, are the utmost requirement of the organizations to manage big data and derive strategic information.
Journal ArticleDOI
Brahmi character recognition based on SVM (support vector machine) classifier using image gradient features
Sandeep Kaur,B. B. Sagar +1 more
TL;DR: A recognition system for Brahmi characters using linear Support Vector machine classifier, trained on the feature set of 24 images of each character, with an accuracy of 91.6% is presented.
Proceedings ArticleDOI
Character Recognition in Historical Handwritten Documents – A Survey
Nija Babu,A. Soumya +1 more
TL;DR: The paper reviews some of the major works carried out in HCR for Ancient handwritten documents and states that promising results have not been achieved.
References
More filters
Journal ArticleDOI
A prototype document image analysis system for technical journals
TL;DR: The document image acquisition process and the knowledge base that must be entered into the system to process a family of page images are described, and the process by which the X-Y tree data structure converts a 2-D page-segmentation problem into a series of 1-D string-parsing problems that can be tackled using conventional compiler tools.
Journal ArticleDOI
Performance Evaluation and Benchmarking of Six-Page Segmentation Algorithms
TL;DR: A vectorial score that is sensitive to, and identifies, the most important classes of segmentation errors (over, under, and mis-segmentation) and what page components (lines, blocks, etc.) are affected.
Journal ArticleDOI
Handwritten Chinese Text Recognition by Integrating Multiple Contexts
TL;DR: The experimental results show that confidence transformation and combining multiple contexts improve the text line recognition performance significantly, and are superior by far to the best results reported in the literature.
Journal ArticleDOI
A synthesised word approach to word retrieval in handwritten documents
TL;DR: A novel method is described to overcome the training data problem using a character-based modelling approach and a word modelling technique enabling the retrieval of keywords that have not explicitly been seen in the training set.
Journal ArticleDOI
Adaptive Membership Functions for Handwritten Character Recognition by Voronoi-Based Image Zoning
Giuseppe Pirlo,Donato Impedovo +1 more
TL;DR: A new class of zone-based membership functions with adaptive capabilities is introduced and its effectiveness is shown and a genetic algorithm is proposed to determine—in a unique process—the most favorable membership functions along with the optimal zoning topology, described by Voronoi tessellation.
Related Papers (5)
Feature selection for an automated ancient Tamil script classification system using machine learning techniques
T. S. Suganya,S. Murugavalli +1 more