Information extraction and text mining of Ancient Vattezhuthu characters in historical documents using image zoning

doi:10.1109/IALP.2016.7875929

Proceedings ArticleDOI

Information extraction and text mining of Ancient Vattezhuthu characters in historical documents using image zoning

E. K. Vellingiriraj, +2 more

- pp 37-40

Chats0

TLDR

A system that involves character recognition of Brahmi, Grantha and Vattezuthu characters from palm manuscripts of historical Tamil ancient documents is developed, analyzed the text and machine translated the present Tamil digital text format.

Abstract:

The aim of this paper is to develop a system that involves character recognition of Brahmi, Grantha and Vattezuthu characters from palm manuscripts of historical Tamil ancient documents, analyzed the text and machine translated the present Tamil digital text format. Though many researchers have implemented various algorithms and techniques for character recognition in different languages, ancient characters conversion still poses a big challenge. Because image recognition technology has reached near-perfection when it comes to scanning English and other language text. But optical character recognition (OCR) software capable of digitizing printed Tamil text with high levels of accuracy is still elusive. Only a few people are familiar with the ancient characters and make attempts to convert them into written documents manually. The proposed system overcomes such a situation by converting all the ancient historical documents from inscriptions and palm manuscripts into Tamil digital text format. It converts the digital text format using Tamil unicode. Our algorithm comprises different stages: i) image preprocessing, ii) feature extraction, iii) character recognition and iv) digital text conversion. The first phase conversion accuracy of the Brahmi script rate of our algorithm is 91.57% using the neural network and image zoning method. The second phase of the Vattezhuthu character set is to be implemented. Conversion accuracy of Vattezhuthu is 89.75%.

Information extraction and text mining of Ancient Vattezhuthu characters in historical documents using image zoning

Citations

An analytical study of information extraction from unstructured and multidimensional big data

Efficient English text classification using selected Machine Learning Techniques

Limitations of information extraction methods and techniques for heterogeneous unstructured big data

Brahmi character recognition based on SVM (support vector machine) classifier using image gradient features

Character Recognition in Historical Handwritten Documents – A Survey

References

A prototype document image analysis system for technical journals

Performance Evaluation and Benchmarking of Six-Page Segmentation Algorithms

Handwritten Chinese Text Recognition by Integrating Multiple Contexts

A synthesised word approach to word retrieval in handwritten documents

Adaptive Membership Functions for Handwritten Character Recognition by Voronoi-Based Image Zoning

Related Papers (5)

Identification of Tamil ancient characters and information retrieval from temple epigraphy using image zoning

Feature selection for an automated ancient Tamil script classification system using machine learning techniques

A Novel Approach to OCR using Image Recognition based Classification for Ancient Tamil Inscriptions in Temples

Devnagari handwritten character recognition (DHCR) for ancient documents: A review

A Complete OCR System for Tamil Magazine Documents