scispace - formally typeset
Proceedings ArticleDOI

Managing multilingual OCR project using XML

TLDR
This paper describes how a new XML based tagging scheme has been exploited to achieve the objectives of the project aimed at developing OCR for 11 scripts of Indian origin for which mature OCR technology was not available.
Abstract
This paper presents an XML-based scheme for managing a large multilingual OCR project. In particular we describe how a new XML based tagging scheme has been exploited to achieve the objectives of the project. Managing a large multi-lingual OCR project involving multiple research groups, developing script specific and script independent technologies in a collaborative fashion is a challenging problem. In this paper, we present some of the software and data management strategies designed for the project aimed at developing OCR for 11 scripts of Indian origin for which mature OCR technology was not available.

read more

Citations
More filters
Proceedings ArticleDOI

Experiences of integration and performance testing of multilingual OCR for printed Indian scripts

TL;DR: The project is an attempt to implement an integrated platform for OCR of different Indian languages and currently is being enhanced for handling the space and time constraints, achieving higher recognition accuracies and adding new functionalities.

Overview of Xml based Knowledge Representation using Scripts

TL;DR: A symbol vocabulary and a system of logic are combined to enable inferences about elements in the knowledge representation to create new knowledge representation sentences by using various techniques.
Proceedings ArticleDOI

Information retrieval system based on ontology

TL;DR: Over the years, the volume of information available through the world wide web has been increasing continuously, and never has so much information readily available and shared among so many people.
References
More filters
Proceedings ArticleDOI

Multimedia ontology learning for automatic annotation and video browsing

TL;DR: This work uses MOWL, a multimedia extension of Web Ontology Language (OWL) which is capable of describing domain concepts in terms of their media properties and of capturing the inherent uncertainties involved.
Proceedings ArticleDOI

Schema extraction for multimedia XML document retrieval

TL;DR: This paper proposes a method of schema extraction for multimedia XML data that leveled schemas are then leveled with respect to the frequency of topological document structures in a database.
Book ChapterDOI

Building Data Sets for Indian Language OCR Research

TL;DR: This chapter presents the activities in this direction of developing robust document understanding systems for Indian languages using a corpus of document images in Indian scripts, and describes the process it follows to obtain word- and symbol-level annotated data sets.