scispace - formally typeset
Proceedings ArticleDOI

Managing multilingual OCR project using XML

TLDR
This paper describes how a new XML based tagging scheme has been exploited to achieve the objectives of the project aimed at developing OCR for 11 scripts of Indian origin for which mature OCR technology was not available.
Abstract
This paper presents an XML-based scheme for managing a large multilingual OCR project. In particular we describe how a new XML based tagging scheme has been exploited to achieve the objectives of the project. Managing a large multi-lingual OCR project involving multiple research groups, developing script specific and script independent technologies in a collaborative fashion is a challenging problem. In this paper, we present some of the software and data management strategies designed for the project aimed at developing OCR for 11 scripts of Indian origin for which mature OCR technology was not available.

read more

Citations
More filters
Proceedings ArticleDOI

Experiences of integration and performance testing of multilingual OCR for printed Indian scripts

TL;DR: The project is an attempt to implement an integrated platform for OCR of different Indian languages and currently is being enhanced for handling the space and time constraints, achieving higher recognition accuracies and adding new functionalities.

Overview of Xml based Knowledge Representation using Scripts

TL;DR: A symbol vocabulary and a system of logic are combined to enable inferences about elements in the knowledge representation to create new knowledge representation sentences by using various techniques.
Proceedings ArticleDOI

Information retrieval system based on ontology

TL;DR: Over the years, the volume of information available through the world wide web has been increasing continuously, and never has so much information readily available and shared among so many people.
References
More filters
Proceedings ArticleDOI

Content-level Annotation of Large Collection of Printed Document Images

C. V. Jawahar, +1 more
TL;DR: The method is model-driven and is intended to annotate large collection of documents, scanned in three different resolutions, at character level, and employs an XML representation for storage of the annotation information.
Journal ArticleDOI

Eighth International Conference: on Grief and Bereavement in Contemporary Society

TL;DR: The HSM 2010 conference will be held at the ENIM Engineering School at Metz Technopole in the new building with an original design as mentioned in this paper, which is located in the city of Metz, Luxembourg.
Journal ArticleDOI

XML---an opportunity for meaningful data standards in the geosciences

TL;DR: This paper explores the evolution, benefits and status of XML and related standards in the more general context of Web activities and uses this as a platform for discussion of its potential for development of data Standards in the geosciences.
Proceedings ArticleDOI

Representation and annotation of online handwritten data

TL;DR: An XML representation for annotation of online handwriting data that uses the emerging digital ink markup language (InkML) standard from W3C for the representation of handwriting data is described and a tool based on the proposed representation that can be used for annotations of digital ink is described.
Proceedings ArticleDOI

UPX: a new XML representation for annotated datasets of online handwriting data

TL;DR: The efforts to create UPX, an XML-based successor to the venerable UNIPEN format for the representation of annotated datasets of online handwriting data, are introduced and the goals of UPX are outlined.