scispace - formally typeset
Search or ask a question
Author

K. J. Jinesh

Bio: K. J. Jinesh is an academic researcher from International Institute of Information Technology, Hyderabad. The author has contributed to research in topics: Word recognition & Data management. The author has an hindex of 1, co-authored 2 publications receiving 4 citations.

Papers
More filters
Proceedings ArticleDOI
25 Jul 2009
TL;DR: This paper describes how a new XML based tagging scheme has been exploited to achieve the objectives of the project aimed at developing OCR for 11 scripts of Indian origin for which mature OCR technology was not available.
Abstract: This paper presents an XML-based scheme for managing a large multilingual OCR project. In particular we describe how a new XML based tagging scheme has been exploited to achieve the objectives of the project. Managing a large multi-lingual OCR project involving multiple research groups, developing script specific and script independent technologies in a collaborative fashion is a challenging problem. In this paper, we present some of the software and data management strategies designed for the project aimed at developing OCR for 11 scripts of Indian origin for which mature OCR technology was not available.

3 citations

Proceedings ArticleDOI
12 Dec 2010
TL;DR: An attempt to formulate the problem of degraded word recognition in a generic and formal structure as a probabilistic parsing problem and effectively combine it with an alternate word generator, symbol recognizer and verification unit to improve recognition rates of degraded words without compromising good characters.
Abstract: Though, Indian language OCRs have shown significant improvement in classification rates in recent years, recognition of degraded words still pose a big challenge for the development of robust OCR systems. Ours is an attempt to formulate the problem of degraded word recognition in a generic and formal structure. We formulate the problem of degraded word recognition as a probabilistic parsing problem. A probabilistic parsing based framework is used to rank and validate various possible hypotheses. We effectively combine it with an alternate word generator, symbol recognizer and verification unit to improve recognition rates of degraded words without compromising good characters. We demonstrate our method on Malayalam. We experiment our method on a complete annotated book, where around 65% of the degraded words are correctly recognized using this approach.

1 citations


Cited by
More filters
Proceedings ArticleDOI
17 Sep 2011
TL;DR: The project is an attempt to implement an integrated platform for OCR of different Indian languages and currently is being enhanced for handling the space and time constraints, achieving higher recognition accuracies and adding new functionalities.
Abstract: This paper presents integration and testing scheme for managing a large Multilingual OCR Project. The project is an attempt to implement an integrated platform for OCR of different Indian languages. Software engineering, workflow management and testing processes have been discussed in this paper. The OCR has now been experimentally deployed for some specific applications and currently is being enhanced for handling the space and time constraints, achieving higher recognition accuracies and adding new functionalities.

26 citations

Proceedings ArticleDOI
15 Dec 2011
TL;DR: This paper has created an annotated dataset of 1034 word images with pixel level ground truth for quantitative evaluation of the multiple segmentation methods which address the problem of cuts and merges in degraded words.
Abstract: In most of the Optical Character Recognition softwares, a substantial percentage of errors are caused by the incorrect segmentation of degraded words. This is especially true for recognizing old books, newspapers and historical manuscripts. In this paper, we propose multiple segmentation methods which address the problem of cuts and merges in degraded words. We have created an annotated dataset of 1034 word images with pixel level ground truth for quantitative evaluation of the methods. We compare the methods with a baseline implementation based on connected component analysis. We report substantial improvement in accuracy both at character and at word level.

4 citations

01 Jan 2014
TL;DR: A symbol vocabulary and a system of logic are combined to enable inferences about elements in the knowledge representation to create new knowledge representation sentences by using various techniques.
Abstract: A knowledge representation (KR) is an idea to enable an individual to determine consequences by thinking rather than acting, i.e., by reasoning about the world rather than taking action in it. The knowledge acquired from experts or induced from a set of data must be represented in a format that is both understandable by humans and executable on computers. Knowledge representation research involves analysis of how to reason accurately and effectively and how best to use a set of symbols to represent a set of fact within a knowledge domain. A symbol vocabulary and a system of logic are combined to enable inferences about elements in the knowledge representation to create new knowledge representation sentences by using various techniques.

2 citations

Proceedings ArticleDOI
01 Oct 2016
TL;DR: Over the years, the volume of information available through the world wide web has been increasing continuously, and never has so much information readily available and shared among so many people.
Abstract: Over the years, the volume of information available through the world wide web has been increasing continuously, and never has so much information readily available and shared among so many people. Unfortunately, the unstructured nature and huge volume of information accessible over network have made it difficult for users to shift through and find relevant information. The information retrievals commonly used are based on keywords. These techniques used keyword lists to describe the content of information, but one problem with such list is that they do not say anything about the symantic relationships between keywords, nor do they take into account the meaning of words or phrases.

1 citations