scispace - formally typeset
Search or ask a question

Showing papers on "Devanagari published in 2007"


Journal ArticleDOI
TL;DR: A general fuzzy hyperline segment neural network is proposed that combines supervised and unsupervised learning in a single algorithm so that it can be used for pure classification, pure clustering and hybrid classification/clustering.

94 citations


Book ChapterDOI
18 Dec 2007
TL;DR: A hidden Markov model for recognition of handwritten Devanagari words is proposed, which has the property that its states are not defined a priori, but are determined automatically based on a database of handwritten word images.
Abstract: A hidden Markov model (HMM) for recognition of handwritten Devanagari words is proposed. The HMM has the property that its states are not defined a priori, but are determined automatically based on a database of handwritten word images. A handwritten word is assumed to be a string of several stroke primitives. These are in fact the states of the proposed HMM and are found using certain mixture distributions. One HMM is constructed for each word. To classify an unknown word image, its class conditional probability for each HMM is computed. The classification scheme has been tested on a small handwritten Devanagari word database developed recently. The classification accuracy is 87.71% and 82.89% for training and test sets respectively.

33 citations


Book ChapterDOI
09 Sep 2007
TL;DR: The effectiveness of representing an online handwritten stroke using spatiostructural features is demonstrated, as indicated by its effect on the stroke classification accuracy by a Support Vector Machine (SVM) based classifier.
Abstract: The spatiostructural features proposed for recognition of online handwritten characters refer to offline-like features that convey information about both the positional and structural (shape) characteristics of the handwriting unit. This paper demonstrates the effectiveness of representing an online handwritten stroke using spatiostructural features, as indicated by its effect on the stroke classification accuracy by a Support Vector Machine (SVM) based classifier. The study has been done on two major Indian writing systems, Devanagari and Tamil. The importance of localization information of the structural features and handling of translational variance is studied using appropriate approaches to zoning the handwritten character.

25 citations


Journal Article
TL;DR: This paper employs a "Dynamic Time Warping" (DTW) algorithm to align two on-line handwritten strokes and to estimate the similarity, and uses a template-based approach to identify stroke number and stroke order free natural handwritten alphanumeric characters.
Abstract: In this paper, we explore the efficacy of various stroke-based handwriting analysis strategies in classifring Nepalese handwritten alphanumeric characters by using a template-based approach. Writing units are variable from time to time, even within the drawings of a specific character from the same user. Writing units include the properties of stroke such as, number, shape and size, order and writing speed. We propose to use structural properties of writing samples having such variability in writing units. We employ a "Dynamic Time Warping" (DTW) algorithm to align two on-line handwritten strokes and to estimate the similarity. We use two different features for stroke identification, a sequence of direction at every pen-tip position along the pen trajectory and inclusion of pen-tip position with the direction as the feature of the stroke. For each fype of feature, two different systems are trained by using both original and pre-processed samples. To evaluate the system, we collected examples of 46 different alphanumeric characters from 25 Nepalese natives, and then performed a series ofdifferent experiments. Use ofspecific-stroke pre-processing and a sequence of both pen-tip position and slope at every position as a feature of a stroke, yield improved results, which are confidently supported by a five-fold cross validation. The superiority of the present work over several related works on Devanagari script, is the recognition ofstroke number and stroke order free natural handwritten alphanumeric characters.

8 citations


Proceedings ArticleDOI
16 Dec 2007
TL;DR: Experiments and results show that presented method is robust for preprocessing scanned images of Devanagari text documents.
Abstract: In this paper we have presented a rule based approach for removing insignificant data and skew from scanned documents of Devanagari script. To develop an OCR system for Devanagari script is not an easy job hence proper preprocessing of these scanned documents requires noise removal and correcting skew from the image. The proposed system is based on rule based methods, morphological operations and connected component labeling. Images used for the experiment are binarised grayscale images. Experiments and results show that presented method is robust for preprocessing scanned images of Devanagari text documents.

5 citations


13 Dec 2007
TL;DR: This paper proposes to encode the Kaithi script in the international character encoding standard Unicode, which was published in Unicode Standard version 5.2 in October 2009.
Abstract: Author(s): Pandey, Anshuman | Abstract: This is a proposal to encode the Kaithi script in the international character encoding standard Unicode. The script was published in Unicode Standard version 5.2 in October 2009. The script was used for administrative communication from at least the 19c until the early 20c to write Bhojpuri, Magahi, Awadhi, Maithili, Urdu, and other languages related to Hindi. It was also used in religious and literary materials, to record commercial transactions, and in correspondence and personal communication. The Kaithi script was eventually supplanted by Devanagari.

3 citations


01 Jan 2007
TL;DR: This work has developed a recognition driven segmentation method that generates multiple segmentation results for each character and takes advantage of the syllabic-alphabetic nature of the Devanagari script by designing a stochastic framework where the primitives are syllABic characters made of one or more alphabets.
Abstract: Font-independent OCR solutions for Latin and Oriental scripts are commercially available and widely used in Digital library applications. However, accurate OCRs are still not available for Devanagari, a script used by over 400 million people in more than forty languages including Hindi and Sanskrit. Challenges in Devanagari OCR include: (i) Large number of character classes, (ii) Character shapes made of complex primitives that cannot be easily segmented using conventional character segmentation approaches, (iii) Variable representations of the same character in different fonts, and (iv) Preponderance of poor print quality or poor quality paper that causes unpredictable character distortions. We address this challenge by segmenting the characters into components which are horizontally or vertically juxtaposed, and connected along non-linear boundaries. Most techniques in the literature have approximated the segmentation process by using sliding windows or projection profiles in a single direction. We adopt a Block Adjacency Graph (BAG) representation, where each node of the BAG represents a part of the character image and the edges represent their interconnections. Characters are segmented by selecting subgraphs while also accommodating the natural breaks and joints in characters and the various ways in which alphabets can join. Instead of the common approach of using font-dependent rules to guide the segmentation and classification process, we have developed a recognition driven segmentation method that generates multiple segmentation results for each character. Word hypotheses are generated by integrating image recognition results with a language model that encodes frequencies of alphabets and syllabic characters. This is in contrast with the previous use of language models primarily as a post-processing technique. We take advantage of the syllabic-alphabetic nature of the Devanagari script by designing a stochastic framework where the primitives are syllabic characters made of one or more alphabets. We use dictionary lookup to enhance the word hypotheses. On a publicly available, multi-font test set of 10,606 words, we have achieved top choice word accuracy of 75%, and top-5 choice word accuracy of 85%. This is a significant improvement over the performance of previous techniques on the same test set.

2 citations