scispace - formally typeset
Search or ask a question

Showing papers on "Devanagari published in 2006"


Proceedings Article
23 Oct 2006
TL;DR: A system for recognition of online handwritten characters has been presented for Indian writing systems and the results have been presented after testing the system on Devanagari and Telugu scripts.
Abstract: A system for recognition of online handwritten characters has been presented for Indian writing systems. A handwritten character is represented as a sequence of strokes whose features are extracted and classied. Support vector machines have been used for constructing the stroke recognition engine. The results have been presented after testing the system on Devanagari and Telugu scripts.

99 citations


Proceedings Article
23 Oct 2006
TL;DR: In this article, a two-stage classification system for recognition of handwritten Devanagari numerals is presented, using a shape feature vector computed from certain directional-view-based strokes of an input character image, used by both the HMM and ANN classifiers of the present recognition system.
Abstract: In this article, a two-stage classification system for recognition of handwritten Devanagari numerals is presented. A shape feature vector computed from certain directional-view-based strokes of an input character image, has been used by both the HMM and ANN classifiers of the present recognition system. The two sets of posterior probabilities obtained from the outputs of the above two classifiers are combined by using another ANN classifier. Finally, the numeral image is classified according to the maximum score provided by the ANN of the second stage. In the proposed scheme, we achieved 92.83% recognition accuracy on the test set of a recently developed large image database[1] of handwritten isolated numerals of Devanagari, the first and third most popular language and script in India and the world respectively. This recognition result improves the previously reported[2] accuracy of 91.28% on the same data set.

60 citations


Proceedings ArticleDOI
27 Apr 2006
TL;DR: Two different techniques for OCR of machine printed, multi-font Devanagari text are outlined, one segmentation driven and the other following the paradigm of recognition driven segmentation.
Abstract: We outline two different techniques for OCR of machine printed, multi-font Devanagari text In the first design, words are segmented along linear boundaries Subsequently, classification is performed with the assumption of accurate segmentation The second approach uses classifiers to obtain preliminary hypothesis for each segment of the word These results are used to guide further segmentation of certain pieces While the former technique is segmentation driven, the latter method follows the paradigm of recognition driven segmentation The two approaches are compared by using a standard data set

29 citations


Proceedings ArticleDOI
01 Nov 2006
TL;DR: The method that has been proposed is using segmentation evolved regular expressions to recognition of unconstrained Devanagari writing is efficient enough because of power of regular expressions.
Abstract: Devanagari script is used in the Indian subcontinent for several major languages such as Hindi, Sanskrit, Marathi and Nepali languages. More than 500 million people use the script. Recognition of unconstrained (Handwritten) Devanagari writing is more complex than English cursive due to shape of constituent strokes. The method that has been proposed is using segmentation and evolved regular expressions. It has been taken care into account that there is vast variation in writing styles size and thickness of characters and any distortion during scanning. There is no need of preprocessing as well as training. The notation of regular expression is short and precise and can be easily transformed into directed graphs or finite - state automata accepting all the symbol strings generated by the corresponding expressions. The method is efficient enough because of power of regular expressions

12 citations


Journal ArticleDOI
TL;DR: In this paper, four Indic alphasyllabaries — Devanagari, Oriya, Kannada and Tamil — are analyzed in terms of a calculus involving two dimensional catenation operators and there is evidence that “phonemic awareness” — the ability for literate speakers to manipulate sounds consciously at the phoneme level — is much stronger with alphabetic scripts, than with alPHasyllABaries.
Abstract: In earlier work (Sproat 2000), I characterized the layout of symbols in a script in terms of a calculus involving two dimensional catenation operators: I claimed that leftwards, rightwards, upwards, downwards and surrounding catenation are sufficient to describe the layout of any script. In the first half of this paper I analyze four Indic alphasyllabaries — Devanagari, Oriya, Kannada and Tamil — in terms of this model. A crucial claim is that despite the complexities of layout in alphasyllabic scripts, they are essentially no different in nature than alphabetic scripts, such as Latin. The second part of the paper explores implications of this view for theories of phonology and human processing of orthography. Apparently problematic is evidence that “phonemic awareness” — the ability for literate speakers to manipulate sounds consciously at the phoneme level — is much stronger with alphabetic scripts, than with alphasyllabaries. But phonemic awareness is not categorically absent for readers of Indic scripts; in general, how aware a reader is of a particular phoneme is related to how that phoneme is rendered in the script. Relevant factors appear to include whether the symbol is written inline, whether it is a diacritic, and whether it is ligatured with another symbol.

11 citations


Proceedings Article
23 Oct 2006
TL;DR: Evaluation of the solution using gesture data collected from novice users shows a gesture recognition accuracy of 97% for supported writing styles and the system may be adapted for different scripts easily.
Abstract: Gesture Keyboard (GKB) is a novel method of text input for syllabic scripts whose success and acceptance is critically dependent on the reliability of handwritten gesture recognition. In this paper, we describe the solution we have developed for the Devanagari Gesture Keyboard. A data driven approach is adopted for the recognition of basic shapes corresponding to components of gestures. Script specific rules are then employed to combine basic shapes into gestures. These rules are captured externally in an XML configuration file so that the system may be adapted for different scripts easily. Evaluation of the solution using gesture data collected from novice users shows a gesture recognition accuracy of 97% for supported writing styles. The paper concludes with next steps.

10 citations


01 Jan 2006
TL;DR: The main object of this paper is to describe in some detail how the printed MW has been encoded without changing its rather complicated structure and without losing any of the information contained in it.
Abstract: The Cologne Digital Sanskrit Lexicon (CDSL) project undertakes to digitize and merge the major bilingual Sanskrit dictionaries compiled in the 19th century. Its aim is to provide a basic lexical corpus to provide an easy access to all available meanings of Sanskrit words and to allow the creation of a number of computer programs that will help to analyze Sanskrit texts. In the first stage Monier-William's Sanskrit-English dictionary (MW) has been digitized to be followed at a second stage by three other dictionaries (Cap, PW2 and Sch). All these will be structured and unified to allow access to the meanings as developed by the different lexicographers. As a final goal it is hoped that a step can be taken towards an integrated Sanskrit word catalogue which codifies the distribution of lexical units in Sanskrit text corpora by linking them to the existing descriptions in dictionaries by a numeric system which functions as a placeholder for a word sense which can be expanded or changed. Last but not least connecting Sanskrit with Tamil vocabulary is envisaged. To this end the major Tamil dictionaries have already been converted into digital form. The main object of this paper is to describe in some detail how the printed MW has been encoded without changing its rather complicated structure and without losing any of the information contained in it. 7-bit encoding has been used for the transliteration of DevanAgari to make it directly readable for humans as well as making it accessible to general text processing tools.

4 citations