scispace - formally typeset
Search or ask a question
Topic

Devanagari

About: Devanagari is a research topic. Over the lifetime, 655 publications have been published within this topic receiving 7428 citations. The topic is also known as: Deva nagari & Hindi Script.


Papers
More filters
Journal ArticleDOI
TL;DR: A two pass algorithm for the segmentation and decomposition of Devanagari composite characters/symbols into their constituent symbols and a recognition rate has been achieved on the segmented conjuncts.

143 citations

Journal ArticleDOI
01 Jul 2000
TL;DR: The reading process has been widely studied and there is a general agreement among researchers that knowledge in different forms and at different levels plays a vital role, which is the underlying philosophy of the Devanagari document recognition system described in this work.
Abstract: The reading process has been widely studied and there is a general agreement among researchers that knowledge in different forms and at different levels plays a vital role. This is the underlying philosophy of the Devanagari document recognition system described in this work. The knowledge sources we use are mostly statistical in nature or in the form of a word dictionary tailored specifically for optical character recognition (OCR). We do not perform any reasoning on these. However, we explore their relative importance and role in the hierarchy. Some of the knowledge sources are acquired a priori by an automated training process while others are extracted from the text as it is processed. A complete Devanagari OCR system has been designed and tested with real-life printed documents of varying size and font. Most of the documents used were photocopies of the original. A performance of approximately 90% correct recognition is achieved.

132 citations

Journal ArticleDOI
TL;DR: A case of deep dyslexia in a patient who premorbidly could read English and Nepalese is described, which concludes that his reading performance demonstrates that the reading of syllabic scripts is not necessarily abolished in deep Dyslexia, and the inability of Japanesedeep dyslexics to read aloud or comprehend the syllabIC script kana remains to be explained.
Abstract: Studies of deep dyslexia in Japanese patients, and of non-word reading by deep dyslexic readers of alphabetic scripts, suggest a general principle that reading that depends on the mapping of characters onto phonological segments (phonemes in the case of alphabetic scripts, syllables in the case of syllabaries) is impossible in deep dyslexia. We describe a case of deep dyslexia in a patient who premorbidly could read English and Nepalese. As the latter is written in the syllabic Devanagari script, this case may be used to explore the generality of this principle. It would be expected that reading Nepalese words written in the syllabic script would be more difficult than reading English words written in the Roman alphabet. In oral reading tasks this was the case, even though Nepalese was the patient's first language. However, further studies showed that he could understand Nepalese words written in the syllabic script at least as well as English words written in the Roman alphabet, and that he could read al...

131 citations

07 Jun 2012
TL;DR: This work annotates and releases a large collection of tweets in nine languages, focusing on confusable languages using the Cyrillic, Arabic, and Devanagari scripts, the first publicly-available collection of LID-annotated tweets in non-Latin scripts and should become a standard evaluation set for LID systems.
Abstract: Social media services such as Twitter offer an immense volume of real-world linguistic data. We explore the use of Twitter to obtain authentic user-generated text in low-resource languages such as Nepali, Urdu, and Ukrainian. Automatic language identification (LID) can be used to extract language-specific data from Twitter, but it is unclear how well LID performs on short, informal texts in low-resource languages. We address this question by annotating and releasing a large collection of tweets in nine languages, focusing on confusable languages using the Cyrillic, Arabic, and Devanagari scripts. This is the first publicly-available collection of LID-annotated tweets in non-Latin scripts, and should become a standard evaluation set for LID systems. We also advance the state-of-the-art by evaluating new, highly-accurate LID systems, trained both on our new corpus and on standard materials only. Both types of systems achieve a huge performance improvement over the existing state-of-the-art, correctly classifying around 98% of our gold standard tweets. We provide a detailed analysis showing how the accuracy of our systems vary along certain dimensions, such as the tweet-length and the amount of in- and out-of-domain training data.

124 citations

Journal ArticleDOI
TL;DR: This paper proposes two different techniques for word recognition based on Hidden Markov Models (HMM): lexicon driven and lexicon free, which significantly outperforms either of them used in isolation on handwritten Devanagari word samples.
Abstract: Research for recognizing online handwritten words in Indic scripts is at its early stages when compared to Latin and Oriental scripts In this paper, we address this problem specifically for two major Indic scripts-Devanagari and Tamil In contrast to previous approaches, the techniques we propose are largely data driven and script independent We propose two different techniques for word recognition based on Hidden Markov Models (HMM): lexicon driven and lexicon free The lexicon-driven technique models each word in the lexicon as a sequence of symbol HMMs according to a standard symbol writing order derived from the phonetic representation The lexicon-free technique uses a novel Bag-of-Symbols representation of the handwritten word that is independent of symbol order and allows rapid pruning of the lexicon On handwritten Devanagari word samples featuring both standard and nonstandard symbol writing orders, a combination of lexicon-driven and lexicon-free recognizers significantly outperforms either of them used in isolation In contrast, most Tamil word samples feature the standard symbol order, and the lexicon-driven recognizer outperforms the lexicon free one as well as their combination The best recognition accuracies obtained for 20,000 word lexicons are 8713 percent for Devanagari when the two recognizers are combined, and 918 percent for Tamil using the lexicon-driven technique

107 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
77% related
Support vector machine
73.6K papers, 1.7M citations
75% related
Image segmentation
79.6K papers, 1.8M citations
74% related
Convolutional neural network
74.7K papers, 2M citations
74% related
Encryption
98.3K papers, 1.4M citations
73% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202342
202298
202148
202061
201938
201843