Topic
Devanagari
About: Devanagari is a research topic. Over the lifetime, 655 publications have been published within this topic receiving 7428 citations. The topic is also known as: Deva nagari & Hindi Script.
Papers published on a yearly basis
Papers
More filters
••
TL;DR: This work focuses on the Devanagari script that has 46 categories of characters that makes training a difficult task, especially when the number of samples are few, and proposes deep structure learning of image quadrants, based on learning the hidden state activations derived from convolutional neural networks that are trained separately on five image quadrant.
Abstract: Ancient Indic languages were written in the Devanagari script from which most of the modern-day Indic writing systems have evolved. The digitisation of ancient Devanagari manuscripts, now archived in national museums, is a part of the language documentation and digital archiving initiative of the Government of India. The challenge in digitizing these handwritten scripts is the lack of adequate datasets for training machine learning models. In our work, we focus on the Devanagari script that has 46 categories of characters that makes training a difficult task, especially when the number of samples are few. We propose deep structure learning of image quadrants, based on learning the hidden state activations derived from convolutional neural networks that are trained separately on five image quadrants. The second phase of our learning module comprises of a deep neural network that learns the hidden state activations of the five convolutional neural networks, fused by concatenation. The experiments prove that the proposed deep structure learning outperforms the state of the art.
9 citations
••
01 Dec 2017TL;DR: A dataset of 24253 News articles was extracted and the extractive summaries results were evaluated on various parameters with manual gold summaries of exactly 60 words each.
Abstract: With immense amount of data growing on web in Hindi, a text summariser would be helpful in summarising Government data, medical reports, news, and research articles. Hindi is the fourth most-spoken first language in the world. Hindi written in the Devanagari script is the official language of the Government of India. There is no public dataset for extractive summarisation available in Hindi and thus a dataset of 24253 News articles was extracted and the extractive summaries results were evaluated on various parameters with manual gold summaries of exactly 60 words each.
9 citations
••
15 Dec 2005TL;DR: Experimental results show the validity and efficiency of the developed scheme for recognition of characters of this script, and the major challenge in developing the proposed scheme lay in striking the right balance between definiteness and flexibility to derive optimal solutions for out of sample data.
Abstract: In this paper, a Devanagari script recognition scheme based on a novel algorithm is proposed. Devanagari script poses new challenges in the field of pattern recognition primarily due to the highly cursive nature of the script seen across its diverse character set. In the proposed algorithm, the character is initially subjected to a simple noise removal filter. Based on a reference co-ordinate system, the significant contours of the character are extracted and characterized as a contour set. The recognition of the character involves comparing these contour sets with those in the enrolled database. The matching of these contour sets is achieved by characterizing each contour based on its length, its relative position in the reference co-ordinate system and an interpolation scheme which eliminates displacement errors. In the Devanagari script, similar contour sets are observed among few characters, hence this method helps to filter out disparate characters and narrow down the possibilities to a limited set. The next step involves focusing on the subtle yet vital differences between the similar contours in this limited set. This is done by a prioritization scheme which concentrates only on those portions of character which reflect its uniqueness. The major challenge in developing the proposed scheme lay in striking the right balance between definiteness and flexibility to derive optimal solutions for out of sample data. Experimental results show the validity and efficiency of the developed scheme for recognition of characters of this script.
9 citations
••
01 Jun 2010
9 citations
••
25 Aug 2013
TL;DR: This paper developed a novel part-based model technique that can use either the machine printed or the handwritten dataset for training on Devanagari character recognition from scene images and presents the results on the publicly available dataset (DSIW2K) containing images of street scenes taken in New Delhi, India.
Abstract: Character recognition in scene images is an extremely challenging task. Although several techniques are reported performing well, they pertain to English only. This paper focuses on Devanagari character recognition from scene images. Devanagari script is very popular language and has very typical characteristics different from other scripts, particularly English. Combination of basic Devanagari consonants and vowels in multi-variegated ways can yield as many as 100s of characters. Building a classifier to recognize all these classes will be a difficult task. To alleviate this problem, a novel part-based model technique is proposed. 40 basic classes were identified from the Devanagari script for the same purpose. The technique was proposed so as to classify an instance of one these classes in any given test sample. Procuring a large dataset for training is not feasible in the case of scene images. To simultaneously solve this problem, we developed our technique that can use either the machine printed or the handwritten dataset for training. We present our results on the publicly available dataset (DSIW2K) containing images of street scenes taken in New Delhi, India.
9 citations