Showing papers on "Devanagari published in 2018"

PDF

Open Access

Journal Article•DOI•

PHDIndic_11: page-level handwritten document image dataset of 11 official Indic scripts for script identification

[...]

Sk Md Obaidullah¹, Chayan Halder², K. C. Santosh³, Nibaran Das⁴, Kaushik Roy² - Show less +1 more•Institutions (4)

Aliah University¹, West Bengal State University², University of South Dakota³, Jadavpur University⁴

01 Jan 2018-Multimedia Tools and Applications

TL;DR: A page-level handwritten document image dataset of 11 official Indic scripts, composed of 1458 document text-pages written by 463 individuals from various parts of India, is presented and the benchmark results for handwritten script identification (HSI) are reported.

...read moreread less

Abstract: Without publicly available dataset, specifically in handwritten document recognition (HDR), we cannot make a fair and/or reliable comparison between the methods. Considering HDR, Indic script’s document recognition is still in its early stage compared to others such as Roman and Arabic. In this paper, we present a page-level handwritten document image dataset (PHDIndic_11), of 11 official Indic scripts: Bangla, Devanagari, Roman, Urdu, Oriya, Gurumukhi, Gujarati, Tamil, Telugu, Malayalam and Kannada. PHDIndic_11 is composed of 1458 document text-pages written by 463 individuals from various parts of India. Further, we report the benchmark results for handwritten script identification (HSI). Beside script identification, the dataset can be effectively used in many other applications of document image analysis such as script sentence recognition/understanding, text-line segmentation, word segmentation/recognition, word spotting, handwritten and machine printed texts separation and writer identification.

...read moreread less

70 citations

Proceedings Article•DOI•

Optical Character Recognition for Sanskrit Using Convolution Neural Networks

[...]

Meduri Avadesh¹, Navneet Goyal¹•Institutions (1)

Birla Institute of Technology and Science¹

24 Apr 2018

TL;DR: A Convolutional Neural Network based Optical Character Recognition system (OCR) which accurately digitizes Ancient Sanskrit manuscripts (Devanagari Script) that are not necessarily in good condition.

...read moreread less

Abstract: Ancient Sanskrit manuscripts are a rich source of knowledge about Science, Mathematics, Hindu mythology, Indian civilization, and culture. It therefore becomes critical that access to these manuscripts is made easy, to share this knowledge with the world and to facilitate further research on this Ancient literature. In this paper, we propose a Convolutional Neural Network (CNN) based Optical Character Recognition system (OCR) which accurately digitizes Ancient Sanskrit manuscripts (Devanagari Script) that are not necessarily in good condition. We use an image segmentation algorithm for calculating pixel intensities to identify letters in the image. The OCR considers typical compound characters (half letter combinations) as separate classes in order to improve the segmentation accuracy. The novelty of the OCR is its robustness to image quality, image contrast, font style and font size, which makes it an ideal choice for digitizing soiled and poorly maintained Sanskrit manuscripts.

...read moreread less

40 citations

Proceedings Article•DOI•

Offline Handwriting Recognition on Devanagari Using a New Benchmark Dataset

[...]

Kartik Dutta¹, Praveen Krishnan¹, Minesh Mathew¹, C. V. Jawahar¹•Institutions (1)

International Institute of Information Technology, Hyderabad¹

24 Apr 2018

TL;DR: This paper releases a new handwritten word dataset for Devanagari, IIIT-HW-Dev, and empirically shows that usage of synthetic data and cross lingual transfer learning helps alleviate the issue of lack of training data.

...read moreread less

Abstract: Handwriting recognition (HWR) in Indic scripts, like Devanagari is very challenging due to the subtleties in the scripts, variations in rendering and the cursive nature of the handwriting. Lack of public handwriting datasets in Indic scripts has long stymied the development of offline handwritten word recognizers and made comparison across different methods a tedious task in the field. In this paper, we release a new handwritten word dataset for Devanagari, IIIT-HW-Dev to alleviate some of these issues. We benchmark the IIIT-HW-Dev dataset using a CNN-RNN hybrid architecture. Furthermore, using this architecture, we empirically show that usage of synthetic data and cross lingual transfer learning helps alleviate the issue of lack of training data. We use this proposed pipeline on a public dataset, RoyDB and achieve state of the art results.

...read moreread less

38 citations

Proceedings Article•DOI•

Does Deeper Network Lead to Better Accuracy: A Case Study on Handwritten Devanagari Characters

[...]

Bappaditya Chakraborty, Bikash Shaw¹, Jayanta Aich, Ujjwal Bhattacharya¹, Swapan K. Parui - Show less +1 more•Institutions (1)

Indian Statistical Institute¹

24 Apr 2018

TL;DR: The recognition accuracy obtained in the best case improves significantly the existing state-of-the-art of this handwriting recognition problem and further analysis of the simulation results provides an answer to the question: does an increase in the depth of the network eventually lead to an improved recognition performance on unknown samples?

...read moreread less

Abstract: Deep neural network architectures have been used successfully in various document analysis studies. Its strength in producing human like performance has already been explored in handwritten English numeral recognition task. In this context, a natural question that often arises in a practitioner's mind: does an increase in the depth of the network eventually lead to an improved recognition performance on unknown samples? A goal of the present work is to search for an answer of the same through a case study of a larger class handwriting recognition problem. Here, we have studied recognition of handwritten Devanagari characters. In this study, we have implemented convolutional neural network (CNN) architectures of five different depths. We have also implemented additional neural architectures by adding two Bidirectional Long Short Term Memory (BLSTM) layers between the convolutional stack and the fully connected part of each of these five CNN networks. Simulations have been performed on two different databases of handwritten Devanagari characters consisting of 30408 and 36172 samples and a combined set consisting of 58451 samples. The recognition accuracy obtained in the best case improves significantly the existing state-of-the-art of this handwriting recognition problem. Also, further analysis of our simulation results provides an answer to the above question. Additionally, we have trained a BLSTM network alone using the Histogram of Oriented Gradient (HOG) features. Performance of this architecture failed to compete with the performance of CNN-BLSTM hybrid architecture.

...read moreread less

34 citations

Proceedings Article•DOI•

Handwritten Devanagari Character Classification using Deep Learning.

[...]

Prasad K. Sonawane¹, Sushama Shelke•Institutions (1)

College of Engineering, Pune¹

01 Aug 2018

TL;DR: In this Experiment, this work successfully tried to classify handwritten Devanagari characters using transfer learning mechanism with the help of Alexnet, a convolutional neural network which shows impressive results.

...read moreread less

Abstract: Since past few years, deep neural networks, because of their outstanding performance, are getting highly used in computer vision and machine learning tasks such as regression, segmentation, classification, detection, pattern recognition etc. Recognition of handwritten Devanagari characters is challenging task, but Deep learning can be effectively used as a solution for various such problems. Person to person variations in writing style makes handwritten character recognition one of the most difficult tasks. In this Experiment, we successfully tried to classify handwritten Devanagari characters using transfer learning mechanism with the help of Alexnet. Alexnet, a convolutional neural network, is trained over a dataset of around 16870 samples of 22 consonants of Devanagari script which shows impressive results. The transfer learning helps to learn faster and better even if the data samples are less as compared with the training a CNN from scratch.

...read moreread less

34 citations

Journal Article•DOI•

Cross-language framework for word recognition and spotting of Indic scripts

[...]

Ayan Kumar Bhunia¹, Partha Pratim Roy², Akash Mohta¹, Umapada Pal³•Institutions (3)

Future Institute of Engineering and Management¹, Indian Institute of Technology Roorkee², Indian Statistical Institute³

01 Jul 2018-Pattern Recognition

TL;DR: In this article, a cross-language platform for handwritten word recognition and spotting for such low-resource scripts where training is performed with a sufficiently large dataset of an available script and testing is done on other scripts (considered as target script).

...read moreread less

29 citations

Journal Article•DOI•

A lexicon-free approach for 3D handwriting recognition using classifier combination

[...]

Pradeep Kumar¹, Rajkumar Saini¹, Partha Pratim Roy¹, Umapada Pal²•Institutions (2)

Indian Institute of Technology Roorkee¹, Indian Statistical Institute²

01 Feb 2018-Pattern Recognition Letters

TL;DR: A lexicon free approach for the recognition of 3D handwritten words in Latin and Devanagari scripts by combining multiple classifiers by using the Recognizer Output Voting Error Reduction (ROVER) framework.

...read moreread less

29 citations

Proceedings Article•DOI•

Deep Convolutional Neural Network for Recognition of Unified Multi-Language Handwritten Numerals

[...]

Ghazanfar Latif¹, Jaafar Alghazo¹, Loay Alzubaidi¹, M. Muzzamal Naseer², Yazan Alghazo¹ - Show less +1 more•Institutions (2)

Prince Mohammad bin Fahd University¹, Australian National University²

12 Mar 2018

TL;DR: Results indicate that the proposed deep learning architecture for the recognition of handwritten Multilanguage (mixed numerals belongs to multiple languages) numerals produces better results compared to methods suggested in the previous literature.

...read moreread less

Abstract: Deep learning systems have recently gained importance as the architecture of choice in artificial intelligence (AI). Handwritten numeral recognition is essential for the development of systems that can accurately recognize digits in different languages which is a challenging task due to variant writing styles. This is still an open area of research for developing an optimized Multilanguage writer independent technique for numerals. In this paper, we propose a deep learning architecture for the recognition of handwritten Multilanguage (mixed numerals belongs to multiple languages) numerals (Eastern Arabic, Persian, Devanagari, Urdu, Western Arabic). The overall accuracy of the combined Multilanguage database was 99.26% with a precision of 99.29% on average. The average accuracy of each individual language was found to be 99.322%. Results indicate that the proposed deep learning architecture produces better results compared to methods suggested in the previous literature.

...read moreread less

27 citations

Journal Article•DOI•

Visualizing and Understanding Customized Convolutional Neural Network for Recognition of Handwritten Marathi Numerals

[...]

D. T. Mane, Uday V. Kulkarni

01 Jan 2018-Procedia Computer Science

TL;DR: A Customized Convolutional Neural Network (CCNN) that has the ability to learn the features automatically and predict the class of numerals from a wide ranged data-set and its performance when verified using K- fold cross validation has achieved average 94.93% accuracy for testing data-sets.

...read moreread less

27 citations

Journal Article•DOI•

Benchmark databases of handwritten Bangla-Roman and Devanagari-Roman mixed-script document images

[...]

Pawan Kumar Singh¹, Ram Sarkar¹, Nibaran Das¹, Subhadip Basu¹, Mahantapas Kundu¹, Mita Nasipuri¹ - Show less +2 more•Institutions (1)

Jadavpur University¹

01 Apr 2018-Multimedia Tools and Applications

TL;DR: This paper addresses three key challenges here: collection, compilation and organization of benchmark databases of images of 150 Bangla-Roman and 150 Devanagari-Roman mixed-script handwritten document pages respectively, and development of a bi-script and tri-script word-level script identification module using Modified log-Gabor filter as feature extractor.

...read moreread less

Abstract: Handwritten document image dataset is one of the basic necessities to conduct research on developing Optical Character Recognition (OCR) systems. In a multilingual country like India, handwritten documents often contain more than one script, leading to complex pattern analysis problems. In this paper, we highlight two such situations where Devanagari and Bangla scripts, two most widely used scripts in Indian sub-continent, are individually used along with Roman script in documents. We address three key challenges here: 1) collection, compilation and organization of benchmark databases of images of 150 Bangla-Roman and 150 Devanagari-Roman mixed-script handwritten document pages respectively, 2) script-level annotation of 18931 Bangla words, 15528 Devanagari words and 10331 Roman words in those 300 document pages, and 3) development of a bi-script and tri-script word-level script identification module using Modified log-Gabor filter as feature extractor. The technique is statistically validated using multiple classifiers and it is found that Multi-Layer Perceptron (MLP) classifier performs the best. Average word-level script identification accuracies of 92.32%, 95.30% and 93.78% are achieved using 3-fold cross validation for Bangla-Roman, Devanagari-Roman and Bangla-Devanagari-Roman databases respectively. Both the mixed-script document databases along with the script-level annotations and 44790 extracted word images of the three aforementioned scripts are available freely at https://code.google.com/p/cmaterdb/ .

...read moreread less

27 citations

Proceedings Article•DOI•

An Efficient Approach for Handwritten Devanagari Character Recognition based on Artificial Neural Network

[...]

Nikita Singh¹•Institutions (1)

Banasthali Vidyapith¹

01 Feb 2018

TL;DR: The proposed approach achieves the maximum of 99.27% classification accuracy in training and is able to recognize the different handwritten Devanagari characters with an average accuracy of 97.06%.

...read moreread less

Abstract: Hindi is the common and most popular language in the countries such as India, Nepal etc. People use this language not only for conversation but also in their vehicles license plates, documents, sign boards, handwritten notes etc. In recent years, many approaches have been proposed for Hindi character recognition and various applications such as text to speech translator, automatic license plate recognition etc. are proposed for these. Some computationally expensive approaches have achieved desirable accuracy but for light computing devices, recognition of handwritten characters is still challenging task. This paper proposes an approach for recognition of handwritten Devanagari character recognition. The shape variance of the character in Devanagari script is exhibited by variant of curves. These characters are distinguished using feature extraction in piecewise manner. The image partitioning technique is used for piecewise histogram of oriented gradients (HOG) features extraction. To train the neural network, a feature vector comprise of HOG features of all partitions is used. The proposed approach achieves the maximum of 99.27% classification accuracy in training and is able to recognize the different handwritten Devanagari characters with an average accuracy of 97.06%. The proposed approach may be useful in the application for blind people to read the handwritten contents.

...read moreread less

Journal Article•DOI•

Static and Dynamic Synthesis of Bengali and Devanagari Signatures

[...]

Miguel Ferrer¹, Sukalpa Chanda², Moises Diaz¹, Chayan Kumar Banerjee², Anirban Majumdar², Cristina Carmona-Duarte¹, Parikshit Acharya², Umapada Pal² - Show less +4 more•Institutions (2)

University of Las Palmas de Gran Canaria¹, Indian Statistical Institute²

01 Oct 2018-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: This paper reports an effective synthesizer for static and dynamic signatures written in Devanagari or Bengali scripts, and obtains promising results with artificially generated signatures in terms of appearance and performance when compared with those for real signatures.

...read moreread less

Abstract: Developing an automatic signature verification system is challenging and demands a large number of training samples This is why synthetic handwriting generation is an emerging topic in document image analysis Some handwriting synthesizers use the motor equivalence model, the well-established hypothesis from neuroscience, which analyses how a human being accomplishes movement Specifically, a motor equivalence model divides human actions into two steps: 1) the effector independent step at cognitive level and 2) the effector dependent step at motor level In fact, recent work reports the successful application to Western scripts of a handwriting synthesizer, based on this theory This paper aims to adapt this scheme for the generation of synthetic signatures in two Indic scripts, Bengali (Bangla), and Devanagari (Hindi) For this purpose, we use two different online and offline databases for both Bengali and Devanagari signatures This paper reports an effective synthesizer for static and dynamic signatures written in Devanagari or Bengali scripts We obtain promising results with artificially generated signatures in terms of appearance and performance when we compare the results with those for real signatures

...read moreread less

Proceedings Article•DOI•

Towards Spotting and Recognition of Handwritten Words in Indic Scripts

[...]

Kartik Dutta¹, Praveen Krishnan¹, Minesh Mathew¹, C. V. Jawahar¹•Institutions (1)

International Institute of Information Technology, Hyderabad¹

01 Aug 2018

TL;DR: A framework for annotating large scale of handwritten word images with ease and speed is proposed, and a new handwritten word dataset for Telugu is released, which is collected and annotated using the proposed framework.

...read moreread less

Abstract: Handwriting recognition (HWR) in Indic scripts is a challenging problem due to the inherent subtleties in the scripts, cursive nature of the handwriting and similar shape of the characters. Lack of publicly available handwriting datasets in Indic scripts has affected the development of handwritten word recognizers, and made direct comparisons across different methods an impossible task in the field. In this paper, we propose a framework for annotating large scale of handwritten word images with ease and speed. We also release a new handwritten word dataset for Telugu, which is collected and annotated using the proposed framework. We also benchmark major Indic scripts such as Devanagari, Bangla and Telugu for the tasks of word spotting and handwriting recognition using state of the art deep neural architectures. Finally, we evaluate the proposed pipeline on RoyDB, a public dataset, and achieve significant reduction in error rates.

...read moreread less

Book Chapter•DOI•

A Study on the Effect of CNN-Based Transfer Learning on Handwritten Indic and Mixed Numeral Recognition.

[...]

Rahul Pramanik¹, Prabhat Dansena¹, Soumen Bag¹•Institutions (1)

Indian Institutes of Technology¹

18 Dec 2018

TL;DR: This study uses readily available pre-trained Convolutional Neural Network architectures on four different Indic scripts, viz.

...read moreread less

Abstract: Filling up forms at post offices, railway counters, and for application of jobs has become a routine for modern people, especially in a developing country like India. Research on automation for the recognition of such handwritten forms has become mandatory. This applies more for a multilingual country like India. In the present work, we use readily available pre-trained Convolutional Neural Network (CNN) architectures on four different Indic scripts, viz. Bangla, Devanagari, Oriya, and Telugu to achieve a satisfactory recognition rate for handwritten Indic numerals. Furthermore, we have mixed Bangla and Oriya numerals and applied transfer learning for recognition. The main objective of this study is to realize how good a CNN model trained on an entire different dataset (of natural images) works for small and unrelated datasets. As a part of practical application, we have applied the proposed approach to recognize Bangla handwritten pin codes after their extraction from postal letters.

...read moreread less

Proceedings Article•DOI•

Handwritten Digit Recognition using DAISY Descriptor: A Study

[...]

Agneet Chatterjee¹, Samir Malakar², Ram Sarkar¹, Mita Nasipuri¹•Institutions (2)

Jadavpur University¹, Asutosh College²

01 Jan 2018

TL;DR: A script invariant feature vector is designed here based on the concept of the DAISY descriptor which has previously been applied in different research domains and is a computationally inexpensive approach when compared to other state-of-the-art prevalent architectures like LSTM or CNN.

...read moreread less

Abstract: Handwritten digit recognition is a highly evolved research domain. The major issues that make this domain challenging are different photometric discrepancies, along with computation complexity. A script invariant feature vector is designed here based on the concept of the DAISY descriptor which has previously been applied in different research domains. We have applied this feature descriptor after suitable customization to fit it into the aforesaid classification problem. We have tested the same on handwritten digits written in four different scripts namely Arabic, Bangla, Devanagari and Roman. Bangla dataset is in-house, while the remaining are the standard databases. Experimental results demonstrate the effectiveness of the said feature descriptor for digit recognition. It is a computationally inexpensive approach when compared to other state-of-the-art prevalent architectures like LSTM or CNN.

...read moreread less

Journal Article•DOI•

Copying helps novice learners build orthographic knowledge: methods for teaching Devanagari akshara

[...]

Adeetee Bhide¹, Adeetee Bhide²•Institutions (2)

National University of Singapore¹, University of Pittsburgh²

01 Jan 2018-Reading and Writing

TL;DR: The authors showed that copying and writing complex akshara is more time efficient than writing, suggesting that having beginning learners copy and write the complex ankhara is an important pedagogical tool to use in classrooms.

...read moreread less

Abstract: Hindi graphs, called akshara, are difficult to learn because of their visual complexity and large set of graphs. Akshara containing multiple consonants (complex akshara) are particularly difficult. In Hindi, complex akshara are formed by fusing individual consonantal graphs. Some complex akshara look similar to their component parts (transparent), whereas others do not (opaque). We taught 35 English-speaking adults a semi-artificial orthography that was modeled on the Devanagari script used for Hindi and other Indic languages. Participants were taught 80 complex akshara using 4 different methods: (1) choosing the components (from several choices) given the graph (2) choosing the correct graph (from several choices) given its components, (3) copying a graph while the graph and its components are displayed, and (4) writing a graph from memory given its components. Methods 1 and 2 compare emphasis on part-whole versus whole-part relationships, methods 1 & 2 and 3 & 4 compare motor effects, and methods 3 and 4 compare testing effects. We found that transparent graphs were better learned than opaque graphs. Testing on the akshara typically did not improve learning and there were few effects of emphasis on part-whole versus whole-part relationships. There was evidence for motor effects; copying & writing the akshara improved pure orthographic knowledge and people’s ability to produce the phonological form of a given akshara. These results corroborate other studies showing that copying and writing graphs helps beginning learners of English, Chinese, and Arabic build orthographic knowledge. Copying was more time efficient than writing, suggesting that having beginning learners copy akshara is an important pedagogical tool to use in classrooms.

...read moreread less

Proceedings Article•DOI•

Recognition of Handwritten Numerals of various Indian Regional Languages using Deep Learning

[...]

Saumya Chaurasia¹, Suneeta Agarwal¹•Institutions (1)

Motilal Nehru National Institute of Technology Allahabad¹

01 Nov 2018

TL;DR: An effective handwritten numeral recognition approach based on Convolutional Neural Network (CNN) and Support Vector Machine (SVM) and the results show that the performance of the proposed approach is better than state-of-art approaches.

...read moreread less

Abstract: Handwritten numeral recognition is an interesting area of research in the field of computer vision and pattern recognition. It plays an important role in postal automation services especially in a country like India where multiple languages and scripts are used. So, the recognition system needs to deal with many challenges like varying writing styles and cursive nature of handwriting. This paper proposes an effective handwritten numeral recognition approach based on Convolutional Neural Network (CNN) and Support Vector Machine (SVM). The proposed work is an attempt to develop a recognition system for recognizing the handwritten digits written in any one of the regional languages: Bangla, Devanagari, Oriya, and Telugu. The proposed system first normalizes the input image having single digit thereafter CNN works as a feature extractor while SVM as a classifier. Experiments have been conducted on benchmark database of ISI Kolkata (having numerals of Bangla, Devanagari, and Oriya languages) and CMaterdb (having numerals of Telugu language) (Jadavpur University). The results show that our model performs best for Devanagari language with accuracy 99.41% and for Bangla, Telugu, and Oriya, the accuracies are 99.14%, 99.16%, and 94.54% respectively. So, the performance of the proposed approach is better than state-of-art approaches.

...read moreread less

Journal Article•DOI•

Devanagari and Gurmukhi Script Recognition in the Context of Machine Learning Classifiers

[...]

R. G. Sharma, Baijnath Kaushik, Naveen Kumar Gond

15 Jun 2018-Journal of Artificial Intelligence

Proceedings Article•DOI•

Devanagari Ancient Character Recognition using HOG and DCT Features

[...]

Sonika Rani Narang¹, Manish Kumar Jindal², Pooja Sharma¹•Institutions (2)

D.A.V. College, Koraput¹, Panjab University, Chandigarh²

01 Dec 2018

TL;DR: Two feature extraction techniques, namely, DCT(Discrete Cosine Transformation) zigzag features and Histogram of oriented gradients are considered for extracting features of Devanagari ancient manuscripts for recognition of ancient documents in Devanakari script.

...read moreread less

Abstract: In the present work, a system for recognition of ancient documents in Devanagari script is presented. Two feature extraction techniques, namely, DCT(Discrete Cosine Transformation) zigzag features and Histogram of oriented gradients are considered for extracting features of Devanagari ancient manuscripts. For recognition, three classification techniques, namely, SVM (Support Vector Machine), decision tree, and Naive Bayes are used. A database for the experiments is collected from various libraries and museums. Using SVM classifier with RBF kernel, a recognition accuracy of 90.70% with DCT zigzag feature vector of length 100 has been reported. A recognition accuracy of 90.70% with a partitioning strategy of dataset (80% data as training data and the remaining 20% data as testing data) has been achieved.

...read moreread less

Proceedings Article•DOI•

Recognition of Offline Handwritten Devanagari Numerals using Regional Weighted Run Length Features

[...]

Pawan Kumar Singh, Supratim Das, Ram Sarkar, Mita Nasipuri

29 Jun 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, Mask Oriented Directional (MOD) features were used for the recognition of handwritten Devanagari script and achieved a 95.02% accuracy using SVM classifier.

...read moreread less

Abstract: Recognition of handwritten Roman characters and numerals has been extensively studied in the last few decades and its accuracy reached to a satisfactory state. But the same cannot be said while talking about the Devanagari script which is one of most popular script in India. This paper proposes an efficient digit recognition system for handwritten Devanagari script. The system uses a novel 196-element Mask Oriented Directional (MOD) features for the recognition purpose. The methodology is tested using five conventional classifiers on 6000 handwritten digit samples. On applying 3-fold cross-validation scheme, the proposed system yields the highest recognition accuracy of 95.02% using Support Vector Machine (SVM) classifier.

...read moreread less

Proceedings Article•DOI•

Augmented Handwritten Devanagari Digit Recognition Using Convolutional Autoencoder

[...]

Sourabh Kumar¹, R. K Aggarwal¹•Institutions (1)

National Institute of Technology, Kurukshetra¹

11 Jul 2018

TL;DR: The improved accuracy of Hindi, English and Bangla digit dataset is shown by using the proposed approach and also performing a number of cross-validation experiments on all three datasets using image augmentation.

...read moreread less

Abstract: Handwritten digit recognition has turned into one of the demanding areas of research in the field of image processing. Many approaches have been proposed which include a statistical method, fuzzy technique, and neural network for feature classification and feature selection but have not been found to use convolutional autoencoder for Devanagari digit after performing image augmentation on the training dataset. This paper shows the use of unsupervised training using convolutional autoencoder with deep ConvNet in order to detect handwritten Devanagari digits, i.e., 0–9. Convolutional autoencoder is the type of autoencoder that is used to encode the input for extracting important features and then try to reconstruct the input image. This paper shows the improved accuracy of Hindi, English and Bangla digit dataset by using the proposed approach and also performing a number of cross-validation experiments on all three datasets using image augmentation.

...read moreread less

Proceedings Article•DOI•

RNN Based Online Handwritten Word Recognition in Devanagari Script

[...]

Pooja Keshri, Prabhat Kumar¹, Rajib Ghosh¹•Institutions (1)

National Institute of Technology, Patna¹

01 Aug 2018

TL;DR: This article proposes a novel approach for online handwritten word recognition in Devanagari script based on two recently developed models of Recurrent Neural Network (RNN), termed as Long-Short Term Memory (LSTM) and Bidirectional Long- shortterm Memory (BLSTM), specifically designed for sequential data where the segmentation of data into basic unit level is very difficult.

...read moreread less

Abstract: Devanagari script is the most popular script in India. But, very little recognized works have been done in this script towards development of online handwritten text recognition systems. The existence of large number of symbols and symbol order variations in this script, has led to low recognition rates for even the best existing recognition system. Most of the existing studies in Devanagari script have relied upon the same Hidden Markov Model (HMM) which has been used for so many years in handwriting recognition, despite of its familiar shortcomings. This article proposes a novel approach for online handwritten word recognition in Devanagari script based on two recently developed models of Recurrent Neural Network (RNN), termed as Long-Short Term Memory (LSTM) and Bidirectional Long-Short Term Memory (BLSTM), specifically designed for sequential data where the segmentation of data into basic unit level is very difficult. Analysis shows that words are written in non-cursive fashion in Devanagari script. The proposed approach considers the local zone wise analysis of each basic stroke of a word to extract various features from each basic stroke. In this local zone wise feature extraction approach, dominant points are detected from strokes using slope angles, to find the local features. These features are then studied using both LSTM and BLSTM versions of RNN. Most of the existing word recognition systems in this script have followed the typical holistic approach whereas the proposed system has been developed in analytical scheme with a total of 10K words in lexicon. An exhaustive experiment on large datasets has been performed to evaluate the performance of the proposed recognition approach using both LSTM and BLSTM to make a comparative performance analysis. Experimental results show that the proposed system outperforms existing HMM based systems in the literature.

...read moreread less

Journal Article•DOI•

Script differences and masked translation priming: Evidence from Hindi-English bilinguals.

[...]

Namrata Dubey¹, Naoko Witzel¹, Jeffrey Witzel¹•Institutions (1)

University of Texas at Arlington¹

01 Jan 2018-Quarterly Journal of Experimental Psychology

TL;DR: This study reports on two experiments investigating the effects of script differences on masked translation priming in highly proficient early Hindi-English bilinguals and provides alternative accounts for these results in terms of how orthographic cues provided by L1 targets might lead to the discontinuation or disruption of processing for L2 primes.

...read moreread less

Abstract: This study reports on two experiments investigating the effects of script differences on masked translation priming in highly proficient early Hindi-English bilinguals. In Experiment 1 (the cross-script experiment), L1 Hindi was presented in the standard Devanagari script, while L2 English was presented in the Roman alphabet. In Experiment 2 (the same-script experiment), both L1 Hindi and L2 English were presented in the Roman alphabet. Both experiments revealed translation priming in the L1-L2 direction. However, L2-L1 priming was obtained in the same-script experiment, but not in the cross-script experiment. These findings are discussed in relation to the orthographic cue hypothesis as well as hypotheses that hold that script differences influence the distance between the L1 and L2 in lexical space and/or cross-language lateral inhibition. We also provide alternative accounts for these results in terms of how orthographic cues provided by L1 targets might lead to the discontinuation or disruption of processing for L2 primes.

...read moreread less

Book Chapter•DOI•

On-Line Devanagari Handwritten Character Recognition Using Moments Features

[...]

Shalaka Prasad Deore¹, Shalaka Prasad Deore², A. Pravin²•Institutions (2)

Savitribai Phule Pune University¹, Sathyabama University²

21 Dec 2018

TL;DR: A HWDCR system that recognizes Devanagari handwritten characters, the most popular script in India, is proposed using MLP-BP Neural Network Classifier for classification and the average recognition accuracy is achieved.

...read moreread less

Abstract: Now a days recognizing the handwritten character is receiving high significance because of numerous applications like Educational Software, On-line Signature Verification, Bank Cheque Processing, postal code recognition, Electronic library etc Very less work is accounted in the research of Devanagari handwritten character recognition (HWDCR), so that there is a large scope of research in this area In this paper we proposed a HWDCR system that recognizes Devanagari handwritten characters, the most popular script in India Using pen tablet handwritten character is inputted and its on-line features are extracted like sequence of (x, y) coordinates, stroke and pressure information which are passed to classifier for classification We have used MLP-BP Neural Network Classifier for classification The average recognition accuracy is achieved by the proposed HWDCR system is 90% using on-line data

...read moreread less

Journal Article•DOI•

Sub-Stroke-Wise Relative Feature for Online Indic Handwriting Recognition

[...]

Nilanjana Bhattacharya¹, Partha Pratim Roy², Umapada Pal³•Institutions (3)

Bose Institute¹, Indian Institute of Technology Roorkee², Indian Statistical Institute³

17 Dec 2018

TL;DR: A new category of features called ‘sub-stroke-wise relative feature’ (SRF) which are based on relative information of the constituent parts of the handwritten strokes are proposed which significantly outperforms the state-of-the-art feature sets for online Bangla and Devanagari cursive word recognition.

...read moreread less

Abstract: The main problem of Bangla (Bengali) and Devanagari handwriting recognition is the shape similarity of characters. There are only a few pieces of work on writer-independent cursive online Indian text recognition, and the shape similarity problem needs more attention from the researchers. To handle the shape similarity problem of cursive characters of Bangla and Devanagari scripts, in this article, we propose a new category of features called ‘sub-stroke-wise relative feature’ (SRF) which are based on relative information of the constituent parts of the handwritten strokes. Relative information among some of the parts within a character can be a distinctive feature as it scales up small dissimilarities and enhances discrimination among similar-looking shapes. Also, contextual anticipatory phenomena are automatically modeled by this type of feature, as it takes into account the influence of previous and forthcoming strokes. We have tested popular state-of-the-art feature sets as well as proposed SRF using various (up to 20,000-word) lexicons and noticed that SRF significantly outperforms the state-of-the-art feature sets for online Bangla and Devanagari cursive word recognition.

...read moreread less

Training deployable general domain MT for a low resource language pair: English–Bangla

[...]

Sandipan Dandapat, William Lewis

01 Jan 2018

TL;DR: These efforts towards developing general domain English–Bangla MT systems which are deployable to the Web are described, which have gained significant improvement over SMT baselines.

...read moreread less

Abstract: A large percentage of the world’s population speaks a language of the Indian subcontinent, what we will call here Indic languages, comprising languages from both Indo-European (e.g., Hindi, Bangla, Gujarati, etc.) and Dravidian (e.g., Tamil, Telugu, Malayalam, etc.) families, upwards of 1.5 Billion people. A universal characteristic of Indic languages is their complex morphology, which, when combined with the general lack of sufficient quantities of high quality parallel data, can make developing machine translation (MT) for these languages difficult. In this paper, we describe our efforts towards developing general domain English–Bangla MT systems which are deployable to the Web. We initially developed and deployed SMT-based systems, but over time migrated to NMT-based systems. Our initial SMT-based systems had reasonably good BLEU scores, however, using NMT systems, we have gained significant improvement over SMT baselines. This is achieved using a number of ideas to boost the data store and counter data sparsity: crowd translation of intelligently selected monolingual data (throughput enhanced by an IME (Input Method Editor) designed specifically for QWERTY keyboard entry for Devanagari scripted languages), back-translation, different regularization techniques, dataset augmentation and early stopping.

...read moreread less

Book Chapter•DOI•

Artistic Multi-character Script Identification Using Iterative Isotropic Dilation Algorithm

[...]

Mridul Ghosh, Sk Md Obaidullah¹, K. C. Santosh², Nibaran Das³, Kaushik Roy⁴ - Show less +1 more•Institutions (4)

Aliah University¹, University of South Dakota², Jadavpur University³, West Bengal State University⁴

21 Dec 2018

TL;DR: A novel iterative isotropic dilation algorithm is proposed here to convert the components into a single component object and promising accuracy has been observed.

...read moreread less

Abstract: In this work, a new problem of script identification named artistic multi-character script identification has been addressed. Two types of datasets of artistic documents/images prepared with Bangla, Devanagari and Roman script have been used: one is real life artistic multi-character script image and another is synthetic artistic multi-character script image. After binarization using Otsu’s algorithm, some character images found to be broken into components. To overcome this, a novel iterative isotropic dilation algorithm is proposed here to convert the components into a single component object. Then two types of features, namely shape based and texture based features have been considered. Discrete Gabor wavelet has been exploited with 2 scales and 4 orientations for texture feature extraction and PCA is used to reduce the dimensionality of the texture feature space. The performance of the proposed algorithm has been tested with different machine learning classifiers and promising accuracy has been observed.

...read moreread less

Proceedings Article•DOI•

RWIL: Robust Writer Identification for Indic Language

[...]

Babu Kumar¹, Parveen Kumar¹, Ambalika Sharma²•Institutions (2)

National Institute of Technology, Srinagar¹, Indian Institute of Technology Roorkee²

01 Jun 2018

TL;DR: A robust model for writer identification for Indic languages is proposed and it is efficient for recognizing and classifying data because of its feature extraction and training at different convolution and pooling stages.

...read moreread less

Abstract: Writer Identification plays an important role in fraud detection while considering the case of unauthorized access to banks and other security checks. Indic Forensic document analysis is also done in Devanagari handwritten languages. We have proposed a robust model for writer identification for Indic languages. It is complex to efficiently extract the words and characters from the Devanagari handwritten document because of overlapping, compound characters, modifiers and touching, etc. The proposed model is efficient for recognizing and classifying data because of its feature extraction and training at different convolution and pooling stages. We have prepared Devanagari (Hindi) dataset of 80 students. The proposed model is trained by using the prepared Hindi dataset and it is not require any domain knowledge for handwriting recognition. The experiments are done on three different languages (Hindi, Kannada and Arabic language) and obtained satisfactory results.

...read moreread less

Journal Article•DOI•

Automatic line-level script identification from handwritten document images - a region-wise classification framework for indian subcontinent

[...]

Sk Md Obaidullah¹, Chayan Halder², K. C. Santosh³, Nibaran Das⁴, Kaushik Roy⁴ - Show less +1 more•Institutions (4)

Aliah University¹, West Bengal State University², University of South Dakota³, Jadavpur University⁴

23 Jan 2018-Malaysian Journal of Computer Science

TL;DR: An automatic approach for line-level handwritten script identification (HSI), considering eight official Indic scripts namely: Bangla, Devanagari, Kannada, Malayalam, Oriya, Roman, Telugu, and Urdu is proposed, and multilayer perceptron (MLP) is found as the best performer.

...read moreread less

Abstract: Script identification is a well-studied problem for automatic processing of document images. Several attempts have been made so far, but it is still far ahead from the complete solution. In this paper, an automatic approach for line-level handwritten script identification (HSI), considering eight official Indic scripts namely: Bangla, Devanagari, Kannada, Malayalam, Oriya, Roman, Telugu, and Urdu is proposed. We consider a 148-dimensional feature vector using: image component fractal dimension, structural and visual appearance, directional stroke, interpolation and Gabor energy based texture features. For classification, we divide the whole script dataset based on different regions of India, to study a region-wise classification performance. Experimentation was carried out using the state-of-the-art classifiers: multilayer perceptron (MLP), support vector machine (SVM), random forest (RF), and fuzzy unordered rule induction algorithm (FURIA). Among all, we found that MLP as the best performer in terms of average accuracy of 98.2%, 99.5%, 99.1%, 99.5%, 99.9%, 98%, 98.9% for eight-script, bi-script, eastern, north, south Indian script groups, scripts with ‘matra’ vs without ‘matra’, and dravidian vs. non-dravidian groups respectively.

...read moreread less

Proceedings Article•DOI•

A Rule Based Light Weight Inflectional Stemmer for Sindhi Devanagari Using Affix Stripping Approach

[...]

Bharti Nathani¹, Nisheeth Joshi¹, Gaurav Purohit¹•Institutions (1)

Banasthali Vidyapith¹

01 Nov 2018

TL;DR: This research paper has proposed a Light weight Inflectional Stemmer using affix stripping approach, for Sindhi Devanagari Script.

...read moreread less

Abstract: Now a day’s web is multilingual and equipped with lot of information. In order to access information in different language we require some Language processing tools such as Stemmer, Part of Speech Tagger etc. Although plenty of different language processing tools are available for some languages, still some of the languages are not getting attention of research and community. Sindhi Language is one of them. This research paper we have proposed a Light weight Inflectional Stemmer using affix stripping approach, for Sindhi Devanagari Script.

...read moreread less