Showing papers on "Devanagari published in 2019"

PDF

Open Access

Journal Article•DOI•

Character and numeral recognition for non-Indic and Indic scripts: a survey

[...]

Munish Kumar, Manish Kumar Jindal¹, Rajendra Kumar Sharma², Simpel Rani Jindal•Institutions (2)

Panjab University, Chandigarh¹, Thapar University²

01 Dec 2019-Artificial Intelligence Review

TL;DR: A comprehensive survey on character and numeral recognition of non-Indic and Indic scripts is presented and major challenges/issues for character/numeral recognition are examined.

...read moreread less

Abstract: A collection of different scripts is employed in writing languages throughout the world. Character and numeral recognition of a particular script is a key area in the field of pattern recognition. In this paper, we have presented a comprehensive survey on character and numeral recognition of non-Indic and Indic scripts. Many researchers have done work on character and numeral recognition from the most recent couple of years. In perspective of this, few strategies for character/numeral have been developed so far. There are an immense number of frameworks available for printed and handwritten character recognition for non-Indic scripts. But, only a limited number of systems are offered for character/numeral recognition of Indic scripts. However, few endeavors have been made on the recognition of Bangla, Devanagari, Gurmukhi, Kannada, Oriya and Tamil scripts. In this paper, we have additionally examined major challenges/issues for character/numeral recognition. The efforts in two directions (non-Indic and Indic scripts) are reflected in this paper. When compared with non-Indic scripts, the research on character recognition of Indic scripts has not achieved that perfection yet. The techniques used for recognition of non-Indic scripts may be used for recognition of Indic scripts (printed/handwritten text) and vice versa to improve the recognition rates. It is also noticed that the research in this field is quietly thin and still more research is to be done, particularly in the case of handwritten Indic scripts documents.

...read moreread less

58 citations

Journal Article•DOI•

RNN based online handwritten word recognition in Devanagari and Bengali scripts using horizontal zoning

[...]

Rajib Ghosh¹, Chirumavila Vamshi¹, Prabhat Kumar¹•Institutions (1)

National Institute of Technology, Patna¹

01 Aug 2019-Pattern Recognition

TL;DR: Experimental results show that the proposed RNN based system is superior over HMM achieving 99.50% and 95.24% accuracies in Devanagari and Bengali scripts respectively and outperforms existing HMM based systems in the literature as well.

...read moreread less

52 citations

Journal Article•DOI•

Multiobjective optimization for recognition of isolated handwritten Indic scripts

[...]

Anisha Gupta¹, Ritesh Sarkhel², Nibaran Das³, Mahantapas Kundu³•Institutions (3)

Heritage Institute of Technology¹, Ohio State University², Jadavpur University³

01 Dec 2019-Pattern Recognition Letters

TL;DR: A modified opposition-based multiobjective Harmony Search algorithm has been proposed to select the local regions from handwritten character images based on their rankings in a three-dimensional pareto-front based on recognition accuracy and redundancy.

...read moreread less

31 citations

Journal Article•DOI•

An efficient Devanagari character classification in printed and handwritten documents using SVM

[...]

Shalini Puri¹, Satya Prakash Singh¹•Institutions (1)

Birla Institute of Technology, Mesra¹

01 Jan 2019-Procedia Computer Science

TL;DR: An efficient Devanagari character classification model using SVM for printed and handwritten mono-lingual Hindi, Sanskrit and Marathi documents, which first preprocesses the image, segments it through projection profiles, removes shirorekha, extracts features, and then classifies the shirorikha-less characters into pre-defined character categories.

...read moreread less

30 citations

Journal Article•DOI•

Devanagari ancient documents recognition using statistical feature extraction techniques

[...]

Sonika Rani Narang¹, Manish Kumar Jindal², Munish Kumar³•Institutions (3)

DAV College, Chandigarh¹, Panjab University, Chandigarh², Punjab Technical University³

01 Jun 2019-Sadhana-academy Proceedings in Engineering Sciences

TL;DR: Various feature extraction and classification techniques are considered and compared to the recognition of basic characters segmented from Devanagari ancient manuscripts and authors have achieved 88.95% recognition accuracy.

...read moreread less

Abstract: Devanagari ancient document recognition process is drawing a lot of consideration from researchers nowadays. These ancient documents contain a wealth of knowledge. However, these documents are not available to all because of their fragile condition. A Devanagari ancient manuscript recognition system is designed for digital archiving. This system includes image binarization, character segmentation and recognition phases. It incorporates automatic recognition of scanned and segmented characters. Segmented characters may include basic characters (vowels and consonants), modifiers (matras) and various compound characters (characters formed by joining more than one basic characters). In this paper, handwritten Devanagari ancient manuscripts recognition system has been presented using statistical features extraction techniques. In feature extraction phase, intersection points, open endpoints, centroid, horizontal peak extent and vertical peak extent features are extracted. For classification, Convolutional Neural Network, Neural Network, Multilayer Perceptron, RBF-SVM and random forest techniques are considered in this work. Various feature extraction and classification techniques are considered and compared to the recognition of basic characters segmented from Devanagari ancient manuscripts. A data set, of 6152 pre-segmented samples of Devanagari ancient documents, is considered for experimental work. Authors have achieved 88.95% recognition accuracy using a combination of all features and a combination of all classifiers considered in this work by a simple majority voting scheme.

...read moreread less

24 citations

Proceedings Article•DOI•

Handwritten Tamil Character Recognition UsingDeep Learning

[...]

M. A. Pragathi¹, K. Priyadarshini¹, S. Saveetha¹, A. Shavar Banu¹, K. O. Mohammed Aarif¹ - Show less +1 more•Institutions (1)

C. Abdul Hakeem College of Engineering & Technology¹

30 Mar 2019

TL;DR: This paper proposes a character recognition system for handwritten Tamil characters using deep learning using VGG 16 approaches and gives efficiency of 94.52% on datasets.

...read moreread less

Abstract: Character recognition is developed for various patterns of handwritten or optical characters to be recognized digitally. There are many Tamil literatures in undigitized form. Using deep learning the undigitized Tamil literatures can be converted into readable format. Many researches were carried on character recognition using deep learning for languages like Arabic, Devanagari, Telugu, etc… Due to the larger category set and confusion in similarities between handwritten characters Tamil character recognition is a challenge. In this paper, we propose a character recognition system for handwritten Tamil characters using deep learning. Here, VGG 16 approaches is carried out. The proposed work gives efficiency of 94.52% on our datasets.

...read moreread less

24 citations

Journal Article•DOI•

Devanagari ancient character recognition using DCT features with adaptive boosting and bootstrap aggregating

[...]

Sonika Rani Narang¹, Manish Kumar Jindal², Munish Kumar³•Institutions (3)

DAV College, Chandigarh¹, Panjab University, Chandigarh², Punjab Technical University³

08 Mar 2019

TL;DR: A system for improvement in recognition of Devanagari ancient manuscripts using AdaBoost and Bagging methodologies and maximum recognition accuracy of 90.70% has been achieved using DCT zigzag features and RBF-SVM classifier.

...read moreread less

Abstract: Devanagari ancient manuscript recognition framework is drawing a lot of considerations from researchers nowadays. Devanagari ancient manuscripts are rare and delicate documents. To exploit the priceless information included in these documents, these documents are being digitized. Optical character recognition process is being used for the recognition of these documents. This paper presents a system for improvement in recognition of Devanagari ancient manuscripts using AdaBoost and Bagging methodologies. Discrete cosine transform (DCT) zigzag is used for feature extraction. Decision tree, Naive Bayes and support vector machine classifiers are used for the recognition of basic characters segmented from Devanagari ancient manuscripts. A dataset of 5484 pre-segmented characters of Devanagari ancient documents is considered for experimental work. Maximum recognition accuracy of 90.70% has been achieved using DCT zigzag features and RBF-SVM classifier. AdaBoost and Bagging ensemble methods are used with the base classifiers to improve the accuracy. Maximum accuracy of 91.70% is achieved for adaptive boosting (AdaBoost) with RBF-SVM. Various parameters for performance measures such as precision, recall, F-measure, false acceptance rate, false rejection rate and RMSE are used for assessing the quality of the ensemble methods.

...read moreread less

19 citations

Proceedings Article•DOI•

Code-Mixed to Monolingual Translation Framework

[...]

Sainik Kumar Mahata¹, Soumil Mandal², Dipankar Das¹, Sivaji Bandyopadhyay¹•Institutions (2)

Jadavpur University¹, SRM University²

12 Dec 2019

TL;DR: This work presents a translation framework that uses a translation-transliteration strategy for translating code-mixed data into their equivalent monolingual instances and reordered using a target language model to convert the output to a more readable form.

...read moreread less

Abstract: The use of multilingualism among the new generation is widespread in the form of code-mixed data on social media, and therefore a robust translation system is required for catering to the novice and monolingual users. In this work, we present a translation framework that uses a translation-transliteration strategy for translating code-mixed data into their equivalent monolingual instances. One of the goals of this work is to translate a code-mixed source (written in Roman script) to a Bengali target (written in Devanagari script), where the source may contain English, along with transliterated Bengali. Finally, to convert the output to a more readable form, it is reordered using a target language model. The decisive advantage of the proposed framework is that it does not require a code-mixed to monolingual parallel corpus for training and decoding. On testing the framework, it achieved BLEU and TER scores of 16.47 and 55.45, respectively. Since the proposed framework deals with various sub-modules, we dive deeper into the importance of each of them, analyze the errors and finally, discuss some improvement strategies.

...read moreread less

15 citations

Journal Article•DOI•

Robust Character Segmentation and Recognition Schemes for Multilingual Indian Document Images

[...]

Parul Sahare¹, Sanjay B. Dhok¹•Institutions (1)

Visvesvaraya National Institute of Technology¹

04 Mar 2019-Iete Technical Review

TL;DR: In this paper, robust methods for character segmentation and recognition for multilingual (Latin and Devanagari) Indian document images are presented, which are used for text segmentation.

...read moreread less

Abstract: This paper presents robust methods for character segmentation and recognition for multilingual (Latin and Devanagari) Indian document images. The documents degraded over the years because of text d...

...read moreread less

15 citations

Book Chapter•DOI•

Deep Learning for Hindi Text Classification: A Comparison

[...]

Ramchandra Joshi¹, Purvi Goel², Raviraj Joshi²•Institutions (2)

Pune Institute of Computer Technology¹, Indian Institute of Technology Madras²

12 Dec 2019

TL;DR: Translated versions of English data-sets are used to evaluate models based on CNN, LSTM and Attention for text classification of Hindi text and serve as a tutorial for popular text classification techniques.

...read moreread less

Abstract: Natural Language Processing (NLP) and especially natural language text analysis have seen great advances in recent times. Usage of deep learning in text processing has revolutionized the techniques for text processing and achieved remarkable results. Different deep learning architectures like CNN, LSTM, and very recent Transformer have been used to achieve state of the art results variety on NLP tasks. In this work, we survey a host of deep learning architectures for text classification tasks. The work is specifically concerned with the classification of Hindi text. The research in the classification of morphologically rich and low resource Hindi language written in Devanagari script has been limited due to the absence of large labeled corpus. In this work, we used translated versions of English data-sets to evaluate models based on CNN, LSTM and Attention. Multilingual pre-trained sentence embeddings based on BERT and LASER are also compared to evaluate their effectiveness for the Hindi language. The paper also serves as a tutorial for popular text classification techniques.

...read moreread less

14 citations

Proceedings Article•DOI•

A Deep Learning Approach for Optical Character Recognition of Handwritten Devanagari Script

[...]

Brijeshwar Dessai, Amit Patil

05 Jul 2019

TL;DR: Development of Convolutional Neural Network (CNN) based Optical Character Recognition system (OCR) for Handwritten Devanagari Script which is observed to recognize the characters accurately.

...read moreread less

Abstract: Handwritten Character Recognition is one of the most challenging and demanding area of interest for researchers in domains of pattern recognition and image processing. Many researchers have worked with recognition of characters of different languages but there is comparatively less work carried for Devanagari Script. In past few years, however the work carried out in this direction is increasing to a great extent. Handwritten Devanagari Character Recognition is more challenging in comparison to the recognition of the Roman characters. The complexity is mostly due to the presence of a header line known as shirorekha that connects the Devanagari characters to form a word. The presence of this header line makes the segmentation process of characters more difficult. There is uniqueness to the handwriting styles of every individual which adds to the complexity. In this paper, we propose development of Convolutional Neural Network (CNN) based Optical Character Recognition system (OCR) for Handwritten Devanagari Script which is observed to recognize the characters accurately.

...read moreread less

Journal Article•DOI•

Drop flow method: an iterative algorithm for complete segmentation of Devanagari ancient manuscripts

[...]

Sonika Rani Narang¹, Manish Kumar Jindal², Munish Kumar³•Institutions (3)

D.A.V. College, Koraput¹, Panjab University, Chandigarh², Punjab Technical University³

01 Aug 2019-Multimedia Tools and Applications

TL;DR: An iterative character segmentation algorithm is presented for ancient documents in Devanagari script and a new algorithm with the name ‘Drop Flow Method’ is proposed to find the segmentation path between touching components.

...read moreread less

Abstract: One of the major challenges of ancient manuscripts recognition is character segmentation. Because of many distinct features of ancient documents (thick characters, overlapping and touching characters), character segmentation is a very difficult task. Devanagari ancient manuscripts consist of vowels, consonants, modifiers, conjuncts and compound characters. Using existing techniques, segmentation of overlapping and touching characters is problematic. In this paper, an iterative character segmentation algorithm is presented for ancient documents in Devanagari script. At the beginning, the lines are extracted from the ancient documents by dividing the document image into vertical stripes and then using piecewise horizontal projection profiles. After that, these lines are segmented into words using vertical projection profiles and finally, words are segmented in characters using an iterative algorithm. In each iteration, character segmentation is refined. In the present work, we have proposed a new algorithm with the name ‘Drop Flow Method’ to find the segmentation path between touching components. The proposed algorithm can segment touching characters and 96.0% accuracy has been achieved for complete segmentation of Devanagari ancient manuscripts.

...read moreread less

Journal Article•

Devanagari Handwritten Character Recognition Using Neural Network

[...]

Anupama Thakur, Amrit Kaur

25 Oct 2019-International Journal of Scientific & Technology Research

TL;DR: This work proposes a new strategy for recognition of printed Hindi characters in Devanagari script using hybrid technique which contains k-NN along with neural networks, which indicates that the proposed approach is better as compared to the techniques used in existing method.

...read moreread less

Abstract: Optical Character Recognition is a framework which can translate the images from manually handwritten or printed structure to machine-editable structure. Devanagari script is utilized in numerous Indian dialects like Hindi, Nepali, Marathi, Sindhi and so on. This script structures the establishment of the language like Hindi which is the national and most generally communicated language in India. In current scenario, there is a tremendous interest of accumulating the data in advanced configuration accessible in paper archives and after that later reusing this data by a search procedure. In this paper, we propose a new strategy for recognition of printed Hindi characters in Devanagari script. In this research, the main focus is given towards the recognition of the individual consonant and vowel which can be later reached out to perceive complex inferred words. In this undertaking the fundamental accentuation is given towards the recognition of the individual consonant and vowel. In this project, different pre-processing operations like features extraction, segmentations and classification methods have been studied and implemented to design a sophisticated OCR system for Hindi. In previous research, the classification K-NN technique have been implemented, but in proposed work, we have used hybrid technique which contains k-NN along with neural networks. Proposed approach provides 97.4% recognition rate as compared to 94.5% for existing techniques, which indicates that the proposed approach is better as compared to the techniques used in existing method.

...read moreread less

Book Chapter•DOI•

Deep ConvNet with Different Stochastic Optimizations for Handwritten Devanagari Character

[...]

Mahesh Jangid¹, Sumit Srivastava¹•Institutions (1)

Manipal University Jaipur¹

01 Jan 2019

TL;DR: A deep learning model to recognize the handwritten Devanagari characters, which is the most popular language in India, is presented and it is discerned that the proposed model gives a 96.00% recognition accuracy with fifty epochs.

...read moreread less

Abstract: In this paper, we present a deep learning model to recognize the handwritten Devanagari characters, which is the most popular language in India. This model aims to use the deep convolutional neural networks (DCNN) to eliminate the feature extraction process and the extraction process with the automated feature learning by the deep convolutional neural networks. It also aims to use the different optimizers with deep learning where the deep convolution neural network was trained with different optimizers to observe their role in the enhancement of recognition rate. It is discerned that the proposed model gives a 96.00% recognition accuracy with fifty epochs. The proposed model was trained on the standard handwritten Devanagari characters dataset.

...read moreread less

Proceedings Article•DOI•

A Roman to Devanagari Back-Transliteration Algorithm based on Harvard-Kyoto Convention

[...]

Jayashree Nair¹, Anand Sadasivan¹•Institutions (1)

Amrita Vishwa Vidyapeetham¹

29 Mar 2019

TL;DR: This study focuses on development of a rule-based, grapheme model character alignment back-transliteration algorithm of Sanskrit script, transcribed ASCII-encoded English to Devanagari, pursuant to the Harvard-Kyoto (HK) convention and appraises the complexity of the pseudo-coded algorithm.

...read moreread less

Abstract: Transliteration is the process to transcribe a script of one language into another, while, backward or back transliteration is converting back the transliterated text into its original script. The highly technical phonetic system of Sanskrit seems to have made the preparation of transliteration scheme quite arduous. This study is focused on development of a rule-based, grapheme model character alignment back-transliteration algorithm of Sanskrit script, transcribed ASCII(American Standard Code for Information Interchange)-encoded English to Devanagari, pursuant to the Harvard-Kyoto (HK) convention. Accordingly, the paper presents the context of the utility for such an algorithm. It also describes the various standard schemes available for transcribing Devanagari into Roman. A survey on the evolution of scripts in India suggests the Brahmi script as the foundation for the origin of variants like Devanagari. Since the nineteenth century, various transliteration schemes based on Roman script have evolved. The International Alphabet of Sanskrit Transliteration (IAST) schemes used diacritics to disambiguate phonetic similarities and seem to have induced much strenuous venture for the non-professionals. The ASCII-based, HK and its variant, Indian Language Transliteration (ITRANS) schemes do not use diacritics and hence accounted to be the simplest. Our rationale for the use of HK scheme, stem from its prime traits of Sanskrit Unicode encoding. We have also explained the Sanskrit alphabet and its classifications, which are incorporated into our proffered process. We appraise the complexity of our pseudo-coded algorithm and finally, we propose an extension of this work in the creation of similar tools, for other Indian languages that use the Devanagari script, such as Hindi and Marathi.

...read moreread less

Posted Content•

Fonts-2-Handwriting: A Seed-Augment-Train framework for universal digit classification

[...]

Vinay Uday Prabhu, Sanghyun Han, Dian Ang Yap, Mihail Douhaniaris, Preethi Seshadri, John Whaley - Show less +2 more

16 May 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: A Seed-Augment-Train/Transfer framework that contains a synthetic seed image dataset generation procedure for languages with different numeral systems using freely available open font file datasets establishes not only an interesting nexus between the font-datasets-world and transfer learning but also provides a recipe for universal-digit classification in any script.

...read moreread less

Abstract: In this paper, we propose a Seed-Augment-Train/Transfer (SAT) framework that contains a synthetic seed image dataset generation procedure for languages with different numeral systems using freely available open font file datasets. This seed dataset of images is then augmented to create a purely synthetic training dataset, which is in turn used to train a deep neural network and test on held-out real world handwritten digits dataset spanning five Indic scripts, Kannada, Tamil, Gujarati, Malayalam, and Devanagari. We showcase the efficacy of this approach both qualitatively, by training a Boundary-seeking GAN (BGAN) that generates realistic digit images in the five languages, and also quantitatively by testing a CNN trained on the synthetic data on the real-world datasets. This establishes not only an interesting nexus between the font-datasets-world and transfer learning but also provides a recipe for universal-digit classification in any script.

...read moreread less

Book Chapter•DOI•

Primitive Feature-Based Optical Character Recognition of the Devanagari Script

[...]

Richa Sharma¹, Tarun Mudgal¹•Institutions (1)

University of Delhi¹

01 Jan 2019

TL;DR: The proposed method gave 93.33% accuracy with 21 fonts used for Hindi, Sanskrit, and Marathi and 72.72% accuracy for handwritten character samples taken from 22 different people from varied age groups for the Ka-Varga—the first five consonants of the Devanagari script.

...read moreread less

Abstract: The Devanagari script forms the backbone of the writing system of several Indian languages including Sanskrit and Hindi This paper proposes a method to recognize a Devanagari character from a digital image using primitive feature information The procedure involves representing each character in terms of the presence and location of primitive features like vertical lines, the frequency, and location of the intersections and the frequency of intersections of character body with Shirorekha (the top horizontal line of a Devanagari character) The classification of the character is done on the basis of the existence and (if present) the location of these features in the glyph (test character) The proposed method gave 9333% accuracy with 21 fonts used for Hindi, Sanskrit, and Marathi and 7272% accuracy for the handwritten character samples taken from 22 different people from varied age groups for the Ka-Varga—the first five consonants of the Devanagari script The method worked better for handwritten samples of younger people (aged 20–25 years) than the older ones (aged 40–50 years)

...read moreread less

Journal Article•DOI•

LSTM Based Lip Reading Approach for Devanagiri Script

[...]

Mahesh S. Patil¹, Satyadhyan Chickerur¹, Anand S. Meti¹, Priyanka M Nabapure¹, Sunaina Mahindrakar¹, Sonali Naik¹, Soumya Kanyal¹ - Show less +3 more•Institutions (1)

KLE Technological University¹

14 Sep 2019

TL;DR: A novel lip-reading solution, which extracts the geometrical shape of lip movement from the video and predicts the words/sentences spoken and is able to predict the words spoken with 77% and 35% accuracy for data set of 3 and 10 words respectively.

...read moreread less

Abstract: Speech Communication in a noisy environment is a difficult and challenging task. Many professionals work in noisy environments like aviation, constructions, or manufacturing, and find it difficult to communicate orally. Such noisy environments need an automated lip-reading system that could be helpful in communicating some instructions and commands. This paper proposes a novel lip-reading solution, which extracts the geometrical shape of lip movement from the video and predicts the words/sentences spoken. An Indian specific language data set is developed which consists of lip movement information captured from 50 persons. This includes students in the age group of 18 to 20 years and faculty in the age group of 25 to 40 years . All have spoken a paragraph of 58 words within 10 sentences in Hindi (Devanagari, spoken in India) language which was recorded under various conditions. The implementation consists of facial parts detection, along with Long short term memory’s. The proposed solution is able to predict the words spoken with 77% and 35% accuracy for data set of 3 and 10 words respectively. The sentences are predicted with 20% accuracy, which is encouraging.

...read moreread less

Book Chapter•DOI•

Discrete Cosine Transform-Based Feature Selection for Marathi Numeral Recognition System

[...]

Madhav V. Vaidya¹, Yashwant Joshi¹, Milind Bhalerao¹, Ganesh Pakle¹•Institutions (1)

Shri Guru Gobind Singhji Institute of Engineering and Technology¹

01 Jan 2019

TL;DR: A novel feature extraction and selection method is proposed for the recognition of isolated handwritten Marathi numbers based on one-dimensional Discrete Cosine Transform (1-D DCT) algorithm for reducing the dimensionality of feature space.

...read moreread less

Abstract: Optical character recognition system is hotcake for the researchers since last four decades. Recognition of handwritten Devanagari characters and digits is comparatively a tough task as compared to recognition other scripts like English or Latin. In this manuscript, a novel feature extraction and selection method is proposed for the recognition of isolated handwritten Marathi numbers based on one-dimensional Discrete Cosine Transform (1-D DCT) algorithm for reducing the dimensionality of feature space. The scanned document is preprocessed and segmented to create isolated numerals. Features for each numeral can be calculated after normalizing the numeral image to 32 × 32 size. Based on these reduced features, the numerals are classified into appropriate groups. Database of 6000 numerals size is used for the proposed work. Neural network is used for classification of numerals based on the extracted and selected features. Experimental results show accuracy observed for the method is 90.30%.

...read moreread less

Book Chapter•DOI•

Frequently Used Devanagari Words in Marathi and Pali Language Documents

[...]

Milind Bhalerao¹, S. V. Bonde¹, Madhav V. Vaidya¹•Institutions (1)

Shri Guru Gobind Singhji Institute of Engineering and Technology¹

01 Jan 2019

TL;DR: The proposed system successfully segments out the lines and words of the Marathi and Pali documents and identifies the language of the script under observation using K-NN classifier.

...read moreread less

Abstract: Optical character recognition (OCR) deals with the recognition of printed or handwritten characters. India being the multilingual country, and possessing the historical information in some of the old languages being practiced in India since ancient time, it is obvious that important information is still to be discovered from these ancient available documents. In this paper, we have devised a method which will identify the language of the script under observation. Character recognition itself is a challenging problem because of the variation in the font and size of the characters. In this paper, a scheme is developed for complete OCR for Marathi and Pali languages. The proposed system successfully segments out the lines and words of the Marathi and Pali documents. The proposed system is evaluated on ten Marathi and ten Pali documents comprised of 552 text lines and 6430 words. We obtained the promising results on the line segmentation with an accuracy of 99.25% and 98.6% and for word segmentation 97.6% and 96.5%, respectively, on Marathi and Pali language documents. Using K-NN classifier, the most frequently used words in Marathi and Pali documents are identified.

...read moreread less

Proceedings Article•DOI•

A multiscript gaze-based assistive virtual keyboard

[...]

Hubert Cecotti¹, Yogesh Kumar Meena², Braj Bhushan², Ashish Dutta², Girijesh Prasad³ - Show less +1 more•Institutions (3)

California State University, Fresno¹, Indian Institutes of Technology², Ulster University³

07 Oct 2019

TL;DR: A multiscript gaze based virtual keyboard that can be accessed for people who communicate with the Latin, Bangla, and/or Devanagari scripts is proposed where it is possible to change the layout of the graphical user interface in relation to the script.

...read moreread less

Abstract: The recent development of inexpensive and accurate eye-trackers allows the creation of gazed based virtual keyboards that can be used by a large population of disabled people in developing countries. Thanks to eye-tracking technology, gaze-based virtual keyboards can be designed in relation to constraints related to the gaze detection accuracy and the considered display device. In this paper, we propose a new multimodal multiscript gaze-based virtual keyboard where it is possible to change the layout of the graphical user interface in relation to the script. Traditionally, virtual keyboards are assessed for a single language (e.g. English). We propose a multiscript gaze based virtual keyboard that can be accessed for people who communicate with the Latin, Bangla, and/or Devanagari scripts. We evaluate the performance of the virtual keyboard with two main groups of participants: 28 people who can communicate with both Bangla and English, and 24 people who can communicate with both Devanagari and English. The performance is assessed in relation to the information transfer rate when participants had to spell a sentence using their gaze for pointing to the command, and a dedicated mouth switch for commands selection. The results support the conclusion that the system is efficient, with no difference in terms of information transfer rate between Bangla and Devanagari. However, the performance is higher with English, despite the fact it was the secondary language of the participants.

...read moreread less

Journal Article•DOI•

An approach based on classifier combination for online handwritten text and non-text classification in Devanagari script

[...]

Rajib Ghosh¹, Saurav Shanu¹, Sugandha Ranjan¹, Khusboo Kumari¹•Institutions (1)

National Institute of Technology, Patna¹

01 Aug 2019-Sadhana-academy Proceedings in Engineering Sciences

TL;DR: A method of analysing features of elliptical regions and combining outcomes of classifiers using Dempster–Shafer Theory (DST) is presented to classify online handwritten text and non-text data of any online handwritten document in the most popular Indic script—Devanagari.

...read moreread less

Abstract: In this article, a method of analysing features of elliptical regions and combining outcomes of classifiers using Dempster–Shafer Theory (DST) is presented to classify online handwritten text and non-text data of any online handwritten document in the most popular Indic script—Devanagari. Although a few works exist in this regard in different non-Indic scripts, to our knowledge, no study is available to classify handwritten text and non-text document in online mode in any Indic script. The present method uses various structural and directional features analysed in elliptical regions to extract feature values from strokes of text and non-text data. The features are then studied separately in classification platforms based on Support Vector Machine (SVM) and Hidden Markov Model (HMM). The probabilistic outcomes of these two classification platforms are then combined using DST to improve the system performance. The efficiency of the present system has been measured on a self-generated dataset and it provides promising result.

...read moreread less

Journal Article•DOI•

Indic script identification from handwritten document images

[...]

Pawan Kumar Singh¹, Ram Sarkar¹, Mita Nasipuri¹•Institutions (1)

Jadavpur University¹

22 Mar 2019-International Journal of Intelligent Systems Technologies and Applications

TL;DR: An encouraging outcome confirms the efficacy of customary textural features to handwritten Indic script identification and computes fractal dimension by using segmentation-based fractal texture analysis (SFTA).

...read moreread less

Abstract: Script identification plays an important role in document image processing especially for multilingual environment. This paper hires two conventional textural methods for recognition of the scripts of the handwritten documents inscribed in different Indic scripts. The first method extracts well-known Haralick features from spatial grey-level dependence matrix (SGLDM) and the second method computes fractal dimension by using segmentation-based fractal texture analysis (SFTA). Finally, a 104-element feature vector is constructed from each page image by these two methods. The proposed technique is then evaluated on a total dataset comprising 360 handwritten document pages written in 12 Indian official scripts namely Bangla, Devanagari, Gujarati, Gurumukhi, Kannada, Malayalam, Manipuri, Oriya, Tamil, Telugu, Urdu and Roman. Experimentations using multiple classifiers reveal that multilayer perceptron (MLP) shows the highest identification accuracy of 96.94%. The encouraging outcome confirms the efficacy of customary textural features to handwritten Indic script identification.

...read moreread less

Book Chapter•DOI•

Cross-Modal Processing of Orthography-Phonology Interface in Hindi-English Bilinguals

[...]

Ramesh Kumar Mishra¹•Institutions (1)

University of Hyderabad¹

01 Jan 2019

TL;DR: The chapter emphasizes the point that much psycholinguistic work needs to be done in Indian languages that link orthographic, phonological and visual processing.

...read moreread less

Abstract: The chapter presents an overview of Devanagari script. Later it presents experimental data from one eye tracking experiment that examined interface between spoken words and orthographic activation. The chapter also touches upon discussion of dyslexia in the Indian setting. The chapter emphasizes the point that much psycholinguistic work needs to be done in Indian languages that link orthographic, phonological and visual processing.

...read moreread less

Book Chapter•DOI•

Challenges in Learning Akshara orthographies for Second language Learners

[...]

Adeetee Bhide¹, Charles A. Perfetti¹•Institutions (1)

University of Pittsburgh¹

01 Jan 2019

TL;DR: This chapter describes Marathi’s orthographic and phonological systems and describes the consequences of inconsistent schwa expression for speakers’ awareness of the vowel and summarizes experiments with both native Marathi speakers and second language learners.

...read moreread less

Abstract: Marathi is a language derived from Sanskrit and spoken in the state of Maharashtra, India. In this chapter, we describe Marathi’s orthographic and phonological systems. The script used to write Marathi is Devanagari. Because Devanagari is described in the Hindi chapter (Singh & Sumathi), we focus our description on how it is implemented in Marathi and how this differs from how Devanagari is implemented in Hindi. In addition to describing the orthographic and phonological systems separately, we explain how Marathi’s orthography codes phonology. Written Marathi is generally transparent. However, there are some exceptions including graphs that code multiple phonemes and an inconsistent expression of the schwa vowel. After describing the language and its writing system, we summarize experiments with both native Marathi speakers and second language learners of Marathi. For native speakers, we explain the consequences of inconsistent schwa expression for speakers’ awareness of the vowel. For second language learners, we review experiments comparing different methods for teaching Marathi’s orthographic and phonological systems.

...read moreread less

Dissertation•DOI•

Deep Learning Based Real Time Devanagari Character Recognition

[...]

Aseem Chhabra

01 Jan 2019

TL;DR: In this article, a deep learning-based model was proposed to recognize Devanagari script characters in real time by analyzing hand movements, which can be used for character recognition.

...read moreread less

Abstract: The revolutionization of the technology behind optical character recognition (OCR) has helped it to become one of those technologies that have found plenty of uses in the entire industrial space. Today, the OCR is available for several languages and have the capability to recognize the characters in real time, but there are some languages for which this technology has not developed much. All these advancements have been possible because of the introduction of concepts like artificial intelligence and deep learning. Deep Neural Networks have proven to be the best choice when it comes to a task involving recognition. There are many algorithms and models that can be used for this purpose. This project tries to implement and optimize a deep learning-based model which will be able to recognize Devanagari script’s characters in real time by analyzing the hand movements.

...read moreread less

Book Chapter•DOI•

Keyword Spotting in Historical Devanagari Manuscripts by Word Matching

[...]

B. Sharada¹, S. N. Sushma¹, Bharathlal¹•Institutions (1)

University of Mysore¹

01 Jan 2019

TL;DR: An efficient keyword spotting approach for handwritten Devanagari documents is presented and experiments are conducted on historical datasets consisting of manuscripts by Oriental Research Institute @ Mysuru.

...read moreread less

Abstract: Huge quantities of ancient manuscripts are there in various National archives. Digitization of these manuscripts and historical documents is very important for making its access efficiently. Handwritten keyword spotting is very challenging task for Devanagari documents due to large variations in writing styles. Keyword spotting of unconstrained offline handwritten documents is performed based on matching scheme of word images. This paper presents an efficient keyword spotting approach for handwritten Devanagari documents. Experiments are conducted on historical datasets consisting of manuscripts by Oriental Research Institute @ Mysuru.

...read moreread less

Book Chapter•DOI•

Offline Handwritten Devanagari Character Identification

[...]

Gita Sinha, Shailja Sharma

19 Dec 2019

TL;DR: Manuscript is written by the use of paper and then converting to an image via scanner, identify handwritten characters as of the image is well-known as off-line handwritten character identification is a demanding work due to the fact that each author will have diverse style of writing and all scripts have their own character set and complexities to write.

...read moreread less

Abstract: Handwritten character identification is a most torrid area of research where countless researchers have presented their work and is still an area less than research to accomplish higher identification accuracy. In earlier period acquisition, storing and exchanging information in type of handwritten script was the well-situated way and is still widespread as a convenient medium in the era of digital equipment. As advanced technology like tablet has been used and many comparable devices that allows humans to key in data in form of handwriting character. Manuscript is written by the use of paper and then converting to an image via scanner, identify handwritten characters as of the image is well-known as off-line handwritten character identification is a demanding work due to the fact that each author will have diverse style of writing and all scripts have their own character set and complexities to write.

...read moreread less

Book Chapter•DOI•

Toward Recognition and Classification of Hindi Handwritten Document Image

[...]

Shalini Puri, Satya Prakash Singh

01 Jan 2019

TL;DR: A new idea of offline Hindi handwritten document classification is proposed, which first recognizes and classifies the character images, and then classifying the document image into the predefined category, putting a step ahead in the direction of automatic document image classification.

...read moreread less

Abstract: With the increased demand of digitization of Indic scripts in today’s world, many Devanagari printed and handwritten text recognition and extraction techniques have been developed and are used in industries, corporate, and institutional domain areas. Because of the script and character structure difficulties, and handwritten content-based criticalities, Hindi handwriting processing is considered as a big bottleneck in recognition systems. This paper introduces NMC handwriting types and complexity evaluators for Hindi language. The inherent challenges of handwriting are discussed further. Although several Hindi-based handwritten character recognizers have been developed, they are limited to the segmentation and identification of character images only. So, this paper proposes a new idea of offline Hindi handwritten document classification, which first recognizes and classifies the character images, and then classifies the document image into the predefined category. In support of this concept, this paper provides a case study using a set of Hindi handwritten documents and shows their segmentation and classification results. The proposed system puts a step ahead in the direction of automatic document image classification.

...read moreread less

Book Chapter•DOI•

Soft Clustering for Segmenting Touching Characters in Printed Scripts

[...]

Keshab Nath¹, Swarup Roy²•Institutions (2)

North Eastern Hill University¹, Sikkim University²

01 Jan 2019

TL;DR: This paper explores the effectiveness of fuzzy, rough, and rough fuzzy k-means clustering to segment touching characters in Devanagari, Assamese, and Bangla printed scripts and reveals that soft k-Means are an effective alternative method for segmenting touching characters.

...read moreread less

Abstract: Segmentation of characters from the printed script is an important preprocessing step in automatic Optical Character Recognition (OCR). The performances of the various machine learning algorithms depend on the results of segmentation of the characters. The situation is more challenging when the scripts contain touching characters. Touching characters are predominant in different Indian scripts like Assamese, Bangla, Devanagari, Oriya, Gurmukhi, and many others. In such cases, the accuracy of an OCR system depends on the quality of segmentation of touching characters. In this paper, we explore the effectiveness of fuzzy, rough, and rough fuzzy k-means clustering to segment touching characters. We use different compound characters dataset from Devanagari, Assamese, and Bangla printed scripts for experimentation. Our results reveal that soft k-means are an effective alternative method for segmenting touching characters.

...read moreread less