scispace - formally typeset
Search or ask a question

Showing papers on "Handwriting recognition published in 2021"


Journal ArticleDOI
TL;DR: An automatic handwriting recognition model based on convolutional neural networks (CNN) is proposed, achieving accuracies of 97% and 88% on the AHCD dataset and the Hijja dataset, respectively, outperforming other models in the literature.
Abstract: Automatic handwriting recognition is an important component for many applications in various fields It is a challenging problem that has received a lot of attention in the past three decades Research has focused on the recognition of Latin languages’ handwriting Fewer studies have been done for the Arabic language In this paper, we present a new dataset of Arabic letters written exclusively by children aged 7–12 which we call Hijja Our dataset contains 47,434 characters written by 591 participants In addition, we propose an automatic handwriting recognition model based on convolutional neural networks (CNN) We train our model on Hijja, as well as the Arabic Handwritten Character Dataset (AHCD) dataset Results show that our model’s performance is promising, achieving accuracies of 97% and 88% on the AHCD dataset and the Hijja dataset, respectively, outperforming other models in the literature

93 citations


Proceedings ArticleDOI
01 Jun 2021
TL;DR: In this article, a cross-modal translation pre-text task for self-supervised feature learning is proposed, where vectorization and rasterization are used to map image space to vector coordinates and vector coordinates to image space, respectively.
Abstract: Self-supervised learning has gained prominence due to its efficacy at learning powerful representations from unlabelled data that achieve excellent performance on many challenging downstream tasks. However, supervision-free pre-text tasks are challenging to design and usually modality specific. Although there is a rich literature of self-supervised methods for either spatial (such as images) or temporal data (sound or text) modalities, a common pretext task that benefits both modalities is largely missing. In this paper, we are interested in defining a self-supervised pre-text task for sketches and handwriting data. This data is uniquely characterised by its existence in dual modalities of rasterized images and vector coordinate sequences. We address and exploit this dual representation by proposing two novel cross-modal translation pre-text tasks for self-supervised feature learning: Vectorization and Rasterization. Vectorization learns to map image space to vector coordinates and rasterization maps vector coordinates to image space. We show that our learned encoder modules benefit both raster-based and vector-based downstream approaches to analysing hand-drawn data. Empirical evidence shows that our novel pre-text tasks surpass existing single and multi-modal self-supervision methods.

49 citations


Journal ArticleDOI
TL;DR: A critical review of the related significant aspects is provided and an overview of existing applications of deep learning in computational visual perception is included, which shows that there is a significant improvement in the accuracy using dropout and data augmentation.
Abstract: Computational visual perception, also known as computer vision, is a field of artificial intelligence that enables computers to process digital images and videos in a similar way as biological vision does. It involves methods to be developed to replicate the capabilities of biological vision. The computer vision’s goal is to surpass the capabilities of biological vision in extracting useful information from visual data. The massive data generated today is one of the driving factors for the tremendous growth of computer vision. This survey incorporates an overview of existing applications of deep learning in computational visual perception. The survey explores various deep learning techniques adapted to solve computer vision problems using deep convolutional neural networks and deep generative adversarial networks. The pitfalls of deep learning and their solutions are briefly discussed. The solutions discussed were dropout and augmentation. The results show that there is a significant improvement in the accuracy using dropout and data augmentation. Deep convolutional neural networks’ applications, namely, image classification, localization and detection, document analysis, and speech recognition, are discussed in detail. In-depth analysis of deep generative adversarial network applications, namely, image-to-image translation, image denoising, face aging, and facial attribute editing, is done. The deep generative adversarial network is unsupervised learning, but adding a certain number of labels in practical applications can improve its generating ability. However, it is challenging to acquire many data labels, but a small number of data labels can be acquired. Therefore, combining semisupervised learning and generative adversarial networks is one of the future directions. This article surveys the recent developments in this direction and provides a critical review of the related significant aspects, investigates the current opportunities and future challenges in all the emerging domains, and discusses the current opportunities in many emerging fields such as handwriting recognition, semantic mapping, webcam-based eye trackers, lumen center detection, query-by-string word, intermittently closed and open lakes and lagoons, and landslides.

34 citations


Journal ArticleDOI
TL;DR: Two types of neural networks, known as deep neural networks in its expansion form, a convolutional neural network (CNN) and an auto-encoder, are implemented and by using a new combination of CNN layers one can obtain improved results in classifying Farsi digits.
Abstract: Handwriting recognition remains a challenge in the machine vision field, especially in optical character recognition (OCR). The OCR has various applications such as the detection of handwritten Farsi digits and the diagnosis of biomedical science. In expanding and improving quality of the subject, this research focus on the recognition of Farsi Handwriting Digits and illustration applications in biomedical science. The detection of handwritten Farsi digits is being widely used in most contexts involving the collection of generic digital numerical information, such as reading checks or digits of postcodes. Selecting an appropriate classifier has become an issue highlighted in the recognition of handwritten digits. The paper aims at identifying handwritten Farsi digits written with different handwritten styles. Digits are classified using several traditional methods, including K-nearest neighbor, artificial neural network (ANN), and support vector machine (SVM) classifiers. New features of digits, namely, geometric and correlation-based features, have demonstrated to achieve better recognition performance. A noble class of methods, known as deep neural networks (DNNs), is also used to identify handwritten digits through machine vision. Here, two types of introduce its expansion form, a convolutional neural network (CNN) and an auto-encoder, are implemented. Moreover, by using a new combination of CNN layers one can obtain improved results in classifying Farsi digits. The performances of the DNN-based and traditional classifiers are compared to investigate the improvements in accuracy and calculation time. The SVM shows the best results among the traditional classifiers, whereas the CNN achieves the best results among the investigated techniques. The ANN offers better execution time than the SVM, but its accuracy is lower. The best accuracy among the traditional classifiers based on all investigated features is 99.3% accuracy obtained by the SVM, and the CNN achieves the best overall accuracy of 99.45%.

30 citations


Book ChapterDOI
05 Sep 2021
TL;DR: In this article, a Neural Network based Handwritten Text Recognition (HTR) model is proposed to recognize full pages of handwritten or printed text without image segmentation, which can extract text present in an image and then sequence it correctly without imposing any constraints regarding orientation, layout and size of text and nontext.
Abstract: We present a Neural Network based Handwritten Text Recognition (HTR) model architecture that can be trained to recognize full pages of handwritten or printed text without image segmentation. Being based on Image to Sequence architecture, it can extract text present in an image and then sequence it correctly without imposing any constraints regarding orientation, layout and size of text and non-text. Further, it can also be trained to generate auxiliary markup related to formatting, layout and content. We use character level vocabulary, thereby enabling language and terminology of any subject. The model achieves a new state-of-art in paragraph level recognition on the IAM dataset. When evaluated on scans of real world handwritten free form test answers - beset with curved and slanted lines, drawings, tables, math, chemistry and other symbols - it performs better than all commercially available HTR cloud APIs. It is deployed in production as part of a commercial web application.

26 citations


Journal ArticleDOI
TL;DR: A fully convolution based deep network architecture for cursive handwriting recognition from line level images that has fewer parameters and takes less training and testing time, making it suitable for low-resource and environment-friendly deployment.
Abstract: Recognition of cursive handwritten images has advanced well with recent recurrent architectures and attention mechanism. Most of the works focus on improving transcription performance in terms of Character Error Rate (CER) and Word Error Rate (WER). Existing models are too slow to train and test networks. Furthermore, recent studies have recommended models be not only efficient in terms of task performance but also environmentally friendly in terms of model carbon footprint. Reviewing the recent state-of-the-art models, it recommends considering model training and retraining time while designing. High training time increases costs not only in terms of resources but also in carbon footprint. This becomes challenging for handwriting recognition model with popular recurrent architectures. It is truly critical since line images usually have a very long width resulting in a longer sequence to decode. In this work, we present a fully convolution based deep network architecture for cursive handwriting recognition from line level images. The architecture is a combination of 2-D convolutions and 1-D dilated non causal convolutions with Connectionist Temporal Classification (CTC) output layer. This offers a high parallelism with a smaller number of parameters. We further demonstrate experiments with various re-scaling factors of the images and how it affects the performance of the proposed model. A data augmentation pipeline is further analyzed while model training. The experiments show our model, has comparable performance on CER and WER measures with recurrent architectures. A comparison is done with state-of-the-art models with different architectures based on Recurrent Neural Networks (RNN) and its variants. The analysis shows training performance and network details of three different dataset of English and French handwriting. This shows our model has fewer parameters and takes less training and testing time, making it suitable for low-resource and environment-friendly deployment.

25 citations


Journal Article
TL;DR: A framework for automatic differentiation with weighted finite-state transducers (WFSTs) allowing them to be used dynamically at training time and a convolutional WFST layer which maps lower-level representations to higher- level representations and can be used as a drop-in replacement for a traditional convolution.
Abstract: We introduce a framework for automatic differentiation with weighted finite-state transducers (WFSTs) allowing them to be used dynamically at training time. Through the separation of graphs from operations on graphs, this framework enables the exploration of new structured loss functions which in turn eases the encoding of prior knowledge into learning algorithms. We show how the framework can combine pruning and back-off in transition models with various sequence-level loss functions. We also show how to learn over the latent decomposition of phrases into word pieces. Finally, to demonstrate that WFSTs can be used in the interior of a deep neural network, we propose a convolutional WFST layer which maps lower-level representations to higher-level representations and can be used as a drop-in replacement for a traditional convolution. We validate these algorithms with experiments in handwriting recognition and speech recognition.

21 citations


Proceedings ArticleDOI
04 Jul 2021
TL;DR: The Computer Science Vocabulary Database (CSVD) as mentioned in this paper contains 15,000 images of handwritten French word images with educational vocabulary written by adolescent learners and used for the training and testing of word recognition systems.
Abstract: Handwriting recognition still considered a difficult problem statement. In this paper, we present a new database containing educational vocabulary written by adolescent’s learners. 200 Moroccan secondary school students whose age varies between 14 and 16 years participated in the development of the first version of the Computer Science Vocabulary Database (CSVD). Our database contains a set of 15000 images. Word images are scanned and converted to image format to prepare them for subsequent processing steps. Preprocessing techniques were applied to the images to decrease the variability of colors, contrast, and brightness often presented in handwritten word images. This database is designed for the training and testing of handwritten French word recognition systems. Furthermore, the CSVD database can constitute an opportunity for other researchers to evaluate their recognition methods and systems.

19 citations


Journal ArticleDOI
Hang Guo1, Ji Wan1, Haobin Wang1, Hanxiang Wu1, Chen Xu1, Liming Miao1, Mengdi Han1, Haixia Zhang1 
01 Apr 2021
TL;DR: In this article, an intelligent human-machine interaction interface based on a triboelectric nanogenerator using the horizontal-vertical symmetrical electrode array can be recorded without external energy supply Combined with supervised machine learning methods, it can successfully recognize handwritten English letters, Chinese characters and Arabic numerals.
Abstract: Handwritten signatures widely exist in our daily lives The main challenge of signal recognition on handwriting is in the development of approaches to obtain information effectively External mechanical signals can be easily detected by triboelectric nanogenerators which can provide immediate opportunities for building new types of active sensors capable of recording handwritten signals In this work, we report an intelligent human-machine interaction interface based on a triboelectric nanogenerator Using the horizontal-vertical symmetrical electrode array, the handwritten triboelectric signal can be recorded without external energy supply Combined with supervised machine learning methods, it can successfully recognize handwritten English letters, Chinese characters, and Arabic numerals The principal component analysis algorithm preprocesses the triboelectric signal data to reduce the complexity of the neural network in the machine learning process Further, it can realize the anticounterfeiting recognition of writing habits by controlling the samples input to the neural network The results show that the intelligent human-computer interaction interface has broad application prospects in signature security and human-computer interaction

18 citations


Proceedings ArticleDOI
21 May 2021
TL;DR: In this paper, a CNN based on the kara-model was used to classify handwritten digit images with RMSprop optimizer for optimizing the model, which achieved 99.06% of training accuracy and 98.80% of testing accuracy with epoch 10.
Abstract: In the era of research, pattern recognition is one of the most famous and widely used area in the field of research work. There are various types of patterns are available for the researches like: audio, video, handwritten digit images and handwritten characters images etc. In this paper, we concentrate in the field of handwritten digit recognition for classification of patterns. We have used famous handwritten digit datasets named as MNIST, which is collection of 70000 images. Many of machine learning and deep learning techniques have been already used by the researches for handwritten digit recognition like Support Vector Machine (SVM), RFC, K-nearest Neighbor (K-NN), Multilayer Perceptron (MLP), Convolutional Neural Network (CNN) etc. In this research work, we have suggested CNN as deep learning technique on keras for MNIST handwritten digit recognition and compare the performance of CNN with SVM and KNN. The proposed CNN based on keras model used to classify handwritten digit images with RMSprop optimizer for optimizing the model. The main contribution of this research work is to increase the convolutional layer with pooling and dropout and also tuned the model using filter, kernel size and number of neurons. The proposed CNN model achieves 99.06% of training accuracy and 98.80% of testing accuracy with epoch 10. Experiment results reveals that proposed CNN is more effective compare to other techniques.

18 citations


Journal ArticleDOI
TL;DR: This article presents mmWrite, the first high-precision passive handwriting tracking system using a single commodity millimeter-wave (mmWave) radio, and promises ubiquitous handwriting tracking for new applications in the field of human–computer interactions.
Abstract: In the era of pervasively connected and sensed Internet of Things, many of our interactions with machines have been shifted from conventional computer keyboards and mouses to hand gestures and writing in the air. While gesture recognition and handwriting recognition have been well studied, many new methods are being investigated to enable pervasive handwriting tracking. Most of the existing handwriting tracking systems either require cameras and handheld sensors or involve dedicated hardware restricting user convenience and the scale of usage. In this article, we present mmWrite , the first high-precision passive handwriting tracking system using a single commodity millimeter-wave (mmWave) radio. Leveraging the short wavelength and large bandwidth of 60-GHz signals and the radar-like capabilities enabled by the large phased array, mmWrite transforms any flat region into an interactive writing surface that supports handwriting tracking at millimeter accuracy. MmWrite employs an end-to-end pipeline of signal processing to enhance the range and spatial resolution limited by the hardware, boost the coverage, and suppress interference from backgrounds and irrelevant objects. We implement and evaluate mmWrite on a commodity 60-GHz device. The experimental results show that mmWrite can track a finger/pen with a median error of 2.8 mm and thus can reproduce handwritten characters as small as 1 cm $\times $ 1 cm, with a coverage of up to 8 m2 supported. With minimal infrastructure needed, mmWrite promises ubiquitous handwriting tracking for new applications in the field of human–computer interactions.

Journal ArticleDOI
TL;DR: This research work predicts gender from handwriting using the landmarks of differences between the two genders using the shape or visual appearance of the handwriting for extracting features ofThe handwriting such as slanteness (direction), area, area (no of pixels occupied by text), perimeter (length of edges), etc.
Abstract: Handwriting recognition is used for the prediction of various demographic traits such as age, gender, nationality, etc. Out of all the applications gender prediction is mainly admired topic among researchers. The relation between gender and handwriting can be seen from the physical appearance of the handwriting. This research work predicts gender from handwriting using the landmarks of differences between the two genders. We use the shape or visual appearance of the handwriting for extracting features of the handwriting such as slanteness (direction), area (no of pixels occupied by text), perimeter (length of edges), etc. Classification is carried out using the Support Vector Machine (SVM) as a classifier which transforms the nonlinear problem into linear using its kernel trick, logistic regression, KNN and at the end to enhance the classification rates we use Majority Voting. The experimental results obtained on a dataset of 282 writers with 2 samples per writer shows that the proposed method attains appealing performance on writer detection and text-independent environment.

Journal ArticleDOI
TL;DR: In this paper, a survey of handwritten mathematical expression recognition and their applications is presented, with a focus on end-to-end approaches based on encoder-decoder architecture and multi-modal input.
Abstract: Handwritten mathematical expressions are an essential part of many domains, including education, engineering, and science. The pervasive availability of computationally powerful touch-screen devices, similar to the recent emergence of deep neural networks as high-quality sequence recognition models, result in the widespread adoption of online recognition of handwritten mathematical expressions. Also, a deeper study and improvement of such technologies is necessary to address the current challenges posed by the extensive usage of distance learning, and remote work due to the world pandemic. This paper delineates the state-of-the-art recognition methods along with the user’s experience in pen-centric applications for operating with handwritten mathematical expressions. Recognition methods have been categorized into classes, with a description of their merits and limitations. Particular attention is paid to end-to-end approaches based on encoder-decoder architecture and multi-modal input. Evaluation protocols and open benchmark datasets are considered as well as the comparison of the recognition performance, based on open competition results. The use of handwritten math recognition is illustrated by examples of applications for various fields and platforms. A distinctive part of the survey is that we also considered how UI design relies on the use of different recognition approaches, which is aimed at helping potential researchers improve the performance of the introduced approaches toward the best responses in practical applications. Finally, this paper presents the prospective survey of future research directions in handwritten mathematical expression recognition and their applications.

Journal ArticleDOI
TL;DR: This paper proposes a novel sequential relation decoder (SRD) that aims to decode expressions into tree structures for online handwritten mathematical expression recognition and demonstrates how the proposed SRD outperforms state-of-the-art string decoders through a set of experiments on CROHME database.
Abstract: Recently, recognition of online handwritten mathe- matical expression has been greatly improved by employing encoder-decoder based methods. Existing encoder-decoder models use string decoders to generate LaTeX strings for mathematical expression recognition. However, in this paper, we importantly argue that string representations might not be the most natural for mathematical expressions – mathematical expressions are inherently tree structures other than flat strings. For this purpose, we propose a novel sequential relation decoder (SRD) that aims to decode expressions into tree structures for online handwritten mathematical expression recognition. At each step of tree construction, a sub-tree structure composed of a relation node and two symbol nodes is computed based on previous sub-tree structures. This is the first work that builds a tree structure based decoder for encoder-decoder based mathematical expression recognition. Compared with string decoders, a decoder that better understands tree structures is crucial for mathematical expression recognition as it brings a more reasonable learning objective and improves overall generalization ability. We demonstrate how the proposed SRD outperforms state-of-the-art string decoders through a set of experiments on CROHME database, which is currently the largest benchmark for online handwritten mathematical expression recognition.

Proceedings ArticleDOI
17 May 2021
TL;DR: In this article, the authors proposed an online handwritten recognition system to predict the doctors' handwriting and develop a digital prescription, which achieved 89.5% accuracy which is 16.1% higher than the recognition accuracy with no data expansion.
Abstract: Inability to read doctors’ handwritten prescriptions causes 7,000 deaths a year in a developed country like the US. The situation should be worse in developing countries where more doctors use handwriting prescriptions. In Bangladesh, the writings become more indecipherable as they contain both English and Bangla words with Latin abbreviations of medical terms. As a result, patients and pharmacists find them difficult to read and the pharmacists provide wrong medicines. In order to ease the difficulty of reading doctors’ prescriptions, this paper proposes an online handwritten recognition system to predict the doctors’ handwriting and develop a digital prescription. To build this system, the “Handwritten Medical Term Corpus” dataset is introduced which contains 17,431 data samples of 480 words (360 English and 120 Bangla) from 39 Bangladeshi doctors and medical professionals. A bigger sample size can improve the recognition efficiency. A new data augmentation technique SRP (Stroke Rotation and Parallel shift) method is proposed to widen the variety of handwriting styles and increase the sample size. A sequence of line data is extracted from the augmented image dataset of 1,591,100 samples which is fed to a Bidirectional LSTM model. The proposed method has achieved 89.5% accuracy which is 16.1% higher than the recognition accuracy with no data expansion. This technology can reduce medical errors and save medical cost and ensure healthy living.

Journal ArticleDOI
15 Sep 2021
TL;DR: In this paper, the authors used deep learning architecture convolutional neural network, and obtained above 97% recognition accuracy in data dependent mode of handwriting for online handwritten Gurmukhi word recognition.
Abstract: The recognition of online handwriting is a vital application of pattern recognition, which involves the extraction of spatial and temporal information of handwritten patterns, and understanding the handwritten text while writing on the digital surface. Although, online handwriting recognition is a mature but exciting and fast developing field of pattern recognition, the same is not true for many of the Indic scripts. Gurmukhi is one of such popular scripts of India, and online handwriting recognition issues for larger units as words or sentences largely remained unexplored for this script till date. The existing study and first ever attempt for online handwritten Gurmukhi word recognition has relied upon the widely used hidden Markov model. This existing study evaluated against and performed very well in their chosen metrics. But, the available online handwritten Gurmukhi word recognition system could not obtain more than 90% recognition accuracy in data dependent environment too. The present study provided benchmark results for online handwritten Gurmukhi word recognition using deep learning architecture convolutional neural network, and obtained above 97% recognition accuracy in data dependent mode of handwriting. The previous Gurmukhi word recognition system followed the stroke based class labeling approach, whereas the present study has followed the word based class labeling approach. Present Online handwritten Gurmukhi word recognition results are quite satisfactory. Moreover, the proposed architecture can be used to improve the benchmark results of online handwriting recognition of several major Indian scripts. Experimental results demonstrated that the deep learning system achieved great results in Gurmukhi script and outperforms existing results in the literature.

Proceedings ArticleDOI
02 Apr 2021
TL;DR: This paper developed a knowledgeable framework for Handwritten Character Recognition (HCR) victimization Neural Network which might effectively acknowledge selected type-format character victimization as the substitute Neural Network approach.
Abstract: The Character recognition techniques equate an illustrative identity with the image of character. Handwritten human character recognition is a machine's ability to obtain and recognize handwritten information from various sources such as papers, photos, tactile touch devices etc. Recognition of handwriting and computer characters is an evolving field of study and has broad uses in banks, offices and industries. The key objective of this research work is to develop a knowledgeable framework for “Handwritten Character Recognition (HCR) victimization Neural Network” which might effectively acknowledge selected type-format character victimization as the substitute Neural Network approach. Neural method is the best method for controlling images, thus style parts square measure less all around plot as compared to various designs. Neural computers do parallel results. Neural computers square measure run during a manner that's utterly different from traditional operation. Neural computers square measure conditioned (not programmed) in such a way, that how it's given in an explicit beginning state (data input); they either assign the information (input file or computer file) into one amongst the quantity of categories or permit the initial data to evolve to maximize an explicit fascinating property. In this research work, a purely handwritten digit recognition using machine learning model as well as character recognition matlab model is used. A translator using MATLAB to beat the barrier of various languages is designed. The projected style is also used for English, Marathi and Guajarati text to speech conversion into English language. Input is taken in English, Marathi and Gujrati text manually to the interface or image of written text or handwritten text and output can be translated in English Language by facilitating use of Optical Character Recognition (OCR) technique. The projected methodology is also used to produce help to folks that lack the ability of speech or non-native speakers. On the other hand, purely handwritten digit recognition using machine learning algorithms is used to interpret the human handwriting to the second person easily and effectively.

Journal ArticleDOI
TL;DR: In this article, a hybrid model using Support Vector Machine (SVM) and eXtreme Gradient Boosting (XGBoost) classifiers is used for feature extraction of the Arabic character images, which are then passed on to the Machine Learning classifiers.
Abstract: Handwriting recognition for computer systems has been in research for a long time, with different researchers having an extensive variety of methods at their disposal. The problem is that most of these experiments are done in English, as it is the most spoken language in the world. But other languages such as Arabic, Mandarin, Spanish, French, and Russian also need research done on them since there are millions of people who speak them. In this work, recognizing and developing Arabic handwritten characters is proposed by cleaning the state-of-the-art Arabic dataset called Hijaa, developing Conventional Neural Network (CNN) with a hybrid model using Support Vector Machine (SVM) and eXtreme Gradient Boosting (XGBoost) classifiers. The CNN is used for feature extraction of the Arabic character images, which are then passed on to the Machine Learning classifiers. A recognition rate of up to 96.3% for 29classes is achieved, far surpassing the already state-of-the-art results of the Hijaa dataset.

Journal ArticleDOI
TL;DR: A comprehensive review has been reported for online handwriting recognition of non-Indic and Indic scripts and an effort has been made to provide the list of publicly available online handwritten dataset for various scripting languages.
Abstract: Handwriting recognition is one of the challenging tasks in the area of pattern recognition and machine learning. Handwriting recognition has two flavors, namely, Offline Handwriting Recognition and Online Handwriting Recognition. Though, saturation level has been achieved in machine printed (Offline) character recognition. Presently, due to dramatical development in IT sector, touch-based devices are available in the market with efficient processing capabilities. With this revolution, research in the area of handwriting recognition has become more popular in real-time (Online) mode. In this paper, a comprehensive review has been reported for online handwriting recognition of non-Indic and Indic scripts. The six non-Indic-scripts and eight Indic script namely, Arabic, Chinese, Japanese, Persian, Roman, Thai, and, Assamese, Bangla, Devanagari, Gurmukhi, Kannada, Malayalam, Tamil, Telugu, respectively have been considered in this article. This study comprises introduction of online handwriting recognition process, various challenges, motivations, feature extraction, and classification methodologies, used for recognizing the various scripting languages. Moreover, an effort has been made to provide the list of publicly available online handwritten dataset for various scripting languages. This study also provides the recognition and beneficial assistance to the novice researchers in field of handwriting recognition by providing a nut shell studies of various feature extraction strategies and classification techniques, used for the recognition of both Indic and non-Indic scripts.

Journal ArticleDOI
TL;DR: This paper presents a metric transfer learning approach entitled as “ Metric Transfer Learning via Geometric Knowledge Embedding (MTL-GKE) ” to actuate metric learning in transfer learning, and learns two projection matrices for each domain to project the source and target domains to a new feature space.
Abstract: The usefulness of metric learning in image classification has been proven and has attracted increasing attention in recent research. In conventional metric learning, it is assumed that the source and target instances are distributed identically, however, real-world problems may not have such an assumption. Therefore, for better classifying, we need abundant labeled images, which are inaccessible due to the high cost of labeling. In this way, the knowledge transfer could be utilized. In this paper, we present a metric transfer learning approach entitled as “Metric Transfer Learning via Geometric Knowledge Embedding (MTL-GKE)” to actuate metric learning in transfer learning. Specifically, we learn two projection matrices for each domain to project the source and target domains to a new feature space. In the new shared sub-space, Mahalanobis distance metric is learned to maximize inter-class and minimize intra-class distances in target domain, while a novel instance reweighting scheme based on the graph optimization is applied, simultaneously, to employ the weights of source samples for distribution matching. The results of different experiments on several datasets on object and handwriting recognition tasks indicate the effectiveness of the proposed MTL-GKE compared to other state-of-the-arts methods.

Journal ArticleDOI
TL;DR: Attention-based two-pathway Densely Connected Convolutional Networks (ATP-DenseNet) is proposed to identify the gender of handwriting to improve gender identification in forensic techniques and handwriting recognition.
Abstract: Digital forensics has a vital effect in several domains and mainly focuses on reactive measures, especially when facing digital incidents. Gender identification becomes the important problem in the realm of forensic techniques and handwriting recognition. In this paper, attention-based two-pathway Densely Connected Convolutional Networks (ATP-DenseNet) is proposed to identify the gender of handwriting. There are two pathways in ATP-DenseNet: Feature pyramid could extract hierarchical page feature, and attention-based DenseNet (A-DenseNet) could extract the word feature by fusing Convolutional Block Attention Module (CBAM) and dense connected block. Finally, ATP-DenseNet makes the final prediction combining the two pathways. Experimental results show the efficiency of ATP-DenseNet, and the proposed method performs better than other researches. And the visualization of the feature maps can help us to know which part of the image contributes most to the gender identity.

Proceedings ArticleDOI
Zuoyu Yan1, Xiaode Zhang1, Liangcai Gao1, Ke Yuan1, Zhi Tang1 
10 Jan 2021
TL;DR: Wang et al. as discussed by the authors proposed a convolutional sequence modeling network, ConvMath, which converts the mathematical expression description in an image into a LaTeX sequence in an end-to-end way.
Abstract: Despite the recent advances in optical character recognition (OCR), mathematical expressions still face a great challenge to recognize due to their two-dimensional graphical layout. In this paper, we propose a convolutional sequence modeling network, ConvMath, which converts the mathematical expression description in an image into a LaTeX sequence in an end-to-end way. The network combines an image encoder for feature extraction and a convolutional decoder for sequence generation. Compared with other Long Short Term Memory(LSTM) based encoder-decoder models, ConvMath is entirely based on convolution, thus it is easy to perform parallel computation. Besides, the network adopts multi-layer attention mechanism in the decoder, which allows the model to align output symbols with source feature vectors automatically, and alleviates the problem of lacking coverage while training the model. The performance of ConvMath is evaluated on an open dataset named IM2LATEX-100K, including 103556 samples. The experimental results demonstrate that the proposed network achieves state-of-the-art accuracy and much better efficiency than previous methods.

Proceedings ArticleDOI
10 Jan 2021
TL;DR: In this article, a few-shot object detection method was proposed to detect all symbols of a given alphabet in a line image, and then a decoding step maps the symbol similarity scores to the final sequence of transcribed symbols.
Abstract: Encoded (or ciphered) manuscripts are a special type of historical documents that contain encrypted text. The automatic recognition of this kind of documents is challenging because: 1) the cipher alphabet changes from one document to another, 2) there is a lack of annotated corpus for training and 3) touching symbols make the symbol segmentation difficult and complex. To overcome these difficulties, we propose a novel method for handwritten ciphers recognition based on few-shot object detection. Our method first detects all symbols of a given alphabet in a line image, and then a decoding step maps the symbol similarity scores to the final sequence of transcribed symbols. By training on synthetic data, we show that the proposed architecture is able to recognize handwritten ciphers with unseen alphabets. In addition, if few labeled pages with the same alphabet are used for fine tuning, our method surpasses existing unsupervised and supervised HTR methods for ciphers recognition.

Book ChapterDOI
05 Sep 2021
TL;DR: The authors proposed the Simple Predict & Align Network (SPAN), an end-to-end recurrence-free Fully Convolutional Network performing OCR at paragraph level without any prior segmentation stage.
Abstract: Unconstrained handwriting recognition is an essential task in document analysis. It is usually carried out in two steps. First, the document is segmented into text lines. Second, an Optical Character Recognition model is applied on these line images. We propose the Simple Predict & Align Network: an end-to-end recurrence-free Fully Convolutional Network performing OCR at paragraph level without any prior segmentation stage. The framework is as simple as the one used for the recognition of isolated lines and we achieve competitive results on three popular datasets: RIMES, IAM and READ 2016. The proposed model does not require any dataset adaptation and can be trained without line breaks in the transcription labels. Our code and trained model weights are available at https://github.com/FactoDeepLearning/SPAN.

Journal ArticleDOI
TL;DR: The proposed best CNN model has the simplest architecture that provides a higher accuracy for different datasets and takes less computational time and the validation accuracy of the proposed model is also higher than those of in past works.
Abstract: Background: Handwriting recognition becomes an appreciable research area because of its important practical applications, but varieties of writing patterns make automatic classification a challenging task. Classifying handwritten digits with a higher accuracy is needed to improve the limitations from past research, which mostly used deep learning approaches. Objective: Two most noteworthy limitations are low accuracy and slow computational speed. The current study is to model a Convolutional Neural Network (CNN), which is simple yet more accurate in classifying English handwritten digits for different datasets. Novelty of this paper is to explore an efficient CNN architecture that can classify digits of different datasets accurately. Methods: The author proposed five different CNN architectures for training and validation tasks with two datasets. Dataset-1 consists of 12,000 MNIST data and Dataset-2 consists of 29,400-digit data of Kaggle. The proposed CNN models extract the features first and then performs the classification tasks. For the performance optimization, the models utilized stochastic gradient descent with momentum optimizer. Results: Among the five models, one was found to be the best performer, with 99.53% and 98.93% of validation accuracy for Dataset-1 and Dataset-2 respectively. Compared to Adam and RMSProp optimizers, stochastic gradient descent with momentum yielded the highest accuracy. Conclusion: The proposed best CNN model has the simplest architecture. It provides a higher accuracy for different datasets and takes less computational time. The validation accuracy of the proposed model is also higher than those of in past works.

Journal ArticleDOI
TL;DR: An ensemble model for in-air handwriting recognition which is based on convolutional neural network (CNN) and a long short-term memory Neural Network (LSTM-NN) is proposed, which can assimilate more information from the air-writing patterns and hence offer better recognition performance than the state-of-the-art approaches.
Abstract: In-air handwriting is a contemporary human computer interaction (HCI) technique which enables users to write and communicate in free space in a simple and intuitive manner. Air-written characters exhibit wide variations depending upon different writing styles of users and their speed of articulation, which presents a great challenge towards effective recognition of linguistic characters. So, in this paper we have proposed an ensemble model for in-air handwriting recognition which is based on convolutional neural network (CNN) and a long short-term memory neural network (LSTM-NN). The method collaborates overall character trajectory appearance modeling and temporal trajectory feature modeling for efficient recognition of varied types of air-written characters. In contrast to two-dimensional handwriting, in-air handwriting generally involves writing of characters interlinked by a continuous stroke, which makes segregation of intended writing activity from insignificant connecting motions an intricate task. So, a two-stage statistical framework is incorporated in the system for automatic detection and extraction of relevant writing segments from air-written characters. Identification of writing events from a continuous stream of air-written data is accomplished by formulating a Markov Random Field (MRF) model, while the segmentation of writing events into meaningful handwriting segments and redundant parts is performed by implementation of a Mahalanobis distance (MD) classifier. The proposed approach is assessed on an air-written character dataset comprising of Assamese vowels, consonants and numerals. The experimental results connote that our hybrid network can assimilate more information from the air-writing patterns and hence offer better recognition performance than the state-of-the-art approaches.

Journal ArticleDOI
TL;DR: An enhanced method for the recognition of Arabic handwriting words using a directions-based segmentation technique and discrete cosine transform coefficients as structural features and the k-nearest neighbors (KNN) classifier to classify the segmented characters based on the extracted features is proposed.
Abstract: With advances in machine learning techniques, handwriting recognition systems have gained a great deal of importance. Lately, the increasing popularity of handheld computers, digital notebooks, and smartphones give the field of online handwriting recognition more interest. In this paper, we propose an enhanced method for the recognition of Arabic handwriting words using a directions-based segmentation technique and discrete cosine transform (DCT) coefficients as structural features. The main contribution of this research was combining a total of 18 structural features which were extracted by DCT coefficients and using the k-nearest neighbors (KNN) classifier to classify the segmented characters based on the extracted features. A dataset is used to validate the proposed method consisting of 2500 words in total. The obtained average 99.10% accuracy in recognition of handwritten characters shows that the proposed approach, through its multiple phases, is efficient in separating, distinguishing, and classifying Arabic handwritten characters using the KNN classifier. The availability of an online dataset of Arabic handwriting words is the main issue in this field. However, the dataset used will be available for research via the website.

Proceedings ArticleDOI
05 Mar 2021
TL;DR: In this article, two major deep learning algorithms Artificial Neural Network and Convolutional Neural Network are compared considering their feature extraction and classification stages of recognition, the models were trained using categorical cross-entropy loss and ADAM optimizer on the MNIST dataset.
Abstract: Handwritten digit recognition is an intricate assignment that is vital for developing applications, in computer vision digit recognition is one of the major applications. There has been a copious exploration done in the Handwritten Character Recognition utilizing different deep learning models. Deep learning is rapidly increasing in demand due to its resemblance to the human brain. The two major Deep learning algorithms Artificial Neural Network and Convolutional Neural Network which have been compared in this paper considering their feature extraction and classification stages of recognition. The models were trained using categorical cross-entropy loss and ADAM optimizer on the MNIST dataset. Backpropagation along with Gradient Descent is being used to train the networks along with reLU activations in the network which do automatic feature extraction. In neural networks, Convolution Neural Network (ConvNets or Convolutional neural networks) is one of the primary classifiers to do image recognition, image classification tasks in Computer Vision.

Journal ArticleDOI
TL;DR: The authors proposed an end-to-end transformer-based approach to jointly perform text transcription and named entity recognition in handwritten documents, which achieved the state-of-the-art performance in the ICDAR 2017 Information Extraction competition.

Proceedings ArticleDOI
10 Jan 2021
TL;DR: In this paper, a deep neural network-based method was proposed to recover dynamic online trajectories from offline handwritten Japanese kanji character images, which is the first attempt to use online recovered trajectories to help improve offline handwriting recognition performance.
Abstract: We propose a deep neural network-based method to recover dynamic online trajectories from offline handwritten Japanese kanji character images. It is a challenging task since Japanese kanji characters consist of multiple strokes. Our proposed model has three main components: Convolutional Neural Network-based encoder, Long Short-Term Memory Network-based decoder with an attention layer, and Gaussian Mixture Model (GMM). The encoder focuses on feature extraction while the decoder refers to the extracted features and generates time-sequences of GMM parameters. The attention layer is the key component for trajectory recovery. The GMM provides robustness to style variations so that the proposed model does not overfit to training samples. In the experiments, the proposed method is evaluated by both visual verification and handwritten character recognition. This is the first attempt to use online recovered trajectories to help improve offline handwriting recognition performance. Although the visual verification reveals some problems, the recognition experiments demonstrate the effect of trajectory recovery in improving offline handwritten character recognition accuracy when online recognition of the recovered trajectories are combined.