scispace - formally typeset
Search or ask a question

Showing papers on "Handwriting recognition published in 2014"


Posted Content
TL;DR: Novel LSTM based RNN architectures which make more effective use of model parameters to train acoustic models for large vocabulary speech recognition are presented.
Abstract: Long Short-Term Memory (LSTM) is a recurrent neural network (RNN) architecture that has been designed to address the vanishing and exploding gradient problems of conventional RNNs. Unlike feedforward neural networks, RNNs have cyclic connections making them powerful for modeling sequences. They have been successfully used for sequence labeling and sequence prediction tasks, such as handwriting recognition, language modeling, phonetic labeling of acoustic frames. However, in contrast to the deep neural networks, the use of RNNs in speech recognition has been limited to phone recognition in small scale tasks. In this paper, we present novel LSTM based RNN architectures which make more effective use of model parameters to train acoustic models for large vocabulary speech recognition. We train and compare LSTM, RNN and DNN models at various numbers of parameters and configurations. We show that LSTM models converge quickly and give state of the art speech recognition performance for relatively small sized models.

843 citations


Proceedings ArticleDOI
01 Sep 2014
TL;DR: In this article, the authors show that RNNs with Long Short-Term Memory (LSTM) cells can be improved using dropout, a recently proposed regularization method for deep architectures.
Abstract: Recurrent neural networks (RNNs) with Long Short-Term memory cells currently hold the best known results in unconstrained handwriting recognition. We show that their performance can be greatly improved using dropout - a recently proposed regularization method for deep architectures. While previous works showed that dropout gave superior performance in the context of convolutional networks, it had never been applied to RNNs. In our approach, dropout is carefully used in the network so that it does not affect the recurrent connections, hence the power of RNNs in modeling sequences is preserved. Extensive experiments on a broad range of handwritten databases confirm the effectiveness of dropout on deep architectures even when the network mainly consists of recurrent and shared connections.

444 citations


Posted Content
TL;DR: A CNN for processing spatially-sparse inputs, motivated by the problem of online handwriting recognition, and applying a deep convolutional network using sparsity has resulted in a substantial reduction in test error on the CIFAR small picture datasets.
Abstract: Convolutional neural networks (CNNs) perform well on problems such as handwriting recognition and image classification However, the performance of the networks is often limited by budget and time constraints, particularly when trying to train deep networks Motivated by the problem of online handwriting recognition, we developed a CNN for processing spatially-sparse inputs; a character drawn with a one-pixel wide pen on a high resolution grid looks like a sparse matrix Taking advantage of the sparsity allowed us more efficiently to train and test large, deep CNNs On the CASIA-OLHWDB11 dataset containing 3755 character classes we get a test error of 382% Although pictures are not sparse, they can be thought of as sparse by adding padding Applying a deep convolutional network using sparsity has resulted in a substantial reduction in test error on the CIFAR small picture datasets: 628% on CIFAR-10 and 2430% for CIFAR-100

215 citations


Proceedings ArticleDOI
15 Dec 2014
TL;DR: A modified topology for long short-term memory recurrent neural networks that controls the shape of the squashing functions in gating units is demonstrated and an efficient training framework based on a mini-batch training on sequence level combined with a sequence chunking approach is proposed.
Abstract: In this paper we demonstrate a modified topology for long short-term memory recurrent neural networks that controls the shape of the squashing functions in gating units. We further propose an efficient training framework based on a mini-batch training on sequence level combined with a sequence chunking approach. The framework is evaluated on publicly available data sets containing English and French handwriting by utilizing a GPU based implementation. Speedups of more than 3x are achieved in training recurrent neural network models which outperform state of the art recognition results.

172 citations


Journal ArticleDOI
TL;DR: In the framework of handwriting recognition, a novel GA-based feature selection algorithm in which feature subsets are evaluated by means of a specifically devised separability index that represents an extension of the Fisher Linear Discriminant method and uses covariance matrices for estimating how class probability distributions are spread out in the considered N-dimensional feature space.

115 citations


Journal ArticleDOI
TL;DR: Experimental results on six public data sets demonstrate that the proposed method outperforms the state-of-the-art algorithms.
Abstract: This paper proposes a novel offline text-independent writer identification method based on scale invariant feature transform (SIFT), composed of training, enrollment, and identification stages. In all stages, an isotropic LoG filter is first used to segment the handwriting image into word regions (WRs). Then, the SIFT descriptors (SDs) of WRs and the corresponding scales and orientations (SOs) are extracted. In the training stage, an SD codebook is constructed by clustering the SDs of training samples. In the enrollment stage, the SDs of the input handwriting are adopted to form an SD signature (SDS) by looking up the SD codebook and the SOs are utilized to generate a scale and orientation histogram (SOH). In the identification stage, the SDS and SOH of the input handwriting are extracted and matched with the enrolled ones for identification. Experimental results on six public data sets (including three English data sets, one Chinese data set, and two hybrid-language data sets) demonstrate that the proposed method outperforms the state-of-the-art algorithms.

114 citations


Journal ArticleDOI
TL;DR: In this article, a formal model for the recognition of on-line handwritten mathematical expressions using 2D stochastic context-free grammars and hidden Markov models is described.

107 citations


Proceedings ArticleDOI
Chunpeng Wu1, Wei Fan1, Yuan He1, Jun Sun1, Satoshi Naoi1 
15 Dec 2014
TL;DR: The relaxation convolution layer adopted in the R-CNN, unlike traditional convolutional layer, does not require neurons within a feature map to share the same Convolutional kernel, endowing the neural network with more expressive power.
Abstract: Deep learning methods have recently achieved impressive performance in the area of visual recognition and speech recognition. In this paper, we propose a hand- writing recognition method based on relaxation convolutional neural network (R-CNN) and alternately trained relaxation convolutional neural network (ATR-CNN). Previous methods regularize CNN at full-connected layer or spatial-pooling layer, however, we focus on convolutional layer. The relaxation convolution layer adopted in our R-CNN, unlike traditional convolutional layer, does not require neurons within a feature map to share the same convolutional kernel, endowing the neural network with more expressive power. As relaxation convolution sharply increase the total number of parameters, we adopt alternate training in ATR-CNN to regularize the neural network during training procedure. Our previous C- NN took the 1st place in ICDAR'13 Chinese Handwriting Character Recognition Competition, while our latest ATR-CNN outperforms our previous one and achieves the state-of-the-art accuracy with an error rate of 3.94%, further narrowing the gap between machine and human observers (3.87%).

105 citations


Journal ArticleDOI
TL;DR: This paper lists the different models proposed in order to characterize the handwriting process and focuses on a representation involving a vectorial summation of lognormal functions: the Sigma-lognormal model.

96 citations


Journal ArticleDOI
01 Jan 2014
TL;DR: It is shown that continuous gesture recognition with inertial sensors is feasible for gesture vocabularies that are several orders of magnitude larger than traditional vocABularies for known systems.
Abstract: We present a wearable input system which enables interaction through 3D handwriting recognition. Users can write text in the air as if they were using an imaginary blackboard. The handwriting gestures are captured wirelessly by motion sensors applying accelerometers and gyroscopes which are attached to the back of the hand. We propose a two-stage approach for spotting and recognition of handwriting gestures. The spotting stage uses a support vector machine to identify those data segments which contain handwriting. The recognition stage uses hidden Markov models (HMMs) to generate a text representation from the motion sensor data. Individual characters are modeled by HMMs and concatenated to word models. Our system can continuously recognize arbitrary sentences, based on a freely definable vocabulary. A statistical language model is used to enhance recognition performance and to restrict the search space. We show that continuous gesture recognition with inertial sensors is feasible for gesture vocabularies that are several orders of magnitude larger than traditional vocabularies for known systems. In a first experiment, we evaluate the spotting algorithm on a realistic data set including everyday activities. In a second experiment, we report the results from a nine-user experiment on handwritten sentence recognition. Finally, we evaluate the end-to-end system on a small but realistic data set.

86 citations


Journal ArticleDOI
TL;DR: This paper presents an online handwritten mathematics expression recognition system that handles mathematical expression recognition as a simultaneous optimization of expression segmentation, symbol recognition, and 2D structure recognition under the restriction of a mathematical expression grammar.

Patent
29 May 2014
TL;DR: In this paper, a handwriting recognition module is trained to have a repertoire comprising multiple non-overlapping scripts and capable of recognizing tens of thousands of characters using a single handwriting recognition model.
Abstract: Methods, systems, and computer-readable media related to a technique for providing handwriting input functionality on a user device A handwriting recognition module is trained to have a repertoire comprising multiple non-overlapping scripts and capable of recognizing tens of thousands of characters using a single handwriting recognition model The handwriting input module provides real-time, stroke-order and stroke-direction independent handwriting recognition for multi-character handwriting input In particular, real-time, stroke-order and stroke-direction independent handwriting recognition is provided for multi-character, or sentence level Chinese handwriting recognition User interfaces for providing the handwriting input functionality are also disclosed

Journal ArticleDOI
TL;DR: Experimental results obtained on the IAM off-line database demonstrate that consistent word error rate reductions can be achieved with neural network language models when compared with statistical N-gram language models on the three tested systems.

Book ChapterDOI
14 Oct 2014
TL;DR: Bidirectional LSTM-RNNs with DeepMLPs for handwriting recognition yield performance comparable to the state-of-the-art, regardless of the type of features (hand-crafted or pixel values) and the neural network optical model (DeepMLP or RNN).
Abstract: Long Short-Term Memory Recurrent Neural Networks are the current state-of-the-art in handwriting recognition. In speech recognition, Deep Multi-Layer Perceptrons (DeepMLPs) have become the standard acoustic model for Hidden Markov Models (HMMs). Although handwriting and speech recognition systems tend to include similar components and techniques, DeepMLPs are not used as optical model in unconstrained large vocabulary handwriting recognition. In this paper, we compare Bidirectional LSTM-RNNs with DeepMLPs for this task. We carried out experiments on two public databases of multi-line handwritten documents: Rimes and IAM. We show that the proposed hybrid systems yield performance comparable to the state-of-the-art, regardless of the type of features (hand-crafted or pixel values) and the neural network optical model (DeepMLP or RNN).

Proceedings ArticleDOI
15 Dec 2014
TL;DR: A system based on recurrent neural networks and weighted finite state transducers was used both for printed and handwritten recognition, in French, English and Arabic, for multi-lingual text recognition.
Abstract: This paper describes the system submitted by A2iA to the second Maurdor evaluation for multi-lingual text recogni- tion. A system based on recurrent neural networks and weighted finite state transducers was used both for printed and handwritten recognition, in French, English and Arabic. To cope with the difficulty of the documents, multiple text line segmentations were considered. An automatic procedure was used to prepare annotated text lines needed for the training of the neural network. Language models were used to decode sequences of characters or words for French and English and also sequences of part-of- arabic words (PAWs) in case of Arabic. This system scored first at the second Maurdor evaluation for both printed and handwritten text recognition in French, English and Arabic.

Patent
30 May 2014
TL;DR: In this paper, a handwriting recognition module is trained to have a repertoire comprising multiple non-overlapping scripts and capable of recognizing tens of thousands of characters using a single handwriting recognition model.
Abstract: Methods, systems, and computer-readable media related to a technique for providing handwriting input functionality on a user device. A handwriting recognition module is trained to have a repertoire comprising multiple non-overlapping scripts and capable of recognizing tens of thousands of characters using a single handwriting recognition model. The handwriting input module provides real-time, stroke-order and stroke-direction independent handwriting recognition. User interfaces for providing the handwriting input functionality are also disclosed.

Proceedings ArticleDOI
07 Apr 2014
TL;DR: This paper describes the Arabic handwriting recognition systems proposed by A2iA to the NIST OpenHaRT2013 evaluation, based on an optical model using Long Short-Term Memory recurrent neural networks trained to recognize the different forms of the Arabic characters directly from the image, without explicit feature extraction nor segmentation.
Abstract: This paper describes the Arabic handwriting recognition systems proposed by A2iA to the NIST OpenHaRT2013 evaluation. These systems were based on an optical model using Long Short-Term Memory (LSTM) recurrent neural networks, trained to recognize the different forms of the Arabic characters directly from the image, without explicit feature extraction nor segmentation.Large vocabulary selection techniques and n-gram language modeling were used to provide a full paragraph recognition, without explicit word segmentation. Several recognition systems were also combined with the ROVER combination algorithm. The best system exceeded 80% of recognition rate.

Proceedings ArticleDOI
24 Aug 2014
TL;DR: An image database of historical handwritten marriages records stored in the archives of Barcelona cathedral, and the corresponding meta-data addressed to evaluate the performance of document analysis algorithms, which is the first dataset in the emerging area of genealogical document analysis.
Abstract: This paper presents an image database of historical handwritten marriages records stored in the archives of Barcelona cathedral, and the corresponding meta-data addressed to evaluate the performance of document analysis algorithms. The contribution of this paper is twofold. First, it presents a complete ground truth which covers the whole pipeline of handwriting recognition research, from layout analysis to recognition and understanding. Second, it is the first dataset in the emerging area of genealogical document analysis, where documents are manuscripts pseudo-structured with specific lexicons and the interest is beyond pure transcriptions but context dependent.

Proceedings ArticleDOI
01 Dec 2014
TL;DR: This paper presents a technique based on Multi Layer Perceptron (MLP) Neural Network model that detects graphical symbols by identifying lines and characters from the image and analyzes the symbols by training the network using feed forward topology for a set of desired unicode characters.
Abstract: Machine vision researchers are working on the area of recognition of handwritten or printed text from scanned images for the purpose of digitizing documents and for reducing the errorless data entry cost. The classic difficulty of being able to correctly recognize language symbols is the complexity and the irregularity among the pictorial representation of characters due to variation in writing styles, size of symbols etc. Character recognition process depends on, how the input data is given to the system. Input data may be categorized as Online data or Offline data. Both the forms of data input have their own issues. In this paper, we are focusing on the Offline Gurmukhi character recognition from text image. There are lot of complexities associated with Gurmukhi Script. In this paper, we present a technique based on Multi Layer Perceptron (MLP) Neural Network model. Here we consider isolated handwritten Gurmukhi characters for recognition. MLP is used because it uses generalized delta learning rules and easily gets trained in less number of iterations. The proposed method in this paper detect graphical symbols by identifying lines and characters from the image. After that it analyzes the symbols by training the network using feed forward topology for a set of desired unicode characters. We achieve the performance rate of proposed system maximum up to 98.96% for recognition of symbols by using MLP neural network.

Journal ArticleDOI
TL;DR: The results of the research in handwriting/handwritten character recognition in about the last quarter of a century are reported, illustrating the results presented during the International Workshop on Frontiers in Handwriting Recognition (IWFHR) and the ICFHR.

Journal ArticleDOI
TL;DR: This paper proposes a novel approach to limited vocabulary recognition of unconstrained (mixed cursive) handwriting based on a hidden Markov model (HMM), and implements fully connected non-homogeneous HMMs considering the enormous variability in the present handwriting style.

Proceedings ArticleDOI
07 Apr 2014
TL;DR: A novel handwritten word spotting approach based on graph representation that comprises both topological and morphological signatures of handwriting that outperforms the state-of-the-art structural methods.
Abstract: Effective information retrieval on handwritten documentimages has always been a challenging task. In this paper, we propose a novel handwritten word spotting approach based on graph representation. The presented model comprises both topological and morphological signatures of handwriting. Skeleton-based graphs with the Shape Context labelled vertexes are established for connected components. Each word image is represented as a sequence of graphs. In order to be robust to the handwriting variations, an exhaustive merging process based on DTW alignment result is introduced in the similarity measure between word images. With respect to the computation complexity, an approximate graph edit distance approach using bipartite matching is employed for graph matching. The experiments on the George Washington dataset and the marriage records from the Barcelona Cathedral dataset demonstrate that the proposed approach outperforms the state-of-the-art structural methods.

Journal ArticleDOI
TL;DR: Column bit vectors are extended by means of a sliding window of adequate width to better capture image context at each horizontal position of the word image and to ensure that no discriminative information is filtered out during feature extraction, which in some sense is integrated into the recognition model.

Proceedings ArticleDOI
23 Jun 2014
TL;DR: This paper addresses the problem of personalization in the context of gesture recognition, and proposes a novel and extremely efficient way of doing personalization that learns a set of classifiers during training, one of which is selected for each test subject based on the personalization data.
Abstract: Human gestures, similar to speech and handwriting, are often unique to the individual. Training a generic classifier applicable to everyone can be very difficult and as such, it has become a standard to use personalized classifiers in speech and handwriting recognition. In this paper, we address the problem of personalization in the context of gesture recognition, and propose a novel and extremely efficient way of doing personalization. Unlike conventional personalization methods which learn a single classifier that later gets adapted, our approach learns a set (portfolio) of classifiers during training, one of which is selected for each test subject based on the personalization data. We formulate classifier personalization as a selection problem and propose several algorithms to compute the set of candidate classifiers. Our experiments show that such an approach is much more efficient than adapting the classifier parameters but can still achieve comparable or better results.

Proceedings ArticleDOI
15 Dec 2014
TL;DR: A text zone detection followed by a text line segmentation method suitable for historical handwritten documents is proposed and an existing approach based on Hough transform is enhanced in order to better treat cases of vertical connected characters.
Abstract: In order to achieve accurate text recognition performance for historical handwritten document images, robust and efficient page segmentation is necessary. In this paper, we propose a text zone detection followed by a text line segmentation method suitable for historical handwritten documents. Our aim is to handle several challenging cases such as horizontal and vertical rule lines overlapping with the text, two column documents and characters of different text lines touching vertically. For text zone detection, we analyze vertical rule lines, connected components as well as vertical white runs while for text line segmentation, we enhance an existing approach based on Hough transform in order to better treat cases of vertical connected characters. Both methods have been proved very promising after an evaluation using a set of historical handwritten documents.

Proceedings ArticleDOI
07 Apr 2014
TL;DR: The RWTH system for large vocabulary Arabic handwriting recognition is described, based on Hidden Markov Models (HMMs) with state of the art methods for visual/language modeling and decoding, which allows for competitive results in previous handwriting recognition competitions.
Abstract: This paper describes the RWTH system for large vocabulary Arabic handwriting recognition The recognizer is based on Hidden Markov Models (HMMs) with state of the art methods for visual/language modeling and decoding The feature extraction is based on Recurrent Neural Networks (RNNs) which estimate the posterior distribution over the character labels for each observation Discriminative training using the Minimum Phone Error (MPE) criterion is used to train the HMMs The recognition is done with the help of n-gram Language Models (LMs) trained using in-domain text data Unsupervised writer adaptation is also performed using the Constrained Maximum Likelihood Linear Regression (CMLLR) feature adaptation The RWTH Arabic handwriting recognition system gave competitive results in previous handwriting recognition competitions The used techniques allows to improve the performance of the system participating in the OpenHaRT 2013 evaluation

Proceedings ArticleDOI
15 Dec 2014
TL;DR: A new offline handwriting database that was developed to be employed in performance evaluation, result comparison and development of new methods related to handwriting analysis and recognition and similar related tasks is introduced.
Abstract: This paper introduces a new offline handwriting database that was developed to be employed in performance evaluation, result comparison and development of new methods related to handwriting analysis and recognition. The database can particularly be used for signature verification, writer recognition and writer demographics classification. In addition, the database also supports isolated digit recognition, digit/text segmentation and recognition and similar related tasks. The database comprises 600 Arabic and 600 French text samples, 1300 signatures and 21,000 digits. 100 Algerian individuals coming from different age groups and educational backgrounds contributed to the development of database by providing a total of 1300 forms. The database is also accompanied with ground truth data supporting the evaluation of the aforementioned tasks. The main contribution of the database is providing a multi-script platform where same authors contributed samples in French and Arabic. It would be interesting to explore applications like writer recognition and writer demographics classification in a multi-script environment.

Proceedings ArticleDOI
01 Sep 2014
TL;DR: A real-time recognition-based segmentation technique of on-line Arabic script is proposed, and the feasibility of carrying out the most time consuming tasks, required for the segmentation process, during the course of writing is demonstrated.
Abstract: —Real-time performance is necessary inapplications involving on-line handwriting recognition.However, conventional approaches usually wait until the entirecurve is traced out before starting the analysis, inevitablycausing delays in the recognition process. In regards to theArabic script, the postponed analysis may be attributed to thecursive and unconstrained nature of the Arabic writing system,in both printed and handwritten forms. Nevertheless, thispaper proposes a real-time recognition-based segmentationtechnique of on-line Arabic script. It demonstrate thefeasibility of carrying out the most time consuming tasks,required for the segmentation process, during the course ofwriting. The system has been designed and tested using theADAB Database, and promising results were obtained. Keywords -Arabic script segmentation; handwriting recogni-tion; on-line text segmentation; I. I NTRODUCTION Handwriting remains the most commonly used meanof communication and recording of information in thedaily life, therefore, a growing interest in the handwritingcharacter recognition field has emerged in recent years.Handwriting recognition can be categorized into two mainareas: off-line and on-line. In the off-line case, a digitalimage containing text is fed to the computer, and the systemattempts to convert the spatial representation of the lettersinto digital symbols [1]. In contrast, the process of on-linehandwriting recognition is done on a digital representation ofthe text written on a special digitizer, tablet or smart-phonedevice, where sensors pick up the pen-tip movements.Research in this field has established two main ap-proaches; the analytic approach, which involves segmen-tation and classification of each part of the text [2], [3],[4], and the holistic approach, which considers the globalproperties of the written text and recognizes the input wordshape as a whole [5], [6]. While having many advantages, theholistic approach requires the classifier to be trained over theentire dictionary, which is impractical for large dictionaries(containing more than 20,000 words) [7].The cursiveness of the Arabic script, prima facie, requiresdelaying the launch of the recognition process until thecompletion of the word scribing. However, in this paper, wequestion the necessity of this requirement by demonstratingthe feasibility of approximating the position of the

Patent
24 Nov 2014
TL;DR: In this article, a stroke untangler composes handwritten messages from handwritten strokes representing overlapping letters or partial letter segments are drawn on a touchscreen device or touch-sensitive surface, automatically untangled and then segmented and combined into one or more letters, words, or phrases.
Abstract: A "Stroke Untangler" composes handwritten messages from handwritten strokes representing overlapping letters or partial letter segments are drawn on a touchscreen device or touch-sensitive surface. These overlapping strokes are automatically untangled and then segmented and combined into one or more letters, words, or phrases. Advantageously, segmentation and composition is performed without requiring user gestures, timeouts, or other inputs to delimit characters within words, and without using handwriting recognition-based techniques to guide untangling and composing of the overlapping strokes to form characters. In other words, the user draws multiple overlapping strokes. Those strokes are then automatically segmented and combined into one or more corresponding characters. Text recognition of the resulting characters is then performed. Further, the segmentation and combination is performed in real-time, thereby enabling real-time rendering of the resulting characters in a user interface window. A related drawing mode enables entry of drawings in combination with the handwritten characters.

Proceedings ArticleDOI
07 Apr 2014
TL;DR: A method to learn structural relations from training patterns without any heuristic decisions by using two SVM models is proposed and stroke order is employed to reduce the complexity of the parsing algorithm.
Abstract: This paper presents a system for recognizing online handwritten mathematical expressions (MEs) and improvement of structure analysis. We represent MEs in Context Free Grammars (CFGs) and employ the Cocke-Younger-Kasami (CYK) algorithm to parse 2D structure of on-line handwritten MEs and select the best interpretation in terms of symbol segmentation, recognition and structure analysis. We propose a method to learn structural relations from training patterns without any heuristic decisions by using two SVM models. We employ stroke order to reduce the complexity of the parsing algorithm. Moreover, we revise structure analysis. Even though CFG does not resolve ambiguities in some cases, our method still gives users a list of candidates that contain expecting result. We evaluate our method in the CROHME 2013 database and demonstrate the improvement of our system in recognition rate as well as processing time.