scispace - formally typeset
Search or ask a question

Showing papers on "Handwriting recognition published in 2003"


Proceedings ArticleDOI
18 Jun 2003
TL;DR: This work presents an algorithm for matching handwritten words in noisy historical documents that performs better and is faster than competing matching techniques and presents experimental results on two different data sets from the George Washington collection.
Abstract: Libraries and other institutions are interested in providing access to scanned versions of their large collections of handwritten historical manuscripts on electronic media. Convenient access to a collection requires an index, which is manually created at great labor and expense. Since current handwriting recognizers do not perform well on historical documents, a technique called word spotting has been developed: clusters with occurrences of the same word in a collection are established using image matching. By annotating "interesting" clusters, an index can be built automatically. We present an algorithm for matching handwritten words in noisy historical documents. The segmented word images are preprocessed to create sets of 1-dimensional features, which are then compared using dynamic time warping. We present experimental results on two different data sets from the George Washington collection. Our experiments show that this algorithm performs better and is faster than competing matching techniques.

626 citations


Journal ArticleDOI
TL;DR: This article will discuss the methods and principles that have been proposed to handle large vocabularies and identify the key issues affecting their future deployment.
Abstract: Considerable progress has been made in handwriting recognition technology over the last few years. Thus far, handwriting recognition systems have been limited to small and medium vocabulary applications, since most of them often rely on a lexicon during the recognition process. The capability of dealing with large lexicons, however, opens up many more applications. This article will discuss the methods and principles that have been proposed to handle large vocabularies and identify the key issues affecting their future deployment. To illustrate some of the points raised, a large vocabulary off-line handwritten word recognition system will be described.

194 citations


Proceedings ArticleDOI
Horst Bunke1
03 Aug 2003
TL;DR: The state of the art in off-line Roman cursive handwriting recognition is reviewed, recent trends are analyzed, and challenges for future research in this field are identified.
Abstract: This paper reviews the state of the art in off-line Roman cursive handwriting recognition. The input provided to an off-line handwriting recognition system is an image of a digit, a word, or - more generally -some text, and the system produces, as output, an ASCII transcription of the input. This task involves a number of processing steps, some of which are quite difficult. Typically, preprocessing, normalization, feature extraction, classification, and postprocessing operations are required. We'll survey the state of the art, analyze recent trends, and try to identify challenges for future research in this field.

178 citations


Proceedings ArticleDOI
03 Aug 2003
TL;DR: A robust scheme to segment unconstrained handwritten Banglatexts into lines, words and characters based on water reservoir principle is proposed to take care of variability involved in the writing style of different individuals.
Abstract: To take care of variability involved in the writing style ofdifferent individuals in this paper we propose a robustscheme to segment unconstrained handwritten Banglatexts into lines, words and characters. For linesegmentation, at first, we divide the text into verticalstripes. Stripe width of a document is computed bystatistical analysis of the text height in the document.Next we determine horizontal histogram of these stripesand the relationship of the minimal values of thehistograms is used to segment text lines. Based onvertical projection profile lines are segmented intowords. Segmentation of characters from handwrittenword is very tricky as the characters are seldomvertically separable. We use a concept based on waterreservoir principle for the purpose. Here we, at first,identify isolated and connected (touching) characters ina word. Next touching characters of the word aresegmented based on the reservoir base area points andstructural feature of the component.

177 citations


Proceedings ArticleDOI
03 Aug 2003
TL;DR: An offline recognition system for Arabic handwrittenwords is presented and achieves maximal recognitionrates of about 89% on a word level using the new IFN/ENIT - database of handwritten Arabicwords.
Abstract: An offline recognition system for Arabic handwrittenwords is presented. The recognition system is based ona semi-continuous 1-dimensional HMM. From each binaryword image normalization parameters were estimated. Firstheight, length, and baseline skew are normalized, then featuresare collected using a sliding window approach. Thispaper presents these methods in more detail. Some parameterswere modified and the consequent effect on the recognitionresults are discussed. Significant tests were performedusing the new IFN/ENIT - database of handwritten Arabicwords. The comprehensive database consists of 26459Arabic words (Tunisian town/village names) handwrittenby 411 different writers and is free for non-commercial research.In the performed tests we achieved maximal recognitionrates of about 89% on a word level.

167 citations


Proceedings ArticleDOI
03 Aug 2003
TL;DR: A methodology for feature selection in unsupervisedlearning makes use of a multi-objectivegenetic algorithm where the minimization of thenumber of features and a validity index that measures the quality of clusters have been used to guide the search toward more discriminant features and the best number of clusters.
Abstract: In this paper a methodology for feature selection in unsupervisedlearning is proposed. It makes use of a multi-objectivegenetic algorithm where the minimization of thenumber of features and a validity index that measures thequality of clusters have been used to guide the search towardsthe more discriminant features and the best numberof clusters. The proposed strategy is evaluated usingtwo synthetic data sets and then it is applied to handwrittenmonth word recognition. Comprehensive experimentsdemonstrate the feasibility and efficiency of the proposedmethodology.

118 citations


Proceedings ArticleDOI
03 Aug 2003
TL;DR: The purpose is to improve the performance of an HMM-based off-line cursive handwriting recognition system by providing it with additional synthetic training data by using a perturbation model for generating synthetic text lines from existing cursively handwritten lines of text produced by human writers.
Abstract: A perturbation model for generating synthetic text lines from existing cursively handwritten lines of text produced by human writers is presented. Our purpose is to improve the performance of an HMM-based off-line cursive handwriting recognition system by providing it with additional synthetic training data. Two kinds of perturbations are applied, geometrical transformations and thinning/thickening operations. The proposed perturbation model is evaluated under different experimental conditions.

96 citations


Patent
18 Sep 2003
TL;DR: In this paper, a 3D handwriting recognition method and corresponding system is presented, which allows the user to generate 3D motion data by tracking corresponding 3D motions, calculate 3D coordinates, construct corresponding tracks, derive 2D projection plane based on some strokes 3D tracks of on character, and generate 2D image for handwriting recognition by mapping the 3D track onto the said 2D projected plane.
Abstract: The present invention relates to three-dimensional (3D) handwriting recognition methods and systems. The present invention provides a 3D handwriting recognition method and corresponding system which allows to generate 3D motion data by tracking corresponding 3D motion, calculate corresponding 3D coordinates, construct corresponding 3D tracks, derive 2D projection plane based on some strokes 3D tracks of on character, and generate 2D image for handwriting recognition by mapping the 3D tracks onto the said 2D projection plane. The 3D handwriting recognition method according to the present invention can use the processing power of system more efficiently and highly improve the system performance. So that the system can get the final input result in a much shorter time after the user finishes writing a character without a long time waiting between two characters input, thus the user has more pleased and natural input experience.

89 citations


Proceedings ArticleDOI
03 Aug 2003
TL;DR: It is shown that in general alphabetic characters bear more individuality than numerals and use of a certain number of characters will significantly outperform the global features of handwriting samples in handwriting identification and verification models.
Abstract: Analysis of handwritten characters (allographs) plays animportant role in forensic document examination. However,so far there lacks a comprehensive and quantitative study onindividuality of handwritten characters. Based on a largenumber of handwritten characters extracted from handwritingsamples of 1000 individuals in US, the individuality ofhandwritten characters has been quantitatively measuredthrough identification and verification models. Our studyshows that in general alphabetic characters bear more individualitythan numerals and use of a certain number ofcharacters will significantly outperform the global featuresof handwriting samples in handwriting identification andverification. Moreover, the quantitative measurement ofdiscriminative powers of characters offers a general guidancefor selecting most-informative characters in examiningforensic documents.

80 citations


Journal ArticleDOI
TL;DR: This paper compares the current state of the art in online Japanese character recognition with techniques in western handwriting recognition to help develop compact modules for integrated systems supporting many writing systems capable of recognizing multilanguage documents.
Abstract: This paper compares the current state of the art in online Japanese character recognition with techniques in western handwriting recognition. It discusses important developments in preprocessing, classification, and postprocessing for Japanese character recognition in recent years and relates them to the developments in western handwriting recognition. Comparing eastern and western handwriting recognition techniques allows learning from very different approaches and understanding the underlying common foundations of handwriting recognition. This is very important when it comes to developing compact modules for integrated systems supporting many writing systems capable of recognizing multilanguage documents.

75 citations


Proceedings ArticleDOI
03 Aug 2003
TL;DR: Experimental results show that handwritten words are very effective in differentiating handwriting and carry more individuality than most handwrittencharacters (allographs).
Abstract: Analysis of allographs (characters) and allograph combinations(words) is the key for obtaining the discriminatingelements of handwriting. While allographs usually inhabitin words and segregation of a word into allographs ismore subjective than objective, especially for cursive writing,analysis of handwritten words is a natural and betteroption. In this study, a handwritten word image is characterizedby gradient, structural, and concavity features, andindividuality of handwritten words is experimented throughwritership identification and verification on over 12,000word images extracted from 3000 handwriting samples of1000 individuals in U.S.. Experimental results show thathandwritten words are very effective in differentiating handwritingand carry more individuality than most handwrittencharacters (allographs).

Proceedings ArticleDOI
08 Sep 2003
TL;DR: A number of features that can be extracted from handwritten digits and used for author verification or identification of a person's handwriting are presented and it is indicated that the combined features work well at discriminating writers.
Abstract: The objective of this paper is to present a number of features that can be extracted from handwritten digits and used for author verification or identification of a person's handwriting. The features under consideration are mainly computational features some of which cannot be easily evaluated by humans. On the other hand, these features can be extracted by computer algorithms with a high degree of accuracy. The eleven features used are described. All features were appropriately binarized so that binary feature vectors of constant lengths could be formed. These vectors were then used for author discrimination, using the Hamming distance measure. For this task a writer database consisting of 15 writers was created. Each writer was asked to write random strings of 0 to 9 at least 10 times. The results indicate that the combined features work well at discriminating writers and warrant further detailed investigation. Although the set of features was designed for dealing with handwritten digits (as may be written on cheques), it may also be used for isolated alphabetic characters.

Proceedings ArticleDOI
03 Aug 2003
TL;DR: A system for the recognition of on-line handwritten mathematical formulas which is used in the electronic chalkboard (E-chalk), a multimedia system for distance-teaching, is presented.
Abstract: In this article, we present a system for the recognition ofon-line handwritten mathematical formulas which is usedin the electronic chalkboard (E-chalk), a multimedia systemfor distance-teaching. We discuss the classification of symbolsand the construction of the tree of spatial relationshipsamong them. The classification is based on support vectormachines and the construction of formulas is based onbaseline structure analysis.

Proceedings ArticleDOI
03 Aug 2003
TL;DR: This paper examines some optimization strategies for an HMM classifier that works with continuous feature values and uses the Baum-Welch training algorithm, and introduces the free parameters of the optimization procedure, which are the number of states of a model, thenumber of training iterations, and theNumber of Gaussian mixtures for each state.
Abstract: In off-line handwriting recognition, classifiers based on hidden Markov models (HMMs) have become very popular. However, while there exist well-established training algorithms, such as the Baum-Welsh procedure, which optimize the transition and output probabilities of a given HMM architecture, the architecture itself, and in particular the number of states, must be chosen "by hand". Also the number of training iterations and the output distributions need to be defined by the system designer. In this paper we examine some optimization strategies for an HMM classifier that works with continuous feature values and uses the Baum-Welch training algorithm. The free parameters of the optimization procedure introduced in this paper are the number of states of a model, the number of training iterations, and the number of Gaussian mixtures for each state. The proposed optimization strategies are evaluated in the context of a handwritten word recognition task.

Proceedings ArticleDOI
10 Mar 2003
TL;DR: This paper proposes to preprocess the input document images so as to compensate for the variations due to writing style and thereby making them suitable for analysis on the basis of their visual appearances, and applies denoising, thinning, pruning, m-connectivity and text size normalization in sequence.
Abstract: Script-based text document classification is an important field of research in the context of multilingual textual document processing But, all script identification techniques available in the literature so far do not consider handwritten documents Variations in the writing style, character size, inter-line and inter-word spacings, etc make the recognition process difficult and unreliable when these script identification algorithms, more specifically visual appearance based approaches, are applied directly on hand-written documents Therefore, in this paper, we propose to preprocess the input document images so as to compensate for the variations due to writing style and thereby making them suitable for analysis on the basis of their visual appearances Accordingly, we apply denoising, thinning, pruning, m-connectivity and text size normalization in sequence Multi-channel Gabor filtering is used to extract texture features that characterize the visual appearances of the document images Experimental result proves the potentiality of our proposed method of script identification for hand-written text document classification

Proceedings ArticleDOI
03 Aug 2003
TL;DR: A new hybrid handwritten signature verification system where the on-line reference data acquired through a digitizing tablet serves as the basis for the segmentation process of the corresponding scanned off-line data.
Abstract: This paper proposes a new hybrid handwritten signature verification system where the on-line reference data acquired through a digitizing tablet serves as the basis for the segmentation process of the corresponding scanned off-line data. Local foci of attention over the image are determined through a self-adjustable learning process in order to pinpoint the feature extraction process. Both local and global primitives are processed and the decision about the authenticity of the specimen is defined through similarity measurements. The global performance of the system is measured using two different classifiers.

Patent
03 Dec 2003
TL;DR: In this paper, a user may select one or more words, which will cause the system to display the original ink corresponding to the selected word(s) and allow the user to select one of the alternatives to make corrections in the recognized text (akin to using a spell checking or handwriting recognition program).
Abstract: Systems, methods, and computer-readable media for processing electronic ink: (a) receive electronic ink input; (b) convert the input to machine-generated objects; and (c) render the objects such that their size substantially corresponds to the input's original size. The input ink may constitute text, and the machine-generated objects may correspond to words, lines, and/or other groupings of text generated by a handwriting recognizer. To enable quick and easy identification of recognizer errors, in at least some systems and methods, a user may select one or more words, which will cause the system to display the original ink corresponding to the selected word(s). Such systems also may display alternative words generated by the recognizer corresponding to the selected original ink and allow the user to select one of the alternatives to make corrections in the recognized text (akin to using a spell-checking or handwriting recognition program).

Patent
Jian Wang1, Jian-Lai Zhou1, Jiang Wu1, Hongyun Yang1, Xianfang Wang1, Wenli Zhu1 
15 Dec 2003
TL;DR: In this paper, a list of machine-generated objects based on the electronic ink input is generated and the list including the first machine generated object and alternative machine generated objects is used as a dictionary for converting the speech input.
Abstract: Systems, methods, and computer-readable media for processing electronic ink receive an electronic ink input; convert the electronic ink input to a first machine-generated object using handwriting recognition; display the first machine-generated object on a display; receive speech input; convert the speech input to a second machine-generated object using speech recognition; generate a list of machine-generated objects based on the electronic ink input, the list including the first machine-generated object and alternative machine-generated objects and functioning as a dictionary for converting the speech input; and replace the first machine-generated object with the second machine-generated object. The machine-generated objects may correspond to words, lines, and/or other groupings of machine-generated text. A user may confirm that the second machine-generated object should replace the first machine-generated object and the system will perform the replacement. The systems and methods may generate a list of alternative machine-generated object candidates to the first machine-generated object based on handwriting recognition of the electronic ink input alone or in combination with a statistical language model.

Journal ArticleDOI
TL;DR: An integrated system for document image preprocessing that is a combination of the projection profile technique and the Wigner–Ville distribution, and can be used as a preprocessing stage to any handwriting character recognition or segmentation system as well as to any writer identification system.
Abstract: In this paper we attempt to face common problems of handwritten documents such as nonparallel text lines in a page, hill and dale writing, slanted and connected characters. Towards this end an integrated system for document image preprocessing is presented. This system consists of the following modules: skew angle estimation and correction, line and word segmentation, slope and slant correction. The skew angle correction, slope correction and slant removing algorithms are based on a novel method that is a combination of the projection profile technique and the Wigner–Ville distribution. Furthermore, the skew angle correction algorithm can cope with pages whose text line skew angles vary, and handle them by areas. Our system can be used as a preprocessing stage to any handwriting character recognition or segmentation system as well as to any writer identification system. It was tested in a wide variety of handwritten document images of unconstrained English and Modern Greek text from about 100 writers. Add...

Proceedings ArticleDOI
03 Aug 2003
TL;DR: A novel contourcode feature in conjunction with a rule based segmentation for cursive handwriting recognition and the proposed rule-based module validates every segmentation points against closed area, average character size, left character anddensity.
Abstract: The purpose of this paper is to present a novel contourcode feature in conjunction with a rule basedsegmentation for cursive handwriting recognition. Aheuristic segmentation algorithm is initially used to oversegment each word. Then the prospective segmentationpoints are passed through the rule-based module todiscard the incorrect segmentation points and include anymissing segmentation points. The proposed rule-basedmodule validates every segmentation points againstclosed area, average character size, left character anddensity. During the left char validation, a contour codefeature is extracted and checked weather the left of theprospective segmentation point is a character or rubbish(non-char). The neural network used for this validationwas trained on character and non-character database.Following the segmentation, the contour between correctsegmentation points is passed through the featureextraction module that extracts the contour code, afterwhich another trained neural network is used forclassification. The recognized characters are groupedinto words and passed to a variable length lexicon thatretrieves words that has highest confidence value.

Journal ArticleDOI
TL;DR: It is demonstrated through experiments that ensemble methods have the potential of improving recognition accuracy also in the domain of handwriting recognition.
Abstract: Handwritten text recognition is one of the most difficult problems in the field of pattern recognition. Recently, a number of classifier creation and combination methods, known as ensemble methods, have been proposed in the field of machine learning. They have shown improved recognition performance over single classifiers. In this paper the application of some of those ensemble methods in the domain of offline cursive handwritten word recognition is described. The basic word recognizers are given by hidden Markov models (HMMs). It is demonstrated through experiments that ensemble methods have the potential of improving recognition accuracy also in the domain of handwriting recognition.

Journal ArticleDOI
TL;DR: A handwriting recognition system that deals with unconstrained handwriting and large vocabularies based on the segmentation-recognition paradigm where words are first loosely segmented into characters or pseudocharacters and the final segmentation is obtained during the recognition process, which is carried out with a lexicon.
Abstract: This paper presents a handwriting recognition system that deals with unconstrained handwriting and large vocabularies. The system is based on the segmentation-recognition paradigm where words are first loosely segmented into characters or pseudocharacters and the final segmentation is obtained during the recognition process, which is carried out with a lexicon. Characters are modeled by multiple hidden Markov models (HMMs), which are concatenated to build up word models. The lexicon is organized as a tree structure, and during the decoding words with similar prefixes share the same computation steps. To avoid an explosion of the search space due to the presence of multiple character models, a lexicon-driven level building algorithm (LDLBA) is used to decode the lexical tree and to choose at each level the more likely models. Bigram probabilities related to the variation of writing styles within the words are inserted between the levels of the LDLBA to improve the recognition accuracy. To further speed up the recognition process, some constraints are added to limit the search efforts to the more likely parts of the search space. Experimental results on a dataset of 4674 unconstrained words show that the proposed recognition system achieves recognition rates from 98% for a 10-word vocabulary to 71% for a 30,000-word vocabulary and recognition times from 9 ms to 18.4 s, respectively.

Proceedings ArticleDOI
03 Aug 2003
TL;DR: A new character generation method from on-line handwriting recognizers based on Bayesian networks, which generates more natural character shapes than various kinds of hidden Markov models.
Abstract: In this paper, we propose a new character generationmethod from on-line handwriting recognizers basedon Bayesian networks. On-line handwriting recognizersare trained with handwriting samples from many writers.Then, character shapes are generated from given texts bysearching the most probable input point sequences. SinceBayesian network based classifiers have large number ofparameters for modeling components and their relationships,they generate more natural character shapes thanvarious kinds of hidden Markov models.

Book ChapterDOI
04 Jun 2003
TL;DR: Under the most elaborate synthetic handwriting generation model, a level of performance comparable to, or even slightly better than, the system trained on the writing of humans was observed.
Abstract: Three different methods for the synthetic generation of handwritten text are introduced. These methods are experimentally evaluated in the context of a cursive handwriting recognition task, using an HMM-based recognizer. In the experiments, the performance of a traditional recognizer, which is trained on data produced by human writers, is compared to a system that is trained on synthetic data only. Under the most elaborate synthetic handwriting generation model, a level of performance comparable to, or even slightly better than, the system trained on the writing of humans was observed.


Proceedings ArticleDOI
03 Aug 2003
TL;DR: A prototype system for automatic video-based whiteboard reading is presented and is designed for recognizing unconstrained handwritten text and is further characterized by an incremental processing strategy in order to facilitate recognizing portions of text as soon as they have been written on the board.
Abstract: As whiteboards have become a popular tool in meeting rooms, there has been a growing interest in making use of the whiteboard as a user interface for human computer interaction. Therefore, systems based on electronic whiteboards have been developed in order to serve as meeting assistants for e.g. collaborative working. However, as special pens and erasers are required, the natural interaction is restricted. In order to render this communication method more natural it was proposed to retain ordinary whiteboard and pens and to visually observe the writing process using a video camera by Stafford-Fraser and Robinson (1996). In this paper a prototype system for automatic video-based whiteboard reading is presented. The system is designed for recognizing unconstrained handwritten text and is further characterized by an incremental processing strategy in order to facilitate recognizing portions of text as soon as they have been written on the board. We present the methods employed for extracting text regions, pre-processing, feature extraction, and statistical modeling and recognition. Evaluation results on a writer independent unconstrained handwriting recognition task demonstrate the feasibility of the proposed approach.

Proceedings ArticleDOI
03 Aug 2003
TL;DR: This work describes an online handwriting recognition system working in combination with an offline recognizer, which results in a classifier out-performing both individual systems.
Abstract: This work describes an online handwriting recognition system working in combination with an offline recognizer. The online input data is first transcribed by theonline recognizer, then converted into an offline bitmapand recognized by the offline system. The outputs ofthe two recognizers are then combined probabilisticallyresulting in a classifier out-performing both individualsystems. Experiments were performed over a databaseof single digits. The error rate of the online recognizeris reduced by 43% when the combination with the offlinesystem is applied.

Proceedings ArticleDOI
03 Aug 2003
TL;DR: This paper presents a system for the offline recognition of cursive handwritten lines of text based on continuous density HMMs and Statistical Language Models, which shows a recognition rate of ~85% with a lexicon containing 50'000 words.
Abstract: This paper presents a system for the offline recognitionof cursive handwritten lines of text. The system is based oncontinuous density HMMs and Statistical Language Models.The system recognizes data produced by a single writer.No a-priori knowledge is used about the content of the textto be recognized. Changes in the experimental setup withrespect to the recognition of single words are highlighted.The results show a recognition rate of ~85% with a lexiconcontaining 50'000 words. The experiments were performedover a publicly available database.

Proceedings ArticleDOI
03 Aug 2003
TL;DR: A novel handwriting recognition interface for wearable computing where users write characters continuously without pauses on a small single writingbox and substroke based hidden Markov models and a stochastic bigram languagemodel are employed.
Abstract: This paper proposes a novel handwriting recognition interfacefor wearable computing where users write characterscontinuously without pauses on a small single writingbox. Since characters are written on the same writingarea, they are overlaid with each other. Therefore thetask is regarded as a special case of the continuous characterrecognition problem. In contrast to the conventionalcontinuous character recognition problem, location informationof strokes does not help very much in the proposedframework. To tackle the problem, substroke based hiddenMarkov models (HMMs) and a stochastic bigram languagemodel are employed. Preliminary experiments were carriedout on a dataset of 578 handwriting sequences with acharacter bigram consisting of 1,016 Japanese educationalKanji and 71 Hiragana characters. The proposed methoddemonstrated promising performance with 69.2% of hand-writingsequences beeing correctly recognized when differentstroke order was permitted, and the rate was improvedup to 88.0% when characters were written with fixed strokeorder.

Proceedings ArticleDOI
03 Aug 2003
TL;DR: An original two stages recognizer is presented, which is a model-based classifier that stores an exhaustive set of character models and a discriminative classifiers that separates the most ambiguous pairs of classes.
Abstract: Handwriting recognition is such a complex classification problem that it is quite usual now to make co-operate several classification methods at the pre-processing stage or at the classification stage. In this paper, we present an original two stages recognizer. The first stage is a model-based classifier that stores an exhaustive set of character models. The second stage is a discriminative classifier that separates the most ambiguous pairs of classes. This hybrid architecture is based on the idea that the correct class almost systematically belongs to the two more relevant classes found by the first classifier. Experiments on the Unipen database show a 30% improvement on a 62 class recognition problem.