scispace - formally typeset
Open AccessJournal ArticleDOI

Handwritten Character Recognition in English: A Survey

Reads0
Chats0
TLDR
This paper presents a comprehensive review of Handwritten Character Recognition (HCR) in English language.
Abstract
This paper presents a comprehensive review of Handwritten Character Recognition (HCR) in English language.The handwritten character recognition has been applied in variety of applications like Banking sectors, Health care industries and many such organizations where handwritten documents are dealt with. Handwritten Character Recognition is the process of conversion of handwritten text into machine readable form. For handwritten characters there are difficulties like it differs from one writer to another, even when same person writes same character there is difference in shape, size and position of character. Latest research in this area has used different types of method, classifiers and features to reduce the complexity of recognizing handwritten text.

read more

Content maybe subject to copyright    Report

ISSN (Online) 2278-1021
ISSN (Print) 2319-5940
International Journal of Advanced Research in Computer and Communication Engineering
Vol. 4, Issue 2, February 2015
Copyright to IJARCCE DOI 10.17148/IJARCCE.2015.4278 345
Handwritten Character Recognition in English: A Survey
Monica Patel
1
, Shital P. Thakkar
2
Department of Electronics and Communication, Dharmsinh Desai University, Nadiad, Gujarat, India
1
Associate Professor, Dept of Electronics and Communication, Dharmsinh Desai University, Nadiad, Gujarat, India
2
Abstract: This paper presents a comprehensive review of Handwritten Character Recognition (HCR) in English
language.The handwritten character recognition has been applied in variety of applications like Banking sectors, Health
care industries and many such organizations where handwritten documents are dealt with. Handwritten Character
Recognition is the process of conversion of handwritten text into machine readable form. For handwritten characters
there are difficulties like it differs from one writer to another, even when same person writes same character there is
difference in shape, size and position of character. Latest research in this area has used different types of method,
classifiers and features to reduce the complexity of recognizing handwritten text.
Keywords: Handwritten database, features extraction, classifiers, HCR system.
I. INTRODUCTION
Handwritten character recognition (HCR) is the process of
conversion of handwritten text into machine readable
form. The major problem in handwritten character
recognition (HCR) system is the variation of the
handwriting styles, which can be completely different for
different writers. The objective of handwritten character
recognition system is to implement user friendly computer
assisted character representation that will allow successful
extraction of characters from handwritten documents and
to digitalize and translate the handwritten text into
machine readable text.
Handwritten character Recognition system is divided into
two categories
On-line character recognition.
It is system in which recognition is performed when
characters are under creation.
Off-line character recognition.
It is system in which first handwritten documents are
generated, scanned, stored in computer and than they are
recognized.
Handwritten Character Recognition System consists of
following stages:
1) Pre-processing.
2) Segmentation.
3) Feature extraction.
4) Training and recognition.
5) Post processing.
There are four methods of cursive handwritten word
recognition.
Holistic Approach
It is method in which entire word is recognized without
splitting them by extracting features of entire word.
Segmentation based Approach
Characters are segmented from word.
Recognization based segmentation Approach
Character classification and segmentation are performed
simultaneously by using appropriate learning method
Mixed Approach
This system consist of combination of above methods
In this paper we present concise survey of available HCR
for English language. HCR techniques are discussed with
their strength and weaknesses. Different types of features
are extracted and different types of classifiers are used to
classify the input characters. The current study is focused
on investigation of possible techniques to develop an
offline HCR system for English language for both separate
characters and cursive words.
II. MOTIVATION
Most organizations use documents to acquire information
from customers. These documents are generally
handwritten. Such documents can be forms, checks, etc.
For their easier retrieval or information collection
documents are transformed and stored in digital formats.
Common practice to handle that information is manually
filling same data into computer. It would be tiresome and
time consuming to handle such documents manually.
Hence the requirement of a special Handwritten Character
Recognition Software arises which will automatically
recognize texts from image of documents. The process of
extracting data from the handwritten documents and
storing it in electronic formats has made easy by
Handwritten Character Recognition (HCR) Software.
Banking sectors, Health care industries and many such
organizations where handwritten documents are used
regularly. HCR systems also find applications in newly
emerging areas where handwriting data entry is required,
such as development of electronic libraries, multimedia
database etc.
III. STRUCTURE OF HCR SYSTEM
The block diagram of the handwritten character
recognition system is shown in figure 1.

ISSN (Online) 2278-1021
ISSN (Print) 2319-5940
International Journal of Advanced Research in Computer and Communication Engineering
Vol. 4, Issue 2, February 2015
Copyright to IJARCCE DOI 10.17148/IJARCCE.2015.4278 346
Fig 1: Block diagram of HCR System
The collected databases are divided into two parts Training
data and testing data. Training data are used to train the
system and this trained system are than used to recognize
test data.
A. Pre-processing
The pre-processing is a sequence of operations performed
on the scanned input image. It essentially enhances the
image making it suitable for further processing. The
various tasks performed on the image in pre-processing
stage are noise removal, binarization, skew correction etc
Noise removal
It is a process of removing noise from scanned image by
using appropriate filter for example smoothing linear
filter, order statistic filter etc. Smoothing is used for
blurring and reducing noise, and removal of small details
from the image extracting large objects.
Binarization
It converts a gray scale image into a binary image using
global thresholding technique like otsu’s method of
thresholding. Otsu’s provide optimum value of threshold.
Skew correction
It is removal of skew in scanned document for its proper
further segmentation. It is not necessary that handwritten
documents are perfectly horizontally aligned thus skew
correction methods are required to be performed.
For example projection profile analysis, Hough
transforms, nearest neighbour clustering, cross-correlation,
piece-wise covering by parallelogram etc
B Segmentation
In the segmentation stage, an image is decomposed into
sub-images of individual character. Segmentation
includes:
line segmentation which is separation of line
from paragraph,
Word segmentation which is separation of word
from line.
Character segmentation which is separation of
character from words.
Character segmentation is performed if segmentation
based method is adopted for cursive word recognition, for
holistic method character segmentation is not performed.
C Feature Extraction
In this stage, the features of the characters that are
essential for classifying them at recognition stage are
extracted. This is an important stage as its successful
operation improves the recognition rate and reduces the
misclassification. Features like binary features, directional
features etc are extracted and feature vector is created.
Feature extraction methods falls among these categories.
statistical features
It is based on the probability theory and hypothesis.
Statistical distribution of pixels of an image takes care of
variations in writing styles. Statistical features are derived
from the statistical distribution of points. For example
Projections histogram, crossings, distances, zoning etc.
Figure 2: Statistical Features
structural features
Structural features give information about structure of the
image. Structural features describe the geometrical and
topological properties of character, like crossing points,
Branches, loops, stroke length, stroke width, up, down, left
and right projection profiles etc.
Figure 3: Structural features

ISSN (Online) 2278-1021
ISSN (Print) 2319-5940
International Journal of Advanced Research in Computer and Communication Engineering
Vol. 4, Issue 2, February 2015
Copyright to IJARCCE DOI 10.17148/IJARCCE.2015.4278 347
Global transformation feature
Global transformation based features give well
representation of image shape. It is spatial domain to
frequency domain translation of image. This features
stores information contained in whole image in few
coefficients, thus it performs energy compactness. Various
types of global transformation based features are: Discrete
Fourier Transform, Discrete Cosine Transform, Discrete
Wavelet Transform etc.
D-Classification
The classification stage is the decision making part of a
recognition system and it uses the features extracted in the
previous stage. The feature vector is denoted as X where
X = (f1, f2,....., fd) where f denotes features and d is the
no. features extracted from character. Based on the
comparison of feature vector characters are efficiently
classified into appropriate class and recognized.
Classifiers are based on two types of learning methods.
Supervised learning
In supervised learning training data with correct detail of
class is applied to train a model. This model is used to test
data for proper classification. Training data includes both
the input and the desired results. The model undergoes
learning process and based on this learning it classifies test
data.
For example: SVM, HMM etc.
Unsupervised learning
In unsupervised learning model is not provided with
training data. It does not require learning. The model
classifies test data based on statistical properties and by
their spatial grouping and considering their nearest
neighbour.
For example: Clustering, k means etc
E-Post-Processing
In this stage accuracy of recognition is further increased
by connecting dictionary to the system in order to perform
Syntax analysis, semantic analysis kind of higher level
concepts, which is applied to check the recognized
character. This stage is not compulsory in HCR system.
IV. RELATED WORK
Recognition accuracy of the image depends on the
sensitivity of the selected features and type of classifier
used. Hence, number of feature extraction and
classification methods can be found in the literature.
Following paper perform handwritten character
recognition of cursive words.
Radmilo M. Bozinovic and Sargur N. Srihari (1989)[1]
This paper use Holistic method for cursive word
recognization. The approach used here is to represent word
through various stages of transformation like points,
contours, features, letter and word. A unique feature vector
is generated from the image using statistical dependences
between letter and feature; partially computed words are
recognized by comparing with lexicon. Lexicon includes
130 words, thus limited no of words are recognized.
Classifiers are not used for recognization of words, rating
is given to each segment which are separated by pre-
segmentation using letter hypothesis and they are
recognized based on maximum value of rating
H. Bunke, M. Roth and E. G. Schukat-Talamazzini (1995)
[2]. This paper use holistic method for cursive word
recognition. They extract features from the skeleton of
word. The feature vector is generated from the edge
information of words which includes location of edge
relative to four reference line, its curvature, degree of
nodes incident to the edge etc. 10-dimensional feature
vector is generated.HMM for each letter of alphabets is
built and by concatenation of this HMMs, HMM for each
dictionary word is built. Limited sized dictionary is used.
HMM is trained using Baum-Welch algorithm and
recognization is performed using Viterbi algorithm.
Nafiz Arica (1998) [3]. The author had performed
recognization both cursive and isolated handwritten
characters using HMM. Hybrid method is used to
maximize the superiority of HMM. For recognization of
characters features used are medians of black run in each
scan line. Character image is scanned in four different
directions for extracting feature. Medians in each
direction represent a sparse directional skeleton of the
character. The discrete density left to right HMM is used
for recognization. For cursive he used recognization based
segmentation approach. Features are fed to the higher
order HMM and finally segmentation path are confirmed.
Correct segmentation points are found using graph search
method in which shortest path with minimum cost. The
probability of observation sequence of HMM are used for
recognization.
Yong Haw Tay, Pierre-Michel Lallican, Marzuki Khalid,
Christian Viard-Gaudin, Stefan Knerr (2001) [4]. This
paper recognizes handwritten cursive words using
recognization based segmentation method. This paper
gives the comparison between two methods. The first
recognization system uses combination of Neural Network
and HMM (Hidden Markov Model) for recognization. In
second method discrete HMM is used. It first method Pre-
segmentation of word is performed using segmentation
graph. Neural network calculates the probability for each
letter hypothesis in graph and then HMM computes
likelihood for each word in lexicon by adding the
probability along each possible path in graph. In second
method 140 geometric features are extracted from each
segment which is separated by pre-segmentation. This
features by vector quantization (VQ) converted to single
symbol and finally by calculating the likelihood for each
word in lexicon word is recognized.
Anshul Gupta, Manisha Srivastava and Chitralekha
Mahanta. (2011) [5] .In this paper author used
segmentation based approach for cursive word
recognization. In this method cursive words are first
segmented into individual characters, which are than

ISSN (Online) 2278-1021
ISSN (Print) 2319-5940
International Journal of Advanced Research in Computer and Communication Engineering
Vol. 4, Issue 2, February 2015
Copyright to IJARCCE DOI 10.17148/IJARCCE.2015.4278 348
recognized and merged to produce meaningful word by
comparing with dictionary. The dictionary used in this
paper consists of 26 words. Thus scope of this paper is
limited to 26 words.
Following paper perform handwritten character
recognition of separate characters.
J. Pradeep, E. Srinivasan and S. Himavathi (2012) [6] has
designed Neural Network based recognition system. They
used different neural network (NN) topologies- back
propagation neural network, nearest neighbour network
and radial basis function network for same training
dataset. They compared the performance of each network
and optimized the number of neurons in hidden layer
which is not dependent on initial value and concluded that
combination of standard feature extraction technique with
feed forward back propagation
D. K. Patel, T. Som and M. K Singh (2012) [7] deals with
the handwritten English character recognition using multi-
resolution technique with Discrete Wavelet Transform
(DWT) and Euclidean Distance Metric (EDM). Distances
from unknown input pattern vector to all the mean vectors
are calculated by EDM. Minimum distance determines the
class membership of input pattern vector. EDM gives a
recognition accuracy of 90.77%. In case of
misclassification, the learning rule through ANN improves
the recognition accuracy to 95.38% by comparing scores
and then product of generated recognition scores with
Euclidean distances has further improves the recognition
accuracy to 98.46%.
M. Blumenstein, B. Verma and H. Basli (2003) [8]. This
research describes neural network-based techniques for
segmented character recognition. Two neural architectures
along with two different feature extraction techniques
were investigated. Directional and Transition features are
used and compared by using Back-Propagation (BP) and
Radial Basis Function (RBF) networks classifiers. The
size of feature vector is 100 in case of transition feature
and 81 for directional feature. Experiment was performed
by using the CAS dataset, the BP (Back propagation) and
RBF (Radial basis function) algorithm using two feature
extraction techniques for both lower case and upper case
characters, similarly for BAC database. Directional
features using neural network perform better than
transition features.
Sumedha B. Hallale, Geeta D. Salunke (2013) [9]. In this
paper comparison between conventional and directional
feature extraction method is done. Twelve directional
features are used for recognition of alphabets and
numerals. In order to extract directional feature gradient
feature of each pixel are extracted the gradient values are
mapped onto 12 direction values to the angle span of 30
degree between any two adjacent direction values. Feature
vector of each class is obtained by taking mean of feature
matrix of each class. The similarity between testing feature
vector and feature vector of all the classes is calculated,
testing image belongs to the class which has the highest
similarity.
Amit Choudhary, Rahul Rishi and Savita Ahlawat (2013)
[10]. In this paper handwritten character recognition of
lowercase English alphabets is performed by using
binarized pixels of the image as features and multilayer
back-propagation neural network as classifier. The
character image is binarized, filtered and resized to
15X12, thus feature vector of size 180 is created of each
character which is given to neural network for its training.
MSE (mean square error) is used as cost function. The use
of binarization features with back-propagation neural
network classifier gives classification accuracy of 85.62%.
It has simplicity of features as direct pixel values are
taken.
Rafael M. O. Cruz, George D. C. Cavalcanti and Tsang
Ing Ren (2010) [11].In this paper recognization separate
handwritten cursive characters is performed. Here
different features are extracted among them two features-
modified edge map and multiple zoning are proposed by
authors. Total nine features are extracted and drawback of
each feature each overcome by other. Each features are
individually given as input to nine multilayer perceptron
network and output of all this classifier are combined with
each other by different rule like sum rule, product rule,
max rule, mean rule etc among them trained MLP
combiner gives maximum result. Among proposed
features modified edge map feature gives highest result.
Table 1 has concise details of all papers that have be read
by us.
TABLE 1
Literature Survey
Author &
Year
Classifie
r
Features
Accuracy
R.
M.Bozinovic,
S. N. Srihari.
(1989)
Word
formation
using
letter
hypothesi
s
Contour
tracing, event
construction,
letter
hypothesis
and word
hypothesis
77%
For dictionary of
130 words
66-training
64-different test set
For cursive
H. Bunke ,
M. Roth ,
E. G.
Schukat-
Talamazzini.
(1995)
HMM
Location,
curvature of
edge and
percentage of
pixels lying
on the edge
98%
For dictionary of
150 words
2250-training
750-testing
On ruled paper
For cursive
Nafiz Arica
(1998)
HMM
medians of
black run in
vertical
,horizontal
and diagonal
scan lines for
getting
directional
skeleton
65% 240
cursive words for
names on the bank
checks, 19 distinct
characters
segmented
manually, 20
samples for each
class used for
HMM training,
300- testing
For cursive
Y. H. Tay,
P.M. Lallican,
M. Khalid,
Christian
HMM+N
N
140
geometrical
features of
each pre
96.1%
196 word lexicon
24177-training
12219-testing

ISSN (Online) 2278-1021
ISSN (Print) 2319-5940
International Journal of Advanced Research in Computer and Communication Engineering
Vol. 4, Issue 2, February 2015
Copyright to IJARCCE DOI 10.17148/IJARCCE.2015.4278 349
Gaudin,
Stefan Knerr.
(2001)
segmented
frames
For cursive
A. Gupta, M.
Srivastava ,
C. Mahanta.
(2011)
NN+
SVM
Fourier
Descriptors
62.93% on test data
26 word lexicon
260 words each 10
times
26-testing
For cursive
Author &
Year
Classifie
r
Features
Accuracy
J. Pradeep, E.
Srinivasan
S. Himavathi
(2012)
NN
character
resized into
30X20 pixels
taken as
feature
94.15%
200 database of
each 26 characters
Capital characters
D. K. Patel,
T. Som,
M. K Singh
(2012)
ANN
Discrete
Wavelet
Transform
(DWT)
98.46%
100 samples of each
character for
training, 50 samples
for testing
Capital characters
M.
Blumenstein,
B. Verma, H.
Basli (2003)
BP and
RBF
networks
Directional
and Transition
features
85.48% for
uppercase and
70.63% for
lowercase, CAS and
BAC database
Small characters
S. B. Hallale,
G. D. Salunke
(2013)
direction
al Pattern
matching
Twelve
directional
features
88.29%
500 training images
and 200 testing
images
Capital characters
and numerals
A.
Choudhary,
R. Rishi,
S. Ahlawat
(2013)
NN
Character
image resized
to 15X12 size,
feature vector
of size 180 is
created
85.62 %.
Database from 10
peoples 5 samples
from each thus total
10X5X26=1300
character image
small characters
Rafael M. O.
Cruz, George
D. C.
Cavalcanti
and Tsang Ing
Ren (2010).
MLP
Modified
Edge Maps
and Multi
zoning.
91.39% for
uppercase
88.45% for
lowercase
C-Cube Database
Separate Cursive
Characters
V. CONCLUSION
Immense work and research has been done in the
handwritten separate character recognition. But so far
100% accuracy is not achieved which gives scope of
further work in this direction. Separate characters give
good accuracy but word recognition is affected by
different writing style. Holistic method eliminates the
complicate segmentation but they use limited vocabulary.
Segmentation based method due to its complexity acquire
less accuracy. Good accuracy is observed in the classifier
where scope of words is limited to fix numbers as it has to
deal with limited number of variation.
REFERENCES
[1] Radmilo M.Bozinovic and Sargur N. Srihari, “Off-line Cursive
Script word recognition, IEEE Transactions On Pattern Analysis
and Machine Intelligence, Vol.11. No. 1, January 1989.
[2] H. Bunke, M. Roth and E.G.Schukat-Talamazzini, Offline
Cursive Handwriting Recognition Using Hidden Markov
Models”, Pattern Recognition, Vol. 28, No. 9, 1995 Elsevier
Science Ltd.
[3] Nafiz Arica, An Off-line Character Recognition System for
free style Handwriting”, a thesis submitted to the graduate school of
natural and applied sciences of the middle east technical
university,1998.
[4] Yong Haw Tay, Pierre-Michel Lallican, Marzuki Khalid,
Christian Viard-Gaudin, Stefan Knerr,” An Offline Cursive
Handwritten Word Recognition System”, IEEE Catalogue No.
01 CH37239 2001.
[5] Anshul Gupta, Manisha Srivastava and Chitralekha Mahanta,”
Offline Handwritten Character Recognition International
Conference on Computer Applications and Industrial
Electronics (ICCAIE), 2011.
[6] J. Pradeep, E. Srinivasan and S. Himavathi,”Neural Network Based
Recognition System Integrating Feature Extraction and
Classification for English Handwritten”, International Journal of
Engineering (IJE) Transactions B: Applications Vol. 25, No. 2,
(May 2012) 99-106.
[7] D. K. Patel, T. Som and M. K Singh,” Improving the Recognition
of Handwritten Characters using Neural Network through
Multiresolution Technique And Euclidean Distance Metric”,
International Journal of Computer Applications (0975 8887)
Volume 45 No.6 May 2012.
[8] M. Blumenstein, B. Verma and H. Basli, A Novel Feature
Extraction Technique for the Recognition of Segmented
Handwritten Characters”, Proceedings of the Seventh International
Conference on Document Analysis and Recognition (ICDAR’03)
0-7695-1960-1/03 $17.00 © 2003 IEEE
[9] Sumedha B. Hallale, Geeta D. Salunke, Twelve Directional
Feature Extraction for Handwritten English Character
Recognition”, International Journal of Recent Technology and
Engineering (IJRTE)ISSN:2277-3878, Volume-2, Issue-2, May 2013.
[10] Amit Choudhary, Rahul Rishi and Savita Ahlawat, Off-Line
Handwritten Character Recognition using Features Extracted from
Binarization Technique”, 2212-6716 © 2013 American Applied
Science Research Institute doi:10.1016/j.aasri.2013.10.045
[11] Rafael M. O. Cruz, George D. C. Cavalcanti and Tsang Ing Ren,”
An Ensemble classifier for offline Cursive character recognition
using multiple features Extraction technique”, 978-1-4244-8126-
2/10/$26.00 ©2010 IEEE
BIOGRAPHIES
Monica Patel received bachelor of
Engineering degree in Electronics and
Communication Engineering in 2013 from
Gujarat Technological University,
Ahmedabad, Gujarat, India. She is pursuing
Master of Technology in Electronics and
communication Systems from Dharmsinh
Desai University, Nadiad, Gujarat, India.
Her area of interest is, Image processing.
Shital Thakkar obtained her Master’s
degree in Electronics & Communication
Systems Engineering from Dharmsinh
Desai University, Nadiad, India. She has
experience of one year as an Apprentice at
Space Application Centre (ISRO),
Ahmedabad. She joined as a Lecturer in
Electronics & Communication Department,
Dharmsinh Desai University, Nadiad, Gujarat, India, where she
is currently working as an Associate Professor since 2007. Her
research areas include Image Processing and Signal Processing.

Citations
More filters
Journal ArticleDOI

Gender and Handedness Prediction from Offline Handwriting Using Convolutional Neural Networks

TL;DR: This work describes an experimental study on the suitability of deep neural networks to three automatic demographic problems: gender, handedness, and combined gender-and-handedness classifications, respectively, carried out on two public handwriting databases.
Journal ArticleDOI

Fuzzy-based multi-kernel spherical support vector machine for effective handwritten character recognition

TL;DR: A fuzzy-based multi-kernel spherical support vector machine that attains 99% higher accuracy, which ensures efficient recognition performance and is implemented in MATLAB.
Journal ArticleDOI

Two Decades of Bengali Handwritten Digit Recognition: A Survey

TL;DR: The characteristics and inherent ambiguities of Bengali handwritten digits along with a comprehensive insight of two decades of state-of-the-art datasets and approaches towards offline BHDR have been analyzed and several real-life application-specific studies, which involve BH DR, have been discussed in detail.
Journal ArticleDOI

Two Decades of Bengali Handwritten Digit Recognition: A Survey

- 01 Jan 2022 - 
TL;DR: In this article , the characteristics and inherent ambiguities of Bengali handwritten digits along with a comprehensive insight of two decades of the state-of-the-art datasets and approaches towards offline BHDR have been analyzed.
BookDOI

Biomedical Applications Based on Natural and Artificial Computing

TL;DR: This paper proposes the use of two threshold-based strategies for vessel detection, a fixed and an adaptive approach, that have been tested and validated with 128 OCT images, that include 560 vessels that were labelled by an ophthalmologist.
References
More filters
Journal ArticleDOI

Off-line cursive script word recognition

TL;DR: In this paper, a word image is transformed through a hierarchy of representation levels: points, contours, features, letters, and words, and a unique feature representation is generated bottom-up from the image using statistical dependences between letters and features.
Journal ArticleDOI

Off-line cursive handwriting recognition using hidden Markov models

TL;DR: A method for the off-line recognition of cursive handwriting based on hidden Markov models (HMMs) is described, which has an average correct recognition rate of over 98% on the word level and in experiments with cooperative writers using two dictionaries of I50 words each.
Proceedings ArticleDOI

A novel feature extraction technique for the recognition of segmented handwritten characters

TL;DR: This research describes neural network-based techniques for segmented character recognition that may be applied to the segmentation and recognition components of an off-line handwritten word recognition system.
Journal ArticleDOI

Off-line Handwritten Character Recognition Using Features Extracted from Binarization Technique☆

TL;DR: Very promising results are achieved when binarization features and the multilayer feed forward neural network classifier is used to recognize the off-line cursive handwritten characters.
Journal ArticleDOI

Neural Network Based Recognition System Integrating Feature Extraction and Classification for English Handwritten

TL;DR: This paper identifies the most suitable NN for the design of hand written English character recognition system using back propagation neural network, nearest neighbour network and radial basis function network to classify the characters.