scispace - formally typeset
Search or ask a question

Showing papers on "Intelligent word recognition published in 2015"


Proceedings ArticleDOI
01 Dec 2015
TL;DR: This paper introduces a new public image dataset for Devanagari script, and proposes a deep learning architecture for recognition of those characters, with highest test accuracy of 98.47% on the dataset.
Abstract: In this paper, we introduce a new public image dataset for Devanagari script: Devanagari Handwritten Character Dataset (DHCD). Our dataset consists of 92 thousand images of 46 different classes of characters of Devanagari script segmented from handwritten documents. We also explore the challenges in recognition of Devanagari characters. Along with the dataset, we also propose a deep learning architecture for recognition of those characters. Deep Convolutional Neural Network (CNN) have shown superior results to traditional shallow networks in many recognition tasks. Keeping distance with the regular approach of character recognition by Deep CNN, we focus the use of Dropout and dataset increment approach to improve test accuracy. By implementing these techniques in Deep CNN, we were able to increase test accuracy by nearly 1 percent. The proposed architecture scored highest test accuracy of 98.47% on our dataset.

153 citations


Proceedings ArticleDOI
Li Chen1, Song Wang1, Wei Fan1, Jun Sun1, Satoshi Naoi1 
01 Nov 2015
TL;DR: In the experiments, the proposed CNN-based handwritten character recognition framework performed even better than human on handwritten digit (MNIST) and Chinese character (CASIA) recognition.
Abstract: Because of the various appearance (different writers, writing styles, noise, etc.), the handwritten character recognition is one of the most challenging task in pattern recognition. Through decades of research, the traditional method has reached its limit while the emergence of deep learning provides a new way to break this limit. In this paper, a CNN-based handwritten character recognition framework is proposed. In this framework, proper sample generation, training scheme and CNN network structure are employed according to the properties of handwritten characters. In the experiments, the proposed framework performed even better than human on handwritten digit (MNIST) and Chinese character (CASIA) recognition. The advantage of this framework is proved by these experimental results.

117 citations


Proceedings ArticleDOI
23 Aug 2015
TL;DR: A convolutional neural network trained for a larger class recognition problem towards feature extraction of samples of several smaller class recognition problems of English, Devanagari, Bangla, Telugu and Oriya each of which is an official Indian script.
Abstract: There are many scripts in the world, several of which are used by hundreds of millions of people. Handwritten character recognition studies of several of these scripts are found in the literature. Different hand-crafted feature sets have been used in these recognition studies. However, convolutional neural network (CNN) has recently been used as an efficient unsupervised feature vector extractor. Although such a network can be used as a unified framework for both feature extraction and classification, it is more efficient as a feature extractor than as a classifier. In the present study, we performed certain amount of training of a 5-layer CNN for a moderately large class character recognition problem. We used this CNN trained for a larger class recognition problem towards feature extraction of samples of several smaller class recognition problems. In each case, a distinct Support Vector Machine (SVM) was used as the corresponding classifier. In particular, the CNN of the present study is trained using samples of a standard 50-class Bangla basic character database and features have been extracted for 5 different 10-class numeral recognition problems of English, Devanagari, Bangla, Telugu and Oriya each of which is an official Indian script. Recognition accuracies are comparable with the state-of-the-art.

117 citations


Journal ArticleDOI
TL;DR: The proposed method normalizes the written character images and then employ CNN to classify individual characters, which is shown satisfactory recognition accuracy and outperformed some other prominent exiting methods.
Abstract: Handwritten character recognition complexity varies among different languages due to distinct shapes, strokes and number of characters. Numerous works in handwritten character recognition are available for English with respect to other major languages such as Bangla. Existing methods use distinct feature extraction techniques and various classification tools in their recognition schemes. Recently, Convolutional Neural Network (CNN) is found efficient for English handwritten character recognition. In this paper, a CNN based Bangla handwritten character recognition is investigated. The proposed method normalizes the written character images and then employ CNN to classify individual characters. It does not employ any feature extraction method like other related works. 20000 handwritten characters with different shapes and variations are used in this study. The proposed method is shown satisfactory recognition accuracy and outperformed some other prominent exiting methods.

105 citations


Proceedings ArticleDOI
21 May 2015
TL;DR: The proposed OCR system was evaluated on the off-line handwritten Bangla numeral database CMATERdb 3.1, and achieved an excellent accuracy of 96:7% character recognition rate.
Abstract: Local Binary Pattern (LBP) is a simple yet robust texture descriptor that has been widely used in many computer vision applications including face recognition. In this paper, we exploit LBP for handwritten Bangla numeral recognition. We classify Bangla digits from their LBP histograms using K Nearest Neighbors (KNN) classifier. The performance of three different variations of LBP - the basic LBP, the uniform LBP and the simplified LBP was investigated. The proposed OCR system was evaluated on the off-line handwritten Bangla numeral database CMATERdb 3.1.1, and achieved an excellent accuracy of 96:7% character recognition rate.

63 citations


Proceedings ArticleDOI
16 Mar 2015
TL;DR: The proposed DBNN structure for Arabic handwritten character/word recognition is not already able to deal with high-level dimensional data and thus has to be improved.
Abstract: In the handwriting recognition field, the deep learning is becoming the new trend thanks to their ability to deal with unlabeled raw data especially with the huge size of raw data available nowadays. In this paper, we investigate Deep Belief Neural Network (DBNN) for Arabic handwritten character/word recognition. The proposed system takes the raw data as input and proceeds with a grasping layer-wise unsupervised learning algorithm. The approach was tested on two different databases. For the character level one, the results were promising with an error classification rate of 2.1% on the HACDB database. Unlike, the character level, the evaluation on the ADAB database to deal with word level shows an error rate which exceeds the 40%. Hence, the proposed DBNN structure is not already able to deal with high-level dimensional data and thus has to be improved.

39 citations


Journal ArticleDOI
TL;DR: This work forms the word segmentation problem as a binary quadratic assignment problem that considers pairwise correlations between the gaps as well as the likelihoods of individual gaps, and estimates all parameters based on the Structured SVM framework so that the proposed method works well regardless of writing styles and written languages without user-defined parameters.
Abstract: Segmentation of handwritten document images into text-lines and words is an essential task for optical character recognition. However, since the features of handwritten document are irregular and diverse depending on the person, it is considered a challenging problem. In order to address the problem, we formulate the word segmentation problem as a binary quadratic assignment problem that considers pairwise correlations between the gaps as well as the likelihoods of individual gaps. Even though many parameters are involved in our formulation, we estimate all parameters based on the Structured SVM framework so that the proposed method works well regardless of writing styles and written languages without user-defined parameters. Experimental results on ICDAR 2009/2013 handwriting segmentation databases show that proposed method achieves the state-of-the-art performance on Latin-based and Indian languages.

37 citations


Posted Content
TL;DR: It is reported that the winning entry of text image super-resolution framework has largely improved the OCR performance with low-resolution images used as input, reaching an OCR accuracy score of 77.19%, which is comparable with that of using the original high- resolution images.
Abstract: Text image super-resolution is a challenging yet open research problem in the computer vision community. In particular, low-resolution images hamper the performance of typical optical character recognition (OCR) systems. In this article, we summarize our entry to the ICDAR2015 Competition on Text Image Super-Resolution. Experiments are based on the provided ICDAR2015 TextSR dataset (3) and the released Tesseract-OCR 3.02 system (1). We report that our winning entry of text image super-resolution framework has largely improved the OCR performance with low-resolution images used as input, reaching an OCR accuracy score of 77.19%, which is comparable with that of using the original high-resolution images (78.80%). Index Terms—super resolution; optical character recogni- tion.

33 citations


Proceedings ArticleDOI
19 Mar 2015
TL;DR: This work proposes a combined horizontal and vertical projection feature extraction scheme for recognition of Gurmukhi characters, an Indic script commonly used in state of Punjab in India.
Abstract: Despite the advancements in Optical Character Recognition (OCR) technologies, problem of Indic script character recognition remains challenging. Especially in case of handwritten characters the challenges are even more. In this work, we focus on off-line recognition of handwritten characters of Gurmukhi, an Indic script commonly used in state of Punjab in India. As a part of this work, we collected a Gurmukhi character dataset of 3500 images. This dataset is collected from 10 writers. We propose a combined horizontal and vertical projection feature extraction scheme for recognition of Gurmukhi characters. We have tested our method on the collected dataset and achieved a high character recognition accuracy of 98.06%.

27 citations


Proceedings ArticleDOI
01 Nov 2015
TL;DR: An algorithm for handwritten digit recognition based on projections histograms based on carefully tuned 45 support vector machines (SVM) using One Against One strategy is described.
Abstract: Higher level of image processing usually contains some kind of recognition. Digit recognition is common in applications and handwritten digit recognition is an important subfield. Handwritten digits are characterized by large variations so template matching, in general, is not very efficient. In this paper we describe an algorithm for handwritten digit recognition based on projections histograms. Classification is facilitated by carefully tuned 45 support vector machines (SVM) using One Against One strategy. Our proposed algorithm was tested on standard benchmark images from MNIST database and it achieved remarkable global accuracy of 99.05%, with possibilities for further improvement.

26 citations


Proceedings ArticleDOI
01 Nov 2015
TL;DR: The general architecture of modern OCR system with details of each module is discussed, and Moore neighborhood tracing is applied for extracting boundary of characters and then chain rule for feature extraction.
Abstract: Artificial intelligence, pattern recognition and computer vision has a significant importance in the field of electronics and image processing. Optical character recognition (OCR) is one of the main aspects of pattern recognition and has evolved greatly since its beginning. OCR is a system which recognized the readable characters from optical data and converts it into digital form. Various methodologies have been developed for this purpose using different approaches. In this paper, general architecture of modern OCR system with details of each module is discussed. We applied Moore neighborhood tracing for extracting boundary of characters and then chain rule for feature extraction. In the classification stage for character recognition, SVM is trained and is applied on suitable example.

Proceedings ArticleDOI
21 May 2015
TL;DR: The proposed BHNR-CNN normalizes the written numeral images and then employ CNN to classify individual numerals, which is shown satisfactory recognition accuracy and outperformed other prominent exiting methods.
Abstract: Recognition of handwritten numerals has gained much interest in recent years due to its various application potentials. Although Bangla is a major language in Indian subcontinent and is the first language of Bangladesh study regarding Bangla handwritten numeral recognition (BHNR) is very few with respect to other major languages such Roman. The existing BHNR methods uses distinct feature extraction techniques and various classification tools in their recognition schemes. Recently, convolutional neural network (CNN) is found efficient for image classification with its distinct features. It also automatically provides some degree of translation invariance. In this paper, a CNN based BHNR is investigated. The proposed BHNR-CNN normalizes the written numeral images and then employ CNN to classify individual numerals. It does not employ any feature extraction method like other related works. 17000 hand written numerals with different shapes, sizes and variations are used in this study. The proposed method is shown satisfactory recognition accuracy and outperformed other prominent exiting methods.

Book ChapterDOI
17 Jun 2015
TL;DR: In this paper, the authors proposed to take existing binarization techniques, in order to retain their advantages, and modify them in such a way that some of the original grayscale information is preserved and be considered by the subsequent recognizer.
Abstract: The amount of digitized legacy documents has been rising over the last years due mainly to the increasing number of on-line digital libraries publishing this kind of documents. The vast majority of them remain waiting to be transcribed to provide historians and other researchers new ways of indexing, consulting and querying them. However, the performance accuracy of state-of-the-art Handwritten Text Recognition techniques decreases dramatically when they are applied to these historical documents. This is mainly due to the typical paper degradation problems. Therefore, robust pre-processing techniques is an important step for helping further recognition steps. This paper proposes to take existing binarization techniques, in order to retain their advantages, and modify them in such a way that some of the original grayscale information is preserved and be considered by the subsequent recognizer. Results are reported with the publicly available ESPOSALLES database.

Proceedings ArticleDOI
01 Dec 2015
TL;DR: The main aim of this paper is to propose an efficient feature extraction and classification techniques for OCR system for handwritten Kannada characters and numerals which involves several phases such as preprocessing, feature extraction, classification and classification.
Abstract: The frontier area of research in the field of pattern recognition and image processing is handwritten character recognition. This leads to a great demand for OCR system containing handwritten documents. In order to recognize the text present in a document, an Optical Character Recognition (OCR) system is developed. In this paper, OCR system for handwritten Kannada characters and numerals is developed which involves several phases such as preprocessing, feature extraction and classification. Preprocessing includes the techniques that are suitable to convert the input image into an acceptable form for feature extraction. The main aim of this paper is to propose an efficient feature extraction and classification techniques. Suitable features are extracted as structural features and wavelet transform is employed for extracting global features. Artificial neural network classifier is used for recognizing the handwritten Kannada characters and numerals. The proposed method is experimented on 4800 images of handwritten Kannada characters and obtained an average accuracy of 91.00%. Also, the proposed method is experimented on 1000 images of handwritten Kannada numerals and obtained an average accuracy of 97.60%.

Book ChapterDOI
24 Nov 2015
TL;DR: The proposed character segmenattion technique can be used as a part of an OCR system for cursive handwritten Hindi language and can cope with high variations in writing style and skewed header lines as input.
Abstract: The proper character level segmentation of printed or handwritten text is an important preprocessing step for optical character recognition OCR. It is noticed that the languages having cursive nature in writing make the segmentation problem much more complicated. Hindi is one of the well known language in India having this cursive nature in writing style. The main challenge in handwritten character segmentation is to handle the inherent variability in the writing style of different individuals. In this paper, we present an efficient character segmentation method for handwritten Hindi words. Segmentation is performed on the basis of some structural patterns observed in the writing style of this language. The proposed method can cope with high variations in writing style and skewed header lines as input. The method has been tested on our own database for both printed and handwritten words. The average success rate is 96.93i¾?%. The method yields fairly good results for this database comparing with other existing methods. We foresee that the proposed character segmenattion technique can be used as a part of an OCR system for cursive handwritten Hindi language.

Proceedings ArticleDOI
25 Mar 2015
TL;DR: This work provides a comprehensive review of these methods for off-line handwritten Arabic text recognition and presents recognition rates and descriptions of the databases used for the discussed approaches.
Abstract: Research in Arabic handwritten recognition has been of growing interest in the last few decades. This is mainly due to its broad spectrum of applications in different fields such as bank check processing, form data entry, postal mail sorting, automatic processing of old manuscripts, etc. In the literature, numerous techniques have been proposed for feature extraction and applied to various types of images. This work provides a comprehensive review of these methods for off-line handwritten Arabic text recognition. It also presents recognition rates and descriptions of the databases used for the discussed approaches. This paper includes background on the field, discussion of feature extraction methods, and future research directions.

Journal ArticleDOI
25 May 2015
TL;DR: In this paper, concentric rectangles and convex hull-based features are designed in order to classify word images belonging to different classes and a neural network-based classifier is chosen on the basis of the performances of different classifiers and some statistical tests.
Abstract: Holistic word recognition is the current trend for handwritten word recognition. The holistic paradigm in handwritten word recognition considers a word as a single, indivisible entity and attempts to recognise words from their overall shape unlike recognising the individual characters comprising the word. In the present work, concentric rectangles and convex hull-based features are designed in order to classify word images belonging to different classes. For the evaluation of the current technique, 2,754 handwritten Bangla word samples are collected from different sources. A neural network-based classifier is chosen on the basis of the performances of different classifiers and some statistical tests. The recognition performance of the technique is evaluated using a three-fold cross-validation method. From the experimental results, it is observed that the proposed technique correctly recognises 84.74% word images in best case.

Journal ArticleDOI
TL;DR: In order to achieve a better recognition rate, a learning algorithm, Support Vector Machine (SVM) has been implemented and these concepts are experimented on 30 Tamil character sets and achieved an accuracy rate of 88%.
Abstract: Character recognition plays an important role in the field of pattern recognition. Offline character recognition methodology mainly focuses on recognizing the characters irrespective of the difficulties that may arise due to the variations in writing style. This writing style becomes more complex when the characters are in curvy structure. The proposed recognition methodology was applied on one of the complex structures of south Indian language 'Tamil'. The novelty behind this process lies on the selection and extraction of the feature sets. Zoning and Chain Code procedures are employed here to select the features and Sub Line Direction and Bounding box algorithms are used for extracting the features. In order to achieve a better recognition rate, a learning algorithm, Support Vector Machine (SVM) has been implemented. These concepts are experimented on 30 Tamil character sets (Vowels and Consonants) and achieved an accuracy rate of 88%.

Proceedings ArticleDOI
02 Mar 2015
TL;DR: A technique of text word recognition based on template matching technique using Correlation coefficient is proposed and overall 76.4 percent word recognition accuracy is achieved which is an encouraging result for off-line word recognition.
Abstract: The recognition of legal amount present on a bank cheque is a big challenge because of the structural complexity of characters and variability of writing styles in automatic bank cheque processing. This paper proposes a technique of text word recognition based on template matching technique using Correlation coefficient. We have developed a database of 61 words, combination of which can represent any legal amount written in words in Indian bank cheque. Proposed algorithm is tested on our database and overall 76.4 percent word recognition accuracy is achieved which is an encouraging result for off-line word recognition.

Proceedings ArticleDOI
01 Sep 2015
TL;DR: An approach to design and implement an off-line OCR system that recognizes Arabic handwritten characters; in this approach Artificial Neural Networks (ANNs) were used as classifiers.
Abstract: Optical Character Recognition (OCR) is the mechanical or electronic conversion of scanned images of handwritten, typewritten or printed text into machine-encoded text It is widely used as a form of data entry This paper proposes an approach to design and implement an off-line OCR system that recognizes Arabic handwritten characters; in this approach Artificial Neural Networks (ANNs) were used as classifiers The ANN was trained based on the Hopfield Algorithm which was designed using MATLAB In our system, the image goes through a preprocessing stage, followed by a features extraction stage and a recognition stage For the recognition to be accurate certain properties of each of the letters are calculated, these properties also called features are extracted from the image Selection of a relevant feature extraction method is probably the single most important factor in achieving high recognition performance with much better accuracy in character recognition systems A collection of such features (vectors) define the character uniquely by the means of an ANN Experimental results showed that the system designed is able to recognize eight Arabic handwritten letters () with a successful recognition rate of (7725) The system designed can be further developed to include the rest of the Arabic Alphabets, and a segmentation stage so that it could recognize words

Proceedings ArticleDOI
09 Jul 2015
TL;DR: A page-level script identification technique for eight popular handwritten scripts namely, Bangla, Devanagari, Gurumukhi, Oriya, Tamil, Telugu, Urdu along with Roman has been proposed and it yields 95.57% accuracy in identifying the scripts of the documents.
Abstract: Automatic identification of scripts, an imperative research problem during the last few decades, has posed many challenges in any multi-script environment. As India is a multilingual country, therefore, text documents containing more than one language are very familiar phenomenon here. But to digitize these multi-lingual documents using any Optical Character Recognition (OCR) engine, first it is required to recognize the scripts used to write the same. In this paper, a page-level script identification technique for eight popular handwritten scripts namely, Bangla, Devanagari, Gurumukhi, Oriya, Tamil, Telugu, Urdu along with Roman has been proposed. To start with, Modified log-Gabor filters based texture features are designed from each of the document pages. Then the proposed model is evaluated using multiple classifiers and based on their identification accuracies, it is found that Simple Logistic performs the best. Outcome of the present experiment reveals the usefulness of the Modified log-Gabor filters based features in recognition of handwritten Indic scripts. A total of 240 document pages is used to carry out the present experiment and it yields 95.57% accuracy in identifying the scripts of the documents. Even if the proposed method is assessed on limited dataset, but considering the intricacies of the scripts, the outcome can be assumed reasonably acceptable.

Journal ArticleDOI
TL;DR: This paper presents a comprehensive review of Handwritten Character Recognition (HCR) in English language.
Abstract: This paper presents a comprehensive review of Handwritten Character Recognition (HCR) in English language.The handwritten character recognition has been applied in variety of applications like Banking sectors, Health care industries and many such organizations where handwritten documents are dealt with. Handwritten Character Recognition is the process of conversion of handwritten text into machine readable form. For handwritten characters there are difficulties like it differs from one writer to another, even when same person writes same character there is difference in shape, size and position of character. Latest research in this area has used different types of method, classifiers and features to reduce the complexity of recognizing handwritten text.

Proceedings ArticleDOI
23 Aug 2015
TL;DR: Results from handwritten Arabic word recognition task show that the approach is promising with good recognition rates, and investigates different approaches including, computer generated text in different typefaces as training data, unsupervised adaptation, and using recognition hypothesis on the test sets as trainingData.
Abstract: Handwritten text recognition is an active research area in pattern recognition. One of the prerequisites of setting up a handwritten text recognizer is to train them using, mostly, large amounts of labeled training data. In the current paper we report our work on handwritten text recognition using no handwritten training set. We investigate different approaches including, computer generated text in different typefaces as training data, unsupervised adaptation, and using recognition hypothesis on the test sets as training data. Results from handwritten Arabic word recognition task show that the approach is promising with good recognition rates.

Journal ArticleDOI
30 Apr 2015
TL;DR: In this paper, the template matching correlation method was used to identify different types of characters with different sizes and shapes, which achieved an average recognition success rate of 92,90% and achieved good accuracy.
Abstract: OCR (Optical Character Recognition) is an effective solution to the process of converting printed documents into digital documents. The problems that arise in the process of computer letters recognition is how a recognition techniques to identify different types of characters with different sizes and shapes. Recognition method used in this final project is the template matching correlation method. Prior to the recognition process, the input image with a format *.bmp or *.jpg processed first at the preprocessing process, which includes the binerisasi, segmentation, and normalization of images. Average recognition success rate of 92,90% is generated by this system. The final results showed that the use of the template matching correlation method is effective enough to build an OCR system with good accuracy.

Proceedings ArticleDOI
23 Aug 2015
TL;DR: This paper presents a novel approach to create synthetic dataset for word recognition systems to improve performance of off-line handwritten text recognizers by providing it with additional synthetic training data.
Abstract: This paper presents a novel approach to create synthetic dataset for word recognition systems. Our purpose is to improve performance of off-line handwritten text recognizers by providing it with additional synthetic training data. Due to lack of proper data-set for many languages it becomes hard to train recognition systems. To solve such problems synthetic handwriting could be used to expand the existing training dataset. Any available digital data from online newspaper and such sources can be used to generate this synthetic data. The digital data is distorted in such a way that the underlying pattern is conserved for identification of the word by both machine and human user. The images hence produced can be used to train any classification system for handwriting recognition. This data can be used independently to train the system or be combined with natural handwritten data to augment the original dataset and improve the accuracy of the results. We experimented using only synthetic data obtaining high recognition accuracy in both character and word recognition. The data was tested on 3 Indian scripts for numerals- Hindi, Bengali and Telugu, and 1 script-Hindi for words, the results achieved hence are highly promising.

Proceedings ArticleDOI
25 May 2015
TL;DR: Results on a popular handwritten digit recognition benchmark clearly demonstrate that two layers of feature transformations improves generalisation compared to a single layer, and it is shown that the proposed system outperforms several standard Genetic Programming systems.
Abstract: A training protocol for learning deep neural networks, called greedy layer-wise training, is applied to the evolution of a hierarchical, feed-forward Genetic Programming based system for feature construction and object recognition. Results on a popular handwritten digit recognition benchmark clearly demonstrate that two layers of feature transformations improves generalisation compared to a single layer. In addition, we show that the proposed system outperforms several standard Genetic Programming systems, which are based on hand-designed features, and use different program representations and fitness functions.

Proceedings ArticleDOI
01 Dec 2015
TL;DR: This paper compared the well-functioning of the proposed SVMs for AHS recognition with character recognition reliabilities coming from state-of-the-art Arabic OCR which resulted in commendatory outcomes.
Abstract: Handwriting recognition ranks among the highest and the most triumphant applications in the pattern recognition domain. Despite being a developed field, many enquiries are still needed and still represent a defiance mainly for the Arabic Handwritten Script (AHS). Recently, more regard has been given to Support Vector Machines (SVM) classifier for script recognition. Nevertheless, it has not been put in application yet to the handwritten Arabic field if compared with the other methods like ANN, CNN, RNN and HMM. SVMs for AHS recognition is examined in this paper. Handcrafted feature is handled as input by the suggested method and gets going with a supervised learning algorithm. We chose the Multi-class Support Vector Machine with an RBF kernel and we tested it on Handwritten Arabic Characters Database (HACDB) as well. It was proven that the proposed method was effective thanks to the simulation results. We compared the well-functioning of this method with character recognition reliabilities coming from state-of-the-art Arabic OCR which resulted in commendatory outcomes.

Proceedings ArticleDOI
01 Aug 2015
TL;DR: This paper normalize images of various sizes and stroke thickness in preprocessing to eliminate negative information and keep relevant features and proposes specific feature definitions, including structure features, distribution features and projection features, which fuse multiple features into the deep neural networks for semantics recognition.
Abstract: Handwritten digit recognition is an important research topic in computer vision and pattern recognition. This paper proposes an effective handwritten digit recognition approach based on specific multi-feature extraction and deep analysis. First, we normalize images of various sizes and stroke thickness in preprocessing to eliminate negative information and keep relevant features. Secondly, considering that handwritten digit image recognition is different from traditional image semantics recognition, we propose specific feature definitions, including structure features, distribution features and projection features. Moreover, we fuse multiple features into the deep neural networks for semantics recognition. Experiments results on benchmark database of MNIST handwritten digit images show that the performance of our algorithm is remarkable and demonstrate its superiority over several existing algorithms.

Proceedings ArticleDOI
21 Dec 2015
TL;DR: This paper presents a low level stroke feature based method for recognition of online handwritten Gujarati characters and numerals using a nearest neighbor (i.e. K-NN) classifier with k-fold cross validation on the dataset having 4500 samples from 45 different classes.
Abstract: This paper presents a low level stroke feature based method for recognition of online handwritten Gujarati characters and numerals. A reasonable size database of online handwritten Gujarati characters and numerals has been developed. This is the first such database of online handwritten symbols for Gujarati script The hierarchical histograms of twelve different low level stroke features and eight directional features were generated to capture the variation in strokes at different level. Recognition is performed using a nearest neighbor (i.e. K-NN) classifier with k-fold cross validation on the dataset having 4500 samples from 45 different classes (37 characters and 8 numerals). Overall Recognition rates achieved are 95%, 93% and 90% for numerals dataset, characters dataset and combine dataset of numerals and characters respectively.

Proceedings ArticleDOI
01 Sep 2015
TL;DR: The work described in this paper presents efficiency of Zernike moments over Hu's seven moment with zoning for automatic recognition of handwritten `MODI' characters.
Abstract: HOCR is abbreviated as Handwritten Optical Character Recognition. HOCR recognizes handwritten characters from a digital image of documents. Shape identification and feature extraction is very important part of any OCR. Feature extraction defines shape of the character as precisely and as uniquely as possible. Zernike moments describes shape, identify rotation invariant due to its orthogonal property. ‘MODI’ is an ancient script of India had cursive and complex representation of characters. The work described in this paper presents efficiency of Zernike moments over Hu's seven moment with zoning for automatic recognition of handwritten ‘MODI’ characters. 82.61% recognition rate was achieved by using zone based approach for Zernike moments.