scispace - formally typeset
Search or ask a question

Showing papers on "Intelligent word recognition published in 2017"


Journal ArticleDOI
TL;DR: This paper proposes a novel scene text recognition technique that performs word level recognition without character segmentation and adapts the recurrent neural network with Long Short Term Memory, the technique that has been widely used for handwriting recognition in recent years.

129 citations


Journal ArticleDOI
TL;DR: In the present work, a non-explicit feature based approach, more specifically, a multi-column multi-scale convolutional neural network (MMCNN) based architecture has been proposed for this purpose and a deep quad-tree based staggered prediction model has be proposed for faster character recognition.

88 citations


Proceedings ArticleDOI
01 Jan 2017
TL;DR: In this article, the authors proposed a novel algorithm based on deep learning neural networks using appropriate activation function and regularization layer, which shows significantly improved accuracy compared to the existing Arabic numeral recognition methods.
Abstract: Handwritten character recognition is an active area of research with applications in numerous fields. Past and recent works in this field have concentrated on various languages. Arabic is one language where the scope of research is still widespread, with it being one of the most popular languages in the world and being syntactically different from other major languages. Das et al. [1] has pioneered the research for handwritten digit recognition in Arabic. In this paper, we propose a novel algorithm based on deep learning neural networks using appropriate activation function and regularization layer, which shows significantly improved accuracy compared to the existing Arabic numeral recognition methods. The proposed model gives 97.4 percent accuracy, which is the recorded highest accuracy of the dataset used in the experiment. We also propose a modification of the method described in [1], where our method scores identical accuracy as that of [1], with the value of 93.8 percent.

75 citations


Proceedings ArticleDOI
01 Mar 2017
TL;DR: The proposed system uses Convolutional neural network to extract features and is tested against a newly constructed dataset of six Malayalam characters, which shows remarkable improvement in recognizing characters of other languages.
Abstract: Optical Character Recognition is the process of converting an input text image into a machine encoded format. Different methods are used in OCR for different languages. The main steps of optical character recognition are pre-processing, segmentation and recognition. Recognizing handwritten text is harder than recognizing printed text. Convolutional Neural Network has shown remarkable improvement in recognizing characters of other languages. But CNNs have not been implemented for Malayalam handwritten characters yet. The proposed system uses Convolutional neural network to extract features. This is method different from the conventional method that requires handcrafted features that needs to be used for finding features in the text. We have tested the network against a newly constructed dataset of six Malayalam characters. This is method different from the conventional method that requires handcrafted features that needs to be used for finding features in the text.

28 citations


Journal ArticleDOI
TL;DR: Considering the complexities of Hindi characters, the technique shows an impressive result using a Multilayer Perceptron MLP based classifier and shows scale and rotation invariant nature to a significant extent.
Abstract: Holistic word recognition attempts to recognize the entire word image as a single pattern. In general, it performs better than segmentation based word recognition model for known, fixed and small sized lexicon. The present work deals with recognition of handwritten words in Hindi in holistic way. Features like area, aspect ratio, density, pixel ratio, longest run, centroid and projection length are extracted either from entire word image or from the hypothetically generated sub-images of the same. An 89-elements feature vector has been designed to represent each word in the feature space and five different classifiers have been used for measuring recognition performances. Considering the complexities of Hindi characters, the technique shows an impressive result using a Multilayer Perceptron MLP based classifier. Moreover, the technique shows scale and rotation invariant nature to a significant extent.

28 citations



Journal ArticleDOI
TL;DR: The paper describes the behaviors of different Models of Neural Network used in OCR, including Multilayer Feed Forward network with Back propagation and some basic algorithms for segmentation of characters, normalizing of characters and De-skewing.
Abstract: Objective is this paper is recognize the characters in a given scanned documents and study the effects of changing the Models of ANN.Today Neural Networks are mostly used for Pattern Recognition task. The paper describes the behaviors of different Models of Neural Network used in OCR. OCR is widespread use of Neural Network. We have considered parameters like number of Hidden Layer, size of Hidden Layer and epochs. We have used Multilayer Feed Forward network with Back propagation. In Preprocessing we have applied some basic algorithms for segmentation of characters, normalizing of characters and De-skewing. We have used different Models of Neural Network and applied the test set on each to find the accuracy of the respective Neural Network.

26 citations


Proceedings ArticleDOI
22 Mar 2017
TL;DR: The general difficulties in Arabic language text, the main process of a typical OCR system and some enhancements to Arabic OCR systems are described, and a novel approach for identifying handwritten isolated Arabic characters using encoded Freeman chain code is described.
Abstract: Optical Character Recognition (OCR) is the process of identifying text in an image and convert it into a digital form. Several approaches have been attempted to accurately recognize characters in printed Arabic language. This survey focuses on OCR in handwritten Arabic language. We will describe the general difficulties in Arabic language text, the main process of a typical OCR system and some enhancements to Arabic OCR systems. We will also describe a novel approach for identifying handwritten isolated Arabic characters using encoded Freeman chain code. Several handwritten Arabic characters were trained and tested, and the preliminary experimental results are promising.

24 citations


Proceedings ArticleDOI
01 Aug 2017
TL;DR: The aim is to develop an efficient method which uses a custom image to train the classifier, which extract distinct features from the input image for classifying its contents as characters specifically letters and digits.
Abstract: The aim is to develop an efficient method which uses a custom image to train the classifier. This OCR extract distinct features from the input image for classifying its contents as characters specifically letters and digits. Input to the system is digital images containing the patterns to be classified. The analysis and recognition of the patterns in images are becoming more complex, yet easy with advances in technological knowledge. Therefore it is proposed to develop sophisticated strategies of pattern analysis to cope with these difficulties. The present work involves application of pattern recognition using KNN to recognize handwritten or printed text.

20 citations


Journal ArticleDOI
TL;DR: A new handwritten alphanumeric character database for Odia is created and reported in this paper in order to address the paucity of benchmark Odia database.
Abstract: Character recognition is one of the challenging tasks of pattern recognition and machine learning arena. Though a level of saturation has been obtained in machine printed character recognition, there still remains a void while recognizing handwritten scripts. We, in this paper, have summarized all the existing research efforts on the recognition of printed as well as handwritten Odia alphanumeric characters. Odia is a classical and popular language in the Indian subcontinent used by more than 50 million people. In spite of its rich history, popularity and usefulness, not much research efforts have been made to achieve human level accuracy in case of Odia OCR. This review is expected to serve a benchmark reference for research on Odia character recognition and inspire OCR research communities to make tangible impact on its growth. Here several preprocessing methodologies, segmentation approaches, feature extraction techniques and classifier models with their respective accuracies so far reported are critically reviewed, evaluated and compared. The shortcomings and deficiencies in the current state-of-the-art are discussed in detail for each stage of character recognition. A new handwritten alphanumeric character database for Odia is created and reported in this paper in order to address the paucity of benchmark Odia database. From the existing research work, future research paradigms on Odia character recognition are suggested. We hope that such a comprehensive survey on Odia character recognition will serve its purpose of being a solid reference and help creating high accuracy Odia character recognition systems.

17 citations


Journal ArticleDOI
TL;DR: A new approach for handwritten digit recognition that uses a small number of patterns for training phase and Bag of Visual Words (BoVW) technique to construct images feature vectors to improve performance of isolated Farsi/Arabic handwrittendigit recognition.
Abstract: Handwritten digit recognition has long been a challenging problem in the field of optical character recognition and of great importance in industry. This paper develops a new approach for handwritten digit recognition that uses a small number of patterns for training phase. To improve performance of isolated Farsi/Arabic handwritten digit recognition, we use Bag of Visual Words (BoVW) technique to construct images feature vectors. Each visual word is described by Scale Invariant Feature Transform (SIFT) method. For learning feature vectors, Quantum Neural Networks (QNN) classifier is used. Experimental results on a very popular Farsi/Arabic handwritten digit dataset (HODA dataset) show that proposed method can achieve the highest recognition rate compared to other state of the arts methods.

Posted Content
12 Nov 2017
TL;DR: The arbitrary orientation network (AON) is developed to capture the deep features of irregular texts (e.g. arbitrarily-oriented, perspective or curved), which are combined into an attention-based decoder to generate character sequence.
Abstract: Recognizing text from natural images is still a hot research topic in computer vision due to its various applications. Despite the enduring research of several decades on optical character recognition (OCR), recognizing texts from natural images is still a challenging task. This is because scene texts are often in irregular arrangements (curved, arbitrarily-oriented or seriously distorted), which have not yet been well addressed in the literature. Existing methods on text recognition mainly work with regular (horizontal and frontal) texts and cannot be trivially generalized to handle irregular texts. In this paper, we develop the arbitrary orientation network (AON) to capture the deep features of irregular texts (e.g. arbitrarily-oriented, perspective or curved), which are combined into an attention-based decoder to generate character sequence. The whole network can be trained end-to-end by using only images and word-level labels. Extensive experiments on various benchmarks, including the CUTE80, SVT-Perspective, IIIT5k, SVT and ICDAR datasets, show that the proposed AON-based method substantially outperforms the existing methods.

Journal ArticleDOI
TL;DR: Principal Component Analysis (PCA) method is employed for reducing the feature vector dimensions of Persian handwritten digits recognition and the experimental results show significant improvement in accuracy of Iranian handwritten OCR compared to the previous methods.
Abstract: Feature extraction is one of the most important steps in Optical Character Recognition (OCR) systems, that is effective in recognition accuracy. In this paper, a suitable combination of different features such as zoning, hole size, crossing counts, etc. for Persian handwritten digits recognition is proposed. Due to high number of features, feature vector dimensions will be high that increases training time exponentially. In this paper, to solve this problem, Principal Component Analysis (PCA) method is employed for reducing the feature vector dimensions. Finally, data are classified by Support Vector Machine (SVM) classification method. The proposed method has been executed on HODA dataset which is one of the largest standard datasets of Persian handwritten digits that includes 60000 training and 20000 test samples. The proposed method reaches to 99.07% of accuracy in this dataset, and the experimental results show significant improvement in accuracy of Persian handwritten OCR compared to the previous methods.

Proceedings ArticleDOI
01 Jan 2017
TL;DR: The work proposed in this paper tries to automate recognition of handwritten hindi isolated characters using multiple classifiers usingmultiple classifiers for feature extraction.
Abstract: Humans can easily recognize handwritten words, after gaining basic knowledge of languages. This knowledge needs to be transferred to computers for automatic character recognition. The work proposed in this paper tries to automate recognition of handwritten hindi isolated characters using multiple classifiers. For feature extraction, it uses histogram of oriented gradients as one feature and profile projection histogram as another feature. The performance of various classifiers has been evaluated using theses features experimentally and quadratic SVM has been found to produce better results.

Proceedings ArticleDOI
24 Jun 2017
TL;DR: This work proposes a method of combining deep convolution neural network and support vector machine together to learn and extract Chinese characters features automatically, and then the extracted features are classified and identified by the support vectors machine.
Abstract: For the past several decades, offline handwritten character recognition is widely and deeply studied. The requirements of the identification results are constantly improving in practical applications. However, the recognition rates of the similar handwritten Chinese characters are not very high in different writing style, writing environment and writing mode. We propose a method of combining deep convolution neural network and support vector machine together. Using the deep convolution neural network to learn and extract Chinese characters features automatically, and then the extracted features are classified and identified by the support vector machine. Experiments show that the deep convolution neural network can extract the features effectively, which avoided the shortage of artificial feature extraction, then using the support vector machine to classify and identify that, the accuracy rate is further improved.

Journal ArticleDOI
TL;DR: This paper attempted to combine neural network technology and character recognition technology, and came up with effectively new method of handwritten character recognition, finding a feasible new way to solve practical difficulties between handwritten characters recognition.
Abstract: The offline handwritten character recognition is an important branch of character recognition, which involves character recognition, image processing, digital signal processing, artificial intelligence, fuzzy mathematics, information theory, computer and other disciplines. In the process of off-line handwritten character recognition, it only processed two-dimensional character dot images, and there are problems such as too many character classes, complex font structure, large deformation of handwritten characters and other issues. Currently off-line handwritten character recognition technology is still immature, which is still in the laboratory research stage. This paper attempted to combine neural network technology and character recognition technology, and came up with effectively new method of handwritten character recognition, finding a feasible new way to solve practical difficulties between handwritten character recognition. In this paper, according to the building process of the multi-level...

Proceedings ArticleDOI
01 Feb 2017
TL;DR: A method for handwritten text recognition (HWR) of this font is proposed and a method for preprocessing and normalization of data and optical character recognition based on SVM classifier is proposed.
Abstract: Comenia script is a novel handwritten text introduced at primary schools in the Czech Republic This paper describes a method for handwritten text recognition (HWR) of this font In particular it proposes a method for preprocessing and normalization of data and optical character recognition based on SVM classifier We have trained and statistically evaluated several models, where we have focused on recognition of different styles of writing of the same characters — for the forensic purposes and identification of the author of a document The best model has achieved 9286 % accuracy without any further postprocessing, eg a spellchecker We also proposed using more than one classification model for character recognition that has shown to increase accuracy when compared to a single model approach

Proceedings ArticleDOI
17 Apr 2017
TL;DR: This work investigates the applicability of Deep Convolutional Neural Network on the recently proposed database, referred to as OIHACDB, and trains the model under Theano framework with dropout as a regularization technique to avoid over-fitting to give a better generalization performance.
Abstract: The recognition of Arabic writing is still an important challenge due to its cursive nature and high topological variability Traditional machine-learning techniques required careful engineering and considerable domain expertise to transform raw data into a feature vector from which the classifier could classify the input pattern In recent years, deep learning approach has acquired a reputation for solving many computer vision problems, and its application to the field of Handwritten Arabic Character Recognition (HACR) has been shown to provide significantly better results than traditional methods In this work, we investigate the applicability of Deep Convolutional Neural Network on the recently proposed database, referred to as OIHACDB The proposed model normalizes the handwritten character images and then employs Deep Convolutional Neural Network to classify them The model was trained under Theano framework with dropout as a regularization technique to avoid over-fitting to give a better generalization performance Our results shown satisfactory recognition accuracy (9732%) and outperform some other prominent exiting methods

Journal Article
TL;DR: The proposed scheme evaluated on the IFN/ENIT database of Arabic handwritten words reveal that combining the classifiers results in improved recognition rates which, in some cases, outperform the state-of-the-art recognition systems.
Abstract: This study investigates the combination of different classifiers to improve Arabic handwritten word recognition. Features based on Discrete Cosine Transform (DCT) and Histogram of Oriented Gradients (HOG) are computed to represent the handwritten words. The dimensionality of the HOG features is reduced by applying Principal Component Analysis (PCA). Each set of features is separately fed to two different classifiers, Support Vector Machine (SVM) and Fuzzy K-Nearest Neighbor (FKNN) giving a total of four independent classifiers. A set of different fusion rules is applied to combine the output of the classifiers. The proposed scheme evaluated on the IFN/ENIT database of Arabic handwritten words reveal that combining the classifiers results in improved recognition rates which, in some cases, outperform the state-of-the-art recognition systems.

Proceedings ArticleDOI
01 Apr 2017
TL;DR: This work proposes a tailored dataset and a delicately designed model that can be trained on only machine-generated character images with various typefaces and not only achieve an excellent result on machine generated images, but also achieve a decent accuracy in detecting handwritten characters.
Abstract: While the task of Optical Character Recognition is deemed to be a solved problem in many languages, it still requires certain improvements in some languages with more complex script structures such as Farsi. Furthermore, Deep Convolution Neural Networks have reached excellent results in various computer vision tasks, including character recognition. Although, these networks require a great amount of data to be properly learned and (in some cases) lack generalization. In order to address this issue, in this work, we propose a tailored dataset and a delicately designed model that can be trained on only machine-generated character images with various typefaces and not only achieve an excellent result on machine generated images, but also achieve a decent accuracy in detecting handwritten characters.

Proceedings ArticleDOI
07 Mar 2017
TL;DR: An automatic license plate recognition system for the three different Iraqi car license plates was proposed in this paper, differentiating between the three styles were done depending on the plate size.
Abstract: License plate recognition (LPR) system is an important system in our life. LPR is an image processing and a character recognition system that used to recognize any car from the others. An automatic license plate recognition system for the three different Iraqi car license plates was proposed in this paper. Differentiating between the three styles were done depending on the plate size. An optical character recognition (OCR) is used with correlation approach and templates matching for plate recognition by segmenting each number, character and word into sub images. The software used is MATLAB R2014a. The algorithm is successfully constructed with sample of images correctly identified.

Journal ArticleDOI
TL;DR: The core contribution of this research is the development of a new classification technique that is based on the MLP, which can be identified in handwritten documents as the binary digits ‘0’ and ‘1’.
Abstract: With handwritten digit recognition being an established and significant problem that is facing computer vision and pattern recognition, there has been a great deal of research work that has been undertaken in this area. It is not a trivial task because of the big variation that exists in the writing styles that have been found in the available data. Therefore both, the features and the classifier need to be efficient. The core contribution of this research is the development of a new classification technique that is based on the MLP, which can be identified in handwritten documents as the binary digits ‘0’ and ‘1’. This technique maps the different sets of various input data onto the MLP output neurons. An experimental evaluation of the technique’s performance is provided. This evaluation is based on the well-known ‘Pen-Based Recognition of Handwritten Digits’ dataset, which is comprised of a total of 250 handwriting samples that are taken from 44 writers. The results obtained are very promising for such an approach in accurate handwriting recognition.

Journal Article
TL;DR: Artificial Neural Network, Support Vector Machine, and Naive Bayes classifier based methods are implemented for handwritten Gujarati character recognition and results show substantial enhancement over state-of-the-art.
Abstract: Handwritten character recognition is a challenging area of research. Lots of research activities in the area of character recognition are already done for Indian languages such as Hindi, Bangla, Kannada, Tamil and Telugu. Literature review on handwritten character recognition indicates that in comparison with other Indian scripts research activities on Gujarati handwritten character recognition are very less. This paper aims to bring Gujarati character recognition in attention. Recognition of isolated Gujarati handwritten characters is proposed using three different kinds of features and their fusion. Chain code based, zone based and projection profiles based features are utilized as individual features. One of the significant contribution of proposed work is towards the generation of large and representative dataset of 88,000 handwritten Gujarati characters. Experiments are carried out on this developed dataset. Artificial Neural Network (ANN), Support Vector Machine (SVM) and Naive Bayes (NB) classifier based methods are implemented for handwritten Gujarati character recognition. Experimental results show substantial enhancement over state-of-the-art and authenticate our proposals.

Proceedings ArticleDOI
01 Sep 2017
TL;DR: A method of aerial handwritten Japanese katakana character recognition using the triaxial accelerometer using k-nearest neighbor (k-NN) algorithm using feature vectors extracted with simple signal processing is described.
Abstract: In this paper, a method of aerial handwritten Japanese katakana character recognition using the triaxial accelerometer is described. The proposed character recognition method is based on k-nearest neighbor (k-NN) algorithm using feature vectors extracted with simple signal processing. From the experiments using dataset for 46 katakana characters from specific perticipant, average recognition rate of 79.6% was confirmed.

Book ChapterDOI
01 Jan 2017
TL;DR: An online handwritten Arabic text recognition system using an alignment matching theory is presented, which deals with the handwritten words as one block instead of segmenting the words into characters or strokes.
Abstract: Arabic language is considered as the primary language in most parts of the Arabic world. It is spoken as a first language by more than 280 million people, and more than 250 million as a secondary spoken language. In pattern recognition field, several studies were focused on Arabic language with textual or voice methods. In this paper, an online handwritten Arabic text recognition system using an alignment matching theory is presented. The proposed system deals with the handwritten words as one block instead of segmenting the words into characters or strokes. The system started with collecting the dataset of 120 common Quranic words. These words have been gone via some phases to be ready for use. These phases are: Preprocessing, features extraction, and recognition phase. In the first phase, the words went through some steps to be standardized. The second phase is about extracting the features of each word and to save them in the system database. In the third phase, the system uses matching technique to search for the testing word with the system database. The system was tested and the results reached up to 97 %, which were significantly accepted compared to the previous works in the same criteria.

Proceedings ArticleDOI
01 Feb 2017
TL;DR: The handwritten character recognition method for Malayalam language proposed here uses a hybrid approach, both language dependent and independent features are taken into consideration and the basic shape based features of Malayan characters are also extracted for recognition.
Abstract: Optical Character Recognition can be defined as the process of isolating textual scripts from a scanned document. Many researches are going on in this field to make this character recognition process effective and error free. Malayalam handwritten character recognition precision is still inhibited around 90% due to the challenges in Malayalam character set. The presence of two different scripts old and new script, huge character set, presence of similar shaped characters makes Malayalam handwritten character recognition more difficult. Feature extraction for each language may vary depending on various characteristics of that language. The shape and structure of characters for each language have some common features. The handwritten character recognition method for Malayalam language proposed here uses a hybrid approach. Both language dependent and independent features are taken into consideration. The basic shape based features of Malayalam characters are also extracted for recognition.

Book ChapterDOI
01 Jan 2017
TL;DR: This paper deals with stroke-based online Bangla character recognition strategy, constituent strokes have been extracted from characters and then popularly used distance based features have been estimated in order to recognize the basic strokes.
Abstract: This paper deals with stroke-based online Bangla character recognition strategy. In the present work, constituent strokes have been extracted from characters and then popularly used distance based features have been estimated in order to recognize the basic strokes. Next, a rule-based approach is followed for the recognition of the characters from the previously recognized strokes. A total of 15,000 isolated online handwritten Bangla characters contributing 32,534 stroke samples have been used in this experiment, and a satisfactory result of 89.39% recognition accuracy has been achieved.

Journal ArticleDOI
TL;DR: A new approach for off-line intelligent word recognition based on a fuzzy classification model that segmented a word into its single characters, and label each pixel as vertical or as horizontal so that it can group all the pixels into vertical or horizontal strokes.

Journal ArticleDOI
TL;DR: This paper presents various techniques presented by different researchers for Punjabi character recognition work, noting that recognition accuracy depends upon volume of training dataset and testing dataset and may be improved by using various optimized feature selection techniques.
Abstract: Objectives: A framework for character recognition is essential used to convert a digital image of character into machine coded format character. This fundamental trademark can be used to determine numerous real life applications. Methods/ Statistical Analysis: To classify hand-written documents, either offline or online, the recognition of character is tremendously influenced by variety of styles of same writer on various circumstances and even different writers. Distortion and noise included during digitization is additionally a noteworthy issue in recognition of character that influences the recognition/classification accuracy adversely. Findings: It has been get to know that recognition of hand-written Gurmukhi characters is an exceptionally troublesome task. There are enormous difficulties in handwritten character recognition because of various writing style of scholars. This paper presents various techniques presented by different researchers for Punjabi character recognition work. It has been also noticed that recognition accuracy depends upon volume of training dataset and testing dataset and may be improved by using various optimized feature selection techniques. Application/ Improvements: A lot of research papers have been surveyed and it is seen that work on different strategies have been attempted.

Journal ArticleDOI
TL;DR: A brief survey of various methods which recognizes of English alphabet in a given scanned text document and compared various methods improves recognition rate and misclassification.
Abstract: Handwritten character recognition has been one of the active and challenging areas of research in the field of image processing and pattern recognition. It has number of applications which include, reading aid for blind, and conversion of any hand written document into structural text form. In this paper we present a brief survey of various methods which recognizes of English alphabet in a given scanned text document. The first step is image acquisition which acquires the scanned image followed by noise filtering, smoothing and normalization of scanned image, rendering image suitable for segmentation where image is decomposed into sub images. Feature Extraction improves recognition rate and misclassification. We have to surveyed and compared various methods in this paper.