scispace - formally typeset
Search or ask a question

Showing papers on "Optical character recognition published in 2016"


Posted Content
TL;DR: The COCO-Text dataset is described, which contains over 173k text annotations in over 63k images and presents an analysis of three leading state-of-the-art photo Optical Character Recognition (OCR) approaches on the dataset.
Abstract: This paper describes the COCO-Text dataset. In recent years large-scale datasets like SUN and Imagenet drove the advancement of scene understanding and object recognition. The goal of COCO-Text is to advance state-of-the-art in text detection and recognition in natural images. The dataset is based on the MS COCO dataset, which contains images of complex everyday scenes. The images were not collected with text in mind and thus contain a broad variety of text instances. To reflect the diversity of text in natural scenes, we annotate text with (a) location in terms of a bounding box, (b) fine-grained classification into machine printed text and handwritten text, (c) classification into legible and illegible text, (d) script of the text and (e) transcriptions of legible text. The dataset contains over 173k text annotations in over 63k images. We provide a statistical analysis of the accuracy of our annotations. In addition, we present an analysis of three leading state-of-the-art photo Optical Character Recognition (OCR) approaches on our dataset. While scene text detection and recognition enjoys strong advances in recent years, we identify significant shortcomings motivating future work.

361 citations


Proceedings ArticleDOI
09 Mar 2016
TL;DR: In this paper, recursive recurrent neural networks with attention modeling (R2AM) were used for lexicon-free optical character recognition in natural scene images, and they achieved state-of-the-art performance on the Street View Text, IIIT5k, ICDAR and Synth90k.
Abstract: We present recursive recurrent neural networks with attention modeling (R2AM) for lexicon-free optical character recognition in natural scene images. The primary advantages of the proposed method are: (1) use of recursive convolutional neural networks (CNNs), which allow for parametrically efficient and effective image feature extraction, (2) an implicitly learned character-level language model, embodied in a recurrent neural network which avoids the need to use N-grams, and (3) the use of a soft-attention mechanism, allowing the model to selectively exploit image features in a coordinated way, and allowing for end-to-end training within a standard backpropagation framework. We validate our method with state-of-the-art performance on challenging benchmark datasets: Street View Text, IIIT5k, ICDAR and Synth90k.

333 citations


Posted Content
TL;DR: This work presents recursive recurrent neural networks with attention modeling (R2AM) for lexicon-free optical character recognition in natural scene images and validates the method with state-of-the-art performance on challenging benchmark datasets.
Abstract: We present recursive recurrent neural networks with attention modeling (R$^2$AM) for lexicon-free optical character recognition in natural scene images. The primary advantages of the proposed method are: (1) use of recursive convolutional neural networks (CNNs), which allow for parametrically efficient and effective image feature extraction; (2) an implicitly learned character-level language model, embodied in a recurrent neural network which avoids the need to use N-grams; and (3) the use of a soft-attention mechanism, allowing the model to selectively exploit image features in a coordinated way, and allowing for end-to-end training within a standard backpropagation framework. We validate our method with state-of-the-art performance on challenging benchmark datasets: Street View Text, IIIT5k, ICDAR and Synth90k.

327 citations


Journal ArticleDOI
01 Jun 2016
TL;DR: A new model focused on integrating two classifiers; Convolutional Neural Network (CNN) and Support Vector Machine (SVM) for offline Arabic handwriting recognition on which the dropout technique was applied performs significantly more efficiently than CNN based-SVM model without dropout and the standard CNN classifier.
Abstract: In this paper we explore a new model focused on integrating two classifiers; Convolutional Neural Network (CNN) and Support Vector Machine (SVM) for offline Arabic handwriting recognition (OAHR) on which the dropout technique was applied. The suggested system altered the trainable classifier of the CNN by the SVM classifier. A convolutional network is beneficial for extracting features information and SVM functions as a recognizer. It was found that this model both automatically extracts features from the raw images and performs classification. Additionally, we protected our model against over-fitting due to the powerful performance of dropout. In this work, the recognition on the handwritten Arabic characters was evaluated; the training and test sets were taken from the HACDB and IFN/ENIT databases. Simulation results proved that the new design based-SVM of the CNN classifier architecture with dropout performs significantly more efficiently than CNN based-SVM model without dropout and the standard CNN classifier. The performance of our model is compared with character recognition accuracies gained from state-of-the-art Arabic Optical Character Recognition, producing favorable results.

166 citations


Journal ArticleDOI
TL;DR: It is proposed that font recognition on a single Chinese character is a sequence classification problem, which can be effectively solved by recurrent neural networks and integrated a principal component convolution layer with the 2-D long short-term memory (2DLSTM) and developed principal component 2 DLSTM (PC-2DL STM) algorithm.
Abstract: Chinese character font recognition (CCFR) has received increasing attention as the intelligent applications based on optical character recognition becomes popular. However, traditional CCFR systems do not handle noisy data effectively. By analyzing in detail the basic strokes of Chinese characters, we propose that font recognition on a single Chinese character is a sequence classification problem, which can be effectively solved by recurrent neural networks. For robust CCFR, we integrate a principal component convolution layer with the 2-D long short-term memory (2DLSTM) and develop principal component 2DLSTM (PC-2DLSTM) algorithm. PC-2DLSTM considers two aspects: 1) the principal component layer convolution operation helps remove the noise and get a rational and complete font information and 2) simultaneously, 2DLSTM deals with the long-range contextual processing along scan directions that can contribute to capture the contrast between character trajectory and background. Experiments using the frequently used CCFR dataset suggest the effectiveness of PC-2DLSTM compared with other state-of-the-art font recognition methods.

105 citations


Journal ArticleDOI
TL;DR: An implicit segmentation based recognition system for Urdu text lines in Nastaliq script that relies on sliding overlapped windows on lines of text and extracting a set of statistical features is presented.

75 citations


Journal ArticleDOI
TL;DR: This discussion provides a very comprehensive review of the state-of-the-art of the field of OCR and gives a detailed overview of the challenges that might emerge in OCR stages.
Abstract: In many different fields, there is a high demand for storing information to a computer storage disk from the data available in printed or handwritten documents or images to later re-utilize this information by means of computers. One simple way to store information to a computer system from these printed documents could be first to scan the documents and then store them as image files. But to re-utilize this information, it would very difficult to read or query text or other information from these image files. Therefore a technique to automatically retrieve and store information, in particular text, from image files is needed. Optical character recognition is an active research area that attempts to develop a computer system with the ability to extract and process text from images automatically. The objective of OCR is to achieve modification or conversion of any form of text or text-containing documents such as handwritten text, printed or scanned text images, into an editable digital format for deeper and further processing. Therefore, OCR enables a machine to automatically recognize text in such documents. Some major challenges need to be recognized and handled in order to achieve a successful automation. The font characteristics of the characters in paper documents and quality of images are only some of the recent challenges. Due to these challenges, characters sometimes may not be recognized correctly by computer system. In this paper we investigate OCR in four different ways. First we give a detailed overview of the challenges that might emerge in OCR stages. Second, we review the general phases of an OCR system such as pre-processing, segmentation, normalization, feature extraction, classification and post-processing. Then, we highlight developments and main applications and uses of OCR and finally, a brief OCR history are discussed. Therefore, this discussion provides a very comprehensive review of the state-of-the-art of the field.

74 citations



Patent
14 Mar 2016
TL;DR: In this paper, a system for extracting and monitoring media tags within video content includes at least one server in communication with a plurality of content sources, the server receiving video content from the content sources and a recorder saving the video content, a detector detecting one or more unknown text within the frame, each image associated with one of the unknown text, the detector generating metadata associated with the unknown texts appearing in the frame.
Abstract: A system for extracting and monitoring media tags within video content includes at least one server in communication with a plurality of content sources, the server receiving video content from the content sources, a recorder saving the video content, a detector receiving at least one frame of the video content, the detector detecting one or more unknown text within the frame and creating one or more images, each image associated with one of the one or more unknown text, the detector generating metadata associated with the one or more unknown text appearing in the frame, and an optical character recognition engine scanning the one or more images and converting the one or more images into one or more known text. The server further determines that the one or more known text is a media tag.

67 citations


Patent
28 Mar 2016
TL;DR: In this paper, a system for processing and extracting content from an image of a driver's license captured using a mobile device is presented, where the corrected image is then further processed by cropping the image, identifying the format and layout of the DL, binarizing the image and extracting the content using optical character recognition (OCR).
Abstract: Systems and methods are provided for processing and extracting content from an image of a driver's license captured using a mobile device. In one embodiment, an image of a driver's license (DL) is captured by a mobile device and corrected to improve the quality of the image. The corrected image is then further processed by cropping the image, identifying the format and layout of the DL, binarizing the image and extracting the content using optical character recognition (OCR). Multiple methods of image cropping may be implemented to accurately assess the borders of the DL, and a secondary layout identification process may be performed to ensure that the content being extracted is properly classified.

66 citations


Posted Content
TL;DR: A general-purpose, deep learning-based system to decompile an image into presentational markup that employs a convolutional network for text and layout recognition in tandem with an attention-based neural machine translation system.
Abstract: Building on recent advances in image caption generation and optical character recognition (OCR), we present a general-purpose, deep learning-based system to decompile an image into presentational markup. While this task is a well-studied problem in OCR, our method takes an inherently different, data-driven approach. Our model does not require any knowledge of the underlying markup language, and is simply trained end-to-end on real-world example data. The model employs a convolutional network for text and layout recognition in tandem with an attention-based neural machine translation system. To train and evaluate the model, we introduce a new dataset of real-world rendered mathematical expressions paired with LaTeX markup, as well as a synthetic dataset of web pages paired with HTML snippets. Experimental results show that the system is surprisingly effective at generating accurate markup for both datasets. While a standard domain-specific LaTeX OCR system achieves around 25% accuracy, our model reproduces the exact rendered image on 75% of examples.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a novel benchmark composed of a dataset designed to focus specifically on the character segmentation step of the ALPR within an evaluation protocol, which is composed of 2,000 Brazilian license plates consisting of 14,000 alphanumeric symbols and their corresponding bounding box annotations.
Abstract: Automatic License Plate Recognition (ALPR) has been the focus of many researches in the past years. In general, ALPR is divided into the following problems: detection of on-track vehicles, license plates detection, segmention of license plate characters and optical character recognition (OCR). Even though commercial solutions are available for controlled acquisition conditions, e.g., the entrance of a parking lot, ALPR is still an open problem when dealing with data acquired from uncontrolled environments, such as roads and highways when relying only on imaging sensors. Due to the multiple orientations and scales of the license plates captured by the camera, a very challenging task of the ALPR is the License Plate Character Segmentation (LPCS) step, which effectiveness is required to be (near) optimal to achieve a high recognition rate by the OCR. To tackle the LPCS problem, this work proposes a novel benchmark composed of a dataset designed to focus specifically on the character segmentation step of the ALPR within an evaluation protocol. Furthermore, we propose the Jaccard-Centroid coefficient, a new evaluation measure more suitable than the Jaccard coefficient regarding the location of the bounding box within the ground-truth annotation. The dataset is composed of 2,000 Brazilian license plates consisting of 14,000 alphanumeric symbols and their corresponding bounding box annotations. We also present a new straightforward approach to perform LPCS efficiently. Finally, we provide an experimental evaluation for the dataset based on four LPCS approaches and demonstrate the importance of character segmentation for achieving an accurate OCR.

Journal ArticleDOI
TL;DR: The results were obtained at the character level for both cursive Urdu and non-cursive English scripts are significant and suggest that the BLSTM technique is potentially more useful than the existing OCR algorithms.
Abstract: Character recognition has been widely used since its inception in applications involved processing of scanned or camera-captured documents. There exist multiple scripts in which the languages are written. The scripts could broadly be divided into cursive and non-cursive scripts. The recurrent neural networks have been proved to obtain state-of-the-art results for optical character recognition. We present a thorough investigation of the performance of recurrent neural network (RNN) for cursive and non-cursive scripts. We employ bidirectional long short-term memory (BLSTM) networks, which is a variant of the standard RNN. The output layer of the architecture used to carry out our investigation is a special layer called connectionist temporal classification (CTC) which does the sequence alignment. The CTC layer takes as an input the activations of LSTM and aligns the target labels with the inputs. The results were obtained at the character level for both cursive Urdu and non-cursive English scripts are significant and suggest that the BLSTM technique is potentially more useful than the existing OCR algorithms.

Proceedings ArticleDOI
Anisha Priya1, Surbhi Mishra1, Saloni Raj1, Sudarshan Mandal1, Sujoy Datta1 
06 Apr 2016
TL;DR: The architecture, the steps involved, and the various proposed methodologies of offline and online character recognition along with their comparison and few applications are discussed.
Abstract: Handwritten character recognition has been one of the most fascinating research among the various researches in field of image processing. In Handwritten character recognition method the input is scanned from images, documents and real time devices like tablets, tabloids, digitizers etc. which are then interpreted into digital text. There are basically two approaches — Online Handwritten recognition which takes the input at run time and Offline Handwritten Recognition which works on scanned images. In this paper we have discussed the architecture, the steps involved, and the various proposed methodologies of offline and online character recognition along with their comparison and few applications.

Book ChapterDOI
12 Dec 2016
TL;DR: The present evaluation is expected to advance OCR research, providing new insights and consideration to the research area, and assist researchers to determine which service is ideal for optical character recognition in an accurate and efficient manner.
Abstract: Optical character recognition (OCR) as a classic machine learning challenge has been a longstanding topic in a variety of applications in healthcare, education, insurance, and legal industries to convert different types of electronic documents, such as scanned documents, digital images, and PDF files into fully editable and searchable text data. The rapid generation of digital images on a daily basis prioritizes OCR as an imperative and foundational tool for data analysis. With the help of OCR systems, we have been able to save a reasonable amount of effort in creating, processing, and saving electronic documents, adapting them to different purposes. A set of different OCR platforms are now available which, aside from lending theoretical contributions to other practical fields, have demonstrated successful applications in real-world problems. In this work, several qualitative and quantitative experimental evaluations have been performed using four well-know OCR services, including Google Docs OCR, Tesseract, ABBYY FineReader, and Transym. We analyze the accuracy and reliability of the OCR packages employing a dataset including 1227 images from 15 different categories. Furthermore, we review the state-of-the-art OCR applications in healtcare informatics. The present evaluation is expected to advance OCR research, providing new insights and consideration to the research area, and assist researchers to determine which service is ideal for optical character recognition in an accurate and efficient manner.

Journal ArticleDOI
TL;DR: In this article, the authors propose a benchmark composed of a dataset designed to focus specifically on the character segmentation step of the ALPR within an evaluation protocol, which is composed of 2000 Brazilian license plates consisting of 14,000 alphanumeric symbols and their corresponding bounding box annotations.
Abstract: Automatic license plate recognition (ALPR) has been the focus of many researches in the past years. In general, ALPR is divided into the following problems: detection of on-track vehicles, license plate detection, segmentation of license plate characters, and optical character recognition (OCR). Even though commercial solutions are available for controlled acquisition conditions, e.g., the entrance of a parking lot, ALPR is still an open problem when dealing with data acquired from uncontrolled environments, such as roads and highways when relying only on imaging sensors. Due to the multiple orientations and scales of the license plates captured by the camera, a very challenging task of the ALPR is the license plate character segmentation (LPCS) step, because its effectiveness is required to be (near) optimal to achieve a high recognition rate by the OCR. To tackle the LPCS problem, this work proposes a benchmark composed of a dataset designed to focus specifically on the character segmentation step of the ALPR within an evaluation protocol. Furthermore, we propose the Jaccard-centroid coefficient, an evaluation measure more suitable than the Jaccard coefficient regarding the location of the bounding box within the ground-truth annotation. The dataset is composed of 2000 Brazilian license plates consisting of 14000 alphanumeric symbols and their corresponding bounding box annotations. We also present a straightforward approach to perform LPCS efficiently. Finally, we provide an experimental evaluation for the dataset based on five LPCS approaches and demonstrate the importance of character segmentation for achieving an accurate OCR.

Proceedings Article
01 May 2016
TL;DR: Experimentation shows that the Machine Translation for Error Correction method is superior to other Language Modelling correction techniques, with nearly 13% relative improvement compared to the initial baseline.
Abstract: A trend to digitize historical paper-based archives has emerged in recent years, with the advent of digital optical scanners. A lot of paper-based books, textbooks, magazines, articles, and documents are being transformed into electronic versions that can be manipulated by a computer. For this purpose, Optical Character Recognition (OCR) systems have been developed to transform scanned digital text into editable computer text. However, different kinds of errors in the OCR system output text can be found, but Automatic Error Correction tools can help in performing the quality of electronic texts by cleaning and removing noises. In this paper, we perform a qualitative and quantitative comparison of several error-correction techniques for historical French documents. Experimentation shows that our Machine Translation for Error Correction method is superior to other Language Modelling correction techniques, with nearly 13% relative improvement compared to the initial baseline.

Book ChapterDOI
01 Jan 2016
TL;DR: A new high performance algorithm is presented in the segmentation on Palm leaf manuscript's Isan Dhamma overlap characters for optical character recognition (OCR) system.
Abstract: The segmentation is the first important step for optical character recognition (OCR) system. It separate the image text documents into line, characters and word. The accuracy of the recognition system mainly rely on the algorithm in segmentation. The challenge of segmentation is overlapping therefore this paper presents a new high performance algorithm in the segmentation on Palm leaf manuscript’s Isan Dhamma overlap characters.

Journal ArticleDOI
TL;DR: This paper proposes a novel approach to the sliding window technique for feature extraction, and presents a framework for the recognition of unseen fonts, which employs font association and HMM adaptation techniques.

01 Jan 2016
TL;DR: In this article, the authors provide an overview of different aspects of optical character recognition and discuss corresponding proposals aimed at resolving issues of OCR, and summarize the research so far done in the field.
Abstract: —Optical Character Recognition (OCR) has been a topic of interest for many years. It is defined as the process of digitizing a document image into its constituent characters. Despite decades of intense research, developing OCR with capabilities comparable to that of human still remains an open challenge. Due to this challenging nature, researchers from industry and academic circles have directed their attentions towards Optical Character Recognition. Over the last few years, the number of academic laboratories and companies involved in research on Character Recognition has increased dramatically. This research aims at summarizing the research so far done in the field of OCR. It provides an overview of different aspects of OCR and discusses corresponding proposals aimed at resolving issues of OCR.

Journal Article
TL;DR: This article applied neural network-based Optical Character Recognition (OCR) to scanned images of books printed between 1487 and 1870 by training the OCR engine OCRopus on the RIDGES herbal text corpus.
Abstract: This article describes the results of a case study that applies Neural Network-based Optical Character Recognition (OCR) to scanned images of books printed between 1487 and 1870 by training the OCR engine OCRopus [Breuel et al. 2013] on the RIDGES herbal text corpus [Odebrecht et al. 2017] (in press). Training specific OCR models was possible because the necessary ground truth is available as error-corrected diplomatic transcriptions. The OCR results have been evaluated for accuracy against the ground truth of unseen test sets. Character and word accuracies (percentage of correctly recognized items) for the resulting machine-readable texts of individual documents range from 94% to more than 99% (character level) and from 76% to 97% (word level). This includes the earliest printed books, which were thought to be inaccessible by OCR methods until recently. Furthermore, OCR models trained on one part of the corpus consisting of books with different printing dates and different typesets (mixed models) have been tested for their predictive power on the books from the other part containing yet other fonts, mostly yielding character accuracies well above 90%. It therefore seems possible to construct generalized models trained on a range of fonts that can be applied to a wide variety of historical printings still giving good results. A moderate postcorrection effort of some pages will then enable the training of individual models with even better accuracies. Using this method, diachronic corpora including early printings can be constructed much faster and cheaper than by manual transcription. The OCR methods reported here open up the possibility of transforming our printed textual cultural heritage into electronic text by largely automatic means, which is a prerequisite for the mass conversion of scanned books.

Proceedings ArticleDOI
01 Sep 2016
TL;DR: This paper presents camera based system which will help blind person for reading text patterns printed on hand held objects and the framework to assist visually impaired persons to read text patterns and convert it into the audio output.
Abstract: This paper presents camera based system which will help blind person for reading text patterns printed on hand held objects. This is the framework to assist visually impaired persons to read text patterns and convert it into the audio output. To obtain the object from the background and extract the text pattern from that object, the system first proposes the method that will capture the image from the camera and object region is detected. The text which are maximally stable are detected using Maximally Stable External Regions (MSER) feature. A novel algorithm is evaluated on variety of scenes. The detected text is compared with the template and converted into the speech output. The text patterns are localized and binarized using Optical Character Recognition (OCR). The recognized text is converted to an audio output. The speech output is given to the blind user. Experimental results shows the analysis of MSER and OCR for different text patterns. MSER shows that it is robust algorithm for the text detection. Therefore, this paper deals with analysis of detection and recognition of different text patterns on different objects.

Proceedings ArticleDOI
01 Sep 2016
TL;DR: In this approach, the vertical edge detection algorithm is applied and removes unwanted edges by image normalization technique and the LP region is extracted by incorporating statistical and morphological image processing techniques.
Abstract: Automatic License Plate Recognition (ALPR) systems are employed for detection and recognition of license plate/number plate of vehicles. The performance of existing systems is well below the desired level. In this perspective, there is a definite need to propose a system to overcome the limitations of currently available systems. A new approach is being introduced in this paper for fast and efficient implementation of ALPR system. In this approach, the vertical edge detection algorithm is applied and removes unwanted edges by image normalization technique. The LP region is extracted by incorporating statistical and morphological image processing techniques. For character recognition, the template matching is employed for optical character recognition (OCR). The algorithm is tested on 500 real time images, which are acquired under different illumination conditions and from different scenes. Overall efficiency of the proposed method is 84.8% and the execution time is less than 0.5sec.

Journal ArticleDOI
TL;DR: An extensive experiment on a diverse set of scanned historical maps is presented to provide measures of baseline performance of a standard text recognition tool under varying map conditions (graphical quality) and text representations (that can vary even within the same map sheet).

Proceedings ArticleDOI
13 Sep 2016
TL;DR: The experiments prove that the overall document classification accuracy of a Convolutional Neural Network trained using these text-augmented document images is considerably higher than the one achieved by a similar model trained solely on classic document images, especially when different classes of documents share similar visual characteristics.
Abstract: In this paper we introduce a novel document image classification method based on combined visual and textual information. The proposed algorithm's pipeline is inspired to the ones of other recent state-of-the-art methods which perform document image classification using Convolutional Neural Networks. The main addition of our work is the introduction of a preprocessing step embedding additional textual information into the processed document images. To do so we combine Optical Character Recognition and Natural Language Processing algorithms to extract and manipulate relevant text concepts from document images. Such textual information is then visually embedded within each document image to improve the classification results of a Convolutional Neural Network. Our experiments prove that the overall document classification accuracy of a Convolutional Neural Network trained using these text-augmented document images is considerably higher than the one achieved by a similar model trained solely on classic document images, especially when different classes of documents share similar visual characteristics.

Proceedings ArticleDOI
19 Jul 2016
TL;DR: This paper represents an Artificial Neural Network based approach for the recognition of English characters using feed forward neural network that gives 99% accuracy for numeric digits, 97% accuracies for capital letters, 96% Accuracy for small letters, and 93%uracy for alphanumeric characters by considering inter-class similarity measurement.
Abstract: This paper represents an Artificial Neural Network based approach for the recognition of English characters using feed forward neural network. Noise has been considered as one of the major issue that degrades the performance of character recognition system. Our feed forward network has one input, one hidden and one output layer. The entire recognition system is divided into two sections such as training and recognition section. Both sections include image acquisition, preprocessing and feature extraction. Training and recognition section also include training of the classifier and simulation of the classifier respectively. Preprocessing involves digitization, noise removal, binarization, line segmentation and character extraction. After character extraction, the extracted character matrix is normalized into 12×8 matrix. Then features are extracted from the normalized image matrix which is fed to the network. The network consists of 96 input neurons and 62 output neurons. We train our network by proposed training algorithm in a supervised manner and establish the network. Eventually, we have tested our trained network with more than 10 samples per character and gives 99% accuracy for numeric digits (0∼9), 97% accuracy for capital letters (A∼Z), 96% accuracy for small letters (a∼z) and 93% accuracy for alphanumeric characters by considering inter-class similarity measurement.

Book ChapterDOI
07 Nov 2016
TL;DR: The effects of sine cosine algorithm (SCA) on reducing the compactness K-means Clustering as the objective function is investigated and the proposed approach provides the highest value than the famous binarization methods.
Abstract: Historic manuscript image binarization is considered an important step due to the different kinds of degradation effects on optical character recognition (OCR) or word spotting systems. Previous methods failed on to find the optimal threshold for binarization. In this paper, we investigate the effects of sine cosine algorithm (SCA) on reducing the compactness K-means Clustering as the objective function. The SCA searches for the optimal clustering of the given handwritten manuscript image into compact clusters under some constraints. The proposed approach is evaluated and assessed on a set of selected handwritten Arabic manuscript images. The Experimental result shows that the proposed approach provides the highest value than the famous binarization methods such as; Otsu’s and Niblack’s in terms of F-measure, Pseudo- F-measure, PSNR, Geometric accuracy and the low value on DRD, NRM, MPM.

Proceedings ArticleDOI
01 Oct 2016
TL;DR: This paper presents the first Pashto text image database for scientific research and thereby the first dataset with complete handwritten and printed text line images which ultimately covers all alphabets of Arabic and Persian languages.
Abstract: This paper presents the first Pashto text image database for scientific research and thereby the first dataset with complete handwritten and printed text line images which ultimately covers all alphabets of Arabic and Persian languages. Language like Pashto, written in a complex way by calligraphers, still requires a mature Optical Character Recognition (OCR), system. Although 50 million people use this language both for oral and written communication, there is no significant effort which is devoted to the recognition of Pashto Script. A real dataset of 17,015 images having Pashto text lines is introduced. The images are acquired via scanning from hand scribed Pashto books. Further, in this work, we evaluated the performance of deep learning based models like Bidirectional and Multi-Dimensional Long Short Term Memory (BLSTM and MDLSTM) networks for Pashto texts and provide a baseline character error rate of 9.22%.

Journal ArticleDOI
TL;DR: An important step towards the standardization of research works on Optical Character Recognition in Persian language is presented, which describes the formations of a standard handwritten database, including isolated digits, isolated signs, multi-digit numbers, numerical strings, courtesy amounts, and postal codes.

Proceedings ArticleDOI
01 Dec 2016
TL;DR: The results of experiment based the algorithms in the paper illustrate that accuracy rate of character recognition is very high, and the algorithms can fully meet the actual demand of automatic recognition.
Abstract: In order to make the computer own the knowledge about Chinese vehicle license plate segmentation and recognition, the paper put forward a set of algorithms about license plate segmentation and recognition. The algorithms are divided into four parts: image preprocessing, license plate location, license plate segmentation and character recognition. The aim of image preprocessing is quickly and easily location the license plate, so the image preprocessing algorithm is one of the important factors that affect total system performance. Because the algorithm of license plate location directly affects the accuracy of character segmentation and character recognition. So, the algorithm of license plate location is proposed according to characteristics of Chinese vehicle license plate. The algorithm of license plate segmentation uses the vertical projection method about license plate in this paper. According to the license plate segmentation character, the training model can be generated using tool of BPNN(back propagation neural network), which is the key of the character recognition algorithm about license plate. The results of experiment based the algorithms in the paper illustrate that accuracy rate of character recognition is very high, and the algorithms can fully meet the actual demand of automatic recognition. The algorithms can take advantage of the training model to perfectly realize recognition the license plate, and have application value in the real work.