scispace - formally typeset
Search or ask a question

Showing papers on "Optical character recognition published in 2019"


Journal ArticleDOI
TL;DR: A multi-object rectified attention network (MORAN) for general scene text recognition that can read both regular and irregular scene text and achieves state-of-the-art performance.

291 citations


Proceedings ArticleDOI
01 Jan 2019
TL;DR: This work presents a new dataset for form understanding in noisy scanned documents (FUNSD) that aims at extracting and structuring the textual content of forms, and is the first publicly available dataset with comprehensive annotations to address FoUn task.
Abstract: We present a new dataset for form understanding in noisy scanned documents (FUNSD) that aims at extracting and structuring the textual content of forms. The dataset comprises 199 real, fully annotated, scanned forms. The documents are noisy and vary widely in appearance, making form understanding (FoUn) a challenging task. The proposed dataset can be used for various tasks, including text detection, optical character recognition, spatial layout analysis, and entity labeling/linking. To the best of our knowledge, this is the first publicly available dataset with comprehensive annotations to address FoUn task. We also present a set of baselines and introduce metrics to evaluate performance on the FUNSD dataset, which can be downloaded at https://guillaumejaume.github.io/FUNSD.

190 citations


Proceedings ArticleDOI
01 Sep 2019
TL;DR: The ICDAR 2019 Challenge on "Scanned receipts OCR and key information extraction" (SROIE) covers important aspects related to the automated analysis of scanned receipts, and is considered to evolve into a useful resource for the community, drawing further attention and promoting research and development efforts in this field.
Abstract: The ICDAR 2019 Challenge on "Scanned receipts OCR and key information extraction" (SROIE) covers important aspects related to the automated analysis of scanned receipts. The SROIE tasks play a key role in many document analysis systems and hold significant commercial potential. Although a lot of work has been published over the years on administrative document analysis, the community has advanced relatively slowly, as most datasets have been kept private. One of the key contributions of SROIE to the document analysis community is to offer a first, standardized dataset of 1000 whole scanned receipt images and annotations, as well as an evaluation procedure for such tasks. The Challenge is structured around three tasks, namely Scanned Receipt Text Localization (Task 1), Scanned Receipt OCR (Task 2) and Key Information Extraction from Scanned Receipts (Task 3). The competition opened on 10th February, 2019 and closed on 5th May, 2019. We received 29, 24 and 18 valid submissions received for the three competition tasks, respectively. This report presents the competition datasets, define the tasks and the evaluation protocols, offer detailed submission statistics, as well as an analysis of the submitted performance. While the tasks of text localization and recognition seem to be relatively easy to tackle, it is interesting to observe the variety of ideas and approaches proposed for the information extraction task. According to the submissions' performance we believe there is still margin for improving information extraction performance, although the current dataset would have to grow substantially in following editions. Given the success of the SROIE competition evidenced by the wide interest generated and the healthy number of submissions from academic, research institutes and industry over different countries, we consider that the SROIE competition can evolve into a useful resource for the community, drawing further attention and promoting research and development efforts in this field.

143 citations


Proceedings ArticleDOI
01 Sep 2019
TL;DR: This paper introduces a novel task of visual question answering by reading text in images, i.e., by optical character recognition or OCR, and introduces a large-scale dataset, namely OCRVQA-200K, which comprises of 207,572 images of book covers and contains more than 1 million question-answer pairs about these images.
Abstract: The problem of answering questions about an image is popularly known as visual question answering (or VQA in short). It is a well-established problem in computer vision. However, none of the VQA methods currently utilize the text often present in the image. These "texts in images" provide additional useful cues and facilitate better understanding of the visual content. In this paper, we introduce a novel task of visual question answering by reading text in images, i.e., by optical character recognition or OCR. We refer to this problem as OCR-VQA. To facilitate a systematic way of studying this new problem, we introduce a large-scale dataset, namely OCRVQA-200K. This dataset comprises of 207,572 images of book covers and contains more than 1 million question-answer pairs about these images. We judiciously combine well-established techniques from OCR and VQA domains to present a novel baseline for OCR-VQA-200K. The experimental results and rigorous analysis demonstrate various challenges present in this dataset leaving ample scope for the future research. We are optimistic that this new task along with compiled dataset will open-up many exciting research avenues both for the document image analysis and the VQA communities.

96 citations


Journal ArticleDOI
TL;DR: In this article, the authors presented a new and comprehensive Urdu handwritten offline database name Urdu-Nasta'liq handwritten dataset (UNHD) which covers commonly used ligatures that were written by 500 writers with their natural handwriting on A4 size paper.
Abstract: The recognition of cursive script is regarded as a subtle task in optical character recognition due to its varied representation Every cursive script has different nature and associated challenges As Urdu is one of cursive language that is derived from Arabic script, that is why it nearly shares the similar challenges and complexities but with more intensity We can categorize Urdu and Arabic language on basis of its script they use Urdu is mostly written in Nasta’liq style, whereas Arabic follows Naskh style of writing This paper presents new and comprehensive Urdu handwritten offline database name Urdu-Nasta’liq handwritten dataset (UNHD) Currently, there is no standard and comprehensive Urdu handwritten dataset available publicly for researchers The acquired dataset covers commonly used ligatures that were written by 500 writers with their natural handwriting on A4 size paper UNHD is publically available and can be download form https://sitesgooglecom/site/researchonurdulanguage1/databases We performed experiments using recurrent neural networks and reported a significant accuracy for handwritten Urdu character recognition

60 citations


Journal ArticleDOI
TL;DR: This paper provides details of a newly created dataset of Chinese text with about 1 million Chinese characters from 3 850 unique ones annotated by experts in over 30 000 street view images and gives baseline results using state-of-the-art methods.
Abstract: In this paper, we introduce a very large Chinese text dataset in the wild. While optical character recognition (OCR) in document images is well studied and many commercial tools are available, the detection and recognition of text in natural images is still a challenging problem, especially for some more complicated character sets such as Chinese text. Lack of training data has always been a problem, especially for deep learning methods which require massive training data. In this paper, we provide details of a newly created dataset of Chinese text with about 1 million Chinese characters from 3 850 unique ones annotated by experts in over 30 000 street view images. This is a challenging dataset with good diversity containing planar text, raised text, text under poor illumination, distant text, partially occluded text, etc. For each character, the annotation includes its underlying character, bounding box, and six attributes. The attributes indicate the character’s background complexity, appearance, style, etc. Besides the dataset, we give baseline results using state-of-the-art methods for three tasks: character recognition (top-1 accuracy of 80.5%), character detection (AP of 70.9%), and text line detection (AED of 22.1). The dataset, source code, and trained models are publicly available.

56 citations


Proceedings ArticleDOI
01 Sep 2019
TL;DR: In this paper, a robust deep learning based approach was proposed to extract rows and columns from a detected table in document images with a high precision using a bi-directional Recurrent Neural Network with Gated Recurrent Units (GRU) followed by a fully connected layer with softmax activation.
Abstract: Tables present summarized and structured information to the reader, which makes table's structure extraction an important part of document understanding applications. However, table structure identification is a hard problem not only because of the large variation in the table layouts and styles, but also owing to the variations in the page layouts and the noise contamination levels. A lot of research has been done to identify table structure, most of which is based on applying heuristics with the aid of optical character recognition (OCR) to hand pick layout features of the tables. These methods fail to generalize well because of the variations in the table layouts and the errors generated by OCR. In this paper, we have proposed a robust deep learning based approach to extract rows and columns from a detected table in document images with a high precision. In the proposed solution, the table images are first pre-processed and then fed to a bi-directional Recurrent Neural Network with Gated Recurrent Units (GRU) followed by a fully-connected layer with softmax activation. The network scans the images from top-to-bottom as well as left-to-right and classifies each input as either a row-separator or a column-separator. We have benchmarked our system on publicly available UNLV as well as ICDAR 2013 datasets on which it outperformed the state-of-theart table structure extraction systems by a significant margin.

53 citations


Proceedings ArticleDOI
02 Jun 2019
TL;DR: The results of a statistical analysis of OCR errors on four document collections are described and several suggestions related to the design and implementation of effective OCR post-processing approaches are given.
Abstract: Post-OCR is an important processing step that follows optical character recognition (OCR) and is meant to improve the quality of OCR documents by detecting and correcting residual errors. This paper describes the results of a statistical analysis of OCR errors on four document collections. Five aspects related to general OCR errors are studied and compared with human-generated misspellings, including edit operations, length effects, erroneous character positions, real-word vs. non-word errors, and word boundaries. Based on the observations from the analysis we give several suggestions related to the design and implementation of effective OCR post-processing approaches.

51 citations


Journal ArticleDOI
04 Jun 2019-Entropy
TL;DR: The image preprocessing methodology with the use of local image entropy filtering is proposed, allowing for the improvement of various commonly used image thresholding methods, which can be useful also for text recognition purposes.
Abstract: Automatic text recognition from the natural images acquired in uncontrolled lighting conditions is a challenging task due to the presence of shadows hindering the shape analysis and classification of individual characters. Since the optical character recognition methods require prior image binarization, the application of classical global thresholding methods in such case makes it impossible to preserve the visibility of all characters. Nevertheless, the use of adaptive binarization does not always lead to satisfactory results for heavily unevenly illuminated document images. In this paper, the image preprocessing methodology with the use of local image entropy filtering is proposed, allowing for the improvement of various commonly used image thresholding methods, which can be useful also for text recognition purposes. The proposed approach was verified using a dataset of 140 differently illuminated document images subjected to further text recognition. Experimental results, expressed as Levenshtein distances and F-Measure values for obtained text strings, are promising and confirm the usefulness of the proposed approach.

45 citations


Journal ArticleDOI
TL;DR: This work is the successful attempt towards recognition of medieval handwritten Gurmukhi manuscripts and it can lead towards the development of optical character recognition systems for recognizing medieval handwritten documents in other Indic and non-Indic scripts as well.
Abstract: Recognition of medieval handwritten Gurmukhi manuscripts is an essential process for resourceful contents exploitation of the priceless information contained in them. There are numerous Gurmukhi script ancient manuscripts from fifteenth to twentieth century’s. In this paper, we have considered, work written by various persons from 18th to 20th centuries. For recognition, we have used various feature extraction techniques like zoning, discrete cosine transformations, and gradient features and different combinations of these features. For classification, four classifiers, namely, k-NN, SVM, Decision Tree, Random Forest individual and combinations of these four classifiers with voting scheme have been considered. Adaptive boosting and bagging have been explored for improving the recognition results and achieves the new state of the art for recognition of medieval handwritten Gurmukhi manuscripts recognition. Using this proposed framework, maximum recognition accuracy of 95.91% has been achieved using adaptive boosting technique and a combination of four different classifiers considered in this paper. To the best of our knowledge, this work is the successful attempt towards recognition of medieval handwritten Gurmukhi manuscripts and it can lead towards the development of optical character recognition systems for recognizing medieval handwritten documents in other Indic and non-Indic scripts as well.

43 citations


Proceedings ArticleDOI
01 Sep 2019
TL;DR: This paper takes a step forward to full end-to-end scientific table recognition by constructing a large dataset consisting of 450K table images paired with corresponding LaTeX sources and applying a popular attentional encoder-decoder model to this dataset.
Abstract: In recent years, end-to-end trained neural models have been applied successfully to various optical character recognition (OCR) tasks. However, the same level of success has not yet been achieved in end-to-end neural scientific table recognition, which involves multi-row/multi-column structures and math formulas in cells. In this paper, we take a step forward to full end-to-end scientific table recognition by constructing a large dataset consisting of 450K table images paired with corresponding LaTeX sources. We apply a popular attentional encoder-decoder model to this dataset and show the potential of end-to-end trained neural systems, as well as associated challenges.

Proceedings ArticleDOI
01 Sep 2019
TL;DR: This work first studies the performance of state-of-the-art text classification approaches when applied to noisy text obtained from OCR, and shows that fusing this textual information with visual CNN methods produces state of theart results on the RVL-CDIP classification dataset.
Abstract: State-of-the-art methods for document image classification rely on visual features extracted by deep convolutional neural networks (CNNs). These methods do not utilize rich semantic information present in the text of the document, which can be extracted using Optical Character Recognition (OCR). We first study the performance of state-of-the-art text classification approaches when applied to noisy text obtained from OCR. We then show that fusing this textual information with visual CNN methods produces state-of-the-art results on the RVL-CDIP classification dataset.

Book ChapterDOI
01 Jan 2019
TL;DR: The aim of this work is to incorporate machine learning techniques to automate and improve the existing banking processes, which can be achieved through automatic cheque processing, using convolution neural network with image processing techniques on bank cheques.
Abstract: Banking system worldwide suffers from huge dependencies upon manpower and written documents thus making conventional banking processes tedious and time-consuming Existing methods for processing transactions made through cheques causes a delay in the processing as the details have to be manually entered Optical Character Recognition (OCR) finds usage in various fields of data entry and identification purposes The aim of this work is to incorporate machine learning techniques to automate and improve the existing banking processes, which can be achieved through automatic cheque processing The method used is Handwritten Character Recognition where pattern recognition is clubbed with machine learning to design an Optical Character Recognizer for digits and capital alphabets which can be both printed and handwritten The Extension of Modified National Institute of Standards and Technology (EMNIST) dataset, a standard dataset for alphabets and digits is used for training the machine learning model The machine learning model used is 2D Convolution Neural Network which fetched a training accuracy of 98% for digits and 97% for letters Image processing techniques such as segmentation and extraction are applied for cheque processing Otsu thresholding, a type of global thresholding is applied on the processed output The processed segmented image of each character is fed to the trained model and the predicted results are obtained From a pool of sample cheques that were used for testing an accuracy of 9571% was achieved The idea of combining convolution neural network with image processing techniques on bank cheques is novel and can be deployed in banking sectors

Journal ArticleDOI
TL;DR: The proposed technique utilizes effective Tamil character recognition by means of optimal artificial neural network, which is used for recognizing the characters from scanned input digital image and converting them into machine editable form.
Abstract: Nowadays, recognition of machine printed or hand printed document is essential part in applications. Optical character recognition is one of the techniques which are used to convert the printed or hand written file into its corresponding text format. Tamil is the south Indian language spoken widely in Tamil Nadu. It has the longest unbroken literary tradition amongst Dravidian language. Tamil character recognition (TCR) is one of the challenging tasks in optimal character recognition. It is used for recognizing the characters from scanned input digital image and converting them into machine editable form. Recognition of handwritten in Tamil character is very difficult, due to variations in size, style and orientation angle. Character editing and reprinting of text document that were printed on paper are time consuming and low accuracy. In order to overcome this problem, the proposed technique utilizes effective Tamil character recognition. The proposed method has four main process such as preprocessing process, segmentation process, feature extraction process and recognition process. For preprocessing, the input image is fed to Gaussian filter, Binarization process and skew detection technique. Then the segmentation process is carried out, here line and character segmentation is done. From the segmented output, the features are extracted. After that the feature extraction, the Tamil character is recognized by means of optimal artificial neural network. Here the traditional neural network is modified by means of optimization algorithm. In neural network, the weights are optimized by means of Elephant Herding Optimization. The performance of the proposed method is assessed with the help of the metrics namely Sensitivity, Specificity and Accuracy. The proposed approach is experimented and its results are analyzed to visualize the performance. The proposed approach will be implemented in MATLAB.

Proceedings ArticleDOI
01 Oct 2019
TL;DR: The goal of this paper is to design a robust technique for License Plate Detection in the images using deep neural networks, Pre-process the detected license plates and perform License Plate Recognition (LPR) using LSTMTesseract OCR Engine and achieve robust results.
Abstract: Among the ranking of the largest road network in the world, India stood at third position. According to a survey held in 2016 the total number of vehicles in India were 260 million. Therefore, there is a necessity to develop Expert Automatic Number Plate Recognition (ANPR) systems in India because of the tremendous rise in the number of automobiles flying on the roads. It would help in proper tracking of the vehicles,expert traffic examining, tracing stolen vehicles, supervising parking toll and imposing strict actions against red light breaching. Implementing an ANPR expert system in real life seems to be a challenging task because of the variety of number plate (NP) formats,designs, shapes, color, scales, angles and non-uniform lightning situations during image accession. So, we implemented an ANPR system which acts more robustly in different challenging scenarios then the previous proposed ANPR systems.The goal of this paper,is to design a robust technique forLicense Plate Detection(LPD) in the images using deep neural networks, Pre-process the detected license platesand performLicense Plate Recognition (LPR) usingLSTMTesseract OCR Engine. According to our experimentalresults, we have successfully achieved robust results withLPD accuracy of 99% and LPR accuracy of 95%just like commercial ANPR systemsi.e., Open-ALPRand Plate Recognizer.

Proceedings ArticleDOI
TL;DR: This paper presents a deployed, scalable optical character recognition (OCR) system, which is called Rosetta, designed to process images uploaded daily at Facebook scale, and describes Rosetta 's system architecture.
Abstract: In this paper we present a deployed, scalable optical character recognition (OCR) system, which we call Rosetta, designed to process images uploaded daily at Facebook scale. Sharing of image content has become one of the primary ways to communicate information among internet users within social networks such as Facebook and Instagram, and the understanding of such media, including its textual information, is of paramount importance to facilitate search and recommendation applications. We present modeling techniques for efficient detection and recognition of text in images and describe Rosetta's system architecture. We perform extensive evaluation of presented technologies, explain useful practical approaches to build an OCR system at scale, and provide insightful intuitions as to why and how certain components work based on the lessons learnt during the development and deployment of the system.

Proceedings ArticleDOI
01 Nov 2019
TL;DR: This paper applies federated learning with a deep convolutional network to perform variable-length text string recognition with a large corpus and shows that federated text recognition models can achieve similar or even higher accuracy than models trained on deep learning framework.
Abstract: Unsegmented text recognition is a crucial component in financial document processing systems. Financial text materials, such as receipts, transcripts, identification documents, etc. often involve critical personal information. In many circumstances, these data reside in protected servers of different institutions and must not be transferred beyond the institutional firewall. The emerging technology of Federated Learning (FL) provides a data-secure way of uniting isolated datasets in model training. Using the FL framework, text recognition models can be trained with larger collection of image samples. In previous works, federated text recognition models only deal with single-character images and alpha-numeric corpus. Such models are not competent in industrial applications, especially in Chinese text recognition problems. In this paper, we apply federated learning with a deep convolutional network to perform variable-length text string recognition with a large corpus. In our experiments, we compared two prevalent federated learning frameworks, namely, Tensorflow Federated and PySyft. Results show that federated text recognition models can achieve similar or even higher accuracy than models trained on deep learning framework. On a 5-client distributed dataset, the best character accuracy is achieved by TFF at 49.20%. Extensive experiments are also conducted to evaluate the effect of distributed data storage over the performance of trained models. TFF again achieved a maximum character precision of 54.33% with non-distributed dataset.

Proceedings ArticleDOI
01 Sep 2019
TL;DR: In this article, a segmentation-free OCR system that combines deep learning methods, synthetic training data generation, and data augmentation techniques is presented, which surpasses the accuracy of leading commercial and open-source engines on distorted text samples.
Abstract: Contrary to popular belief, Optical Character Recognition (OCR) remains a challenging problem when text occurs in unconstrained environments, like natural scenes, due to geometrical distortions, complex backgrounds, and diverse fonts. In this paper, we present a segmentation-free OCR system that combines deep learning methods, synthetic training data generation, and data augmentation techniques. We render synthetic training data using large text corpora and over 2000 fonts. To simulate text occurring in complex natural scenes, we augment extracted samples with geometric distortions and with a proposed data augmentation technique - alpha-compositing with background textures. Our models employ a convolutional neural network encoder to extract features from text images. Inspired by the recent progress in neural machine translation and language modeling, we examine the capabilities of both recurrent and convolutional neural networks in modeling the interactions between input elements. The proposed OCR system surpasses the accuracy of leading commercial and open-source engines on distorted text samples.

Journal ArticleDOI
TL;DR: A finger-worn device that can be practically applied by visually impaired users for recognizing traditional Chinese characters on the micro internet of things (IoT) processor and results illustrate that the proposed OCR system was more suitable for the needs of visually impaired people in actual use.
Abstract: This study designed a finger-worn device-named the Chinese FingerReader-that can be practically applied by visually impaired users for recognizing traditional Chinese characters on the micro internet of things (IoT) processor. The device is portable, easy to operate, and designed to be worn on the index finger. The Chinese FingerReader on the index finger contains a small camera and buttons. The small camera captures images by identifying the relative position of the index finger to the printed text, and the buttons are applied to capture an image by visually impaired users and provide the audio output of the corresponding Chinese character by a voice prompt. To recognize Chinese characters, English letters, and numbers, a robust Chinese optical character recognition (OCR) system was developed according to the training strategy of an augmented convolution neural network algorithm. The proposed Chinese OCR system can segment a single character from the captured image, and the system can accurately recognize rotated Chinese characters. The experimental results revealed that compared with the OCR application programming interfaces of Google and Microsoft, the proposed OCR system obtains 95% accuracy rate in dealing with rotated character images where the Google and Microsoft OCR APIs only obtain 65% and 34% accuracy rates. These results illustrate that the proposed OCR system was more suitable for the needs of visually impaired people in actual use. Finally, three usage scenarios were simulated, and the accuracy and operational performance of the system were tested. Field tests of this system were conducted for visually impaired users to verify its feasibility.

Journal ArticleDOI
TL;DR: This work addresses the problem of searching and retrieving similar textual images based on the detected text and opens the new directions for textual image retrieval and shows the dominancy of text is efficient and valuable for image retrieval specifically for textual images.
Abstract: This work addresses the problem of searching and retrieving similar textual images based on the detected text and opens the new directions for textual image retrieval. For image retrieval, several methods have been proposed to extract visual features and social tags; however, to extract embedded and scene text within images and use that text as automatic keywords/tags is still a young research field for text-based and content-based image retrieval applications. The automatic text detection retrieval is an emerging technology for robotics and artificial intelligence. In this study, the authors have proposed a novel approach to detect the text in an image and exploit it as keywords and tags for automatic text-based image retrieval. First, text regions are detected using maximally stable extremal region algorithm. Second, unwanted false positive text regions are eliminated based on geometric properties and stroke width transform. Next, the true text regions are proceeded into optical character recognition for recognition. Third, keywords are formed using a neural probabilistic language model. Finally, the textual images are indexed and retrieved based on the detected keywords. The experimental results on two benchmark datasets show the dominancy of text is efficient and valuable for image retrieval specifically for textual images.

Journal ArticleDOI
TL;DR: This paper uses teacher-student learning to transfer the knowledge of a large-size teacher model to a small-size compact student model, followed by Tucker decomposition to further compress the student model to reduce model size and runtime latency for CNN-DBLSTM based character model for OCR.

Journal ArticleDOI
20 Dec 2019-Sensors
TL;DR: The development of an intelligent vehicle identification system based on optical character recognition (OCR) method to be used on intelligent transportation systems is presented.
Abstract: Automatic License Plate Recognition has been a recurrent research topic due to the increasing number of cameras available in cities, where most of them, if not all, are connected to the Internet. The video traffic generated by the cameras can be analyzed to provide useful insights for the transportation segment. This paper presents the development of an intelligent vehicle identification system based on optical character recognition (OCR) method to be used on intelligent transportation systems. The proposed system makes use of an intelligent parking system named Smart Parking Service (SPANS), which is used to manage public or private spaces. Using computer vision techniques, the SPANS system is used to detect if the parking slots are available or not. The proposed system makes use of SPANS framework to capture images of the parking spaces and identifies the license plate number of the vehicles that are moving around the parking as well as parked in the parking slots. The recognition of the license plate is made in real-time, and the performance of the proposed system is evaluated in real-time.

Proceedings ArticleDOI
08 May 2019
TL;DR: The background of OCR-D is introduced, the main challenges and shortcomings in the availability of open tools and resources for OCR of historical printed documents are introduced and the various software modules and related components that are being made available through O CR-D are discussed.
Abstract: Various research projects were concerned with the development and adaptation of methods for OCR specifically for historical printed documents (cf. METAe [20], IMPACT [1], eMOP [9]). However, these initiatives have ended before the wide adoption of deep neural networks and, despite the various project's achievements, there remains a lack of OCR software that is a) comprehensive with regard to the challenges presented by the wide variety of historical documents and b) available as ready-to-use Free Software. The OCR-D project aims to rectify that. In this paper we introduce the background of OCR-D, the main challenges and shortcomings in the availability of open tools and resources for OCR of historical printed documents and discuss the various software modules and related components (repositories, workflows) that are being made available through OCR-D. Finally we provide an outlook to a number of remaining challenges that are not addressed by OCR-D and point out several examples for the positive community aspects arisen through the creation and sharing of open resources for historical German OCR.

Journal ArticleDOI
09 Jan 2019
TL;DR: The proposed enhanced HOG feature extraction method has been used so that the optical character recognition system of spam has been enhanced by using the HOGfeature extraction method in such a way to be both resistant against the character variations on scale and translation and to be computationally cost-effective.
Abstract: Generally, a spam image is an unsolicited message electronically sent to a wide group of arbitrary addresses. Due to attractiveness and more difficult detection, spam images are the most complicated type of spam. One of the ways to encounter the spam images is an optical character recognition, OCR, method. In this paper, the proposed enhanced HOG feature extraction method has been used so that the optical character recognition system of spam has been enhanced by using the HOG feature extraction method in such a way to be both resistant against the character variations on scale and translation and to be computationally cost-effective. For these purposes, two steps of the cropped image and input image size normalization have been added to pre-processing stages. Support vector machine, SVM, was employed for classification. Two heuristic modifications including thickening of the thin characters in the pre-processing stage and non-discrimination in detecting the uppercase and lowercase letters with the same shapes in the classification stage have been also proposed to increase the system recognition accuracy. In the first heuristic modification, when all pixels of the output image are empty (the character is eliminated), the original image was made thicker by one layer. In the second modification, when recognizing the letters, no differentiation was considered between the uppercase and lowercase letters with the same shapes. An average recognition accuracy of the modified HOG method with two heuristic modifications equals 91.61% on Char74K database. Then, an optimum threshold for classification was investigated by ROC curve. The optimal cutoff point was 0.736 with the highest average accuracy, 94.20%, and AUC, area under curve, for ROC and precision–recall, PR, curves were 0.96 and 0.73, respectively. The proposed method was also examined on ICDAR2003 database, and the average accuracy and its optimum using ROC curve were 82.73% and 86.01%, respectively. These results of recognition accuracy and AUC for ROC and PR curve showed an outstanding enhancement in comparison with the best recognition rate of the previous methods.

Journal ArticleDOI
TL;DR: The proposed automated system for detecting of early dyslexia symptoms is able to overcome several drawbacks of current screening methods for the dyslexic children.

Proceedings ArticleDOI
22 Oct 2019
TL;DR: This article presented a fully automatic unsupervised way of extracting parallel data for training a character-based sequence-to-sequence NMT (neural machine translation) model to conduct OCR error correction.
Abstract: A great deal of historical corpora suffer from errors introduced by the OCR (optical character recognition) methods used in the digitization process. Correcting these errors manually is a time-consuming process and a great part of the automatic approaches have been relying on rules or supervised machine learning. We present a fully automatic unsupervised way of extracting parallel data for training a character-based sequence-to-sequence NMT (neural machine translation) model to conduct OCR error correction.


Journal ArticleDOI
TL;DR: There are two main techniques to convert written or printed text into digital format; one is to create an image of written/printed text, but images are large in size so they require...
Abstract: There are two main techniques to convert written or printed text into digital format. The first technique is to create an image of written/printed text, but images are large in size so they require...

Journal ArticleDOI
TL;DR: The evaluation experiments confirmed that the performance of mathematical symbol and expression detection by the proposed method is superior to that of InftyReader, which is state-of-the-art software for mathematical OCR.
Abstract: A detection method for mathematical expressions in scientific document images is proposed. Inspired by the promising performance of U-Net, a convolutional network architecture originally proposed for the semantic segmentation of biomedical images, the proposed method uses image conversion by a U-Net framework. The proposed method does not use any information from mathematical and linguistic grammar so that it can be a supplemental bypass in the conventional mathematical optical character recognition (OCR) process pipeline. The evaluation experiments confirmed that (1) the performance of mathematical symbol and expression detection by the proposed method is superior to that of InftyReader, which is state-of-the-art software for mathematical OCR; (2) the coverage of the training dataset to the variation of document style is important; and (3) retraining with small additional training samples will be effective to improve the performance. An additional contribution is the release of a dataset for benchmarking the OCR for scientific documents.

Book
02 Apr 2019
TL;DR: The NPR (Number Plate Recognition) using is a system designed to help in recognition of number plates of vehicles that successfully detects and recognize the vehicle number plate on real images.
Abstract: The NPR (Number Plate Recognition) using is a system designed to help in recognition of number plates of vehicles. This system is designed for the purpose of the security system. This system is based on the image processing system. This system helps in the functions like detection of the number plates of the vehicles, processing them and using processed data for further processes like storing, allowing vehicle to pass or to reject vehicle. NPR is an image processing technology which uses number (license) plate to identify the vehicle. The objective is to design an efficient automatic authorized vehicle identification system by using the vehicle number plate. The system is implemented on the entrance for security control of a highly restricted area like military zones or area around top government offices e.g. Parliament, Supreme Court etc. The developed system first captures the vehicle image. Vehicle number plate region is extracted using the image segmentation in an image. Optical character recognition technique is used for the character recognition. The resulting data is then used to compare with the records on a database. The system is implemented and simulated in Matlab, and it performance is tested on real image. It is observed from the experiment that the developed system successfully detects and recognize the vehicle number plate on real images.