Showing papers on "Optical character recognition published in 2012"

PDF

Open Access

Journal Article•DOI•

The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web]

[...]

Li Deng¹•Institutions (1)

18 Oct 2012-IEEE Signal Processing Magazine

TL;DR: “Best of the Web” presents the modified National Institute of Standards and Technology (MNIST) resources, consisting of a collection of handwritten digit images used extensively in optical character recognition and machine learning research.

...read moreread less

1,626 citations

Journal Article•

The MNIST Database of Handwritten Digit Images for Machine Learning Research

[...]

Li Deng

01 Nov 2012-IEEE Signal Processing Magazine

TL;DR: In this article, the modified National Institute of Standards and Technology (MNIST) resources, consisting of a collection of handwritten digit images used extensively in optical character recognition and machine learning research, are presented.

...read moreread less

Abstract: In this issue, “Best of the Web” presents the modified National Institute of Standards and Technology (MNIST) resources, consisting of a collection of handwritten digit images used extensively in optical character recognition and machine learning research. Handwritten digit recognition is an important problem in optical character recognition, and it has been used as a test case for theories of pattern recognition and machine learning algorithms for many years. Historically, to promote machine learning and pattern recognition research, several standard databases have emerged in which the handwritten digits are preprocessed, including segmentation and normalization, so that researchers can compare recognition results of their techniques on a common basis. The freely available MNIST database of handwritten digits has become a standard for fast-testing machine learning algorithms for this purpose. The simplicity of this task is analogous to the TIDigit (a speech database created by Texas Instruments) task in speech recognition. Just like there is a long list for more complex speech recognition tasks, there are many more difficult and challenging tasks for image recognition and computer vision, which will not be addressed in this column.

...read moreread less

1,466 citations

Proceedings Article•DOI•

Real-time scene text localization and recognition

[...]

Lukas Neumann¹, Jiri Matas¹•Institutions (1)

Czech Technical University in Prague¹

16 Jun 2012

TL;DR: The proposed end-to-end real-time scene text localization and recognition method achieves state-of-the-art text localization results amongst published methods and it is the first one to report results for end- to-end text recognition.

...read moreread less

Abstract: An end-to-end real-time scene text localization and recognition method is presented. The real-time performance is achieved by posing the character detection problem as an efficient sequential selection from the set of Extremal Regions (ERs). The ER detector is robust to blur, illumination, color and texture variation and handles low-contrast text. In the first classification stage, the probability of each ER being a character is estimated using novel features calculated with O(1) complexity per region tested. Only ERs with locally maximal probability are selected for the second stage, where the classification is improved using more computationally expensive features. A highly efficient exhaustive search with feedback loops is then applied to group ERs into words and to select the most probable character segmentation. Finally, text is recognized in an OCR stage trained using synthetic fonts. The method was evaluated on two public datasets. On the ICDAR 2011 dataset, the method achieves state-of-the-art text localization results amongst published methods and it is the first one to report results for end-to-end text recognition. On the more challenging Street View Text dataset, the method achieves state-of-the-art recall. The robustness of the proposed method against noise and low contrast of characters is demonstrated by “false positives” caused by detected watermark text in the dataset.

...read moreread less

862 citations

Journal Article•DOI•

A Novel Word Spotting Method Based on Recurrent Neural Networks

[...]

Volkmar Frinken¹, Andreas Fischer¹, R. Manmatha², Horst Bunke¹•Institutions (2)

University of Bern¹, University of Massachusetts Amherst²

01 Feb 2012-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A novel keyword spotting method for handwritten documents is described, derived from a neural network-based system for unconstrained handwriting recognition, that performs template-free spotting, i.e., it is not necessary for a keyword to appear in the training set.

...read moreread less

Abstract: Keyword spotting refers to the process of retrieving all instances of a given keyword from a document. In the present paper, a novel keyword spotting method for handwritten documents is described. It is derived from a neural network-based system for unconstrained handwriting recognition. As such it performs template-free spotting, i.e., it is not necessary for a keyword to appear in the training set. The keyword spotting is done using a modification of the CTC Token Passing algorithm in conjunction with a recurrent neural network. We demonstrate that the proposed systems outperform not only a classical dynamic time warping-based approach but also a modern keyword spotting system, based on hidden Markov models. Furthermore, we analyze the performance of the underlying neural networks when using them in a recognition task followed by keyword spotting on the produced transcription. We point out the advantages of keyword spotting when compared to classic text line recognition.

...read moreread less

283 citations

Journal Article•DOI•

Optical music recognition: state-of-the-art and open issues

[...]

Ana Rebelo, Ichiro Fujinaga¹, Filipe Paszkiewicz, André R. S. Marçal, Carlos Guedes, Jaime S. Cardoso - Show less +2 more•Institutions (1)

McGill University¹

02 Mar 2012-International Journal of Multimedia Information Retrieval

TL;DR: An overview of the literature concerning the automatic analysis of images of printed and handwritten musical scores and a reference scheme for any researcher wanting to compare new OMR algorithms against well-known ones is presented.

...read moreread less

Abstract: For centuries, music has been shared and remembered by two traditions: aural transmission and in the form of written documents normally called musical scores. Many of these scores exist in the form of unpublished manuscripts and hence they are in danger of being lost through the normal ravages of time. To preserve the music some form of typesetting or, ideally, a computer system that can automatically decode the symbolic images and create new scores is required. Programs analogous to optical character recognition systems called optical music recognition (OMR) systems have been under intensive development for many years. However, the results to date are far from ideal. Each of the proposed methods emphasizes different properties and therefore makes it difficult to effectively evaluate its competitive advantages. This article provides an overview of the literature concerning the automatic analysis of images of printed and handwritten musical scores. For self-containment and for the benefit of the reader, an introduction to OMR processing systems precedes the literature overview. The following study presents a reference scheme for any researcher wanting to compare new OMR algorithms against well-known ones.

...read moreread less

246 citations

Journal Article•DOI•

Optical Character Recognition by Open source OCR Tool Tesseract: A Case Study

[...]

Chirag Patel, Atul Patel, Dharmendra Patel

20 Oct 2012-International Journal of Computer Applications

TL;DR: A comparative study of this tool with other commercial OCR tool Transym OCR by considering vehicle number plate as input and compared these tools based on various parameters are concluded.

...read moreread less

Abstract: Optical character recognition (OCR) method has been used in converting printed text into editable text. OCR is very useful and popular method in various applications. Accuracy of OCR can be dependent on text preprocessing and segmentation algorithms. Sometimes it is difficult to retrieve text from the image because of different size, style, orientation, complex background of image etc. We begin this paper with an introduction of Optical Character Recognition (OCR) method, History of Open Source OCR tool Tesseract, architecture of it and experiment result of OCR performed by Tesseract on different kinds images are discussed. We conclude this paper by comparative study of this tool with other commercial OCR tool Transym OCR by considering vehicle number plate as input. From vehicle number plate we tried to extract vehicle number by using Tesseract and Transym and compared these tools based on various parameters. explained.Keywords like: Desktop OCR, Server OCR, Web OCR etc.

...read moreread less

223 citations

Journal Article•DOI•

CMATERdb1: a database of unconstrained handwritten Bangla and Bangla–English mixed script document image

[...]

Ram Sarkar¹, Nibaran Das¹, Subhadip Basu¹, Mahantapas Kundu¹, Mita Nasipuri¹, Dipak Kumar Basu¹ - Show less +2 more•Institutions (1)

Jadavpur University¹

01 Mar 2012-International Journal on Document Analysis and Recognition

TL;DR: This paper has described the preparation of a benchmark database for research on off-line Optical Character Recognition (OCR) of document images of handwritten Bangla text and Bangle text mixed with English words, which is the first handwritten database in this area available as an open source document.

...read moreread less

Abstract: In this paper, we have described the preparation of a benchmark database for research on off-line Optical Character Recognition (OCR) of document images of handwritten Bangla text and Bangla text mixed with English words. This is the first handwritten database in this area, as mentioned above, available as an open source document. As India is a multi-lingual country and has a colonial past, so multi-script document pages are very much common. The database contains 150 handwritten document pages, among which 100 pages are written purely in Bangla script and rests of the 50 pages are written in Bangla text mixed with English words. This database for off-line-handwritten scripts is collected from different data sources. After collecting the document pages, all the documents have been preprocessed and distributed into two groups, i.e., CMATERdb1.1.1, containing document pages written in Bangla script only, and CMATERdb1.2.1, containing document pages written in Bangla text mixed with English words. Finally, we have also provided the useful ground truth images for the line segmentation purpose. To generate the ground truth images, we have first labeled each line in a document page automatically by applying one of our previously developed line extraction techniques [Khandelwal et al., PReMI 2009, pp. 369–374] and then corrected any possible error by using our developed tool GT Gen 1.1. Line extraction accuracies of 90.6 and 92.38% are achieved on the two databases, respectively, using our algorithm. Both the databases along with the ground truth annotations and the ground truth generating tool are available freely at http://code.google.com/p/cmaterdb.

...read moreread less

119 citations

Patent•

System and method for automatic document management

[...]

Eitan Dub, Adam O. Dub, Alfredo J. Miro

02 Nov 2012

TL;DR: In this article, a system for managing documents, comprising interfaces to a user interface, proving an application programming interface, a database of document images, a remote server, configured to communicate a text representation of the document from the optical character recognition engine to the report server, and to receive from the remote server a classification of the documents, is presented.

...read moreread less

Abstract: A system for managing documents, comprising: interfaces to a user interface, proving an application programming interface, a database of document images, a remote server, configured to communicate a text representation of the document from the optical character recognition engine to the report server, and to receive from the remote server a classification of the document; and logic configured to receive commands from the user interface, and to apply the classifications received from the remote server to the document images through the interface to the database. A corresponding method is also provided.

...read moreread less

95 citations

Posted Content•

OCR Post-Processing Error Correction Algorithm using Google Online Spelling Suggestion

[...]

Youssef Bassil, Mohammad Alwani

01 Apr 2012-arXiv: Computation and Language

TL;DR: This paper proposes a post-processing context-based error correction algorithm for detecting and correcting OCR non-word and real-word errors, based on Google’s online spelling suggestion which harnesses an internal database containing a huge collection of terms and word sequences gathered from all over the web.

...read moreread less

Abstract: With the advent of digital optical scanners, a lot of paper-based books, textbooks, magazines, articles, and documents are being transformed into an electronic version that can be manipulated by a computer. For this purpose, OCR, short for Optical Character Recognition was developed to translate scanned graphical text into editable computer text. Unfortunately, OCR is still imperfect as it occasionally mis-recognizes letters and falsely identifies scanned text, leading to misspellings and linguistics errors in the OCR output text. This paper proposes a post-processing context-based error correction algorithm for detecting and correcting OCR non-word and real-word errors. The proposed algorithm is based on Google’s online spelling suggestion which harnesses an internal database containing a huge collection of terms and word sequences gathered from all over the web, convenient to suggest possible replacements for words that have been misspelled during the OCR process. Experiments carried out revealed a significant improvement in OCR error correction rate. Future research can improve upon the proposed algorithm so much so that it can be parallelized and executed over multiprocessing platforms.

...read moreread less

92 citations

Patent•

Photography recognition translation

[...]

Ekaterina Solntseva, Konstantin Tarachyov

15 Oct 2012

TL;DR: In this article, a method for efficient and substantially instant recognition and translation of text in photographs is described, where a user is able to select an area of interest for subsequent processing and optical character recognition (OCR) may be performed on a wider area than that selected for determining the subject domain of the text.

...read moreread less

Abstract: Methods are described for efficient and substantially instant recognition and translation of text in photographs. A user is able to select an area of interest for subsequent processing. Optical character recognition (OCR) may be performed on the wider area than that selected for determining the subject domain of the text. Translation to one or more target languages is performed. Manual corrections may be made at various stages of processing. Variations of translation are presented and made available for substitution of a word or expression in the target language. Translated text is made available for further uses or for immediate access.

...read moreread less

65 citations

Proceedings Article•

Recognition of printed Devanagari text using BLSTM Neural Network

[...]

Naveen Sankaran¹, C. V. Jawahar¹•Institutions (1)

International Institute of Information Technology, Hyderabad¹

01 Nov 2012

TL;DR: This paper proposes a recognition scheme for the Indian script of Devanagari using a Recurrent Neural Network known as Bidirectional LongShort Term Memory (BLSTM) and reports a reduction of more than 20% in word error rate and over 9% reduction in character error rate while comparing with the best available OCR system.

...read moreread less

Abstract: In this paper, we propose a recognition scheme for the Indian script of Devanagari. Recognition accuracy of Devanagari script is not yet comparable to its Roman counterparts. This is mainly due to the complexity of the script, writing style etc. Our solution uses a Recurrent Neural Network known as Bidirectional LongShort Term Memory (BLSTM). Our approach does not require word to character segmentation, which is one of the most common reason for high word error rate. We report a reduction of more than 20% in word error rate and over 9% reduction in character error rate while comparing with the best available OCR system.

...read moreread less

Comparison of named entity recognition tools for raw OCR text

[...]

Kepa Joseba Rodriquez, Michael Bryant, Tobias Blanke, Magdalena Luszczynska

19 Sep 2012

TL;DR: An experiment comparing the efficacy of several Named Entity Recognition (NER) tools at extracting entities directly from the output of an optical character recognition (OCR) workflow is analyzed.

...read moreread less

Abstract: This short paper analyses an experiment comparing the efficacy of several Named Entity Recognition (NER) tools at extracting entities directly from the output of an optical character recognition (OCR) workflow. The authors present how they first created a set of test data, consisting of raw and corrected OCR output manually annotated with people, locations, and organizations. They then ran each of the NER tools against both raw and corrected OCR output, comparing the precision, recall, and F1 score against the manually annotated data.

...read moreread less

Journal Article•DOI•

A Survey of OCR Applications

[...]

Amarjot Singh, Ketan Bacchuwar, Akshay Bhasin

01 Jan 2012-International Journal of Machine Learning and Computing

TL;DR: The paper presents a survey of applications of OCR in different fields and further presents the experimentation for three important applications such as Captcha, Institutional Repository and Optical Music Character Recognition.

...read moreread less

Abstract: Optical Character Recognition or OCR is the electronic translation of handwritten, typewritten or printed text into machine translated images. It is widely used to recognize and search text from electronic documents or to publish the text on a website. The paper presents a survey of applications of OCR in different fields and further presents the experimentation for three important applications such as Captcha, Institutional Repository and Optical Music Character Recognition. We make use of an enhanced image segmentation algorithm based on histogram equalization using genetic algorithms for optical character recognition. The paper will act as a good literature survey for researchers starting to work in the field of optical character recognition.

...read moreread less

Proceedings Article•DOI•

An Efficient Framework for Searching Text in Noisy Document Images

[...]

Ismet Zeki Yalniz¹, R. Manmatha¹•Institutions (1)

University of Massachusetts Amherst¹

27 Mar 2012

TL;DR: An efficient word spotting framework is proposed to search text in scanned books allowing one to search for words when optical character recognition (OCR) fails due to noise or for languages where there is no OCR.

...read moreread less

Abstract: An efficient word spotting framework is proposed to search text in scanned books. The proposed method allows one to search for words when optical character recognition (OCR) fails due to noise or for languages where there is no OCR. Given a query word image, the aim is to retrieve matching words in the book sorted by the similarity. In the offline stage, SIFT descriptors are extracted over the corner points of each word image. Those features are quantized into visual terms (visterms) using hierarchical K-Means algorithm and indexed using an inverted file. In the query resolution stage, the candidate matches are efficiently identified using the inverted index. These word images are then forwarded to the next stage where the configuration of visterms on the image plane are tested. Configuration matching is efficiently performed by projecting the visterms on the horizontal axis and searching for the Longest Common Subsequence (LCS) between the sequences of visterms. The proposed framework is tested on one English and two Telugu books. It is shown that the proposed method resolves a typical user query under 10 milliseconds providing very high retrieval accuracy (Mean Average Precision 0.93). The search accuracy for the English book is comparable to searching text in the high accuracy output of a commercial OCR engine.

...read moreread less

Patent•

License plate optical character recognition method and system

[...]

Aaron Michael Burry¹, Vladimir Kozitsky¹, Peter Paul¹•Institutions (1)

Xerox¹

18 Jan 2012

TL;DR: In this article, a method and system for recognizing a license plate character utilizing a machine learning classifier is presented. But the method is limited to a single image and cannot handle a large number of images.

...read moreread less

Abstract: A method and system for recognizing a license plate character utilizing a machine learning classifier. A license plate image with respect to a vehicle can be captured by an image capturing unit and the license plate image can be segmented into license plate character images. The character image can be preprocessed to remove a local background variation in the image and to define a local feature utilizing a quantization transformation. A classification margin for each character image can be identified utilizing a set of machine learning classifiers each binary in nature, for the character image. Each binary classifier can be trained utilizing a character sample as a positive class and all other characters as well as non-character images as a negative class. The character type associated with the classifier with a largest classification margin can be determined and the OCR result can be declared.

...read moreread less

Journal Article•DOI•

Development of an optical character recognition pipeline for handwritten form fields from an electronic health record.

[...]

Luke V. Rasmussen¹, Peggy L. Peissig¹, Catherine A. McCarty, Justin Starren², Justin Starren¹ - Show less +1 more•Institutions (2)

Marshfield Clinic¹, Northwestern University²

01 Jun 2012-Journal of the American Medical Informatics Association

TL;DR: Preliminary experience from this project yielded insights on the generalizability and applicability of integrating multiple, inexpensive general-purpose third-party optical character recognition engines in a modular pipeline.

...read moreread less

Proceedings Article•DOI•

OCR-based neural network for ANPR

[...]

Xiaojun Zhai¹, Faycal Bensaali¹, R. Sotudeh¹•Institutions (1)

University of Hertfordshire¹

16 Jul 2012

TL;DR: An Artificial Neural Network (ANN) based OCR algorithm for ANPR application is presented that can meet the real-time requirement of an ANPR system and can averagely process a character image in 8.4ms with 97.3% successful recognition rate.

...read moreread less

Abstract: Optical Character Recognition (OCR) is the last stage in an Automatic Number Plate Recognition System (ANPRs). In this stage the number plate characters on the number plate image are converted into encoded texts. In this paper, an Artificial Neural Network (ANN) based OCR algorithm for ANPR application is presented. A database of 3700 UK binary character images have been used for testing the performance of the proposed algorithm. Results achieved have shown that the proposed algorithm can meet the real-time requirement of an ANPR system and can averagely process a character image in 8.4ms with 97.3% successful recognition rate.

...read moreread less

Proceedings Article•DOI•

Enhanced continuous sign language recognition using PCA and neural network features

[...]

Yannick Gweth¹, Christian Plahl¹, Hermann Ney¹•Institutions (1)

RWTH Aachen University¹

16 Jun 2012

TL;DR: In this work a Gaussian Hidden Markov Model (GHMM) based automatic sign language recognition system is built on the SIGNUM database and could improve the word error rate of this system by more than 8% relative and outperform the best published results on this database by about 6% relative.

...read moreread less

Abstract: In this work a Gaussian Hidden Markov Model (GHMM) based automatic sign language recognition system is built on the SIGNUM database. The system is trained on appearance-based features as well as on features derived from a multilayer perceptron (MLP). Appearance-based features are directly extracted from the original images without any colored gloves or sensors. The posterior estimates are derived from a neural network. Whereas MLP based features are well-known in speech and optical character recognition, this is the first time that these features are used in a sign language system. The MLP based features improve the word error rate (WER) of the system from 16% to 13% compared to the appearance-based features. In order to benefit from the different feature types we investigate a combination technique. The models trained on each feature set are combined during the recognition step. By means of the combination technique, we could improve the word error rate of our best system by more than 8% relative and outperform the best published results on this database by about 6% relative.

...read moreread less

Patent•

Apparatus, systems and methods for presenting text identified in a video image

[...]

Dale Mountain

18 Jan 2012

TL;DR: In this article, a complete video frame that is associated with a presented video image of a video content event is presented, where a region of text is identified in the video frame and an optical character recognition (OCR) algorithm is used to translate the text.

...read moreread less

Abstract: Systems and methods are operable to present text identified in a presented video image of a media content event. An exemplary embodiment receives a complete video frame that is associated with a presented video image of a video content event, wherein the presented video image includes a region of text; finds the text in the complete video frame; uses an optical character recognition (OCR) algorithm to translate the found text; and presents the translated text. The translated text may be presented on a display concurrently with the video image that is presented on the display. Alternatively, or additionally, the translated text may be presented as audible speech emitted from at least one speaker.

...read moreread less

Proceedings Article•DOI•

Robust Recognition of Degraded Documents Using Character N-Grams

[...]

Shrey Dutta¹, Naveen Sankaran¹, K. Pramod Sankar², C. V. Jawahar¹•Institutions (2)

International Institute of Information Technology, Hyderabad¹, Xerox²

27 Mar 2012

TL;DR: A novel recognition approach that results in a 15% decrease in word error rate on heavily degraded Indian language document images by exploiting the additional context present in the character n-gram images, which enables better disambiguation between confusing characters in the recognition phase.

...read moreread less

Abstract: In this paper we present a novel recognition approach that results in a 15% decrease in word error rate on heavily degraded Indian language document images. OCRs have considerably good performance on good quality documents, but fail easily in presence of degradations. Also, classical OCR approaches perform poorly over complex scripts such as those for Indian languages. We address these issues by proposing to recognize character n-gram images, which are basically groupings of consecutive character/component segments. Our approach is unique, since we use the character n-grams as a primitive for recognition rather than for post processing. By exploiting the additional context present in the character n-gram images, we enable better disambiguation between confusing characters in the recognition phase. The labels obtained from recognizing the constituent n-grams are then fused to obtain a label for the word that emitted them. Our method is inherently robust to degradations such as cuts and merges which are common in digital libraries of scanned documents. We also present a reliable and scalable scheme for recognizing character n-gram images. Tests on English and Malayalam document images show considerable improvement in recognition in the case of heavily degraded documents.

...read moreread less

Proceedings Article•

Learning features for predicting OCR accuracy

[...]

Peng Ye¹, David Doermann¹•Institutions (1)

University of Maryland, College Park¹

01 Nov 2012

TL;DR: Results show that the proposed method outperforms a baseline method which combines features from previous works, and an unsupervised feature learning framework to learn effective and efficient features for predicting OCR accuracy is explored.

...read moreread less

Abstract: In this paper, we present a new method for assessing the quality of degraded document images using unsupervised feature learning The goal is to build a computational model to automatically predict OCR accuracy of a degraded document image without a reference image Current approaches for this problem typically rely on hand-crafted features whose design is based on heuristic rules that may not be generalizable In contrast, we explore an unsupervised feature learning framework to learn effective and efficient features for predicting OCR accuracy Our experimental results, on a set of historic newspaper images, show that the proposed method outperforms a baseline method which combines features from previous works

...read moreread less

Patent•

System and method for processing receipts and other records of users

[...]

David M. Barrett, Kevin Michael Kuchta

10 May 2012

TL;DR: In this paper, a service can perform optical character recognition (OCR) on an image of a record to determine a first set of information items about the record and a second set of items that are likely part of the record but not determinable from performing OCR on the image.

...read moreread less

Abstract: A service can perform optical character recognition (OCR) on an image of a record to determine a first set of information items about the record. A second set of information items can be identified that are likely part of the record but not determinable from performing OCR on the image. Another resource can be utilized to determine the second set of information items. A classification for the record can be determined based on first and second sets of information items. The record can be associated with a financial resource of the user based at least in part on the classification.

...read moreread less

Optical character recognition using artificial neural network

[...]

Sameeksha Barve

01 Jan 2012

TL;DR: An Optical character recognition based on Artificial Neural Networks (ANNs) is presented, trained using the Back Propagation algorithm.

...read moreread less

Abstract: Optical character recognition refers to the process of translat ing images of hand-written, typewritten, or printed text into a format understood by machines for the purpose of editing, indexing/searching, and a reduction in storage size. Optical character recognition is the mechanical or electronic translation of images of handwritten, typewritten or printed text into machine-editable text. Artificial neural networks are commonly used to perform character recognition due to their high noise tolerance. In this paper, an Optical character recognition based on Artificial Neural Networks (ANNs). The ANN is trained using the Back Propagation algorithm.

...read moreread less

Journal Article•DOI•

Character-Based Automated Human Perception Quality Assessment in Document Images

[...]

Tayo Obafemi-Ajayi¹, Gady Agam²•Institutions (2)

University of Missouri¹, Illinois Institute of Technology²

01 May 2012

TL;DR: The goal of this paper is to design a system that enables the learning and estimation of human perception of document IQ, a metric that can be used to compare existing document enhancement methods and guide automated document enhancement.

...read moreread less

Abstract: Large degradations in document images impede their readability and deteriorate the performance of automated document processing systems. Document image quality (IQ) metrics have been defined through optical character recognition (OCR) accuracy. Such metrics, however, do not always correlate with human perception of IQ. When enhancing document images with the goal of improving readability, e.g., in historical documents where OCR performance is low and/or where it is necessary to preserve the original context, it is important to understand human perception of quality. The goal of this paper is to design a system that enables the learning and estimation of human perception of document IQ. Such a metric can be used to compare existing document enhancement methods and guide automated document enhancement. Moreover, the proposed methodology is designed as a general framework that can be applied in a wide range of applications.

...read moreread less

Journal Article•DOI•

Morphological Operations and Projection Profiles based Segmentation of Handwritten Kannada Document

[...]

Mamatha H R, Srikantamurthy K

10 Oct 2012-International Journal of Applied Information Systems

TL;DR: A segmentation scheme for segmenting handwritten Kannada scripts into lines, words and characters using morphological operations and projection profiles is proposed and tested on totally unconstrained handwritten Kanni scripts, which pays more challenge and difficulty due to the complexity involved in the script.

...read moreread less

Abstract: Segmentation is an important task of any Optical Character Recognition (OCR) system. It separates the image text documents into lines, words and characters. The accuracy of OCR system mainly depends on the segmentation algorithm being used. Segmentation of handwritten text of some Indian languages like Kannada, Telugu, Assamese is difficult when compared with Latin based languages because of its structural complexity and increased character set. It contains vowels, consonants and compound characters. Some of the characters may overlap together. Despite several successful works in OCR all over the world, development of OCR tools in Indian languages is still an ongoing process. Character segmentation plays an important role in character recognition because incorrectly segmented characters are unlikely to be recognized correctly. In this paper, a segmentation scheme for segmenting handwritten Kannada scripts into lines, words and characters using morphological operations and projection profiles is proposed. The method was tested on totally unconstrained handwritten Kannada scripts, which pays more challenge and difficulty due to the complexity involved in the script. Usage of the morphology made extracting text lines efficient by an average extraction rate of 94.5% .Because of the varying inter and intra word gaps an average segmentation rate of 82.35% and 73.08% for words and characters respectively is obtained.

...read moreread less

Proceedings Article•DOI•

New Spatial-Gradient-Features for Video Script Identification

[...]

Danni Zhao¹, Palaiahnakote Shivakumara¹, Shijian Lu², Chew Lim Tan¹•Institutions (2)

National University of Singapore¹, Institute for Infocomm Research Singapore²

27 Mar 2012

TL;DR: New features based on Spatial-Gradient-Features (SGF) at block level for identifying six video scripts namely, Arabic, Chinese, English, Japanese, Korean and Tamil are presented, which helps in enhancing the capability of the current OCR on video text recognition by choosing an appropriate OCR engine when video contains multi-script frames.

...read moreread less

Abstract: In this paper, we present new features based on Spatial-Gradient-Features (SGF) at block level for identifying six video scripts namely, Arabic, Chinese, English, Japanese, Korean and Tamil This works helps in enhancing the capability of the current OCR on video text recognition by choosing an appropriate OCR engine when video contains multi-script frames The input for script identification is the text blocks obtained by our text frame classification method For each text block, we obtain horizontal and vertical gradient information to enhance the contrast of the text pixels We divide the horizontal gradient block into two equal parts as upper and lower at the centroid in the horizontal direction Histogram on the horizontal gradient values of the upper and the lower part is performed to select dominant text pixels In the same way, the method selects dominant pixels from the right and the left parts obtained by dividing the vertical gradient block vertically The method combines the horizontal and the vertical dominant pixels to obtain text components Skeleton concept is used to reduce pixel width to a single pixel to extract spatial features We extract four features based on proximity between end points, junction points, intersection points and pixels The method is evaluated on 770 frames of six scripts in terms of classification rate and is compared with an existing method We have achieved 821% average classification rate

...read moreread less

Posted Content•

OCR Context-Sensitive Error Correction Based on Google Web 1T 5-Gram Data Set

[...]

Youssef Bassil, Mohammad Alwani

01 Apr 2012-arXiv: Computation and Language

TL;DR: A post-processing OCR context-sensitive error correction method for detecting and correcting non-word and real-word OCR errors and the cornerstone of this proposed approach is the use of Google Web 1T 5-gram data set as a dictionary of words to spell-check OCR text.

...read moreread less

Abstract: Since the dawn of the computing era, information has been represented digitally so that it can be processed by electronic computers. Paper books and documents were abundant and widely being published at that time; and hence, there was a need to convert them into digital format. OCR, short for Optical Character Recognition was conceived to translate paper-based books into digital e-books. Regrettably, OCR systems are still erroneous and inaccurate as they produce misspellings in the recognized text, especially when the source document is of low printing quality. This paper proposes a post-processing OCR context-sensitive error correction method for detecting and correcting non-word and real-word OCR errors. The cornerstone of this proposed approach is the use of Google Web 1T 5-gram data set as a dictionary of words to spell-check OCR text. The Google data set incorporates a very large vocabulary and word statistics entirely reaped from the Internet, making it a reliable source to perform dictionary-based error correction. The core of the proposed solution is a combination of three algorithms: The error detection, candidate spellings generator, and error correction algorithms, which all exploit information extracted from Google Web 1T 5-gram data set. Experiments conducted on scanned images written in different languages showed a substantial improvement in the OCR error correction rate. As future developments, the proposed algorithm is to be parallelised so as to support parallel and distributed computing architectures.

...read moreread less

Digitised Historical Text: Does it have to be mediOCRe?

[...]

Beatrice Alex, Claire Grover, Ewan Klein, Richard Tobin

19 Sep 2012

TL;DR: The quality of ocred text compared to a gold standard is analysed and it can be improved by performing two automatic correction steps and the impact this can have on named entity recognition in a preliminary extrinsic evaluation is demonstrated.

...read moreread less

Abstract: This paper reports on experiments to improve the Optical Character Recognition (ocr) quality of historical text as a preliminary step in text mining. We analyse the quality of ocred text compared to a gold standard and show how it can be improved by performing two automatic correction steps. We also demonstrate the impact this can have on named entity recognition in a preliminary extrinsic evaluation. This work was performed as part of the Trading Consequences project which is focussed on text mining of historical documents for the study of nineteenth century trade in the British Empire.

...read moreread less

Book Chapter•DOI•

RWTH OCR: A Large Vocabulary Optical Character Recognition System for Arabic Scripts

[...]

Philippe Dreuw¹, David Rybach¹, Georg Heigold¹, Hermann Ney¹•Institutions (1)

RWTH Aachen University¹

01 Jan 2012

TL;DR: A novel large vocabulary OCR system, which implements a confidence- and margin-based discriminative training approach for model adaptation of an HMM-based recognition system to handle multiple fonts, different handwriting styles, and their variations.

...read moreread less

Abstract: We present a novel large vocabulary OCR system, which implements a confidence- and margin-based discriminative training approach for model adaptation of an HMM-based recognition system to handle multiple fonts, different handwriting styles, and their variations. Most current HMM approaches are HTK-based systems which are maximum likelihood (ML) trained and which try to adapt their models to different writing styles using writer adaptive training, unsupervised clustering, or additional writer-specific data. Here, discriminative training based on the maximum mutual information (MMI) and minimum phone error (MPE) criteria are used instead. For model adaptation during decoding, an unsupervised confidence-based discriminative training within a two-pass decoding process is proposed. Additionally, we use neural network-based features extracted by a hierarchical multi-layer perceptron (MLP) network either in a hybrid MLP/HMM approach or to discriminatively retrain a Gaussian HMM system in a tandem approach. The proposed framework and methods are evaluated for closed-vocabulary isolated handwritten word recognition on the IFN/ENIT-database Arabic handwriting database, where the word error rate is decreased by more than 50 % relative to an ML trained baseline system. Preliminary results for large vocabulary Arabic machine-printed text recognition tasks are presented on a novel publicly available newspaper database.

...read moreread less

Proceedings Article•DOI•

Accuracy of automatic number plate recognition (ANPR) and real world UK number plate problems

[...]

Mke Rhead¹, Robert Gurney¹, Soodamani Ramalingam¹, Neil Cohen²•Institutions (2)

University of Hertfordshire¹, Home Office²

31 Dec 2012

TL;DR: This paper considers real world UK number plates and relates these to ANPR and the varied fixing methodologies and fixing locations are discussed as well as the impact on image capture.

...read moreread less

Abstract: This paper considers real world UK number plates and relates these to ANPR. It considers aspects of the relevant legislation and standards when applying them to real world number plates. The varied manufacturing techniques and varied specifications of component parts are also noted. The varied fixing methodologies and fixing locations are discussed as well as the impact on image capture.

...read moreread less

Collapse