scispace - formally typeset
Search or ask a question
Book

An introduction to digital image processing

01 Jan 1986-
About: The article was published on 1986-01-01 and is currently open access. It has received 1745 citations till now. The article focuses on the topics: Digital image processing & Image processing.
Citations
More filters
Journal ArticleDOI
TL;DR: This paper presents a methodology for evaluation of low-level image analysis methods, using binarization (two-level thresholding) as an example, and defines the performance of the character recognition module as the objective measure.
Abstract: This paper presents a methodology for evaluation of low-level image analysis methods, using binarization (two-level thresholding) as an example. Binarization of scanned gray scale images is the first step in most document image analysis systems. Selection of an appropriate binarization method for an input image domain is a difficult problem. Typically, a human expert evaluates the binarized images according to his/her visual criteria. However, to conduct an objective evaluation, one needs to investigate how well the subsequent image analysis steps will perform on the binarized image. We call this approach goal-directed evaluation, and it can be used to evaluate other low-level image processing methods as well. Our evaluation of binarization methods is in the context of digit recognition, so we define the performance of the character recognition module as the objective measure. Eleven different locally adaptive binarization methods were evaluated, and Niblack's method gave the best performance.

700 citations

Proceedings ArticleDOI
27 Jun 2004
TL;DR: The overall algorithm has a success rate of over 90% (evaluated by complete detection and reading of the text) on the test set and the unread text is typically small and distant from the viewer.
Abstract: This paper gives an algorithm for detecting and reading text in natural images. The algorithm is intended for use by blind and visually impaired subjects walking through city scenes. We first obtain a dataset of city images taken by blind and normally sighted subjects. From this dataset, we manually label and extract the text regions. Next we perform statistical analysis of the text regions to determine which image features are reliable indicators of text and have low entropy (i.e. feature response is similar for all text images). We obtain weak classifiers by using joint probabilities for feature responses on and off text. These weak classifiers are used as input to an AdaBoost machine learning algorithm to train a strong classifier. In practice, we trained a cascade with 4 strong classifiers containing 79 features. An adaptive binarization and extension algorithm is applied to those regions selected by the cascade classifier. Commercial OCR software is used to read the text or reject it as a non-text region. The overall algorithm has a success rate of over 90% (evaluated by complete detection and reading of the text) on the test set and the unread text is typically small and distant from the viewer.

686 citations


Cites background or methods from "An introduction to digital image pr..."

  • ...We start by applying adaptive binarization [12] to the text regions detected by the AdaBoost strong classifier....

    [...]

  • ...The second component is an extension and binarization [12] algorithm that acts on the text region candidates....

    [...]

  • ...Our approach is a variant of Niblack’s adaptive binarization algorithm [12], which was reported by Wolf [22] to be the most successful binarization algorithm (jointly with Yanowitz-Bruckstein’s method [25])....

    [...]

Journal ArticleDOI
TL;DR: The proposed method does not require any parameter tuning by the user and can deal with degradations which occur due to shadows, non-uniform illumination, low contrast, large signal-dependent noise, smear and strain.

585 citations

Journal ArticleDOI
01 May 2001
TL;DR: The historical evolution of CR systems is presented, the available CR techniques, with their superiorities and weaknesses, are reviewed and directions for future research are suggested.
Abstract: Character recognition (CR) has been extensively studied in the last half century and has progressed to a level that is sufficient to produce technology-driven applications. Now, rapidly growing computational power is enabling the implementation of the present CR methodologies and is creating an increasing demand in many emerging application domains which require more advanced methodologies. This paper serves as a guide and update for readers working in the CR area. First, the historical evolution of CR systems is presented. Then, the available CR techniques, with their superiorities and weaknesses, are reviewed. Finally, the current status of CR is discussed and directions for future research are suggested. Special attention is given to off-line handwriting recognition, since this area requires more research in order to reach the ultimate goal of machine simulation of human reading.

517 citations

Proceedings ArticleDOI
01 Dec 2013
TL;DR: This work describes Photo OCR, a system for text extraction from images that is capable of recognizing text in a variety of challenging imaging conditions where traditional OCR systems fail, notably in the presence of substantial blur, low resolution, low contrast, high image noise and other distortions.
Abstract: We describe Photo OCR, a system for text extraction from images. Our particular focus is reliable text extraction from smartphone imagery, with the goal of text recognition as a user input modality similar to speech recognition. Commercially available OCR performs poorly on this task. Recent progress in machine learning has substantially improved isolated character classification, we build on this progress by demonstrating a complete OCR system using these techniques. We also incorporate modern data center-scale distributed language modelling. Our approach is capable of recognizing text in a variety of challenging imaging conditions where traditional OCR systems fail, notably in the presence of substantial blur, low resolution, low contrast, high image noise and other distortions. It also operates with low latency, mean processing time is 600 ms per image. We evaluate our system on public benchmark datasets for text extraction and outperform all previously reported results, more than halving the error rate on multiple benchmarks. The system is currently in use in many applications at Google, and is available as a user input modality in Google Translate for Android.

499 citations