scispace - formally typeset
Search or ask a question

Showing papers on "Optical character recognition published in 2004"


Proceedings ArticleDOI
27 Jun 2004
TL;DR: The overall algorithm has a success rate of over 90% (evaluated by complete detection and reading of the text) on the test set and the unread text is typically small and distant from the viewer.
Abstract: This paper gives an algorithm for detecting and reading text in natural images. The algorithm is intended for use by blind and visually impaired subjects walking through city scenes. We first obtain a dataset of city images taken by blind and normally sighted subjects. From this dataset, we manually label and extract the text regions. Next we perform statistical analysis of the text regions to determine which image features are reliable indicators of text and have low entropy (i.e. feature response is similar for all text images). We obtain weak classifiers by using joint probabilities for feature responses on and off text. These weak classifiers are used as input to an AdaBoost machine learning algorithm to train a strong classifier. In practice, we trained a cascade with 4 strong classifiers containing 79 features. An adaptive binarization and extension algorithm is applied to those regions selected by the cascade classifier. Commercial OCR software is used to read the text or reject it as a non-text region. The overall algorithm has a success rate of over 90% (evaluated by complete detection and reading of the text) on the test set and the unread text is typically small and distant from the viewer.

686 citations


Journal ArticleDOI
TL;DR: A review of the OCR work done on Indian language scripts and the scope of future work and further steps needed for Indian script OCR development is presented.

592 citations


Journal ArticleDOI
TL;DR: This paper proposes a local intensity normalization method to effectively handle lighting variations, followed by a Gabor transform to obtain local features, and finally a linear discriminant analysis (LDA) method for feature selection.
Abstract: In this paper, we present an approach to automatic detection and recognition of signs from natural scenes, and its application to a sign translation task. The proposed approach embeds multiresolution and multiscale edge detection, adaptive searching, color analysis, and affine rectification in a hierarchical framework for sign detection, with different emphases at each phase to handle the text in different sizes, orientations, color distributions and backgrounds. We use affine rectification to recover deformation of the text regions caused by an inappropriate camera view angle. The procedure can significantly improve text detection rate and optical character recognition (OCR) accuracy. Instead of using binary information for OCR, we extract features from an intensity image directly. We propose a local intensity normalization method to effectively handle lighting variations, followed by a Gabor transform to obtain local features, and finally a linear discriminant analysis (LDA) method for feature selection. We have applied the approach in developing a Chinese sign translation system, which can automatically detect and recognize Chinese signs as input from a camera, and translate the recognized text into English.

305 citations


Journal ArticleDOI
TL;DR: A new method for localizing and recognizing text in complex images and videos and showing good performance when integrated in a sports video annotation system and a video indexing system within the framework of two European projects is presented.

291 citations


Journal ArticleDOI
TL;DR: An efficient face recognition scheme which has two features: representation of face images by two-dimensional wavelet subband coefficients and recognition by a modular, personalised classification method based on kernel associative memory models.
Abstract: In this paper, we propose an efficient face recognition scheme which has two features: 1) representation of face images by two-dimensional (2D) wavelet subband coefficients and 2) recognition by a modular, personalised classification method based on kernel associative memory models. Compared to PCA projections and low resolution "thumb-nail" image representations, wavelet subband coefficients can efficiently capture substantial facial features while keeping computational complexity low. As there are usually very limited samples, we constructed an associative memory (AM) model for each person and proposed to improve the performance of AM models by kernel methods. Specifically, we first applied kernel transforms to each possible training pair of faces sample and then mapped the high-dimensional feature space back to input space. Our scheme using modular autoassociative memory for face recognition is inspired by the same motivation as using autoencoders for optical character recognition (OCR), for which the advantages has been proven. By associative memory, all the prototypical faces of one particular person are used to reconstruct themselves and the reconstruction error for a probe face image is used to decide if the probe face is from the corresponding person. We carried out extensive experiments on three standard face recognition datasets, the FERET data, the XM2VTS data, and the ORL data. Detailed comparisons with earlier published results are provided and our proposed scheme offers better recognition accuracy on all of the face datasets.

268 citations


Proceedings ArticleDOI
23 Aug 2004
TL;DR: A robust text localization approach is presented, which can automatically detect horizontally aligned text with different sizes, fonts, colors and languages and is demonstrated by presenting experimental results for a set of video frames taken from the MPEG-7 video test set.
Abstract: Text localization and recognition in images is important for searching information in digital photo archives, video databases and Web sites However, since text is often printed against a complex background, it is often difficult to detect In this paper, a robust text localization approach is presented, which can automatically detect horizontally aligned text with different sizes, fonts, colors and languages First, a wavelet transform is applied to the image and the distribution of high-frequency wavelet coefficients is considered to statistically characterize text and non-text areas Then, the k-means algorithm is used to classify text areas in the image The detected text areas undergo a projection analysis in order to refine their localization Finally, a binary segmented text image is generated, to be used as input to an OCR engine The detection performance of our approach is demonstrated by presenting experimental results for a set of video frames taken from the MPEG-7 video test set

146 citations


Patent
20 May 2004
TL;DR: This paper proposed a global optimization framework for optical character recognition (OCR) of low-resolution photographed documents that combines a binarization-type process, segmentation, and recognition into a single process.
Abstract: A global optimization framework for optical character recognition (OCR) of low-resolution photographed documents that combines a binarization-type process, segmentation, and recognition into a single process. The framework includes a machine learning approach trained on a large amount of data. A convolutional neural network can be employed to compute a classification function at multiple positions and take grey-level input which eliminates binarization. The framework utilizes preprocessing, layout analysis, character recognition, and word recognition to output high recognition rates. The framework also employs dynamic programming and language models to arrive at the desired output.

123 citations


Journal ArticleDOI
TL;DR: A novel approach to restoring digital document images, viewing the problem as one of separating overlapped texts and then reformulating it as a blind source separation problem, approached through independent component analysis techniques, which have the advantage that no models are required for the background.
Abstract: We propose a novel approach to restoring digital document images, with the aim of improving text legibility and OCR performance These are often compromised by the presence of artifacts in the background, derived from many kinds of degradations, such as spots, underwritings, and show-through or bleed-through effects So far, background removal techniques have been based on local, adaptive filters and morphological-structural operators to cope with frequent low-contrast situations For the specific problem of bleed-through/show-through, most work has been based on the comparison between the front and back pages This, however, requires a preliminary registration of the two images Our approach is based on viewing the problem as one of separating overlapped texts and then reformulating it as a blind source separation problem, approached through independent component analysis techniques These methods have the advantage that no models are required for the background In addition, we use the spectral components of the image at different bands, so that there is no need for registration Examples of bleed-through cancellation and recovery of underwriting from palimpsests are provided

110 citations


Patent
02 Mar 2004
TL;DR: In this article, a method and apparatus for reading and decoding information extracted from a form is disclosed, where packages are randomly placed on a conveyor belt, with their labels facing a two-camera subassembly.
Abstract: A method and apparatus is disclosed for reading and decoding information extracted from a form. In the system of the present invention, packages are randomly placed on a conveyor belt, with their labels facing a two-camera subassembly. As the conveyor belt moves, the two-camera subassembly continuously takes images of the belt underneath the overhead camera. The design of the camera permits it to take a high resolution image of a non-singulated, unjustified package flow. A digital image of the packages within the field of view of the camera is then transferred to the processing system for analysis. The processing system identifies individual packages in the image, extracts them and then analyzes the information written on the package labels. The analysis process utilizes conventional Optical Character Recognition (OCR) and Intelligent Character Recognition (ICR) techniques to evaluate the information written on the package label. Once the information is decoded, the system either accesses a database record associated with the decoded machine-readable code, or creates a new record. When an unknown word image is encountered, the field-specific recognition process is aided by use of lexicon information, optimized based on installation-specific or user-specific criteria. The lexicon information is continuously revised based on processed form information. In a preferred embodiment, verified destination addresses associated with a user are alphabetized or rank-ordered based on frequency of occurrence. It is only after the system determines that the originating user is not stored in the database does it resort to the ZIP+4 or similar database to verify a destination address.

102 citations


Journal ArticleDOI
TL;DR: An improved method for binarizing document images by adaptively exploiting the local image contrast is proposed, which aims to overcome the common problems encountered in low quality images, such as uneven illumination, low contrast, and random noise.
Abstract: We propose in this paper an improved method for binarizing document images by adaptively exploiting the local image contrast. The proposed method aims to overcome the common problems encountered in low quality images, such as uneven illumination, low contrast, and random noise. Experiments have been conducted and the results are presented to show the effectiveness of the proposed method.

88 citations


Proceedings ArticleDOI
26 Oct 2004
TL;DR: A system for online recognition of handwritten Tamil characters is presented and a structure- or shape-based representation of a strokes is used in which a stroke is represented as a string of shape features.
Abstract: A system for online recognition of handwritten Tamil characters is presented. A handwritten character is constructed by executing a sequence of strokes. A structure- or shape-based representation of a stroke is used in which a stroke is represented as a string of shape features. Using this string representation, an unknown stroke is identified by comparing it with a database of strokes using a flexible string matching procedure. A full character is recognized by identifying all the component strokes. Character termination, is determined using a finite state automaton. Development of similar systems for other Indian scripts is outlined.

Journal ArticleDOI
01 Aug 2004
TL;DR: A novel scheme, mainly based on the concept of water reservoir analogy, to extract individual text lines from printed Indian documents containing multioriented and/or curve text lines is proposed.
Abstract: There are printed artistic documents where text lines of a single page may not be parallel to each other. These text lines may have different orientations or the text lines may be curved shapes. For the optical character recognition (OCR) of these documents, we need to extract such lines properly. In this paper, we propose a novel scheme, mainly based on the concept of water reservoir analogy, to extract individual text lines from printed Indian documents containing multioriented and/or curve text lines. A reservoir is a metaphor to illustrate the cavity region of a character where water can be stored. In the proposed scheme, at first, connected components are labeled and identified either as isolated or touching. Next, each touching component is classified either straight type (S-type) or curve type (C-type), depending on the reservoir base-area and envelope points of the component. Based on the type (S-type or C-type) of a component two candidate points are computed from each touching component. Finally, candidate regions (neighborhoods of the candidate points) of the candidate points of each component are detected and after analyzing these candidate regions, components are grouped to get individual text lines.

Patent
12 Nov 2004
TL;DR: In this article, a method and apparatus for analyzing an electronic communication containing imagery, e.g., to determine whether or not the electronic communication is a spam communication, is provided, which includes detecting one or more regions of imagery in a received electronic communication and applying pre-processing techniques to locate regions (e.g. blocks or lines) of text in the imagery that may be distorted.
Abstract: A method and apparatus are provided for analyzing an electronic communication containing imagery, e.g., to determine whether or not the electronic communication is a spam communication. In one embodiment, an inventive method includes detecting one or more regions of imagery in a received electronic communication and applying pre-processing techniques to locate regions (e.g., blocks or lines) of text in the imagery that may be distorted. The method then analyzes the regions of text to determine whether the content of the text indicates that the electronic communication is spam. In one embodiment, specialized extraction and rectification of embedded text followed by optical character recognition processing is applied to the regions of text to extract their content therefrom. In another embodiment, keyword recognition or shape-matching processing is applied to detect the presence or absence of spam-indicative words from the regions of text. In another embodiment, other attributes of extracted text regions, such as size, location, color and complexity are used to build evidence for or against the presence of spam.

Patent
25 Aug 2004
TL;DR: In this paper, a digital display on the face of a meter is read and an image on a face of the meter is captured, and optical character recognition of digits is performed.
Abstract: A meter is read. An image on a face of the meter is captured. Optical character recognition of digits of a digital display on the face of the meter is performed. At least one stored template is used to perform the optical character recognition of digits.

Patent
07 Sep 2004
TL;DR: This article converted characters and a writing sample into mathematical graphs and used optical character recognition (OCR) techniques to identify these features in the handwriting sample so that drafts from two different samples can be aligned to compare to determine if the feature in the writing sample correlate with each other.
Abstract: A biometric handwriting identification system converts characters and a writing sample into mathematical graphs. The graphs comprise enough information to capture the features of handwriting that are unique to each individual. Optical character recognition (OCR) techniques can then be used to identify these features in the handwriting sample so that drafts from two different samples can be aligned to compare to determine if the features in the writing sample correlate with each other.

Proceedings ArticleDOI
24 Aug 2004
TL;DR: A biologically inspired, multi-channel filtering scheme for page layout analysis and it has been seen to be computationally viable for commercial OCR system development.
Abstract: Reasonable success has been achieved at developing monolingual OCR systems in Indian scripts. Scientists, optimistically, have started to look beyond. Development of bilingual OCR systems and OCR systems with capability to identify the text areas are some of the pointers to future activities in Indian scenario. The separation of text and non-text regions before considering the document image for OCR is an important task. In this paper, we present a biologically inspired, multi-channel filtering scheme for page layout analysis. The same scheme has been used for script recognition as well. Parameter tuning is mostly done heuristically. It has also been seen to be computationally viable for commercial OCR system development.

Journal ArticleDOI
23 Aug 2004
TL;DR: Categorization experiments performed over noisy texts show that the performance loss is acceptable for recall values up to 60-70 percent depending on the noise sources, and new measures of the extraction process performance are proposed.
Abstract: This work presents categorization experiments performed over noisy texts. By noisy, we mean any text obtained through an extraction process (affected by errors) from media other than digital texts (e.g., transcriptions of speech recordings extracted with a recognition system). The performance of a categorization system over the clean and noisy (word error rate between /spl sim/ 10 and /spl sim/ 50 percent) versions of the same documents is compared. The noisy texts are obtained through handwriting recognition and simulation of optical character recognition. The results show that the performance loss is acceptable for recall values up to 60-70 percent depending on the noise sources. New measures of the extraction process performance, allowing a better explanation of the categorization results, are proposed.

Journal ArticleDOI
TL;DR: An attempt to develop a commercially viable and a robust character recognizer for Telugu texts by designing a recognizer which exploits the inherent characteristics of the Telugu Script by using wavelet multiresolution analysis and a Hopfield -based Dynamic Neural Network.

Book ChapterDOI
09 Jun 2004
TL;DR: The out-voting problem, the new algorithm and its promising results on two benchmark datasets as well as on one real world application are presented.
Abstract: An ensemble of classifiers based algorithm, Learn++, was recently introduced that is capable of incrementally learning new information from datasets that consecutively become available, even if the new data introduce additional classes that were not formerly seen. The algorithm does not require access to previously used datasets, yet it is capable of largely retaining the previously acquired knowledge. However, Learn++ suffers from the inherent ”out-voting” problem when asked to learn new classes, which causes it to generate an unnecessarily large number of classifiers. This paper proposes a modified version of this algorithm, called Learn++.MT that not only reduces the number of classifiers generated, but also provides performance improvements. The out-voting problem, the new algorithm and its promising results on two benchmark datasets as well as on one real world application are presented.

Proceedings ArticleDOI
Xi-Ping Luo1, Jun Li1, Li-Xin Zhen1
23 Aug 2004
TL;DR: This paper introduced the design and implementation of a business card reader based on a built-in camera and proposed a new method based on multi-resolution analysis of document images that improves computation speed and reduces memory requirement of the image-processing step.
Abstract: With the availability of high-resolution cameras and increased computation power, it becomes possible to implement OCR applications such as business card readers in the mobile device. In this paper, we introduced the design and implementation of a business card reader based on a built-in camera. In order to deal with the challenge of limited resources in a mobile device, we proposed a new method based on multi-resolution analysis of document images. This method improves computation speed and reduces memory requirement of the image-processing step by detecting the text areas in the downscaled image and then analyzing each detected area in the original image. For the OCR engine, we used a two-layer classifier to improve speed. Our experiment gives a satisfactory result.

Proceedings ArticleDOI
Li Zhuang1, Ta Bao1, Xioyan Zhu1, Chunheng Wang, S. Naoi 
10 Oct 2004
TL;DR: An effective spelling check approach for Chinese OCR with a new multi-knowledge based statistical language model that combines the conventional n-gram language model and the new LSA (latent semantic analysis) language model, so both local information and global information are utilized.
Abstract: This work describes an effective spelling check approach for Chinese OCR with a new multi-knowledge based statistical language model. This language model combines the conventional n-gram language model and the new LSA (latent semantic analysis) language model, so both local information (syntax) and global information (semantic) are utilized. Furthermore, Chinese similar characters are used in Viterbi search process to expand the candidate list in order to add more possible correct results. With our approach, the best recognition accuracy rate increases from 79.3% to 91.9%, which means 60.9% error reduction.

Proceedings ArticleDOI
23 Jan 2004
TL;DR: A Gabor function based filter bank is used to separate the text and the nontext areas of comparable size and the technique is shown to work efficiently on different kinds of scanned document images, camera captured document images and sometimes on scenic images.
Abstract: Extraction of text areas is a necessary first step for taking a complex document image for diameter recognition task. In digital libraries, such OCR'ed text facilitates access to the image of document page through keyword search. Gabor filters, known to be simulating certain characteristics of the human visual system (HVS), have been employed for this task by a large number of scientists, in scanned document images. Adapting such a scheme for camera based document images is a relatively new approach. Moreover, design of the appropriate filters to separate text areas, which are assumed to be rich in high frequency components, from nontext areas is a difficult task. The difficulty increases if the clutter is also rich in high frequency components. Other reported works, on separating text from nontext areas, have used geometrical/structural information like shape and size of the regions in binarized document images. In this work, we have used a combination of the above mentioned approaches for the purpose. We have used connected component analysis (CCA), in binarized images, to segment nontext areas based on the size information of the connected regions. A Gabor function based filter bank is used to separate the text and the nontext areas of comparable size. The technique is shown to work efficiently on different kinds of scanned document images, camera captured document images and sometimes on scenic images.

Journal ArticleDOI
TL;DR: The algorithms designed exploit special characteristics of Telugu script for processing the document images efficiently, and are implemented to create a Telugu OCR system for printed text (TOSP).
Abstract: Telugu is one of the oldest and popular languages of India, spoken by more than 66 million people, especially in South India. Not much work has been reported on the development of optical character recognition (OCR) systems for Telugu text. Therefore, it is an area of current research. Some characters in Telugu are made up of more than one connected symbol. Compound characters are written by associating modifiers with consonants, resulting in a huge number of possible combinations, running into hundreds of thousands. A compound character may contain one or more connected symbols. Therefore, systems developed for documents of other scripts, like Roman, cannot be used directly for the Telugu language. The individual connected portions of a character or a compound character are defined as basic symbols in this paper and treated as a unit of recognition. The algorithms designed exploit special characteristics of Telugu script for processing the document images efficiently. The algorithms have been implemented to create a Telugu OCR system for printed text (TOSP). The output of TOSP is in phonetic English that can be transliterated to generate editable Telugu text. A special feature of TOSP is that it is designed to handle a large variety of sizes and multiple fonts, and still provides raw OCR accuracy of nearly 98%. The phonetic English representation can be also used to develop a Telugu text-to-speech system; work is in progress in this regard.

Proceedings ArticleDOI
23 Aug 2004
TL;DR: A software solution prototype to optically recognise single sided embossed Braille documents using a simple image processing algorithm and probabilistic neural network is proposed.
Abstract: Braille is a tactile format of written communication for sight-impaired people worldwide. This paper proposes a software solution prototype to optically recognise single sided embossed Braille documents using a simple image processing algorithm and probabilistic neural network. The output is a Braille text file formatted to preserve the layout of the original document which can be sent to an electronic embosser for reproduction. Preliminary experiments have been performed with an excellent recognition rate, where the transcription accuracy is at 99%.

Patent
30 Dec 2004
TL;DR: In this paper, an exception processing system for MICR documents is described, in which an exception does not prevent the routing of the document if it is not related to the routing/transit field and an optical character recognition (OCR) process (300, 414) is performed on the stored, electronic image of a document to correct digit errors in the stored data read from the documents.
Abstract: System and method for exception processing of MICR documents. MICR documents are read and sorted to a destination pocket for processing subject to a determination that an exception does not prevent the routing of the document. In example embodiments, for example, an error does not prevent the routing of the document if it is not related to the routing/transit field. In the case of digit errors, an optical character recognition (OCR) process (300, 414) is performed on the stored, electronic image of the document to correct digit errors in the stored data read from the documents. If a determination is made that correction or other exception processing cannot be handled through the OCR process, the image and corresponding MICR data is displayed on a user terminal (528), for manual verification or correction by reference to an image of the document, rather than the document itself.

Proceedings ArticleDOI
23 Jan 2004
TL;DR: A system that is developed in order to retrieve information from digitized books and journals belonging to digital libraries with the ability of combining two principal retrieval strategies in several ways, and the effectiveness of the integrated retrieval is described.
Abstract: Large collections of scanned documents (books and journals) are now available in digital libraries. The most common method for retrieving relevant information from these collections is image browsing, but this approach is not feasible for books with more than a few dozen pages. The recognition of printed text can be made on the images by OCR systems, and in this case a retrieval by textual content can be performed. However, the results heavily depend on the quality of original documents. More sophisticated navigation can be performed when an electronic table of contents of the book is available with links to the corresponding pages. An opposite approach relies on the reduction of the amount of symbolic information to be extracted at the storage time. This approach is taken into account by document image retrieval systems. We describe a system that we developed in order to retrieve information from digitized books and journals belonging to digital libraries. The main feature of the system is the ability of combining two principal retrieval strategies in several ways. The first strategy allows an user to find pages with a layout similar to a query page. The second strategy is used in order to retrieve words in the collection matching a user-defined query, without performing OCR. The combination of these basic strategies allows users to retrieve meaningful pages with a low effort during the indexing phase. We describe the basic tools used in the system (layout analysis, layout retrieval, word retrieval) and the integration of these tools for answering complex queries. The experimental results are made on 1287 pages and show the effectiveness of the integrated retrieval.

Proceedings ArticleDOI
26 Oct 2004
TL;DR: A set of simple structural characteristics that capture the differences between machine-printed and handwritten text-lines is presented and preliminary experiments on document images taken from databases of different languages and characteristics show a remarkable performance.
Abstract: This paper deals with the discrimination between machine-printed and handwritten text, a prerequisite for many OCR applications. An easy-to-follow approach is proposed based on an integrated system able to localize text areas and split them in text-lines. A set of simple structural characteristics that capture the differences between machine-printed and handwritten text-lines is presented and preliminary experiments on document images taken from databases of different languages and characteristics show a remarkable performance.

PatentDOI
17 May 2004
TL;DR: A system and method for recognition of images may include the use of alignment markers as mentioned in this paper, which may be used with identification markings, biosensors, micro-fluidic arrays, and/or optical character recognition systems.
Abstract: A system and method for recognition of images may include the use of alignment markers. The image recognized may be a pattern from an array, a character, a number, a shape, and/or irregular shapes. The pattern may be formed by elements in an array such as an identification marking and/or a sensor array. More particularly, the system and method relate to discriminating between images by accounting for the orientation of the image. The size and/or location of alignment markers may provide information about the orientation of an image. Information about the orientation of an image may reduce false recognitions. The system and method of image recognition may be used with identification markings, biosensors, micro-fluidic arrays, and/or optical character recognition systems.

Journal Article
TL;DR: The most popular words in Arabic writing were identified for the first time, using an associated program and designed an innovative, simple yet powerful, in place tagging procedure for the database.
Abstract: In this paper we present a new database for off-line Arabic handwriting recognition, together with associated preprocessing procedures. We have developed a new database for the collection, storage and retrieval of Arabic handwritten text (AHDB). This is an advance both in terms of the size of the database as well as the number of different writers involved. We further designed an innovative, simple yet powerful, in place tagging procedure for our database. It enables us to easily extract the bitmaps of words. We also constructed a preprocessing class, which contains some useful preprocessing operations. In this paper the most popular words in Arabic writing were identified for the first time, using an associated program.

Proceedings ArticleDOI
23 Aug 2004
TL;DR: Finite-state models are used to implement a handwritten text recognition and classification system for a real application entailing casual, spontaneous writing with large vocabulary.
Abstract: Finite-state models are used to implement a handwritten text recognition and classification system for a real application entailing casual, spontaneous writing with large vocabulary. Handwritten short paragraphs are to be classified into a small number of predefined classes. The paragraphs involve a wide variety of writing styles and contain many non-textual artifacts. HMMs and n-grams are used for text recognition and n-grams are also used for text classification. Experimental results are reported which, given the extreme difficulty of the task, are encouraging.