scispace - formally typeset
Search or ask a question

Showing papers on "Optical character recognition published in 2003"


Proceedings ArticleDOI
18 Jun 2003
TL;DR: Efficient methods based on shape context matching are developed that can identify the word in an EZ-Gimpy image with a success rate of 92%, and the requisite 3 words in a Gimpy image 33% of the time.
Abstract: In this paper we explore object recognition in clutter. We test our object recognition techniques on Gimpy and EZ-Gimpy, examples of visual CAPTCHAs. A CAPTCHA ("Completely Automated Public Turing test to Tell Computers and Humans Apart") is a program that can generate and grade tests that most humans can pass, yet current computer programs can't pass. EZ-Gimpy, currently used by Yahoo, and Gimpy are CAPTCHAs based on word recognition in the presence of clutter. These CAPTCHAs provide excellent test sets since the clutter they contain is adversarial; it is designed to confuse computer programs. We have developed efficient methods based on shape context matching that can identify the word in an EZ-Gimpy image with a success rate of 92%, and the requisite 3 words in a Gimpy image 33% of the time. The problem of identifying words in such severe clutter provides valuable insight into the more general problem of object recognition in scenes. The methods that we present are instances of a framework designed to tackle this general problem.

681 citations


Proceedings ArticleDOI
03 Aug 2003
TL;DR: A survey of application domains, technical challenges and solutions for recognizing documents captured by digital cameras, and some sample applications under development and feasible ideas for future development is presented.
Abstract: The increasing availability of high performance, low priced, portable digital imaging devices has created a tremendous opportunity for supplementing traditional scanning for document image acquisition. Digital cameras attached to cellular phones, PDAs, or as standalone still or video devices are highly mobile and easy to use; they can capture images of any kind of document including very thick books, historical pages too fragile to touch, and text in scenes; and they are much more versatile than desktop scanners. Should robust solutions to the analysis of documents captured with such devices become available, there is clearly a demand from many domains. Traditional scanner-based document analysis techniques provide us with a good reference and starting point, but they cannot be used directly on camera-captured images. Camera captured images can suffer from low resolution, blur, and perspective distortion, as well as complex layout and interaction of the content and background. In this paper we present a survey of application domains, technical challenges and solutions for recognizing documents captured by digital cameras. We begin by describing typical imaging devices and the imaging process. We discuss document analysis from a single camera-captured image as well as multiple frames and highlight some sample applications under development and feasible ideas for future development.

295 citations


Journal ArticleDOI
TL;DR: This work addresses two problems that are often encountered in object recognition: object segmentation, for which a distance sets shape filter is formulated, and shape matching, which is illustrated on printed and handwritten character recognition and detection of traffic signs in complex scenes.
Abstract: We introduce a novel rich local descriptor of an image point, we call the (labeled) distance set, which is determined by the spatial arrangement of image features around that point. We describe a two-dimensional (2D) visual object by the set of (labeled) distance sets associated with the feature points of that object. Based on a dissimilarity measure between (labeled) distance sets and a dissimilarity measure between sets of (labeled) distance sets, we address two problems that are often encountered in object recognition: object segmentation, for which we formulate a distance sets shape filter, and shape matching. The use of the shape filter is illustrated on printed and handwritten character recognition and detection of traffic signs in complex scenes. The shape comparison procedure is illustrated on handwritten character classification, COIL-20 database object recognition and MPEG-7 silhouette database retrieval.

256 citations


Proceedings ArticleDOI
20 Nov 2003
TL;DR: An integrated OCR system for mathematical documents, called INFTY, is presented, which shows high character recognition rates on both mathematical expressions and ordinary texts, and sufficient performance on the structure analysis of the mathematical expressions.
Abstract: An integrated OCR system for mathematical documents, called INFTY, is presented. INFTY consists of four procedures, i.e., layout analysis, character recognition, structure analysis of mathematical expressions, and manual error correction. In those procedures, several novel techniques are utilized for better recognition performance. Experimental results on about 500 pages of mathematical documents showed high character recognition rates on both mathematical expressions and ordinary texts, and sufficient performance on the structure analysis of the mathematical expressions.

182 citations


Proceedings ArticleDOI
Horst Bunke1
03 Aug 2003
TL;DR: The state of the art in off-line Roman cursive handwriting recognition is reviewed, recent trends are analyzed, and challenges for future research in this field are identified.
Abstract: This paper reviews the state of the art in off-line Roman cursive handwriting recognition. The input provided to an off-line handwriting recognition system is an image of a digit, a word, or - more generally -some text, and the system produces, as output, an ASCII transcription of the input. This task involves a number of processing steps, some of which are quite difficult. Typically, preprocessing, normalization, feature extraction, classification, and postprocessing operations are required. We'll survey the state of the art, analyze recent trends, and try to identify challenges for future research in this field.

178 citations


Journal ArticleDOI
TL;DR: This paper explicitly reviews the field of multiple classifier decision combination strategies for character recognition, from some of its early roots to the present day and illustrates explicitly how the principles underlying the application of multi-classifier approaches to character recognition can easily generalise to a wide variety of different task domains.
Abstract: Two research strands, each identifying an area of markedly increasing importance in the current development of pattern analysis technology, underlie the review covered by this paper, and are drawn together to offer both a task-oriented and a fundamentally generic perspective on the discipline of pattern recognition. The first of these is the concept of decision fusion for high-performance pattern recognition, where (often very diverse) classification technologies, each providing complementary sources of information about class membership, can be integrated to provide more accurate, robust and reliable classification decisions. The second is the rapid expansion in technology for the automated analysis of (especially) handwritten data for OCR applications including document and form processing, pen-based computing, forensic analysis, biometrics and security, and many other areas, especially those which seek to provide online or offline processing of data which is available in a human-oriented medium. Classifier combination/multiple expert processing has a long history, but the sheer volume and diversity of possible strategies now available suggest that it is timely to consider a structured review of the field. Handwritten character processing provides an ideal context for such a review, both allowing engagement with a problem area which lends itself ideally to the performance enhancements offered by multi-classifier configurations, but also allowing a clearer focus to what otherwise, because of the unlimited application horizons, would be a task of unmanageable proportions. Hence, this paper explicitly reviews the field of multiple classifier decision combination strategies for character recognition, from some of its early roots to the present day. In order to give structure and a sense of direction to the review, a new taxonomy for categorising approaches is defined and explored, and this both imposes a discipline on the presentation of the material available and helps to clarify the mechanisms by which multi-classifier configurations deliver performance enhancements. The review incorporates a discussion both of processing structures themselves and a range of important related topics which are essential to maximise an understanding of the potential of such structures. Most importantly, the paper illustrates explicitly how the principles underlying the application of multi-classifier approaches to character recognition can easily generalise to a wide variety of different task domains.

138 citations


Proceedings ArticleDOI
03 Aug 2003
TL;DR: An automatic scheme is presented to identify text lines of different Indian scripts from a document with an overall accuracy of about 97.52% based on water reservoir principle, contour tracing, profileetc.
Abstract: A document page may contain two or more different scripts.For Optical Character Recognition (OCR) of such adocument page, it is necessary to separate different scriptsbefore feeding them to their individual OCR system. In thispaper an automatic scheme is presented to identify text linesof different Indian scripts from a document. For theseparation task at first the scripts are grouped into a fewclasses according to script characteristics. Next featurebased on water reservoir principle, contour tracing, profileetc. are employed to identify them without any expensiveOCR-like algorithms. At present, the system has an overallaccuracy of about 97.52%.

133 citations


Patent
Lie Lu1, Yan-Feng Sun1, Mingjing Li, Xian-Sheng Hua, Hong-Jiang Zhang 
19 Feb 2003
TL;DR: In this article, a music video parser automatically detects and segments music videos in a combined audio-video media stream by integrating shot boundary detection, video text detection and audio analysis to automatically detect temporal boundaries of each music video in the media stream.
Abstract: A “music video parser” automatically detects and segments music videos in a combined audio-video media stream. Automatic detection and segmentation is achieved by integrating shot boundary detection, video text detection and audio analysis to automatically detect temporal boundaries of each music video in the media stream. In one embodiment, song identification information, such as, for example, a song name, artist name, album name, etc., is automatically extracted from the media stream using video optical character recognition (OCR). This information is then used in alternate embodiments for cataloging, indexing and selecting particular music videos, and in maintaining statistics such as the times particular music videos were played, and the number of times each music video was played.

131 citations


Journal ArticleDOI
Kongqiao Wang1, Jari Kangas1
TL;DR: A robust, connected-component-based character locating method using an aligning-and-merging-analysis (AMA) scheme to locate all the potential characters using the information about the bounding boxes of connected components in all color layers.

117 citations


01 Jan 2003
TL;DR: This paper summarize research in document layout analysis carried out over the last few years in the laboratory, which has developed a number of novel geometric algorithms and statistical methods that are applicable to a wide variety of languages and layouts.
Abstract: In this paper, I summarize research in document layout analysis carried out over the last few years in our laboratory. Correct document layout analysis is a key step in document capture conversions into electronic formats, optical character recognition (OCR), information retrieval from scanned documents, appearance-based document retrieval, and reformatting of documents for on-screen display. We have developed a number of novel geometric algorithms and statistical methods. Layout analysis systems built from these algorithms are applicable to a wide variety of languages and layouts, and have proven to be robust to the presence of noise and spurious features in a page image. The system itself consists of reusable and independent software modules that can be reconfigured to be adapted to different languages and applications. Currently, we are using them for electronic book and document capture applications. If there is commercial or government demand, we are interested in adapting these tools to information retrieval and intelligence applications.

114 citations


Proceedings ArticleDOI
03 Aug 2003
TL;DR: This paper describes the character recognition process from printed documents containing Hindi and Telugu text using a bilingual recognizer based on Principal Component Analysis followed by support vector classification.
Abstract: This paper describes the character recognition process from printed documents containing Hindi and Telugu text. Hindi and Telugu are among the most popular languages in India. The bilingual recognizer is based on Principal Component Analysis followed by support vector classification. This attains an overall accuracy of approximately 96.7%. Extensive experimentation is carried out on an independent test set of approximately 200000 characters. Applications based on this OCR are sketched.

Proceedings ArticleDOI
18 Sep 2003
TL;DR: An efficient algorithm which can automatically detect, localize and extract horizontally aligned text in images (and digital videos) with complex backgrounds is presented.
Abstract: Text detection in images or videos is an important step to achieve multimedia content retrieval. In this paper, an efficient algorithm which can automatically detect, localize and extract horizontally aligned text in images (and digital videos) with complex backgrounds is presented. The proposed approach is based on the application of a color reduction technique, a method for edge detection, and the localization of text regions using projection profile analyses and geometrical properties. The output of the algorithm are text boxes with a simplified background, ready to be fed into an OCR engine for subsequent character recognition. Our proposal is robust with respect to different font sizes, font colors, languages and background complexities. The performance of the approach is demonstrated by presenting promising experimental results for a set of images taken from different types of video sequences.

Proceedings ArticleDOI
27 May 2003
TL;DR: A statistical model for attribute extraction that represents both the syntactical structure of references and OCR error patterns is proposed and it is shown that the proposed model has advantages in reducing the cost of preparing training data.
Abstract: In this paper, we propose a method for extracting bibliographic attributes from reference strings captured using Optical Character Recognition (OCR) and an extended hidden Markov model. Bibliographic attribute extraction can be used in two ways. One is reference parsing in which attribute values are extracted from OCR-processed references for bibliographic matching. The other is reference alignment in which attribute values are aligned to the bibliographic record to enrich the vocabulary of the bibliographic database. In this paper, we first propose a statistical model for attribute extraction that represents both the syntactical structure of references and OCR error patterns. Then, we perform experiments using bibliographic references obtained from scanned images of papers in journals and transactions and show that useful attribute values are extracted from OCR-processed references. We also show that the proposed model has advantages in reducing the cost of preparing training data, a critical problem in rule-based systems.

Proceedings ArticleDOI
13 Jan 2003
TL;DR: Several dissimilarity measures for binary vectors are formulated and examined for their recognition capability in handwriting identification for which the binary micro-features are used to characterize handwritten character shapes and conclusions are made on how to choose a Dissimilarity measure and how to combine hybrid features.
Abstract: Several dissimilarity measures for binary vectors are formulated and examined for their recognition capability in handwriting identification for which the binary micro-features are used to characterize handwritten character shapes Pertaining to eight dissimilarity measures, ie, Jaccard-Needham, Dice, Correlation, Yule, Russell-Rao, Sokal-Michener, Rogers-Tanmoto and Kulzinsky, the discriminary power of ten individual characters and their combination is exhaustively studied Conclusions are made on how to choose a dissimilarity measure and how to combine hybrid features

Proceedings ArticleDOI
03 Aug 2003
TL;DR: A simpler connected component analysis and regression technique is proposed for OCR accuracy improvement that is computationally less expensive and is resolution independent too.
Abstract: Image warping is a common problem when one scans or photocopies a document page from a thick bound volume, resulting in shading and curved text lines in the spine area of the bound volume. This will not only impair readability, but will also reduce the OCR accuracy. Further to our earlier attempt to correct such images, this paper proposes a simpler connected component analysis and regression technique. Compared to our earlier method, the present system is computationally less expensive and is resolution independent too. The implementation of the new system and improvement of OCR accuracy are presented in this paper.

Proceedings ArticleDOI
03 Aug 2003
TL;DR: The approach is to create a visual challenge that is easy for humans but difficult for a computer to recognize a string of random distorted characters, which presents hard segmentation problems that humans are particularly apt at solving.
Abstract: How do you tell a computer from a human? The situation arises often on the Internet, when online polls are conducted, accounts are requested, undesired email is received, and chat-rooms are spammed. The approach we use is to create a visual challenge that is easy for humans but difficult for a computer. More specifically, our challenge is to recognize a string of random distorted characters. To pass the challenge, the subject must type in the correct corresponding ASCII string. From an OCR point of view, this problem is interesting because our goal is to use the vast amount of accumulated knowledge to defeat the state of the art OCR algorithms. This is a role reversal from traditional OCR research. Unlike many other systems, our algorithm is based on the assumption that segmentation is much more difficult than recognition. Our image challenges present hard segmentation problems that humans are particularly apt at solving. The technology is currently being used in MSN's Hotmail registration system, where it has significantly reduced daily registration rate with minimal Consumer Support impact.

Proceedings ArticleDOI
27 May 2003
TL;DR: A generative probabilistic optical character recognition model is introduced that describes an end-to-end process in the noisy channel framework, progressing from generation of true text through its transformation into the noisy output of an OCR system.
Abstract: In this paper, we introduce a generative probabilistic optical character recognition (OCR) model that describes an end-to-end process in the noisy channel framework, progressing from generation of true text through its transformation into the noisy output of an OCR system. The model is designed for use in error correction, with a focus on post-processing the output of black-box OCR systems in order to make it more useful for NLP tasks. We present an implementation of the model based on finite-state models, demonstrate the model's ability to significantly reduce character and word error rate, and provide evaluation results involving automatic extraction of translation lexicons from printed text.

Journal ArticleDOI
TL;DR: It is shown that in order to improve classification results obtained with single classifiers, it is necessary to combine several sources of information either at the level of feature extraction/description, or at the classification stage, orat both levels.
Abstract: In this paper, we present a review of the state of the art in the current classification techniques used in the optical character recognition of the Arabic script (AOCR). We consider multiple sources of information-based hybrid approaches and multiple classifiers. We show that in order to improve classification results obtained with single classifiers, it is necessary to combine several sources of information either at the level of feature extraction/description, or at the classification stage, or at both levels. We provide a qualitative comparison and discuss the strengths and weaknesses of these approaches.

Journal ArticleDOI
TL;DR: Handwritten Chinese characters can be recognized by first extracting the basic shapes (radicals) of which they are composed by using nonlinear active shape models and optimal parameters found using the chamfer distance transform and a dynamic tunneling algorithm.
Abstract: Handwritten Chinese characters can be recognized by first extracting the basic shapes (radicals) of which they are composed. Radicals are described by nonlinear active shape models and optimal parameters found using the chamfer distance transform and a dynamic tunneling algorithm. The radical recognition rate is 96.5 percent correct (writer-independent) on 280,000 characters containing 98 radical classes.

Journal ArticleDOI
TL;DR: It is demonstrated that trained SVMs with a radial basis function kernel segment satisfactorily (unseen) ultrasound B-mode images as well as clinical ultrasonic images.

Journal ArticleDOI
TL;DR: An original hybrid MLP-SVM method for unconstrained handwritten digits recognition, based on the idea that the correct digit class almost systematically belongs to the two maximum MLP outputs and that some pairs of digit classes constitute the majority of MLP substitutions (errors).
Abstract: This paper presents an original hybrid MLP-SVM method for unconstrained handwritten digits recognition. Specialized Support Vector Machines (SVMs) are introduced to improve significantly the multilayer perceptron (MLP) performance in local areas around the separating surfaces between each pair of digit classes, in the input pattern space. This hybrid architecture is based on the idea that the correct digit class almost systematically belongs to the two maximum MLP outputs and that some pairs of digit classes constitute the majority of MLP substitutions (errors). Specialized local SVMs are introduced to detect the correct class among these two classification hypotheses. The hybrid MLP-SVM recognizer achieves a recognition rate of $98.01\%$ , for real mail zipcode digits recognition task. By introducing a rejection mechanism based on the distances provided by the local SVMs, the error/reject trade-off performance of our recognition system is better than several classifiers reported in recent research.

Proceedings ArticleDOI
03 Aug 2003
TL;DR: This paper examines some optimization strategies for an HMM classifier that works with continuous feature values and uses the Baum-Welch training algorithm, and introduces the free parameters of the optimization procedure, which are the number of states of a model, thenumber of training iterations, and theNumber of Gaussian mixtures for each state.
Abstract: In off-line handwriting recognition, classifiers based on hidden Markov models (HMMs) have become very popular. However, while there exist well-established training algorithms, such as the Baum-Welsh procedure, which optimize the transition and output probabilities of a given HMM architecture, the architecture itself, and in particular the number of states, must be chosen "by hand". Also the number of training iterations and the output distributions need to be defined by the system designer. In this paper we examine some optimization strategies for an HMM classifier that works with continuous feature values and uses the Baum-Welch training algorithm. The free parameters of the optimization procedure introduced in this paper are the number of states of a model, the number of training iterations, and the number of Gaussian mixtures for each state. The proposed optimization strategies are evaluated in the context of a handwritten word recognition task.

Proceedings ArticleDOI
16 Jul 2003
TL;DR: This paper presents a technique for the automatic recognition of Arabic printed text using artificial neural networks and the main features of the system are preprocessing of the text, segmentation of theText to individual characters, feature extraction using moment invariant technique and recognition using RBF network.
Abstract: Optical character recognition (OCR) systems provide human-machine interaction and are widely used in many applications. Much research has already been done on the recognition of Latin, Chinese and Japanese characters. Against this background, it has been experienced that only few papers have specifically addressed to the problem of Arabic text recognition and languages using Arabic script like Urdu and Parsi. This is due to the lack of interest in this field and in part due to the complex nature of the Arabic language. This paper presents a technique for the automatic recognition of Arabic printed text using artificial neural networks. The main features of the system are preprocessing of the text, segmentation of the text to individual characters, feature extraction using moment invariant technique and recognition using RBF network.

Patent
17 Apr 2003
TL;DR: In this paper, a method and system for translating written text from a first (foreign) language to a second (native) language is provided, where an image containing the text is first captured at the request of the user, and text zones are identified in the image and the zones are converted to text characters using optical character recognition.
Abstract: A method and system for translating written text from a first (foreign) language to a second (native) language is provided. An image containing the text is first captured at the request of the user. Text zones are identified in the image and the zones are converted to text characters using optical character recognition. The text characters, which are in the first language, are translated to the second language. The translated text is then output to the user. The text may be converted to an image that can be displayed on a display or, alternatively, the text may be synthesized into speech that may be played over a speaker accessible to the user such as an earpiece. Data can be provided to the user as text, audio or text and audio combined.

Proceedings ArticleDOI
15 Oct 2003
TL;DR: It has been found from experimental results that success rate is approximately 98% for isolated characters and 96% for continuous character in this OCR system for Bengali character.
Abstract: This paper is concerned with a complete optical character recognition (OCR) system for Bengali character. Recognition is done for both isolated and continuous printed multi font Bengali characters. Preprocessing steps includes segmentation in various levels, noise removal and scaling. Freeman chain code has been calculated from scaled character which is further processed to obtain a discriminating set of feature vectors for the recognizer. The unknown samples are classified using feed forward neural network based recognition scheme. It has been found from experimental results that success rate is approximately 98% for isolated characters and 96% for continuous character.

Proceedings ArticleDOI
TL;DR: An evaluation on the effects of different types of information used for video retrieval from a video collection found image matching and video OCR proved to be the deciding aspects of video information retrieval.
Abstract: Video contains multiple types of audio and visual information, which are difficult to extract, combine or trade-off in general video information retrieval. This paper provides an evaluation on the effects of different types of information used for video retrieval from a video collection. A number of different sources of information are present in most typical broadcast video collections and can be exploited for information retrieval. We will discuss the contributions of automatically recognized speech transcripts, image similarity matching, face detection and video OCR in the contexts of experiments performed as part of 2001 TREC Video Retrieval Track evaluation performed by the National Institute of Standards and Technology. For the queries used in this evaluation, image matching and video OCR proved to be the deciding aspects of video information retrieval.

Proceedings ArticleDOI
01 Jan 2003
TL;DR: By structuring video content, this work can support both topic indexing and semantic querying of multimedia documents and two major techniques in this proposed approach include video text analysis and speech recognition.
Abstract: We present an automatic and novel approach in structuring and indexing lecture videos for distance learning applications. By structuring video content, we can support both topic indexing and semantic querying of multimedia documents. our aim is to link the discussion topics extracted from the electronic slides with their associated video and audio segments. Two major techniques in our proposed approach include video text analysis and speech recognition. Initially, a video is partitioned into shots based on slide transitions. For each shot, the embedded video texts are detected, reconstructed and segmented as high-resolution foreground texts for commercial OCR recognition. The recognized texts can then be matched with their associated slides for video indexing. Meanwhile, both phrases (title) and keywords (content) are also extracted from the electronic slides to spot the speech signals. The spotted phrases and keywords are further utilized as queries to retrieve the most similar slide for speech indexing.

Patent
31 Dec 2003
TL;DR: In this article, an optical character recognition (OCR) process is performed on the stored, electronic image of the document to correct digit errors in the stored data read from the documents.
Abstract: System and method for the processing of MICR documents that produce read errors. MICR documents are read and sorted to a destination pocket for processing subject to a determination that existing digit errors do not prevent the routing of the document. In example embodiments, an error does not prevent the routing of the document if it is not related to the routing/transit field. An optical character recognition (OCR) process is performed on the stored, electronic image of the document to correct digit errors in the stored data read from the documents. If a determination is mode that the correction cannot be determined through the OCR process, the image and corresponding MICR data is displayed on a user terminal, for manual correction by reference to an image of the document, rather than the document itself.

Book ChapterDOI
20 Oct 2003
TL;DR: This study attempts to develop a method for embedding watermark in the text that is as successful as the frequency-domain methods have been for image and audio.
Abstract: Numerous schemes have been designed for watermarking multimedia contents. Many of these schemes are vulnerable to watermark erasing attacks. Naturally, such methods are ineffective on text unless the text is represented as a bitmap image, but in that case, the watermark can be erased easily by using Optical Character Recognition (OCR) to change the representation of the text from a bitmap to ASCII or EBCDIC. This study attempts to develop a method for embedding watermark in the text that is as successful as the frequency-domain methods have been for image and audio. The novel method embeds the watermark in original text, creating ciphertext, which preserves the meaning of the original text via various semantic replacements.

Proceedings ArticleDOI
27 May 2003
TL;DR: A technique based on graph combinatorics is used to rejoin the appropriate connected components of degraded historical printed documents and has been applied to real data with successful results.
Abstract: This paper presents a new technique for dealing with broken characters, one of the major challenges in the optical character recognition (OCR) of degraded historical printed documents. A technique based on graph combinatorics is used to rejoin the appropriate connected components. It has been applied to real data with successful results.