Showing papers on "Optical character recognition published in 2003"

PDF

Open Access

Proceedings Article•DOI•

Recognizing objects in adversarial clutter: breaking a visual CAPTCHA

[...]

Greg Mori¹, Jitendra Malik¹•Institutions (1)

18 Jun 2003

TL;DR: Efficient methods based on shape context matching are developed that can identify the word in an EZ-Gimpy image with a success rate of 92%, and the requisite 3 words in a Gimpy image 33% of the time.

...read moreread less

Abstract: In this paper we explore object recognition in clutter. We test our object recognition techniques on Gimpy and EZ-Gimpy, examples of visual CAPTCHAs. A CAPTCHA ("Completely Automated Public Turing test to Tell Computers and Humans Apart") is a program that can generate and grade tests that most humans can pass, yet current computer programs can't pass. EZ-Gimpy, currently used by Yahoo, and Gimpy are CAPTCHAs based on word recognition in the presence of clutter. These CAPTCHAs provide excellent test sets since the clutter they contain is adversarial; it is designed to confuse computer programs. We have developed efficient methods based on shape context matching that can identify the word in an EZ-Gimpy image with a success rate of 92%, and the requisite 3 words in a Gimpy image 33% of the time. The problem of identifying words in such severe clutter provides valuable insight into the more general problem of object recognition in scenes. The methods that we present are instances of a framework designed to tackle this general problem.

...read moreread less

681 citations

Proceedings Article•DOI•

Progress in camera-based document image analysis

[...]

David Doermann¹, Jian Liang¹, Huiping Li¹•Institutions (1)

University of Maryland, College Park¹

03 Aug 2003

TL;DR: A survey of application domains, technical challenges and solutions for recognizing documents captured by digital cameras, and some sample applications under development and feasible ideas for future development is presented.

...read moreread less

Abstract: The increasing availability of high performance, low priced, portable digital imaging devices has created a tremendous opportunity for supplementing traditional scanning for document image acquisition. Digital cameras attached to cellular phones, PDAs, or as standalone still or video devices are highly mobile and easy to use; they can capture images of any kind of document including very thick books, historical pages too fragile to touch, and text in scenes; and they are much more versatile than desktop scanners. Should robust solutions to the analysis of documents captured with such devices become available, there is clearly a demand from many domains. Traditional scanner-based document analysis techniques provide us with a good reference and starting point, but they cannot be used directly on camera-captured images. Camera captured images can suffer from low resolution, blur, and perspective distortion, as well as complex layout and interaction of the content and background. In this paper we present a survey of application domains, technical challenges and solutions for recognizing documents captured by digital cameras. We begin by describing typical imaging devices and the imaging process. We discuss document analysis from a single camera-captured image as well as multiple frames and highlight some sample applications under development and feasible ideas for future development.

...read moreread less

295 citations

Journal Article•DOI•

Distance sets for shape filters and shape recognition

[...]

C. Grigorescu¹, Nicolai Petkov¹•Institutions (1)

University of Groningen¹

01 Oct 2003-IEEE Transactions on Image Processing

TL;DR: This work addresses two problems that are often encountered in object recognition: object segmentation, for which a distance sets shape filter is formulated, and shape matching, which is illustrated on printed and handwritten character recognition and detection of traffic signs in complex scenes.

...read moreread less

Abstract: We introduce a novel rich local descriptor of an image point, we call the (labeled) distance set, which is determined by the spatial arrangement of image features around that point. We describe a two-dimensional (2D) visual object by the set of (labeled) distance sets associated with the feature points of that object. Based on a dissimilarity measure between (labeled) distance sets and a dissimilarity measure between sets of (labeled) distance sets, we address two problems that are often encountered in object recognition: object segmentation, for which we formulate a distance sets shape filter, and shape matching. The use of the shape filter is illustrated on printed and handwritten character recognition and detection of traffic signs in complex scenes. The shape comparison procedure is illustrated on handwritten character classification, COIL-20 database object recognition and MPEG-7 silhouette database retrieval.

...read moreread less

256 citations

Proceedings Article•DOI•

INFTY: an integrated OCR system for mathematical documents

[...]

Masakazu Suzuki¹, Fumikazu Tamari², Ryoji Fukuda³, Seiichi Uchida¹, Toshihiro Kanahori - Show less +1 more•Institutions (3)

Kyushu University¹, Fukuoka University of Education², Oita University³

20 Nov 2003

TL;DR: An integrated OCR system for mathematical documents, called INFTY, is presented, which shows high character recognition rates on both mathematical expressions and ordinary texts, and sufficient performance on the structure analysis of the mathematical expressions.

...read moreread less

Abstract: An integrated OCR system for mathematical documents, called INFTY, is presented. INFTY consists of four procedures, i.e., layout analysis, character recognition, structure analysis of mathematical expressions, and manual error correction. In those procedures, several novel techniques are utilized for better recognition performance. Experimental results on about 500 pages of mathematical documents showed high character recognition rates on both mathematical expressions and ordinary texts, and sufficient performance on the structure analysis of the mathematical expressions.

...read moreread less

182 citations

Proceedings Article•DOI•

Recognition of cursive Roman handwriting: past, present and future

[...]

Horst Bunke¹•Institutions (1)

University of Bern¹

03 Aug 2003

TL;DR: The state of the art in off-line Roman cursive handwriting recognition is reviewed, recent trends are analyzed, and challenges for future research in this field are identified.

...read moreread less

Abstract: This paper reviews the state of the art in off-line Roman cursive handwriting recognition. The input provided to an off-line handwriting recognition system is an image of a digit, a word, or - more generally -some text, and the system produces, as output, an ASCII transcription of the input. This task involves a number of processing steps, some of which are quite difficult. Typically, preprocessing, normalization, feature extraction, classification, and postprocessing operations are required. We'll survey the state of the art, analyze recent trends, and try to identify challenges for future research in this field.

...read moreread less

178 citations

Journal Article•DOI•

Multiple classifier decision combination strategies for character recognition: A review

[...]

A. F. Rahman¹, Michael Fairhurst²•Institutions (2)

CA Technologies¹, University of Kent²

01 Jul 2003-International Journal on Document Analysis and Recognition

TL;DR: This paper explicitly reviews the field of multiple classifier decision combination strategies for character recognition, from some of its early roots to the present day and illustrates explicitly how the principles underlying the application of multi-classifier approaches to character recognition can easily generalise to a wide variety of different task domains.

...read moreread less

Abstract: Two research strands, each identifying an area of markedly increasing importance in the current development of pattern analysis technology, underlie the review covered by this paper, and are drawn together to offer both a task-oriented and a fundamentally generic perspective on the discipline of pattern recognition. The first of these is the concept of decision fusion for high-performance pattern recognition, where (often very diverse) classification technologies, each providing complementary sources of information about class membership, can be integrated to provide more accurate, robust and reliable classification decisions. The second is the rapid expansion in technology for the automated analysis of (especially) handwritten data for OCR applications including document and form processing, pen-based computing, forensic analysis, biometrics and security, and many other areas, especially those which seek to provide online or offline processing of data which is available in a human-oriented medium. Classifier combination/multiple expert processing has a long history, but the sheer volume and diversity of possible strategies now available suggest that it is timely to consider a structured review of the field. Handwritten character processing provides an ideal context for such a review, both allowing engagement with a problem area which lends itself ideally to the performance enhancements offered by multi-classifier configurations, but also allowing a clearer focus to what otherwise, because of the unlimited application horizons, would be a task of unmanageable proportions. Hence, this paper explicitly reviews the field of multiple classifier decision combination strategies for character recognition, from some of its early roots to the present day. In order to give structure and a sense of direction to the review, a new taxonomy for categorising approaches is defined and explored, and this both imposes a discipline on the presentation of the material available and helps to clarify the mechanisms by which multi-classifier configurations deliver performance enhancements. The review incorporates a discussion both of processing structures themselves and a range of important related topics which are essential to maximise an understanding of the potential of such structures. Most importantly, the paper illustrates explicitly how the principles underlying the application of multi-classifier approaches to character recognition can easily generalise to a wide variety of different task domains.

...read moreread less

138 citations

Proceedings Article•DOI•

Multi-script line identification from Indian documents

[...]

Umapada Pal¹, Suranjit Sinha¹, Bidyut B. Chaudhuri¹•Institutions (1)

Indian Statistical Institute¹

03 Aug 2003

TL;DR: An automatic scheme is presented to identify text lines of different Indian scripts from a document with an overall accuracy of about 97.52% based on water reservoir principle, contour tracing, profileetc.

...read moreread less

Abstract: A document page may contain two or more different scripts.For Optical Character Recognition (OCR) of such adocument page, it is necessary to separate different scriptsbefore feeding them to their individual OCR system. In thispaper an automatic scheme is presented to identify text linesof different Indian scripts from a document. For theseparation task at first the scripts are grouped into a fewclasses according to script characteristics. Next featurebased on water reservoir principle, contour tracing, profileetc. are employed to identify them without any expensiveOCR-like algorithms. At present, the system has an overallaccuracy of about 97.52%.

...read moreread less

133 citations

Patent•

Automatic detection and segmentation of music videos in an audio/video stream

[...]

Lie Lu¹, Yan-Feng Sun¹, Mingjing Li, Xian-Sheng Hua, Hong-Jiang Zhang - Show less +1 more•Institutions (1)

Microsoft¹

19 Feb 2003

TL;DR: In this article, a music video parser automatically detects and segments music videos in a combined audio-video media stream by integrating shot boundary detection, video text detection and audio analysis to automatically detect temporal boundaries of each music video in the media stream.

...read moreread less

Abstract: A “music video parser” automatically detects and segments music videos in a combined audio-video media stream. Automatic detection and segmentation is achieved by integrating shot boundary detection, video text detection and audio analysis to automatically detect temporal boundaries of each music video in the media stream. In one embodiment, song identification information, such as, for example, a song name, artist name, album name, etc., is automatically extracted from the media stream using video optical character recognition (OCR). This information is then used in alternate embodiments for cataloging, indexing and selecting particular music videos, and in maintaining statistics such as the times particular music videos were played, and the number of times each music video was played.

...read moreread less

131 citations

Journal Article•DOI•

Character location in scene images from digital camera

[...]

Kongqiao Wang¹, Jari Kangas¹•Institutions (1)

Nokia¹

01 Oct 2003-Pattern Recognition

TL;DR: A robust, connected-component-based character locating method using an aligning-and-merging-analysis (AMA) scheme to locate all the potential characters using the information about the bounding boxes of connected components in all color layers.

...read moreread less

117 citations

High Performance Document Layout Analysis

[...]

Thomas M. Breuel

01 Jan 2003

TL;DR: This paper summarize research in document layout analysis carried out over the last few years in the laboratory, which has developed a number of novel geometric algorithms and statistical methods that are applicable to a wide variety of languages and layouts.

...read moreread less

Abstract: In this paper, I summarize research in document layout analysis carried out over the last few years in our laboratory. Correct document layout analysis is a key step in document capture conversions into electronic formats, optical character recognition (OCR), information retrieval from scanned documents, appearance-based document retrieval, and reformatting of documents for on-screen display. We have developed a number of novel geometric algorithms and statistical methods. Layout analysis systems built from these algorithms are applicable to a wide variety of languages and layouts, and have proven to be robust to the presence of noise and spurious features in a page image. The system itself consists of reusable and independent software modules that can be reconfigured to be adapted to different languages and applications. Currently, we are using them for electronic book and document capture applications. If there is commercial or government demand, we are interested in adapting these tools to information retrieval and intelligence applications.

...read moreread less

114 citations

Proceedings Article•DOI•

A bilingual OCR for Hindi-Telugu documents and its applications

[...]

C. V. Jawahar, M. N. S. S. K. Pavan Kumar, S.S. Ravi Kiran

03 Aug 2003

TL;DR: This paper describes the character recognition process from printed documents containing Hindi and Telugu text using a bilingual recognizer based on Principal Component Analysis followed by support vector classification.

...read moreread less

Abstract: This paper describes the character recognition process from printed documents containing Hindi and Telugu text. Hindi and Telugu are among the most popular languages in India. The bilingual recognizer is based on Principal Component Analysis followed by support vector classification. This attains an overall accuracy of approximately 96.7%. Extensive experimentation is carried out on an independent test set of approximately 200000 characters. Applications based on this OCR are sketched.

...read moreread less

Proceedings Article•DOI•

A robust algorithm for text detection in images

[...]

Julinda Gllavata¹, Ralph Ewerth¹, Bernd Freisleben¹•Institutions (1)

University of Siegen¹

18 Sep 2003

TL;DR: An efficient algorithm which can automatically detect, localize and extract horizontally aligned text in images (and digital videos) with complex backgrounds is presented.

...read moreread less

Abstract: Text detection in images or videos is an important step to achieve multimedia content retrieval. In this paper, an efficient algorithm which can automatically detect, localize and extract horizontally aligned text in images (and digital videos) with complex backgrounds is presented. The proposed approach is based on the application of a color reduction technique, a method for edge detection, and the localization of text regions using projection profile analyses and geometrical properties. The output of the algorithm are text boxes with a simplified background, ready to be fed into an OCR engine for subsequent character recognition. Our proposal is robust with respect to different font sizes, font colors, languages and background complexities. The performance of the approach is demonstrated by presenting promising experimental results for a set of images taken from different types of video sequences.

...read moreread less

Proceedings Article•DOI•

Bibliographic attribute extraction from erroneous references based on a statistical model

[...]

Atsuhiro Takasu¹•Institutions (1)

National Institute of Informatics¹

27 May 2003

TL;DR: A statistical model for attribute extraction that represents both the syntactical structure of references and OCR error patterns is proposed and it is shown that the proposed model has advantages in reducing the cost of preparing training data.

...read moreread less

Abstract: In this paper, we propose a method for extracting bibliographic attributes from reference strings captured using Optical Character Recognition (OCR) and an extended hidden Markov model. Bibliographic attribute extraction can be used in two ways. One is reference parsing in which attribute values are extracted from OCR-processed references for bibliographic matching. The other is reference alignment in which attribute values are aligned to the bibliographic record to enrich the vocabulary of the bibliographic database. In this paper, we first propose a statistical model for attribute extraction that represents both the syntactical structure of references and OCR error patterns. Then, we perform experiments using bibliographic references obtained from scanned images of papers in journals and transactions and show that useful attribute values are extracted from OCR-processed references. We also show that the proposed model has advantages in reducing the cost of preparing training data, a critical problem in rule-based systems.

...read moreread less

Proceedings Article•DOI•

Binary vector dissimilarity measures for handwriting identification

[...]

Bin Zhang¹, Sargur N. Srihari¹•Institutions (1)

University at Buffalo¹

13 Jan 2003

TL;DR: Several dissimilarity measures for binary vectors are formulated and examined for their recognition capability in handwriting identification for which the binary micro-features are used to characterize handwritten character shapes and conclusions are made on how to choose a Dissimilarity measure and how to combine hybrid features.

...read moreread less

Abstract: Several dissimilarity measures for binary vectors are formulated and examined for their recognition capability in handwriting identification for which the binary micro-features are used to characterize handwritten character shapes Pertaining to eight dissimilarity measures, ie, Jaccard-Needham, Dice, Correlation, Yule, Russell-Rao, Sokal-Michener, Rogers-Tanmoto and Kulzinsky, the discriminary power of ten individual characters and their combination is exhaustively studied Conclusions are made on how to choose a dissimilarity measure and how to combine hybrid features

...read moreread less

Proceedings Article•DOI•

Correcting document image warping based on regression of curved text lines

[...]

Zheng Zhang¹, Chew Lim Tan¹•Institutions (1)

National University of Singapore¹

03 Aug 2003

TL;DR: A simpler connected component analysis and regression technique is proposed for OCR accuracy improvement that is computationally less expensive and is resolution independent too.

...read moreread less

Abstract: Image warping is a common problem when one scans or photocopies a document page from a thick bound volume, resulting in shading and curved text lines in the spine area of the bound volume. This will not only impair readability, but will also reduce the OCR accuracy. Further to our earlier attempt to correct such images, this paper proposes a simpler connected component analysis and regression technique. Compared to our earlier method, the present system is computationally less expensive and is resolution independent too. The implementation of the new system and improvement of OCR accuracy are presented in this paper.

...read moreread less

Proceedings Article•DOI•

Using character recognition and segmentation to tell computer from humans

[...]

Patrice Y. Simard¹, Richard Szeliski¹, Josh Benaloh¹, Julien D. Couvreur¹, Iulian D. Calinov¹ - Show less +1 more•Institutions (1)

Microsoft¹

03 Aug 2003

TL;DR: The approach is to create a visual challenge that is easy for humans but difficult for a computer to recognize a string of random distorted characters, which presents hard segmentation problems that humans are particularly apt at solving.

...read moreread less

Abstract: How do you tell a computer from a human? The situation arises often on the Internet, when online polls are conducted, accounts are requested, undesired email is received, and chat-rooms are spammed. The approach we use is to create a visual challenge that is easy for humans but difficult for a computer. More specifically, our challenge is to recognize a string of random distorted characters. To pass the challenge, the subject must type in the correct corresponding ASCII string. From an OCR point of view, this problem is interesting because our goal is to use the vast amount of accumulated knowledge to defeat the state of the art OCR algorithms. This is a role reversal from traditional OCR research. Unlike many other systems, our algorithm is based on the assumption that segmentation is much more difficult than recognition. Our image challenges present hard segmentation problems that humans are particularly apt at solving. The technology is currently being used in MSN's Hotmail registration system, where it has significantly reduced daily registration rate with minimal Consumer Support impact.

...read moreread less

Proceedings Article•DOI•

A generative probabilistic OCR model for NLP applications

[...]

Okan Kolak¹, William Byrne², Philip Resnik¹•Institutions (2)

University of Maryland, College Park¹, Johns Hopkins University²

27 May 2003

TL;DR: A generative probabilistic optical character recognition model is introduced that describes an end-to-end process in the noisy channel framework, progressing from generation of true text through its transformation into the noisy output of an OCR system.

...read moreread less

Abstract: In this paper, we introduce a generative probabilistic optical character recognition (OCR) model that describes an end-to-end process in the noisy channel framework, progressing from generation of true text through its transformation into the noisy output of an OCR system. The model is designed for use in error correction, with a focus on post-processing the output of black-box OCR systems in order to make it more useful for NLP tasks. We present an implementation of the model based on finite-state models, demonstrate the model's ability to significantly reduce character and word error rate, and provide evaluation results involving automatic extraction of translation lexicons from printed text.

...read moreread less

Journal Article•DOI•

Classification of Arabic script using multiple sources of information: State of the art and perspectives

[...]

Najoua Essoukri Ben Amara¹, Faouzi Bouslama²•Institutions (2)

École Normale Supérieure¹, Zayed University²

01 Jul 2003-International Journal on Document Analysis and Recognition

TL;DR: It is shown that in order to improve classification results obtained with single classifiers, it is necessary to combine several sources of information either at the level of feature extraction/description, or at the classification stage, orat both levels.

...read moreread less

Abstract: In this paper, we present a review of the state of the art in the current classification techniques used in the optical character recognition of the Arabic script (AOCR). We consider multiple sources of information-based hybrid approaches and multiple classifiers. We show that in order to improve classification results obtained with single classifiers, it is necessary to combine several sources of information either at the level of feature extraction/description, or at the classification stage, or at both levels. We provide a qualitative comparison and discuss the strengths and weaknesses of these approaches.

...read moreread less

Journal Article•DOI•

Handwritten Chinese radical recognition using nonlinear active shape models

[...]

Daming Shi¹, Steve R. Gunn², Robert I. Damper²•Institutions (2)

Nanyang Technological University¹, University of Southampton²

01 Feb 2003-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Handwritten Chinese characters can be recognized by first extracting the basic shapes (radicals) of which they are composed by using nonlinear active shape models and optimal parameters found using the chamfer distance transform and a dynamic tunneling algorithm.

...read moreread less

Abstract: Handwritten Chinese characters can be recognized by first extracting the basic shapes (radicals) of which they are composed. Radicals are described by nonlinear active shape models and optimal parameters found using the chamfer distance transform and a dynamic tunneling algorithm. The radical recognition rate is 96.5 percent correct (writer-independent) on 280,000 characters containing 98 radical classes.

...read moreread less

Journal Article•DOI•

Segmentation of ultrasonic images using support vector machines

[...]

Constantine Kotropoulos¹, Ioannis Pitas¹•Institutions (1)

Aristotle University of Thessaloniki¹

01 Feb 2003-Pattern Recognition Letters

TL;DR: It is demonstrated that trained SVMs with a radial basis function kernel segment satisfactorily (unseen) ultrasound B-mode images as well as clinical ultrasonic images.

...read moreread less

Journal Article•DOI•

An MLP-SVM combination architecture for offline handwritten digit recognition

[...]

A. Bellili¹, M. Gilloux, Patrick Gallinari¹•Institutions (1)

Pierre-and-Marie-Curie University¹

01 Jul 2003-International Journal on Document Analysis and Recognition

TL;DR: An original hybrid MLP-SVM method for unconstrained handwritten digits recognition, based on the idea that the correct digit class almost systematically belongs to the two maximum MLP outputs and that some pairs of digit classes constitute the majority of MLP substitutions (errors).

...read moreread less

Abstract: This paper presents an original hybrid MLP-SVM method for unconstrained handwritten digits recognition. Specialized Support Vector Machines (SVMs) are introduced to improve significantly the multilayer perceptron (MLP) performance in local areas around the separating surfaces between each pair of digit classes, in the input pattern space. This hybrid architecture is based on the idea that the correct digit class almost systematically belongs to the two maximum MLP outputs and that some pairs of digit classes constitute the majority of MLP substitutions (errors). Specialized local SVMs are introduced to detect the correct class among these two classification hypotheses. The hybrid MLP-SVM recognizer achieves a recognition rate of $98.01\%$ , for real mail zipcode digits recognition task. By introducing a rejection mechanism based on the distances provided by the local SVMs, the error/reject trade-off performance of our recognition system is better than several classifiers reported in recent research.

...read moreread less

Proceedings Article•DOI•

Optimizing the number of states, training iterations and Gaussians in an HMM-based handwritten word recognizer

[...]

Simon Günter¹, Horst Bunke¹•Institutions (1)

University of Bern¹

03 Aug 2003

TL;DR: This paper examines some optimization strategies for an HMM classifier that works with continuous feature values and uses the Baum-Welch training algorithm, and introduces the free parameters of the optimization procedure, which are the number of states of a model, thenumber of training iterations, and theNumber of Gaussian mixtures for each state.

...read moreread less

Abstract: In off-line handwriting recognition, classifiers based on hidden Markov models (HMMs) have become very popular. However, while there exist well-established training algorithms, such as the Baum-Welsh procedure, which optimize the transition and output probabilities of a given HMM architecture, the architecture itself, and in particular the number of states, must be chosen "by hand". Also the number of training iterations and the output distributions need to be defined by the system designer. In this paper we examine some optimization strategies for an HMM classifier that works with continuous feature values and uses the Baum-Welch training algorithm. The free parameters of the optimization procedure introduced in this paper are the number of states of a model, the number of training iterations, and the number of Gaussian mixtures for each state. The proposed optimization strategies are evaluated in the context of a handwritten word recognition task.

...read moreread less

Proceedings Article•DOI•

Offline Arabic text recognition system

[...]

M. Sarfraz¹, S.N. Nawaz¹, A. Al-Khuraidly¹•Institutions (1)

King Fahd University of Petroleum and Minerals¹

16 Jul 2003

TL;DR: This paper presents a technique for the automatic recognition of Arabic printed text using artificial neural networks and the main features of the system are preprocessing of the text, segmentation of theText to individual characters, feature extraction using moment invariant technique and recognition using RBF network.

...read moreread less

Abstract: Optical character recognition (OCR) systems provide human-machine interaction and are widely used in many applications. Much research has already been done on the recognition of Latin, Chinese and Japanese characters. Against this background, it has been experienced that only few papers have specifically addressed to the problem of Arabic text recognition and languages using Arabic script like Urdu and Parsi. This is due to the lack of interest in this field and in part due to the complex nature of the Arabic language. This paper presents a technique for the automatic recognition of Arabic printed text using artificial neural networks. The main features of the system are preprocessing of the text, segmentation of the text to individual characters, feature extraction using moment invariant technique and recognition using RBF network.

...read moreread less

Patent•

System and method for translating languages using portable display device

[...]

Robert Thomas Arenburg¹, Franck Barillaud¹, Bradford L. Cobb¹, Gary Hook¹•Institutions (1)

IBM¹

17 Apr 2003

TL;DR: In this paper, a method and system for translating written text from a first (foreign) language to a second (native) language is provided, where an image containing the text is first captured at the request of the user, and text zones are identified in the image and the zones are converted to text characters using optical character recognition.

...read moreread less

Abstract: A method and system for translating written text from a first (foreign) language to a second (native) language is provided. An image containing the text is first captured at the request of the user. Text zones are identified in the image and the zones are converted to text characters using optical character recognition. The text characters, which are in the first language, are translated to the second language. The translated text is then output to the user. The text may be converted to an image that can be displayed on a display or, alternatively, the text may be synthesized into speech that may be played over a speaker accessible to the user such as an earpiece. Data can be provided to the user as text, audio or text and audio combined.

...read moreread less

Proceedings Article•DOI•

A complete OCR system for continuous Bengali characters

[...]

Jalal Mahmud¹, M.F. Raihan¹, Chowdhury Mofizur Rahman¹•Institutions (1)

Bangladesh University of Engineering and Technology¹

15 Oct 2003

TL;DR: It has been found from experimental results that success rate is approximately 98% for isolated characters and 96% for continuous character in this OCR system for Bengali character.

...read moreread less

Abstract: This paper is concerned with a complete optical character recognition (OCR) system for Bengali character. Recognition is done for both isolated and continuous printed multi font Bengali characters. Preprocessing steps includes segmentation in various levels, noise removal and scaling. Freeman chain code has been calculated from scaled character which is further processed to obtain a discriminating set of feature vectors for the recognizer. The unknown samples are classified using feed forward neural network based recognition scheme. It has been found from experimental results that success rate is approximately 98% for isolated characters and 96% for continuous character.

...read moreread less

Proceedings Article•DOI•

Video retrieval using speech and image information

[...]

Alexander G. Hauptmann¹, Rong Jin¹, Tobun D. Ng¹•Institutions (1)

Carnegie Mellon University¹

20 Jan 2003-Storage and Retrieval for Image and Video Databases

TL;DR: An evaluation on the effects of different types of information used for video retrieval from a video collection found image matching and video OCR proved to be the deciding aspects of video information retrieval.

...read moreread less

Abstract: Video contains multiple types of audio and visual information, which are difficult to extract, combine or trade-off in general video information retrieval. This paper provides an evaluation on the effects of different types of information used for video retrieval from a video collection. A number of different sources of information are present in most typical broadcast video collections and can be exploited for information retrieval. We will discuss the contributions of automatically recognized speech transcripts, image similarity matching, face detection and video OCR in the contexts of experiments performed as part of 2001 TREC Video Retrieval Track evaluation performed by the National Institute of Standards and Technology. For the queries used in this evaluation, image matching and video OCR proved to be the deciding aspects of video information retrieval.

...read moreread less

Proceedings Article•DOI•

Structuring lecture videos for distance learning applications

[...]

Chong-Wah Ngo¹, Feng Wang, Ting-Chuen Pong•Institutions (1)

City University of Hong Kong¹

01 Jan 2003

TL;DR: By structuring video content, this work can support both topic indexing and semantic querying of multimedia documents and two major techniques in this proposed approach include video text analysis and speech recognition.

...read moreread less

Abstract: We present an automatic and novel approach in structuring and indexing lecture videos for distance learning applications. By structuring video content, we can support both topic indexing and semantic querying of multimedia documents. our aim is to link the discussion topics extracted from the electronic slides with their associated video and audio segments. Two major techniques in our proposed approach include video text analysis and speech recognition. Initially, a video is partitioned into shots based on slide transitions. For each shot, the embedded video texts are detected, reconstructed and segmented as high-resolution foreground texts for commercial OCR recognition. The recognized texts can then be matched with their associated slides for video indexing. Meanwhile, both phrases (title) and keywords (content) are also extracted from the electronic slides to spot the speech signals. The spotted phrases and keywords are further utilized as queries to retrieve the most similar slide for speech indexing.

...read moreread less

Patent•

System and method for the processing of MICR documents that produce read errors

[...]

David Craig Mcglamery¹, Kathryn Gerrald Harrington¹•Institutions (1)

Bank of America¹

31 Dec 2003

TL;DR: In this article, an optical character recognition (OCR) process is performed on the stored, electronic image of the document to correct digit errors in the stored data read from the documents.

...read moreread less

Abstract: System and method for the processing of MICR documents that produce read errors. MICR documents are read and sorted to a destination pocket for processing subject to a determination that existing digit errors do not prevent the routing of the document. In example embodiments, an error does not prevent the routing of the document if it is not related to the routing/transit field. An optical character recognition (OCR) process is performed on the stored, electronic image of the document to correct digit errors in the stored data read from the documents. If a determination is mode that the correction cannot be determined through the OCR process, the image and corresponding MICR data is displayed on a user terminal, for manual correction by reference to an image of the document, rather than the document itself.

...read moreread less

Book Chapter•DOI•

Natural Language Watermarking Using Semantic Substitution for Chinese Text

[...]

Chiang Yuei-Lin, Chang Lu-Ping, Hsieh Wen-Tai, Chen Wen-Chih

20 Oct 2003

TL;DR: This study attempts to develop a method for embedding watermark in the text that is as successful as the frequency-domain methods have been for image and audio.

...read moreread less

Abstract: Numerous schemes have been designed for watermarking multimedia contents. Many of these schemes are vulnerable to watermark erasing attacks. Naturally, such methods are ineffective on text unless the text is represented as a bitmap image, but in that case, the watermark can be erased easily by using Optical Character Recognition (OCR) to change the representation of the text from a bitmap to ASCII or EBCDIC. This study attempts to develop a method for embedding watermark in the text that is as successful as the frequency-domain methods have been for image and audio. The novel method embeds the watermark in original text, creating ciphertext, which preserves the meaning of the original text via various semantic replacements.

...read moreread less

Proceedings Article•DOI•

Correcting broken characters in the recognition of historical printed documents

[...]

Michael Droettboom¹•Institutions (1)

Johns Hopkins University¹

27 May 2003

TL;DR: A technique based on graph combinatorics is used to rejoin the appropriate connected components of degraded historical printed documents and has been applied to real data with successful results.

...read moreread less

Abstract: This paper presents a new technique for dealing with broken characters, one of the major challenges in the optical character recognition (OCR) of degraded historical printed documents. A technique based on graph combinatorics is used to rejoin the appropriate connected components. It has been applied to real data with successful results.

...read moreread less