scispace - formally typeset
Search or ask a question

Showing papers on "Optical character recognition published in 2005"


Book
24 Nov 2005
TL;DR: This 2005 book provides a needed review of signal processing theory, the pattern recognition metrics, and the practical application know-how from basic premises and shows both digital and optical implementations.
Abstract: Correlation is a robust and general technique for pattern recognition and is used in many applications, such as automatic target recognition, biometric recognition and optical character recognition The design, analysis and use of correlation pattern recognition algorithms requires background information, including linear systems theory, random variables and processes, matrix/vector methods, detection and estimation theory, digital signal processing and optical processing This 2005 book provides a needed review of this diverse background material and develops the signal processing theory, the pattern recognition metrics, and the practical application know-how from basic premises It shows both digital and optical implementations It also contains technology presented by the team that developed it and includes case studies of significant interest, such as face and fingerprint recognition Suitable for graduate students taking courses in pattern recognition theory, whilst reaching technical levels of interest to the professional practitioner

366 citations


Journal ArticleDOI
TL;DR: This paper broke down the robust reading problem into three subproblems and run competitions for each stage, and also a competition for the best overall system, and described an algorithm for combining the outputs of the individual text locators and showed how the combination scheme improves on any of theindividual systems.
Abstract: This paper describes the robust reading competitions for ICDAR 2003. With the rapid growth in research over the last few years on recognizing text in natural scenes, there is an urgent need to establish some common benchmark datasets and gain a clear understanding of the current state of the art. We use the term `robust reading' to refer to text images that are beyond the capabilities of current commercial OCR packages. We chose to break down the robust reading problem into three subproblems and run competitions for each stage, and also a competition for the best overall system. The subproblems we chose were text locating, character recognition and word recognition. By breaking down the problem in this way, we hoped to gain a better understanding of the state of the art in each of the subproblems. Furthermore, our methodology involved storing detailed results of applying each algorithm to each image in the datasets, allowing researchers to study in depth the strengths and weaknesses of each algorithm. The text-locating contest was the only one to have any entries. We give a brief description of each entry and present the results of this contest, showing cases where the leading entries succeed and fail. We also describe an algorithm for combining the outputs of the individual text locators and show how the combination scheme improves on any of the individual systems.

266 citations


Patent
10 Jun 2005
TL;DR: An intelligent document recognition-based document management system as discussed by the authors includes modules for image capture, image enhancement, image identification, optical character recognition (OCR), data extraction, and quality assurance.
Abstract: An intelligent document recognition-based document management system (Fig. 2) includes modules for image capture (32), image enhancement (32), image identification (34), optical character recognition (36), data extraction (37) and quality assurance (42). The system captures data from electronic documents as diverse as facsimile images, scanned images and images from document management systems. It processes these images and presents the data in, for example, a standard XML format. The document management system processes both structured document images (40) (ones which have a standard format) and unstructured document images (38) (ones which do not have a standard format). The system can extract images directly from a facsimile machine, a scanner or a document management system for processing.

233 citations


Proceedings ArticleDOI
31 Aug 2005
TL;DR: This paper describes the Arabic handwriting recognition competition held at ICDAR 2007, again uses the IFN/ENIT-database with Arabic handwritten Tunisian town names, and 8 groups with 14 systems are participating in the competition.
Abstract: This paper describes the Arabic handwriting recognition competition held at ICDAR 2007. This second competition (the first was at ICDAR 2005) again uses the IFN/ENIT-database with Arabic handwritten Tunisian town names. Today, more than 54 research groups from universities, research centers, and industry are working with this database worldwide. This year, 8 groups with 14 systems are participating in the competition. The systems were tested on known data and on two datasets which are unknown to the participants. The systems are compared on the most important characteristic, the recognition rate. Additionally, the relative speed of the different systems were compared. A short description of the participating groups, their systems, and the results achieved are finally presented.

220 citations


Journal ArticleDOI
TL;DR: A novel scale space algorithm for automatically segmenting handwritten (historical) documents into words is described and it is shown that the technique outperforms a state-of-the-art gap metrics word-segmentation algorithm on this collection.
Abstract: Many libraries, museums, and other organizations contain large collections of handwritten historical documents, for example, the papers of early presidents like George Washington at the Library of Congress. The first step in providing recognition/retrieval tools is to automatically segment handwritten pages into words. State of the art segmentation techniques like the gap metrics algorithm have been mostly developed and tested on highly constrained documents like bank checks and postal addresses. There has been little work on full handwritten pages and this work has usually involved testing on clean artificial documents created for the purpose of research. Historical manuscript images, on the other hand, contain a great deal of noise and are much more challenging. Here, a novel scale space algorithm for automatically segmenting handwritten (historical) documents into words is described. First, the page is cleaned to remove margins. This is followed by a gray-level projection profile algorithm for finding lines in images. Each line image is then filtered with an anisotropic Laplacian at several scales. This procedure produces blobs which correspond to portions of characters at small scales and to words at larger scales. Crucial to the algorithm is scale selection that is, finding the optimum scale at which blobs correspond to words. This is done by finding the maximum over scale of the extent or area of the blobs. This scale maximum is estimated using three different approaches. The blobs recovered at the optimum scale are then bounded with a rectangular box to recover the words. A post processing filtering step is performed to eliminate boxes of unusual size which are unlikely to correspond to words. The approach is tested on a number of different data sets and it is shown that, on 100 sampled documents from the George Washington corpus of handwritten document images, a total error rate of 17 percent is observed. The technique outperforms a state-of-the-art gap metrics word-segmentation algorithm on this collection.

199 citations


Journal ArticleDOI
TL;DR: In this article, the authors investigate the use of texture as a tool for determining the script of a document image, based on the observation that text has a distinct visual texture, and provide a qualitative measure of which texture features are most appropriate for this task.
Abstract: The problem of determining the script and language of a document image has a number of important applications in the field of document analysis, such as indexing and sorting of large collections of such images, or as a precursor to optical character recognition (OCR). In this paper, we investigate the use of texture as a tool for determining the script of a document image, based on the observation that text has a distinct visual texture. An experimental evaluation of a number of commonly used texture features is conducted on a newly created script database, providing a qualitative measure of which features are most appropriate for this task. Strategies for improving classification results in situations with limited training data and multiple font types are also proposed.

168 citations


Proceedings ArticleDOI
20 Jun 2005
TL;DR: This paper proposes to model the page surface by a developable surface, and exploit the properties of the printed textual content on the page to recover the surface shape, and is the first reported method able to process general curved documents in images without camera calibration.
Abstract: Compared to scanned images, document pictures captured by camera can suffer from distortions due to perspective and page warping. It is necessary to restore a frontal planar view of the page before other OCR techniques can be applied. In this paper we describe a novel approach for flattening a curved document in a single picture captured by an uncalibrated camera. To our knowledge this is the first reported method able to process general curved documents in images without camera calibration. We propose to model the page surface by a developable surface, and exploit the properties (parallelism and equal line spacing) of the printed textual content on the page to recover the surface shape. Experiments show that the output images are much more OCR friendly than the original ones. While our method is designed to work with any general developable surfaces, it can be adapted for typical special cases including planar pages, scans of thick books, and opened books.

153 citations


Book ChapterDOI
22 Aug 2005
TL;DR: The efficiency of the proposed method is demonstrated by using a performance evaluation scheme which considers a great variety of documents such as forms, newspapers/magazines, scientific journals, tickets/bank cheques, certificates and handwritten documents.
Abstract: In this paper, we propose a novel technique for automatic table detection in document images. Lines and tables are among the most frequent graphic, non-textual entities in documents and their detection is directly related to the OCR performance as well as to the document layout description. We propose a workflow for table detection that comprises three distinct steps: (i) image pre-processing; (ii) horizontal and vertical line detection and (iii) table detection. The efficiency of the proposed method is demonstrated by using a performance evaluation scheme which considers a great variety of documents such as forms, newspapers/magazines, scientific journals, tickets/bank cheques, certificates and handwritten documents.

125 citations


Proceedings ArticleDOI
17 Jan 2005
TL;DR: The implementation and evaluation of a Hidden Markov Model to extract addresses from OCR text is presented and it is shown that this type of Information Extraction task seems to be affected negatively by the presence of OCRtext.
Abstract: This paper presents the implementation and evaluation of a Hidden Markov Model to extract addresses from OCR text. Although Hidden Markov Models discover addresses with high precision and recall, this type of Information Extraction task seems to be affected negatively by the presence of OCR text.

118 citations


Patent
Bret C. Taylor1, Luc Vincent1
16 Dec 2005
TL;DR: In this paper, a digital mapping database is used as prior information or constraints for an OCR engine that is interpreting the corresponding street scene image, resulting in much greater accuracy of the digital map data provided to the user.
Abstract: Optical character recognition (OCR) for images such as a street scene image is generally a difficult problem because of the variety of fonts, styles, colors, sizes, orientations, occlusions and partial occlusions that can be observed in the textual content of such scenes. However, a database query can provide useful information that can assist the OCR process. For instance, a query to a digital mapping database can provide information such as one or more businesses in a vicinity, the street name, and a range of possible addresses. In accordance with an embodiment of the present invention, this mapping information is used as prior information or constraints for an OCR engine that is interpreting the corresponding street scene image, resulting in much greater accuracy of the digital map data provided to the user.

105 citations


Patent
21 Nov 2005
TL;DR: In this article, an ontology is used to resolve ambiguities in an input string of characters, where the values of some of the characters in the language elements are uncertain.
Abstract: Systems, and associated apparatus, methods, or computer program products, may use ontologies to provide improved word recognition. The ontologies may be applied in word recognition processes to resolve ambiguities in language elements (e.g., words) where the values of some of the characters in the language elements are uncertain. Implementations of the method may use an ontology to resolve ambiguities in an input string of characters, for example. In some implementations, the input string may be received from a language conversion source such as, for example, an optical character recognition (OCR) device that generates a string of characters in electronic form from visible character images, or a voice recognition (VR) device that generates a string of characters in electronic form from speech input. Some implementations may process the generated character strings by using an ontology in combination with syntactic and/or grammatical analysis engines to further improve word recognition accuracy.

Journal ArticleDOI
TL;DR: A new technique is proposed, capable of removing perspective distortion and recovering the fronto-parallel view of text with a single image, which is carried out using character stroke boundaries and tip points based on multiple fuzzy sets and morphological operators.

Patent
01 Apr 2005
TL;DR: In this article, a portable reading machine that operates in several modes and performs image preprocessing to prior to optical character recognition is presented, where the reading machine receives a low resolution image and a high resolution image of a scene and processing the low-resolution image to recognize a user-initiated gesture using a gesturing item.
Abstract: A portable reading machine that operates in several modes and performs image preprocessing to prior to optical character recognition. The portable reading machine receives a low resolution image and a high resolution image of a scene and processing the low resolution image to recognize a user-initiated gesture using a gesturing item that indicates a command from the user to the reading machine and the high resolution image to recognize text in the image of the scene, according to the command from the user to the machine.

Patent
11 Jul 2005
TL;DR: In this article, a method, system and apparatus for facilitating transcription and captioning of multi-media content are presented, which include automatic multi-modal analysis operations that produce information which is presented to an operator as suggestions for spoken words, spoken word timing, caption segmentation, caption playback timing and caption placement.
Abstract: A method, system and apparatus for facilitating transcription and captioning of multi-media content are presented. The method, system, and apparatus include automatic multi-media analysis operations that produce information which is presented to an operator as suggestions for spoken words, spoken word timing, caption segmentation, caption playback timing, caption mark-up such as non-spoken cues or speaker identification, caption formatting, and caption placement. Spoken word suggestions are primarily created through an automatic speech recognition operation, but may be enhanced by leveraging other elements of the multi-media content, such as correlated text and imagery by using text extracted with an optical character recognition operation. Also included is an operator interface that allows the operator to efficiently correct any of the aforementioned suggestions. In the case of word suggestions, in addition to best hypothesis word choices being presented to the operator, alternate word choices are presented for quick selection via the operator interface. Ongoing operator corrections can be leveraged to improve the remaining suggestions. Additionally, an automatic multi-media playback control capability further assists the operator during the correction process.

Patent
01 Apr 2005
TL;DR: A reading machine that operates in various modes includes image correction processing is described in this article, where the reading device preprocesses an image prior to optical character recognition processing by detecting distortion in an image of a page by measuring an extent to which page boundaries in the image deviate from a simple rectangular shape and correcting for the optical distortion by transforming the image to restore the page to a rectangular shape.
Abstract: A reading machine that operates in various modes includes image correction processing is described. The reading device pre-processes an image prior to optical character recognition processing by detect distortion in an image of a page by measuring an extent to which page boundaries in the image deviate from a simple rectangular shape and correcting for the optical distortion by transforming the image to restore the page to a rectangular shape.

Patent
14 Feb 2005
TL;DR: In this paper, a cellular phone is provided with a media scanning capability, which enables image or text scanning, facsimile, text-to-speech conversion, and language translation.
Abstract: A cellular phone is provided with a media scanning capability. Scanner optics, an optional light source and related scanning circuitry is integrated within a cellular phone to enable image or text scanning, facsimile, text-to-speech conversion, and language translation. Position sensors provide position data as the scanner is manually moved, in one or more passes across the scanned media, to enable a bit-mapped image of the strip to be created in a data buffer. Image data from the strips is processed to remove redundant overlap data and skew position errors, to give a bit-mapped final image of the entire scanned item. Image compression is provided to compress the image into standard JPEG format for storage or transmission, or into facsimile format for transmission of the document to any fax machine. Optical character recognition (OCR) is provided to convert image data to text which may be sent as email, locally displayed, stored for later use, or further processed. Further processing of text data includes language translation and text to speech conversion of either the original or translated text. The resulting speech audio can be heard locally or transmitted over the cellular network.

Journal ArticleDOI
TL;DR: This work proposes an approach that reliably rectifies and subsequently recognizes individual lines of text in real-world text that has been rigorously tested on still imagery as well as on MPEG-2 video clips in real time.
Abstract: Real-world text on street signs, nameplates, etc. often lies in an oblique plane and hence cannot be recognized by traditional OCR systems due to perspective distortion. Furthermore, such text often comprises only one or two lines, preventing the use of existing perspective rectification methods that were primarily designed for images of document pages. We propose an approach that reliably rectifies and subsequently recognizes individual lines of text. Our system, which includes novel algorithms for extraction of text from real-world scenery, perspective rectification, and binarization, has been rigorously tested on still imagery as well as on MPEG-2 video clips in real time.

ReportDOI
02 May 2005
TL;DR: A system, which uses language support of the MS Windows operating system combined with custom print drivers to render tiff images simultaneously with windows Enhanced Metafile directives, and used for training and evaluating Optical Character Recognition (OCR) systems.
Abstract: : The problem of generating synthetic data for the training and evaluation of document analysis systems has been widely addressed in recent years. With the increased interest in processing multilingual sources, however, there is a tremendous need to be able to rapidly generate data in new languages and scripts, without the need to develop specialized systems. We have developed a system, which uses language support of the MS Windows operating system combined with custom print drivers to render tiff images simultaneously with windows Enhanced Metafile directives. The metafile information is parsed to generate zone, line, word, and character ground truth including location, font information and content in any language supported by Windows. The resulting images can be physically or synthetically degraded by our degradation modules, and used for training and evaluating Optical Character Recognition (OCR) systems. Our document image degradation methodology incorporates several often-encountered types of noise at the page and pixel levels. Examples of OCR evaluation and synthetically degraded document images are given to demonstrate the effectiveness.

Journal ArticleDOI
TL;DR: Optical Character Recognition (OCR) refers to the process of converting printed Tamil text documents into software translated Unicode Tamil Text.
Abstract: Optical Character Recognition (OCR) refers to the process of converting printed Tamil text documents into software translated Unicode Tamil Text. The printed documents available in the form of books, papers, magazines, etc. are scanned using standard scanners which produce an image of the scanned document. As part of the preprocessing phase the image file is checked for skewing. If the image is skewed, it is corrected by a simple rotation technique in the appropriate direction. Then the image is passed through a noise elimination phase and is binarized. The preprocessed image is segmented using an algorithm which decomposes the scanned text into paragraphs using special space detection technique and then the paragraphs into lines using vertical histograms, and lines into words using horizontal histograms, and words into character image glyphs using horizontal histograms. Each image glyph is comprised of 32×32 pixels. Thus a database of character image glyphs is created out of the segmentation phase. Then all the image glyphs are considered for recognition using Unicode mapping. Each image glyph is passed through various routines which extract the features of the glyph. The various features that are considered for classification are the character height, character width, the number of horizontal lines (long and short), the number of vertical lines (long and short), the horizontally oriented curves, the vertically oriented curves, the number of circles, number of slope lines, image centroid and special dots. The glyphs are now set ready for classification based on these features. The extracted features are passed to a Support Vector Machine (SVM) where the characters are classified by Supervised Learning Algorithm. These classes are mapped onto Unicode for recognition. Then the text is reconstructed using Unicode fonts.

Proceedings ArticleDOI
31 Aug 2005
TL;DR: This paper explores an analytical method that uses a formal printer/scanner degradation model to identify the similarity between groups of degraded characters and this similarity is shown to improve the recognition accuracy of a classifier through model directed choice of training set data.
Abstract: Printing and scanning of text documents introduces degradations to the characters which can be modeled. Interestingly, certain combinations of the parameters that govern the degradations introduced by the printing and scanning process affect characters in such a way that the degraded characters have a similar appearance, while other degradations leave the characters with an appearance that is very different. It is well known that (generally speaking), a test set that more closely matches a training set is recognized with higher accuracy than one that matches the training set less well. Likewise, classifiers tend to perform better on data sets that have lower variance. This paper explores an analytical method that uses a formal printer/scanner degradation model to identify the similarity between groups of degraded characters. This similarity is shown to improve the recognition accuracy of a classifier through model directed choice of training set data.

Book ChapterDOI
20 Dec 2005
TL;DR: The work presents an application of Dempster-Shafer technique for combination of classification decisions obtained from two Multi Layer Perceptron (MLP) based classifiers for optical character recognition (OCR) of handwritten Bangla digits using two different feature sets.
Abstract: The work presents an application of Dempster-Shafer (DS) technique for combination of classification decisions obtained from two Multi Layer Perceptron (MLP) based classifiers for optical character recognition (OCR) of handwritten Bangla digits using two different feature sets. Bangla is the second most popular script in the Indian subcontinent and the fifth most popular language in the world. The two feature sets used for the work are so designed that they can supply complementary information, at least to some extent, about the classes of digit patterns to the MLP classifiers. On experimentation with a database of 6000 samples, the technique is found to improve recognition performances by a minimum of 1.2% and a maximum of 2.32% compared to the average recognition rate of the individual MLP classifiers after 3-fold cross validation of results. The overall recognition rate as observed for the same is 95.1% on average.

Patent
01 Apr 2005
TL;DR: A reading machine that operates in various modes including image correction processing is described in this article, where the reading device preprocesses an image for optical character recognition by receiving the image and determining whether text in the image is too large or small for OCR processing by determining that text height falls outside of a range in which OCR software will recognize text in a digitized image.
Abstract: A reading machine that operates in various modes includes image correction processing is described. The reading device pre-processes an image for optical character recognition by receiving the image and determining whether text in the image is too large or small for optical character recognition processing by determining that text height falls outside of a range in which optical character recognition software will recognize text in a digitized image. If necessary the image is resized according to whether the text is too large or too small.

Proceedings ArticleDOI
31 Aug 2005
TL;DR: This work proposes a novel algorithm that over-segments each word, and then removes extra breakpoints using knowledge of letter shapes, and annotates each detected letter with shape information, to be used for recognition in future work.
Abstract: We propose a novel algorithm for the segmentation and prerecognition of offline handwritten Arabic text. Our character segmentation method over-segments each word, and then removes extra breakpoints using knowledge of letter shapes. On a test set of 200 images, 92.3% of the segmentation points were detected correctly, with 5.1% instances of over-segmentation. The prerecognition component annotates each detected letter with shape information, to be used for recognition in future work.

01 Jan 2005
TL;DR: In this paper, a support vector machine (SVM) is used to extract features of each image glyph and then the extracted features are passed to a SVM where the characters are classified by Supervised Learning Algorithm.
Abstract: Optical Character Recognition (OCR) refers to the process of converting printed Tamil text documents into software translated Unicode Tamil Text. The printed documents available in the form of books, papers, magazines, etc. are scanned using standard scanners which produce an image of the scanned document. As part of the preprocessing phase the image file is checked for skewing. Ifthe image is skewed, it is corrected by a simple rotation technique in the appropriate direction. Then the image is passed through a noise elimination phase and is binarized. The preprocessed image is segmented using an algorithm which decomposes the scanned text into paragraphs using special space detection technique and then the paragraphs into lines using vertical histograms, and lines into words using horizontal histograms, and words into character image glyphs using horizontal histograms.Each image glyph is comprised of 32x32 pixels. Thus a database of character image glyphs is created out of the segmentation phase. Then all the image glyphs are considered for recognition using Unicode mapping. Each image glyph is passed through various routines which extract the features of the glyph. The various features that are considered for classification are the character height, character width, the number of horizontal lines (long and short), the number of vertical lines (long and short), the horizontally oriented curves, the vertically oriented curves, the number of circles, number of slope lines, image centroid and special dots. The glyphs are now set ready for classification based on these features. The extracted features are passed to a Support Vector Machine (SVM) where the characters are classified by Supervised Learning Algorithm. These classes are mapped onto Unicode for recognition. Then the text is reconstructed using Unicode fonts.

Patent
Khalid M. Rabb1
08 Jun 2005
TL;DR: In this paper, the authors provide/acquire a document in electronic form (e.g., by receiving, copying, retrieving from storage, scanning combined with optical character recognition, etc.) and receive user input regarding visual impairment.
Abstract: Embodiments herein provide/acquire a document in electronic form (e.g., by receiving, copying, retrieving from storage, scanning combined with optical character recognition, etc.) and receive user input regarding visual impairment. In response to one or more levels of user visual impairment, embodiments herein automatically change (for example, immediately after scanning text) an appearance of the document, without requiring any user input, other than the visual impairment input. More specifically, when changing the appearance of the document, embodiments herein can increase the size of characters in the document, change the contrast or coloring of the text and/or background, and provide text-to-speech conversion of the document, thereby (in one embodiment) producing audio output of the text-to-speech conversion in coordination with a corresponding portion of the document being displayed. When changing the appearance of the document, embodiments herein also reformat the document (e.g., around graphic elements) to accommodate the increased size of the characters.

Patent
12 Dec 2005
Abstract: In a system for updating a contacts database (42, 46), a portable imager (12) acquires a digital business card image (10). An image segmenter (16) extracts text image segments from the digital business card image. An optical character recognizer (OCR) (26) generates one or more textual content candidates for each text image segment. A scoring processor (36) scores each textual content candidate based on results of database queries respective to the textual content candidates. A content selector (38) selects a textual content candidate for each text image segment based at least on the assigned scores. An interface (50) is configured to update the contacts list based on the selected textual content candidates.

Proceedings ArticleDOI
20 Jun 2005
TL;DR: A Bayesian super-resolution algorithm that uses a text-specific bimodal prior improved the readability of 4- to 7-pixel-high scene text significantly better than bicubic interpolation, and increased the accuracy of OCR results better than the piecewise smoothness prior.
Abstract: To increase the range of sizes of video scene text recognizable by optical character recognition (OCR), we developed a Bayesian super-resolution algorithm that uses a text-specific bimodal prior. We evaluated the effectiveness of the bimodal prior, compared with and in conjunction with a piecewise smoothness prior, visually and by measuring the accuracy of the OCR results on the variously super-resolved images. The bimodal prior improved the readability of 4- to 7-pixel-high scene text significantly better than bicubic interpolation, and increased the accuracy of OCR results better than the piecewise smoothness prior.

Proceedings ArticleDOI
31 Aug 2005
TL;DR: A camera based optical character reader for Japanese Kanji characters was implemented on a mobile phone and recognition accuracy of over 95% was obtained under the best conditions, which shows the potential of the prototype as a new type of electronic dictionary.
Abstract: A camera based optical character reader (OCR) for Japanese Kanji characters was implemented on a mobile phone. This OCR has three key features. The first is discriminative feature extraction (DFE) which enables a character classifier needing only small memory size. The second is a word segmentation method specially designed for looking up Japanese words in a dictionary. The third feature is a GUI suitable for a mobile phone. A prototype mobile phone Kanji OCR was constructed and experimentally tested. Recognition accuracy of over 95% was obtained under the best conditions, which shows the potential of our prototype as a new type of electronic dictionary.

Proceedings ArticleDOI
31 Aug 2005
TL;DR: A novel segmentation-free approach for keyword search in historical typewritten documents combining image preprocessing, synthetic data creation, word spotting and user feedback technologies is proposed.
Abstract: In this paper, we propose a novel segmentation-free approach for keyword search in historical typewritten documents combining image preprocessing, synthetic data creation, word spotting and user feedback technologies. Our aim is to search for keywords typed by the user in a large collection of digitized typewritten historical documents. The proposed method is based on: (i) image preprocessing for image binarization and enhancement, noisy border and frame removal, orientation and skew correction; (ii) creation of synthetic image words from keywords typed by the user; (Hi) word segmentation using dynamic parameters; (iv) efficient feature extraction for each image word and (v) a retrieval procedure that is optimized by user's feedback. Experimental results prove the efficiency of the proposed approach.

Journal ArticleDOI
TL;DR: This paper presents a method for determining the up/down orientation of text in a scanned document of unknown orientation, so that it can be appropriately rotated and processed by an optical character recognition (OCR) engine.