scispace - formally typeset
Search or ask a question

Showing papers on "Optical character recognition published in 2000"


Journal ArticleDOI
TL;DR: This work presents algorithms for detecting and tracking text in digital video that implements a scale-space feature extractor that feeds an artificial neural processor to detect text blocks.
Abstract: Text that appears in a scene or is graphically added to video can provide an important supplemental source of index information as well as clues for decoding the video's structure and for classification. In this work, we present algorithms for detecting and tracking text in digital video. Our system implements a scale-space feature extractor that feeds an artificial neural processor to detect text blocks. Our text tracking scheme consists of two modules: a sum of squared difference (SSD) based module to find the initial position and a contour-based module to refine the position. Experiments conducted with a variety of video sources show that our scheme can detect and track text robustly.

635 citations


Proceedings ArticleDOI
11 Dec 2000
TL;DR: Presents a learning-based approach for the construction of a license-plate recognition system that has shown the following performances on average: car detection rate 100%, segmentation rate 97.5%, and character recognition rate about 97.2%.
Abstract: Presents a learning-based approach for the construction of a license-plate recognition system. The system consists of three modules. They are, respectively, the car detection module, the license-plate segmentation module and the recognition module. The car detection module detects a car in a given image sequence obtained from a camera with a simple color-based approach. The segmentation module extracts the license plate in the detected car image using neural networks as filters for analyzing the color and texture properties of the license plate. The recognition module then reads the characters on the detected license plate with a support vector machine (SVM)-based character recognizer. The system has been tested with 1000 video sequences obtained from toll-gates, parking lots, etc., and has shown the following performances on average: car detection rate 100%, segmentation rate 97.5%, and character recognition rate about 97.2%.

222 citations


Proceedings ArticleDOI
Mei Yu1, Yong Deak Kim
08 Oct 2000
TL;DR: A vertical edge matching based algorithm to recognize a Korean license plate from an input gray-scale image is proposed and is able to recognize license plates in normal shape, as well as plates that are out of shape due to the angle of view.
Abstract: License plate recognition (LPR) has many applications in traffic monitoring systems. In this paper, a vertical edge matching based algorithm to recognize a Korean license plate from an input gray-scale image is proposed. The algorithm is able to recognize license plates in normal shape, as well as plates that are out of shape due to the angle of view. The proposed algorithm is fast enough and the recognition unit of the LPR system can be implemented only in software so that the cost of the system is reduced.

169 citations


Journal ArticleDOI
01 Jul 2000
TL;DR: The reading process has been widely studied and there is a general agreement among researchers that knowledge in different forms and at different levels plays a vital role, which is the underlying philosophy of the Devanagari document recognition system described in this work.
Abstract: The reading process has been widely studied and there is a general agreement among researchers that knowledge in different forms and at different levels plays a vital role. This is the underlying philosophy of the Devanagari document recognition system described in this work. The knowledge sources we use are mostly statistical in nature or in the form of a word dictionary tailored specifically for optical character recognition (OCR). We do not perform any reasoning on these. However, we explore their relative importance and role in the hierarchy. Some of the knowledge sources are acquired a priori by an automated training process while others are extracted from the text as it is processed. A complete Devanagari OCR system has been designed and tested with real-life printed documents of varying size and font. Most of the documents used were photocopies of the original. A performance of approximately 90% correct recognition is achieved.

132 citations


Journal ArticleDOI
TL;DR: A new algorithm for skew detection is described and the performance and results of this skew detection algorithm are compared to other publidhed methods form O'Gorman, Hinds, Le, Baird, Posel and Akuyama.
Abstract: Document image processing has become an increasingly important technology in the automation of office documentation tasks. Automatic document scanners such as text readers and OCR (Optical Character Recognition) systems are an essential component of systems capable of those tasks. One of the problems in this field is that the document to be read is not always placed correctly on a flatbed scanner. This means that the document may be skewed on the scanner bed, resulting in a skewed image. This skew has a detrimental effect on document on document analysis, document understanding, and character segmentation and recognition. Consequently, detecting the skew of a document image and correcting it are important issues in realising a practical document reader. In this paper we describe a new algorithm for skew detection. We then compare the performance and results of this skew detection algorithm to other publidhed methods form O'Gorman, Hinds, Le, Baird, Posel and Akuyama. Finally, we discuss the theory of skew detection and the different apporaches taken to solve the problem of skew in documents. The skew correction algorithm we propose has been shown to be extremenly fast, with run times averaging under 0.25 CPU seconds to calculate the angle on the DEC 5000/20 workstation.

127 citations


PatentDOI
TL;DR: In this paper, an intelligent camera system and method for recognizing license plates, in accordance with the invention, includes a camera adapted to independently capture a license plate image and recognize the image.
Abstract: An intelligent camera system and method for recognizing license plates, in accordance with the invention, includes a camera adapted to independently capture a license plate image and recognize the license plate image. The camera includes a processor for managing image data and executing a license plate recognition program device. The license plate recognition program device includes a program for detecting orientation, position, illumination conditions and blurring of the image and accounting for the orientations, position, illumination conditions and blurring of the image to obtain a baseline image of the license plate. A segmenting program for segmenting characters depicted in the baseline image by employing a projection along a horizontal axis of the baseline image to identify positions of the characters. A statistical classifier is adapted for classifying the characters. The classifier recognizes the characters and returns a confidence score based on the probability of properly identifying each character. A memory is included for storing the license plate recognition program and the license plate images taken by an image capture device of the camera.

115 citations


Proceedings ArticleDOI
01 Sep 2000
TL;DR: A system for recognition of machine printed Gurmukhi script operates at sub-character level and a recognition rate of 96.6% at the processing speed of 175 characters second was achieved on clean images of text without employing any post-processing technique.
Abstract: A system for recognition of machine printed Gurmukhi script is presented. The recognition system presented operates at sub-character level. The segmentation process breaks a word into sub-characters and the recognition phase consists of classifying these sub-characters and combining them to form Gurmukhi characters. A set of very simple and easy to computer features is used and a hybrid classification scheme consisting of binary decision trees and nearest neighbours is employed. A recognition rate of 96.6% at the processing speed of 175 characters second was achieved on clean images of text without employing any post-processing technique.

114 citations


Journal ArticleDOI
TL;DR: This paper provides an update on Doermann's comprehensive survey of research results in the broad area of document-based information retrieval, and focuses on methods that manipulate document images directly, and perform various information processing tasks such as retrieval, categorization, and summarization, without attempting to completely recognize the textual content of the document.
Abstract: Given the phenomenal growth in the variety and quantity of data available to users through electronic media, there is a great demand for efficient and effective ways to organize and search through all this information. Besides speech, our principal means of communication is through visual media, and in particular, through documents. In this paper, we provide an update on Doermann's comprehensive survey (1998) of research results in the broad area of document-based information retrieval. The scope of this survey is also somewhat broader, and there is a greater emphasis on relating document image analysis methods to conventional IR methods. Documents are available in a wide variety of formats. Technical papers are often available as ASCII files of clean, correct, text. Other documents may only be available as hardcopies. These documents have to be scanned and stored as images so that they may be processed by a computer. The textual content of these documents may also be extracted and recognized using OCR methods. Our survey covers the broad spectrum of methods that are required to handle different formats like text and images. The core of the paper focuses on methods that manipulate document images directly, and perform various information processing tasks such as retrieval, categorization, and summarization, without attempting to completely recognize the textual content of the document. We start, however, with a brief overview of traditional IR techniques that operate on clean text. We also discuss research dealing with text that is generated by running OCR on document images. Finally, we also briefly touch on the related problem of content-based image retrieval.

112 citations


Proceedings ArticleDOI
05 Jun 2000
TL;DR: A new text detection and segmentation algorithm that is especially designed for being applied to color images with complicated background and to binarize efficiently the detected text areas so that they can be processed by standard OCR software.
Abstract: Text is a very powerful index in content-based image and video indexing. We propose a new text detection and segmentation algorithm that is especially designed for being applied to color images with complicated background. Our goal is to minimize the number of false alarms and to binarize efficiently the detected text areas so that they can be processed by standard OCR software. First, potential areas of text are detected by enhancement and clustering processes, considering most of constraints related to the texture of words. Then, classification and binarization of potential text areas are achieved in a single scheme performing color quantization and characters periodicity analysis. We report a high rate of good detection results with very few false alarms and reliable text binarization.

109 citations


Journal Article
TL;DR: This paper presents a few approaches that enable large-scale information retrieval for the TELLTALE system and compares several different types of query methods such as tf.idf and incremental similarity to the original technique of centroid subtraction.
Abstract: Information retrieval has become more and more important due to the rapid growth of all kinds of information. However, there are few suitable systems available. This paper presents a few approaches that enable large-scale information retrieval for the TELLTALE system. TELLTALE is an information retrieval environment that provides full-text search for text corpora that may be garbled by OCR (optical character recognition) or transmission errors, and that may contain multiple languages. It can find similar documents against a 1 kB query from 1 GB of text data in 45 seconds. This remarkable performance is achieved by integrating new data structures and gamma compression into the TELLTALE framework. This paper also compares several different types of query methods such as tf.idf and incremental similarity to the original technique of centroid subtraction. The new similarity techniques give better performance but less accuracy.

101 citations


Proceedings ArticleDOI
Wei Qi1, Lie Gu, Hao Jiang1, Xiang-Rong Chen, Hong-Jiang Zhang 
10 Sep 2000
TL;DR: Two advanced video browsers for home users are developed: intelligent highlight player and HTML-based video browser that perform automated categorization of news stories based on the texts obtained from close caption or video OCR process.
Abstract: We present a system developed for content-based broadcast news video browsing for home users. There are three main factors that distinguish our work from other similar ones. First, we have integrated the image and audio analysis results in identifying news segments. Second, we use the video OCR technology to detect text from frames, which provides a good source of textual information for story classification when transcripts and close captions are not available. Finally, natural language processing (NLP) technologies are used to perform automated categorization of news stories based on the texts obtained from close caption or video OCR process. Based on these video structure and content analysis technologies, we have developed two advanced video browsers for home users: intelligent highlight player and HTML-based video browser.

Journal ArticleDOI
TL;DR: By recovering a drawing order of a handwritten script, the temporal information can be recovered from a static 2D image and this method will be used as a bridge from the offline handwriting character recognition problem to the online one.
Abstract: Describes a method to recover a drawing order of a handwritten script from a static 2D image. The script should be written in a single stroke and may include double-traced lines. After the script is scanned in and preprocessed, we apply our recovery method which consists of two phases. In the first phase, we globally analyze the graph constructed from the skeletal image and label the graph by determining the types of each edge. In the second phase, we trace the graph from the start vertex to the end vertex using the labeling information. This method does not enumerate the possible cases, for example, by solving the traveling salesman problem and, therefore, does not cause a combinatorial explosion even if the script is very complex. By recovering a drawing order of a handwritten script, the temporal information can be recovered from a static 2D image. Hence, this method will be used as a bridge from the offline handwriting character recognition problem to the online one.

Proceedings ArticleDOI
29 Apr 2000
TL;DR: The effects of word error rate from ASR and OCR, performance as a function of the amount of training data, and for speech, the effect of out-of-vocabulary errors and the loss of punctuation and mixed case are explored.
Abstract: In this paper, we analyze the performance of name finding in the context of a variety of automatic speech recognition (ASR) systems and in the context of one optical character recognition (OCR) system. We explore the effects of word error rate from ASR and OCR, performance as a function of the amount of training data, and for speech, the effect of out-of-vocabulary errors and the loss of punctuation and mixed case

Journal ArticleDOI
TL;DR: The hierarchical OCR dynamically adapts to factors such as the quality of the input pattern, its intrinsic similarities and differences from patterns of other classes it is being compared against, and the processing time available, which leads to optimal use of computational resources.
Abstract: This paper describes hierarchical OCR, a character recognition methodology that achieves high speed and accuracy by using a multiresolution and hierarchical feature space. Features at different resolutions, from coarse to fine-grained, are implemented by means of a recursive classification scheme. Typically, recognizers have to balance the use of features at many resolutions (which yields a high accuracy), with the burden on computational resources in terms of storage space and processing time. We present in this paper, a method that adaptively determines the degree of resolution necessary in order to classify an input pattern. This leads to optimal use of computational resources. The hierarchical OCR dynamically adapts to factors such as the quality of the input pattern, its intrinsic similarities and differences from patterns of other classes it is being compared against, and the processing time available. Furthermore, the finer resolution is accorded to only certain "zones" of the input pattern which are deemed important given the classes that are being discriminated. Experimental results support the methodology presented. When tested on standard NIST data sets, the hierarchical OCR proves to be 300 times faster than a traditional K-nearest-neighbor classification method, and 10 times taster than a neural network method. The comparison uses the same feature set for all methods. Recognition rate of about 96 percent is achieved by the hierarchical OCR. This is at par with the other two traditional methods.

Journal ArticleDOI
TL;DR: A procedure based on clustering in color space followed by a connected-components analysis that seems promising for locating text in Web images and techniques using polynomial surface fitting and “fuzzy” n-tuple classifiers are described.
Abstract: The explosive growth of the World Wide Web has resulted in a distributed database consisting of hundreds of millions of documents. While existing search engines index a page based on the text that is readily extracted from its HTML encoding, an increasing amount of the information on the Web is embedded in images. This situation presents a new and exciting challenge for the fields of document analysis and information retrieval, as WWW image text is typically rendered in color and at very low spatial resolutions. In this paper, we survey the results of several years of our work in the area. For the problem of locating text in Web images, we describe a procedure based on clustering in color space followed by a connected-components analysis that seems promising. For character recognition, we discuss techniques using polynomial surface fitting and “fuzzy” n-tuple classifiers. Also presented are the results of several experiments that demonstrate where our methods perform well and where more work needs to be done. We conclude with a discussion of topics for further research.

Patent
18 Dec 2000
TL;DR: In this article, a scaling factor is calculated to match a typeface rendering of a word to the width of the word in the originally scanned image, and a cluster analysis is performed to identify close clusters of scaling factors for a type face, indicative of a good typeface fit at a constant scaling factor.
Abstract: Following scanning of a document image, and optical character recognition (OCR) processing, the outputted OCR text is processed to determine a text format (typeface and font size) to match the OCR text to the originally scanned image. The text format is identified by matching word sizes rather than individual character sizes. In particular, for each word and for each of a plurality of candidate typefaces, a scaling factor is calculated to match a typeface rendering of the word to the width of the word in the originally scanned image. After all of the scaling factors have been calculated, a cluster analysis is performed to identify close clusters of scaling factors for a typeface, indicative of a good typeface fit at a constant scaling factor (font size).

Proceedings ArticleDOI
03 Sep 2000
TL;DR: An update of the system for detection and extraction of an unconstrained variety of text from general purpose video and takes advantage of the temporal redundancy in video resulting in good text segmentation is presented.
Abstract: Despite advances in the archiving of digital video, we are still unable to efficiently search and retrieve the portions that interest us. Video indexing by shot segmentation has been a proposed solution and several research efforts are seen in the literature. Shot segmentation alone cannot solve the problem of content based access to video. Recognition of text in video has been proposed as an additional feature. Several research efforts are found in the literature for text extraction from complex images and video with applications for video indexing. We present an update of our system for detection and extraction of an unconstrained variety of text from general purpose video. The text detection results from a variety of methods are fused and each single text instance is segmented to enable it for OCR. Problems in segmenting text from video are similar to those faced in detection and localization phases. Video has low resolution and the text often has poor contrast with a changing background. The proposed system applies a variety of methods and takes advantage of the temporal redundancy in video resulting in good text segmentation.

Patent
28 Dec 2000
TL;DR: In this article, a digital video or still camera including optical character recognition and translator functions is used to translate text included in captured images, where the user can identify the desired text with the image.
Abstract: A digital video or still camera including optical character recognition and translator functions to translate text included in captured images. Cursor control allows the user to identify the desired text with the image. Translation is also possible on text included in images replayed from the camera's memory.

Patent
06 Mar 2000
TL;DR: In this paper, an attribute recognition program such as an optical character recognition (OCR) program is used on the scanned product label which generates text strings from alphanumeric label information and graphics maps/images from graphics/logos.
Abstract: The present invention provides a system, method and apparatus for identifying a product through reading of the product label by a retail terminal. The product/product label is scanned by an imager of a retail terminal. An attribute recognition program such as an optical character recognition (OCR) program is used on the scanned product label which generates text strings from alphanumeric label information and graphics maps/images from graphics/logos. Text strings and/or graphics data are then compared to various text strings and graphics data in a database or look-up table to return information relative to the scanned text string(s)/graphic(s). In one form, kiosks, incorporating an imager and the necessary hardware and software to scan a product label and process the scanned information in accordance with the present principles, may provide printouts of product information, instructions, order forms or the like for the scanned product. Additionally, standard queries or user-generated queries may be answered relative to the scanned product label. Data, stored either locally or at a remote site accessible via a network or the like, is correlated to a plurality of text strings/graphics that correspond to alphanumeric text/graphics on a plurality of product labels.

Proceedings ArticleDOI
21 Dec 2000
TL;DR: The Medical Article Record System (MARS) as discussed by the authors employs document image analysis and understanding techniques and optical character recognition (OCR) to produce bibliographic records for its MEDLINER database.
Abstract: The National Library of Medicine (NLM) is developing an automated system to produce bibliographic records for its MEDLINER database. This system, named Medical Article Record System (MARS), employs document image analysis and understanding techniques and optical character recognition (OCR). This paper describes a key module in MARS called the Automated Labeling (AL) module, which labels all zones of interest (title, author, affiliation, and abstract) automatically. The AL algorithm is based on 120 rules that are derived from an analysis of journal page layouts and features extracted from OCR output. Experiments carried out on more than 11,000 articles in over 1,000 biomedical journals show the accuracy of this rule-based algorithm to exceed 96%.© (2000) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

Proceedings ArticleDOI
01 Jan 2000
TL;DR: Two complementary methods are proposed for characterizing the spatial structure of digitized technical documents and labelling various logical components without using optical character recognition.
Abstract: Two complementary methods are proposed for characterizing the spatial structure of digitized technical documents and labelling various logical components without using optical character recognition The top-down method segments and labels the page image simultaneously using publication-specific information in the form of a page-grammar The bottom-up method naively segments the document into rectangles that contain individual connected components, combines blocks using knowledge about generic layout objects, and identifies logical objects using publication-specific knowledge Both methods are based on the X-Y tree representation of a page image The procedures are demonstrated on scanned and synthesized bit-maps of the title pages of technical articles

Proceedings ArticleDOI
01 Sep 2000
TL;DR: This research differs from earlier attempts to apply cipher decoding to OCR in using real data; a more appropriate clustering algorithm; and decoding a many-to-many instead of a one- to-one mapping between clusters and letters.
Abstract: We present a document-specific OCR system and apply it to a corpus of fixed business letters. Unsupervised classification of the segmented character bitmaps on each page, using a "clump" metric, typically yields several hundred clusters with highly skewed populations. Letter identities are assigned to each cluster by maximizing matches with a lexicon of English words. We found that for 2/3 of the pages, we can identify almost 80% of the words included in the lexicon, without any shape training. Residual errors are caused by mis-segmentation including missed lines and punctuation. This research differs from earlier attempts to apply cipher decoding to OCR in: (1) using real data; (2) a more appropriate clustering algorithm; and (3) decoding a many-to-many instead of a one-to-one mapping between clusters and letters.

Patent
11 Jul 2000
TL;DR: In this article, a document image that is the source of Optical Character Recognition (OCR) output is displayed, and recognition confidence parameters are determined for regions of the document image corresponding to words in the OCR output.
Abstract: A document image that is the source of Optical Character Recognition (OCR) output is displayed. Recognition confidence parameters are determined for regions of the document image corresponding to words in the OCR output. The regions are displayed in a manner (e.g., highlighted in various colors) that is indicative of the respective recognition confidence parameter. Preferably, a user can select a region of the displayed document image. When the region is selected, text of the OCR output corresponding to the selected region is displayed in a pop-up menu.

Proceedings Article
01 Jan 2000
TL;DR: A method based on statistical properties of local image neighbourhoods for the location of text in real-scene images, which has applications in robot vision, and desktop and wearable computing, and the possibility of recovery of the text for optical character recognition.
Abstract: We present a method based on statistical properties of local image neighbourhoods for the location of text in real-scene images. This has applications in robot vision, and desktop and wearable computing. The statistical measures we describe extract properties of the image which characterise text, invariant to a large degree to the orientation, scale or colour of the text in the scene. The measures are employed by a neural network to classify regions of an image as text or non-text. We thus avoid the use of different thresholds for the various situations we expect, including when text is too small to read, or when the text plane is not fronto-parallel to the camera. We briefly discuss applications and the possibility of recovery of the text for optical character recognition.

Proceedings Article
01 Oct 2000
TL;DR: An adaptive optical music recognition system is being developed as part of an experiment in creating a comprehensive framework of tools to manage the workflow of large-scale digitization projects, and will support the path from physical object and/or digitized material into a digital library repository, and offer effective tools for incorporating metadata and perusing the content of the resulting multimedia objects.
Abstract: An adaptive optical music recognition system is being developed as part of an experiment in creating a comprehensive framework of tools to manage the workflow of large-scale digitization projects. This framework will support the path from physical object and/or digitized material into a digital library repository, and offer effective tools for incorporating metadata and perusing the content of the resulting multimedia objects. The project involves digitization of the Lester S. Levy Collection of Sheet Music (Milton S. Eisenhower Library, Johns Hopkins University). In Phase One, images of the music and lyrics, and color images of the covers of the Levy Collection were digitized and a database of text index records was created. Phase Two consists of converting the digitized music to computer-readable music notation format along with full-text lyrics, generating sound renditions, and creating metadata to enhance search capabilities. During Phase One, the researchers at the Eisenhower Library created a database of text index records, images of the music and lyrics and color images of the cover sheets from the Levy Collection. This database is available to the general public at http://levysheetmusic.mse.jhu.edu. Currently, the Collection can be searched in three modes. First, users can search by subject, a keyword search on the text record. Each of the pieces has been indexed for the subject of the song and/or cover image. Users may also browse the Collection by the topical arrangement of the physical collection. In Phase Two, an adaptive optical music recognition (AOMR) software (Fujinaga 1997) is used to convert the TIFF image of scanned sheet music into computer readable-formats, which includes GUIDO and MIDI files along with full-text of the lyrics. These digital objects will be deposited into the data repository along with the scanned sheet music TIFF, JPEG and thumbnail, and associated metadata. The AOMR software offers five important advantages over similar commercial offerings. First, it can be run in batch processing mode, an essential feature for the Levy Collection given its large number of music sheets. It is important to note that most commercial software is intended for the casual user and does not scale for a large number of objects. Second, the software is written in C and therefore is portable across platforms. Third, the software can “learn” to recognize different music symbols—an issue considering the diversity of the Levy Collection and the universe of notated music, in general. Fourth, the software is open-sourced. Finally, this software can separate full-text lyrics that can be further processed using optical character recognition (OCR) technology. The AOMR process is divided into two major sections: symbol classification and musical semantic interpretation. The first step in the interpretation phase is to connect all inter-related symbols. In addition, many rhythmic errors can also be corrected by adjusting the metric placement of notes relative to their vertical alignment with notes in other parts. An interactive graphic editor suitable to be interfaced with the AOMR program is being developed jointly with the group working on the GUIDO editor (Renz 2000). The purpose of this editor is to correct any errors generated by the AOMR so that the corrected version then can be converted to GUIDO format. To enable powerful search and retrieval as well as user-friendly navigational mechanism, Phase Two of the Levy Project will include a strong metadata component. Commonly defined as “data about data,” metadata is structured representational information. The kinds of metadata important for Levy include descriptive (to enable searching, browsing and identification of items), structural (to enable the creation of an interface for optimum browsing and navigation), and administrative (to manage the digital components of the collection and aid users in identification of items). To further enhance the scholarly value of the Levy Collection, a web interface will be developed for a music research toolkit, for example, Humdrum (Huron 1997). These toolkits are software tools intended to assist in music research and are suitable for use in a wide variety of computer-based musical investigations, such as motivic, stylistic, and melodic analysis and concordance studies. We also propose to extend plans for developing automated means of mining authoritative name information and creating even richer name indexes. The entire project is an experiment in developing a comprehensive framework of tools to manage the workflow of large-scale digitization projects. This framework will support the path from physical object and/or digitized material into a digital library repository, and offer effective tools for incorporating metadata and perusing the content of the resulting multimedia objects. The Levy Collection, with its large size and availability in digital format, is an ideal subject for development and evaluation of this proposed framework.

Patent
15 Sep 2000
TL;DR: In this article, a system and method for indexing and searching textual archives using semantic units such as syllables and morphemes is presented, where the string of semantic units that result from a decoding process are stored in a semantic unit database and indexed with pointers to the corresponding textual data in the textual archive.
Abstract: A system and method for indexing and searching textual archives using semantic units such as syllables and morphemes. In one aspect, a system for indexing a textual archive comprises an AHR (automatic handwriting recognition) system and/or OCR (optical character recognition) system for transcribing (decoding) textual input data (handwritten or typed text) into a string of semantic units (e.g., syllables or morphemes) using a statistical language model and vocabulary based on semantic units (such as syllables or morphemes). The string of semantic units that result from a decoding process are stored in a semantic unit database and indexed with pointers to the corresponding textual data in the textual archive. In another aspect, a system for searching a textual archive is provided, wherein a word (or words) to be searched is rendered into a string of semantic units (e.g., syllables or morphemes) depending on the application. A search engine then compares the string of semantic units (resulting from the input query) against the decoded semantic unit database, and then identifies textual data stored in the textual archive using the indexes that were generated during a semantic unit-based indexing process.

Proceedings ArticleDOI
01 Sep 2000
TL;DR: Stochastic error-correcting parsing is proposed as a powerful and flexible method to post-process the results of an optical character recognizer (OCR).
Abstract: In this paper, stochastic error-correcting parsing is proposed as a powerful and flexible method to post-process the results of an optical character recognizer (OCR). Deterministic and nondeterministic approaches are possible under the proposed setting. The basic units of the model can be words or complete sentences, and the lexicons or the language databases can be simple enumerations or may convey probabilistic information from the application domain.

Proceedings ArticleDOI
01 Sep 2000
TL;DR: A video text detection system based on automated neural network training that can detect both graphical text and scene text located in complex backgrounds, can detect text in any orientation, and can perform multilingual text detection.
Abstract: In this paper we present a video text detection system based on automated neural network training. Compared with previous work which detects only graphical text with fixed parameters, our system (1) provides a training mechanism so the parameters of the system can be adapted to changing environments, (2) can detect both graphical text and scene text located in complex backgrounds, (3) can detect text in any orientation and (4) can perform multilingual text detection. Experiments show the effectiveness of our system in various text detection tasks.

Proceedings ArticleDOI
01 Sep 2000
TL;DR: The nature of the problem, state of the art of handwriting recognition at the turn of the new millennium, and the results of CENPARMI researchers in automatic recognition of handwritten digits, touching numerals, cursive scripts, and dates formed by a mixture of the former 3 categories are summarized.
Abstract: The last frontiers of handwriting recognition are considered to have started in the last decade of the second millennium. The paper summarizes (a) the nature of the problem of handwriting recognition, (b) the state of the art of handwriting recognition at the turn of the new millennium, and (c) the results of CENPARMI researchers in automatic recognition of handwritten digits, touching numerals, cursive scripts, and dates formed by a mixture of the former 3 categories. Wherever possible, comparable results have been tabulated according to techniques used, databases, and performance. Aspects related to human generation and perception of handwriting are discussed. The extraction and usage of human knowledge, and their incorporation into handwriting recognition systems are presented. Challenges, aims, trends, efforts and possible rewards, and suggestions for future investigations are also included.

Proceedings ArticleDOI
01 Sep 2000
TL;DR: A gray-scale character recognition method for video indexing that directly extracts Gabor features (called Gabor jets) from video contents and provides robustness under character deformation caused by variation of font types or imprecise segmentation.
Abstract: We propose a gray-scale character recognition method for video indexing. It is robust against the problems of binarization against a complex background and low resolution. Unlike a traditional character recognition scheme through image binarization, we directly extract Gabor features (called Gabor jets) from video contents. The use of the Gabor filters contributes to freeing a tricky binarization process for cluttered images, and furthermore provides localized directional edge features, which have phase-shift invariance to edge positions. To form a feature vector to be classified, we accumulate the extracted Gabor features along projection lines in local regions, and then categorize them with a standard LVQ classifier. The projective accumulation provides robustness under character deformation caused by variation of font types or imprecise segmentation. We compare the proposed method by experiments with a typical OCR method, for which correct binarization is advantageously given. The proposed method shows similar or superior performance to the other method in understanding video captions.