scispace - formally typeset
Search or ask a question

Showing papers on "Intelligent word recognition published in 1997"


Book
02 May 1997
TL;DR: Arabic character recognition, A. Amin automatic reading of braille documents, and Antonacopoulos techniques for improving OCR results.
Abstract: Arabic character recognition, A. Amin automatic reading of braille documents, A. Antonacopoulos techniques for improving OCR results, A. Dengel offline handwritten word recognition using hidden Markov models, A. Kundu combinations of multiple classifier decisions for OCR, L. Lam and C.Y. Suen classification techniques - statistical pattern recognition, neural networks and their relations, J. Schurmann cursive handwriting recognition - contextual and context - free techniques, M. Shridhar and Kimura multilingual document recognition, L. Spitz information retrieval and OCR, K. Taghva technical drawing analysis - including vectorization, D. Dori and K. Tombre reading of music notation, N. Carter and D. Bainbridge benchmarking, T. Nartker et al automatic signature verification, S. Impedovo. (Part Contents).

387 citations


Journal ArticleDOI
TL;DR: Experimental results prove that the approach using the variable duration outperforms the method using fixed duration in terms of both accuracy and speed.
Abstract: A fast method of handwritten word recognition suitable for real time applications is presented in this paper. Preprocessing, segmentation and feature extraction are implemented using a chain code representation of the word contour. Dynamic matching between characters of a lexicon entry and segment(s) of the input word image is used to rank the lexicon entries in order of best match. Variable duration for each character is defined and used during the matching. Experimental results prove that our approach using the variable duration outperforms the method using fixed duration in terms of both accuracy and speed. Speed of the entire recognition process is about 200 msec on a single SPARC-10 platform and the recognition accuracy is 96.8 percent are achieved for lexicon size of 10, on a database of postal words captured at 212 dpi.

286 citations


Journal ArticleDOI
01 Feb 1997
TL;DR: An off-line handwritten word recognition system that assigns confidence that pairs of segments are compatible with character confidence assignments and that this confidence is integrated into the dynamic programming is described.
Abstract: An off-line handwritten word recognition system is described. Images of handwritten words are matched to lexicons of candidate strings. A word image is segmented into primitives. The best match between sequences of unions of primitives and a lexicon string is found using dynamic programming. Neural networks assign match scores between characters and segments. Two particularly unique features are that neural networks assign confidence that pairs of segments are compatible with character confidence assignments and that this confidence is integrated into the dynamic programming. Experimental results are provided on data from the U.S. Postal Service.

151 citations


Patent
Shinji Wakisaka1, Kazuyoshi Ishiwatari1, Kouji Ito1, Tetsuji Toge1, Makoto Tanaka1 
TL;DR: In this paper, a dictionary change-over section for making a changeover between dictionaries to be subjected to speech recognition in accordance with dictionary changeover information, a first memory for storing a plurality of dictionaries, a second record for storing one dictionary made an object of recognition, and a speech recognition section for performing speech recognition processing, whereby speech recognition is performed while making the changeover, as required.
Abstract: A speech recognition system realizing large-vocabulary speech recognition at a low cost without deteriorating the rate of recognition and a recognition speed performance is provided with a dictionary change-over section for making a change-over between dictionaries to be subjected to speech recognition in accordance with dictionary change-over information, a first memory for storing a plurality of dictionaries, a second memory for storing one dictionary made an object of recognition, and a speech recognition section for performing a speech recognition processing, whereby speech recognition is performed while making a change-over between dictionaries, as required. For example, in a car navigation speech recognition system, the change-over between dictionaries is made for each area in accordance with position information.

83 citations


Patent
13 Jan 1997
TL;DR: In an optical character recognition (OCR) system an improved method and apparatus for recognizing the character and producing an indication of the confidence with which the character has been recognized as mentioned in this paper.
Abstract: In an optical character recognition (OCR) system an improved method and apparatus for recognizing the character and producing an indication of the confidence with which the character has been recognized. The system employs a plurality of different OCR devices each of which outputs a indicated (or recognized) character along with the individual devices own determination of how confident it is in the indication. The OCR system uses that data output from each of the different OCR devices along with other attributes of the indicated character such as the relative accuracy of the particular OCR device indicating the character to choose the select character recognized by the system and to produce a combined confidence indication of how confident the system is in its recognition.

80 citations


Journal ArticleDOI
TL;DR: In this paper, an approach to on-line handwritten alphanumeric character recognition based on sequential handwriting signals is presented and the issue of reference (or template) set evolution is also addressed.

78 citations


Patent
Tanveer Syeda-Mahmood1
29 Sep 1997
TL;DR: In this paper, a method and system of recognizing handwritten words in scanned documents is presented, wherein by processing a document containing handwriting, features for word localization are extracted from handwritten words contained in said document through basis points taken from a single curve of text lines.
Abstract: A method and system of recognizing handwritten words in scanned documents, wherein by processing a document containing handwriting, features for word localization are extracted from handwritten words contained in said document through basis points taken from a single curve of text lines. The method is independent of page orientation, and does not assume that the individual lines of handwritten text are parallel, and the method does not require that word regions be aligned with text line orientation wherein intra-word statistics are derived from sample pages rather than using a fixed threshold. The method has applications in digital libraries, handwriting tokenization, document management and OCR systems.

78 citations


Proceedings ArticleDOI
18 Aug 1997
TL;DR: Font recognition and contextual processing are developed as two components that enhance the recognition accuracy of a text recognition system presented in a previous paper.
Abstract: Font recognition and contextual processing are developed as two components that enhance the recognition accuracy of a text recognition system presented in a previous paper ((H. Shi and T. Pavlidis, 1996). Font information is extracted from two sources: one is the global page properties, and the other is the graph matching result of recognized short words such as a, it and of etc. Contextual processing is done by first composing word candidates from the recognition results and then checking each candidate with a dictionary through a spelling checker. Positional binary trigrams and word affixes are used to prune the search for word candidates.

64 citations


Patent
Randy G. Goldberg1
11 Aug 1997
TL;DR: In this article, a method and apparatus for correcting misrecognized words appearing in electronic documents that have been generated by scanning an original document in accordance with an optical character recognition ("OCR") technique is presented.
Abstract: A method and apparatus for correcting misrecognized words appearing in electronic documents that have been generated by scanning an original document in accordance with an optical character recognition ("OCR") technique. If an incorrect word is found in the electronic document, the present invention generates at least one reference word and selects the reference word that is the most likely correct replacement for the incorrect word. This selection is accomplished by performing a probabilistic determination that assigns to each reference word a replacement word recognition probability. The probabilistic determination is carried out on the basis of a pre-stored confusion matrix that stores a plurality of probability values. The confusion matrix is used to associate each character of recognized word in the electronic document with a corresponding character of a word in the original document on the basis of these probability values.

63 citations


Proceedings ArticleDOI
18 Aug 1997
TL;DR: A database of on-line handwritten character patterns sampled in a sequence of sentences without any instructions is presented, describing the characteristics of this database as well as several tools to collect patterns.
Abstract: The paper presents a database of on-line handwritten character patterns sampled in a sequence of sentences without any instructions. The sentences according to which character patterns are collected have been picked up from newspaper to include 1227 frequently appearing character categories with the result that they are composed of about 10000 characters and include 1537 JIS 1st level character categories. The rest of the JIS 1st level 1808 categories have been added at the end of the above text and written one by one. The total text has been commonly employed for collecting script patterns from a number of people. Patterns offered were inspected and omissions and wrong patterns were rewritten. The authors collected data from 80 people and made the 12000/spl times/80 patterns available from February 1996. More patterns are being collected. The paper describes the characteristics of this database as well as several tools to collect patterns.

50 citations


Patent
Randy G. Goldberg1
11 Aug 1997
TL;DR: In this paper, a method and apparatus for correcting misrecognized words appearing in electronic documents that have been generated by scanning an original document in accordance with an optical character recognition (OCR) technique is presented.
Abstract: A method and apparatus for correcting misrecognized words appearing in electronic documents that have been generated by scanning an original document in accordance with an optical character recognition (“OCR”) technique. Each recognized word is generated by first producing, for each character position of the corresponding word in the original document, the N-best characters for occupying that character position. If an incorrect word is found in the electronic document, the present invention generates a plurality of reference words from which one is selected for replacing the incorrect word. This selected reference word is determined by the present invention to be the reference word that is the most likely correct replacement for the incorrect recognized word. This selection is accomplished by computing for each reference word a replacement word value. The reference word that is selected to replace the incorrect recognized word corresponds to the highest replacement word value.

Journal ArticleDOI
01 Oct 1997
TL;DR: A new off-line word recognition system that is able to recognize unconstrained handwritten words using grey-scale images based on structural and relational information in the handwritten word is presented.
Abstract: In this paper, we present a new off-line word recognition system that is able to recognize unconstrained handwritten words using grey-scale images. This is based on structural and relational information in the handwritten word. We use Gabor filters to extract features from the words, and then use an evidence-based approach for word classification. A solution to the Gabor filter parameter estimation problem is given, enabling the Gabor filter to be automatically tuned to the word image properties. We also developed two new methods for correcting the slope of the handwritten words. Our experiments show that the proposed method achieves good recognition rates compared to standard classification methods.


Proceedings ArticleDOI
18 Aug 1997
TL;DR: Experimental results show that when the characters are segmented from words and are randomly presented, the accuracy of the machine recognition is comparable with the average human recognition accuracy.
Abstract: Handwritten character recognition by human readers, a statistical classifier, and a neural network is compared to know the required accuracy for handwritten word recognition. Sample characters extracted from postal address words on mail pieces collected by USPS were used to evaluate human and machine performance. Experimental results show that: 1) when the characters are segmented from words and are randomly presented, the accuracy of the machine recognition is comparable with the average human recognition accuracy, 2) the neural network employing the feature vector of size 64 outperforms the statistical classifier employing the same feature vector, and that 3) the statistical classifier employing the feature vector of size 400 achieves comparable recognition rate with the best human reader.

Proceedings ArticleDOI
18 Aug 1997
TL;DR: A method for the recovery of the stroke order from static handwritten images is presented, tested by classifying the words of an off-line database with a state-of-the-art on-line recognition system.
Abstract: On-line recognition differs from off-line recognition in that additional information about the drawing order of the strokes is available. This temporal information makes it easier to recognize handwritten texts with an on-line recognition system. In this paper we present a method for the recovery of the stroke order from static handwritten images. The algorithm was tested by classifying the words of an off-line database with a state-of-the-art on-line recognition system. On this database with 150 different words, written by four cooperative writers, a recognition rate of 97.4% was obtained.

Journal ArticleDOI
TL;DR: Two methods of combining character recognition with techniques for retrieving Japanese documents are presented and it is shown how these methods can be applied to textual image retrieval.

Patent
18 Feb 1997
TL;DR: For a plurality of handwritten characters extracted from an input image, a character category for each character is first determined by a character recognition process as discussed by the authors, and according to a clustering process, similarity levels of character-forms among extracted characters are determined, and based on the determination result, the character category determination result from the first character classification process is modified.
Abstract: For a plurality of handwritten characters extracted from an input image, a character category for each character is first determined by a character recognition process. Second, according to a clustering process, similarity levels of character-forms among extracted characters are determined, and based on the determination result, the character category determination result from the first character recognition process is modified.

Proceedings ArticleDOI
18 Aug 1997
TL;DR: A Bayesian method of isolating character bitmaps from paragraph-length samples of heavily degraded text images is demonstrated and is sufficiently robust to tolerate errors in transcripts obtained from multifont commercial OCR software.
Abstract: A Bayesian method of isolating character bitmaps from paragraph-length samples of heavily degraded text images is demonstrated. The method requires a transcript of the text, but it is sufficiently robust to tolerate errors in transcripts obtained from multifont commercial OCR software. The resulting prototypes (labeled character images) are used to recognize additional text an the same document.

Proceedings Article
01 Jan 1997
TL;DR: This paper discusses models of confusion which may be used in the identification of confused words, shows how significant contexts may be identified and condensed into Differential Grammars, and compares the performance of the implementa t ion with two commercial checkers which purpor t to handle the confused word problem.
Abstract: We examine the Differential Grammar , a representat ion designed to discr iminate which of a set of eonfusable al ternat ives is most likely in the context it occurs in. This approach is useful whereever uncer ta inty may exist about the ident i ty of a token or sequence of tokens, including in speech recognition, optical character recognition and machine t ransla t ion. In this paper our appl ica t ion is word processing: we discuss mul t ip le models of confusion which may be used in the identification of confused words, we show how significant contexts may be identified and condensed into Differential Grammars , and we contrast the performance of our implementa t ion with tha t of two commercial g r ammar checkers which purpor t to handle the confused word problem.

Proceedings ArticleDOI
18 Aug 1997
TL;DR: New moment features for Chinese character recognition are proposed that provide significant improvements in terms of Chinese character Recognition, especially for those characters that are very close in shapes.
Abstract: Moment descriptors have been developed as features in pattern recognition since the moment method was first introduced. In this paper, new moment features for Chinese character recognition are proposed. These provide significant improvements in terms of Chinese character recognition, especially for those characters that are very close in shapes.


Book ChapterDOI
08 Oct 1997
TL;DR: The objective of this work is the application of the spatio-temporal multilayer perceptron (ST-MLP) developed in the laboratory to the recognition of on-line handwritten characters.
Abstract: The objective of this work is the application of the spatio-temporal multilayer perceptron (ST-MLP) developed in our laboratory to the recognition of on-line handwritten characters. The ST-MLP integrates a spatio-temporal data coding defined in the complex domain. Starting from the stroke of a character produced by a digitizing tablet, we conduct the recognition process in two steps. This procedure which is classic in this domain, consist of a preprocessing step and a recognition one. The first step (segmentation step), identifies some elementary (basic) lines, called primitives, from the stroke of the character. Then we utilise the ST-MLP to recognize the traced character from the primitives provided.

Journal ArticleDOI
TL;DR: A new method for clustering the words in a dictionary into word groups so that a Chinese character recognition system can then use these groups in a language model to improve the recognition accuracy.

Proceedings ArticleDOI
18 Aug 1997
TL;DR: The issues of determination of upper and lower contours of the word, determination of significant focal extrema on the contour, and determination of reference lines from contour representations of handwritten words are discussed.
Abstract: The one-dimensional nature of contour representations presents interesting challenges for processing of images for handwritten word recognition. In this paper, we discuss the issues of determination of upper and lower contours of the word, determination of significant focal extrema on the contour, and determination of reference lines from contour representations of handwritten words.

Patent
Satomi Sakai1, Kengo Osawa1
16 Jan 1997
TL;DR: In an area of a tablet where a handwritten character is written for entry, the result of the recognition of handwritten characters is displayed by replacing the handwritten character written therein; at the same time, the recognition result is also displayed in a display field that can display more characters than can be shown in the handwritten characters entry area at one time as discussed by the authors.
Abstract: In an area of a tablet where a handwritten character is written for entry, the result of the recognition of the handwritten character is displayed by replacing the handwritten character written therein; at the same time, the recognition result is also displayed in a display field that can display more characters than can be shown in the handwritten character entry area at one time

Proceedings ArticleDOI
18 Aug 1997
TL;DR: A new method to extract crossing line features for off-line handwritten Chinese character recognition is proposed, in which the input pattern is nonlinearly normalized in order to compensate for shape variations.
Abstract: A new method to extract crossing line features for off-line handwritten Chinese character recognition is proposed in this paper. Firstly, the input pattern is nonlinearly normalized in order to compensate for shape variations. Secondly, the normalized pattern is separated into four subpatterns according to the four kinds of elementary strokes. Thirdly, the four subpatterns are uniformly divided into M/spl times/M cells respectively. In every cell, the crossing lines are counted. Then a 4M/sup 2/-dimensional feature vector is generated. An off-line handwritten Chinese character recognition system is built based on this feature. Our experiments have demonstrated the effectiveness of the method proposed in this paper.

Proceedings ArticleDOI
18 Aug 1997
TL;DR: In this paper, a character decomposition approach based on deformable templates (DTs) has been used to extract radical sub-images from Chinese characters and feed the extracted radical images to an adopted structural based Chinese character recognizer whose outputs are then combined to produce the class label of the input character.
Abstract: Despite the fact that Chinese characters are composed of radicals and that Chinese people usually formulate their knowledge of Chinese characters as a combination of radicals, very few studies have focused on a character decomposition approach to recognition, i.e., recognizing a character by first extracting and recognizing its radicals. Such an approach is adopted and the problem of how to extract radical sub-images from character images is particularly addressed. A radical extraction algorithm based on deformable templates (DTs) has been developed. The advantage of the character decomposition approach is demonstrated by feeding the extracted radical images to an adopted structural based Chinese character recognizer whose outputs are then combined to produce the class label of the input character. Simulation results show that the performance of the adopted Chinese character recognition system can be improved significantly when the character decomposition approach is used.


Proceedings ArticleDOI
12 Oct 1997
TL;DR: A method for the recognition of handwritten Hindi numerals is proposed based on structural descriptors of numeral shapes, which proves the tolerance of the proposed system to recognize a high variability ofnumeral shapes.
Abstract: A method for the recognition of handwritten Hindi numerals is proposed based on structural descriptors of numeral shapes. The method consists of three major steps: 1) preprocessing, where a handwritten numeral is scanned, normalized, and then thinned; 2) a robust algorithm is developed to segment the scanned numeral image into stroke(s), based on feature points; and 3) identify cavity features. The output of this algorithm is a syntactic representation (that is one or more syntactic terms) of the scanned numeral. Finally, the syntactic representation is matched against a set of syntactic representation prototypes of handwritten numerals and the recognition result is reported. Early experimental results are encouraging and prove the tolerance of the proposed system to recognize a high variability of numeral shapes.

Journal ArticleDOI
TL;DR: Experimental results demonstrate the ability of the proposed algorithm to correctly recognize words in the presence of noise that could not be overcome by conventional character recognition or post-processing algorithms.