scispace - formally typeset
Search or ask a question

Showing papers in "International Journal on Document Analysis and Recognition in 1999"


Journal ArticleDOI
TL;DR: This work mainly deals with the various methods that were proposed to realize the core of recognition in a word recognition system, and classifies the field into three categories: segmentation-free methods, which compare a sequence of observations derived from a word image with similar references of words in the lexicon; segmentations-based methods, that look for the best match between consecutive sequences of primitive segments and letters of a possible word.
Abstract: We review the field of offline cursive word recognition. We mainly deal with the various methods that were proposed to realize the core of recognition in a word recognition system. These methods are discussed in view of the two most important properties of such a system: the size and nature of the lexicon involved, and whether or not a segmentation stage is present. We classify the field into three categories: segmentation-free methods, which compare a sequence of observations derived from a word image with similar references of words in the lexicon; segmentation-based methods, that look for the best match between consecutive sequences of primitive segments and letters of a possible word; and the perception-oriented approach, that relates to methods that perform a human-like reading technique, in which anchor features found all over the word are used to boot-strap a few candidates for a final evaluation phase.

184 citations


Journal ArticleDOI
TL;DR: Key ideas employed in each functional module, which have been developed for dealing with the diversity of handwriting in its various aspects with a goal of system reliability and robustness, are described in this paper.
Abstract: This paper presents an end-to-end system for reading handwritten page images. Five functional modules included in the system are introduced in this paper: (i) pre-processing, which concerns introducing an image representation for easy manipulation of large page images and image handling procedures using the image representation; (ii) line separation, concerning text line detection and extracting images of lines of text from a page image; (iii) word segmentation, which concerns locating word gaps and isolating words from a line of text image obtained efficiently and in an intelligent manner; (iv) word recognition, concerning handwritten word recognition algorithms; and (v) linguistic post-pro- cessing, which concerns the use of linguistic constraints to intelligently parse and recognize text. Key ideas employed in each functional module, which have been developed for dealing with the diversity of handwriting in its various aspects with a goal of system reliability and robustness, are described in this paper. Preliminary experiments show promising results in terms of speed and accuracy.

143 citations


Journal ArticleDOI
TL;DR: In this article, an automatic method for finding acronyms and their definitions in free text is introduced based on an inexact pattern matching algorithm applied to text surrounding the possible acronym.
Abstract: This paper introduces an automatic method for finding acronyms and their definitions in free text. The method is based on an inexact pattern matching algorithm applied to text surrounding the possible acronym. Evaluation shows both high recall and precision for a set of documents randomly selected from a larger set of full text documents.

116 citations


Journal ArticleDOI
TL;DR: A system for automatically identifying the script used in a handwritten document image is described, developed using a 496- document dataset representing six scripts, eight languages, and 279 writers.
Abstract: A system for automatically identifying the script used in a handwritten document image is described. The system was developed using a 496-document dataset representing six scripts, eight languages, and 279 writers. Documents were characterized by the mean, standard deviation, and skew of five connected component features. A linear discriminant analysis was used to classify new documents, and tested using writer-sensitive cross-validation. Classification accuracy averaged 88% across the six scripts. The same method, applied within the Roman subcorpus, discriminated English and German documents with 85% accuracy.

103 citations


Journal ArticleDOI
TL;DR: A useful method for assessing the quality of a typewritten document image and automatically selecting an optimal restoration method based on that assessment, which reduced the corpus OCR character error rate on a 139-document corpus.
Abstract: We present a useful method for assessing the quality of a typewritten document image and automatically selecting an optimal restoration method based on that assessment. We use five quality measures that assess the severity of background speckle, touching characters, and broken characters. A linear classifier uses these measures to select a restoration method. On a 139-document corpus, our methodology reduced the corpus OCR character error rate from 20.27% to 12.60%.

75 citations


Journal ArticleDOI
TL;DR: A process of word recognition that has high tolerance for poor image quality, tunability to the lexical content of the documents to which it is applied, and high speed of operation, and is shown to be enhanced by the application of an appropriate lexicon.
Abstract: We describe a process of word recognition that has high tolerance for poor image quality, tunability to the lexical content of the documents to which it is applied, and high speed of operation. This process relies on the transformation of text images into character shape codes, and on special lexica that contain information on the shape of words. We rely on the structure of English and the high efficiency of mapping between shape codes and the characters in the words. Remaining ambiguity is reduced by template matching using exemplars derived from surrounding text, taking advantage of the local consistency of font, face and size as well as image quality. This paper describes the effects of lexical content, structure and processing on the performance of a word recognition engine. Word recognition performance is shown to be enhanced by the application of an appropriate lexicon. Recognition speed is shown to be essentially independent of the details of lexical content provided the intersection of the occurrences of words in the document and the lexicon is high. Word recognition accuracy is dependent on both intersection and specificity of the lexicon.

57 citations


Journal ArticleDOI
Guy Lorette1
TL;DR: The use of handwriting invariants, a physical model for a first segmentation, a logical model for segmentation and recognition, a fundamental equation of handwriting, and to integrate several sources of perception and of knowledge are proposed in order to design Handwriting Reading Systems (HRS), which would be more universal systems than is currently the case.
Abstract: During the last forty years, Human Handwriting Processing (HHP) has most often been investigated under the frameworks of character (OCR) and pattern recognition. In recent years considerable progress has been made, and to date HHP can be viewed much more as an automatic Handwriting Reading (HR) task for the machine. In this paper we propose the use of handwriting invariants, a physical model for a first segmentation, a logical model for segmentation and recognition, a fundamental equation of handwriting, and to integrate several sources of perception and of knowledge in order to design Handwriting Reading Systems (HRS), which would be more universal systems than is currently the case. At the dawn of the 3rd millennium, we guess that HHP will be considered more as a perceptual and interpretation task requiring knowledge gained from studies on human language. This paper gives some guidelines and presents examples to design systems able to perceive and interpret, i.e., read, handwriting automatically.

56 citations


Journal ArticleDOI
TL;DR: Tests on synthesized data examine QNN's fuzzy decision boundary with the intention to illustrate its mechanism and characteristics, while studies on real data prove its great potential as a handwritten numeral classifier and the special role it plays in multi-expert systems.
Abstract: This paper describes a new kind of neural network – Quantum Neural Network (QNN) – and its application to the recognition of handwritten numerals. QNN combines the advantages of neural modelling and fuzzy theoretic principles. Novel experiments have been designed for in-depth studies of applying the QNN to both real data and confusing images synthesized by morphing. Tests on synthesized data examine QNN's fuzzy decision boundary with the intention to illustrate its mechanism and characteristics, while studies on real data prove its great potential as a handwritten numeral classifier and the special role it plays in multi-expert systems. An effective decision-fusion system is proposed and a high reliability of 99.10% has been achieved.

49 citations


Journal ArticleDOI
TL;DR: A method is proposed to uncover more detailed information about geometrical features which human readers use in the reading of Western script, and high hit rates on ascenders, descenders, crossings, and points of high curvature in the handwriting pattern are confirmed.
Abstract: This paper first summarizes a number of findings in the human reading of handwriting. A method is proposed to uncover more detailed information about geometrical features which human readers use in the reading of Western script. The results of an earlier experiment on the use of ascender/descender features were used for a second experiment aimed at more detailed features within words. A convenient experimental setup was developed, based on image enhancement by local mouse clicks under time pressure. The readers had to develop a cost-effective strategy to identify the letters in the word. Results revealed a left-to-right strategy in time, however, with extra attention to the initial, leftmost parts and the final, rightmost parts of words in a range of word lengths. The results confirm high hit rates on ascenders, descenders, crossings, and points of high curvature in the handwriting pattern.

42 citations


Journal ArticleDOI
TL;DR: A goal-directed evaluation of the extraction approaches is proposed, and both qualitative and quantitative analyses show noticeable advantages of the proposed approach over the existing approaches.
Abstract: This paper presents a technique for extracting the user-entered information from bankcheck images based on a layout-driven item extraction method. The baselines of checks are detected and eliminated by using gray-level mathematical morphology. A priori information about the positions of data is integrated into a combination of top-down and bottom-up analyses of check images. The handwritten information is extracted by a local thresholding technique and the information lost during baseline elimination is restored by mathematical morphology with dynamic kernels. A goal-directed evaluation of the extraction approaches is proposed, and both qualitative and quantitative analyses show noticeable advantages of the proposed approach over the existing approaches.

37 citations


Journal ArticleDOI
TL;DR: A new method called Extended Linear Segment Linking (ELSL for short), able to extract text lines in arbitrary orientations and curved text lines is proposed, which can produce text line candidates for multiple orientations.
Abstract: In order to enhance the ability of document analysis systems, we need a text line extraction method which can handle not only straight text lines but also text lines in various shapes. This paper proposes a new method called Extended Linear Segment Linking (ELSL for short), which is able to extract text lines in arbitrary orientations and curved text lines. We also consider the existence of both horizontally and vertically printed text lines on the same page. The new method can produce text line candidates for multiple orientations. We verify the ability of the method by some experiments as well.

Journal ArticleDOI
TL;DR: This work presents a more generic solution for skew estimation based on determination of the first eigenvector of the data covariance matrix that comprises image resolution reduction, connected component analysis, component classification using a fuzzy approach, and skew estimation.
Abstract: The existing skew estimation techniques usually assume that the input image is of high resolution and that the detectable angle range is limited. We present a more generic solution for this task that overcomes these restrictions. Our method is based on determination of the first eigenvector of the data covariance matrix. The solution comprises image resolution reduction, connected component analysis, component classification using a fuzzy approach, and skew estimation. Experiments on a large set of various document images and performance comparison with two Hough transform-based methods show a good accuracy and robustness for our method.

Journal ArticleDOI
TL;DR: A two-dimensional stochastic method for the recognition of unconstrained handwritten words in a small lexicon based on an efficient combination of hidden Markov models and causal Markov random fields that operates in a holistic manner, at the pixel level, on scaled binary word images.
Abstract: In this paper we present a two-dimensional stochastic method for the recognition of unconstrained handwritten words in a small lexicon. The method is based on an efficient combination of hidden Markov models (hmms) and causal Markov random fields (mrfs). It operates in a holistic manner, at the pixel level, on scaled binary word images which are assumed to be random field realizations. The state-related random fields act as smooth local estimators of specific writing strokes by merging conditional pixel probabilities along the columns of the image. The hmm component of our model provides an optimal switching mechanism between sets of mrf distributions in order to dynamically adapt to the features encountered during the left-to-right image scan. Experiments performed on a French omni-scriptor, omni-bank database of handwritten legal check amounts provided by the A2iA company are described in great detail.


Journal ArticleDOI
TL;DR: In this paper, different hypertext structures one encounters in a document are studied and methods for analyzing paper documents to find these structures are presented, and the structures also form the basis for the presentation of the content of the document to the user.
Abstract: When archives of paper documents are to be accessed via the Internet, the implicit hypertext structure of the original documents should be employed. In this paper we study the different hypertext structures one encounters in a document. Methods for analyzing paper documents to find these structures are presented. The structures also form the basis for the presentation of the content of the document to the user. Results are presented.

Journal ArticleDOI
TL;DR: Two methods for stroke segmentation from a global point of view are presented and compared, one based on thinning methods and the other based on contour curve fitting, and Experimental results are shown for some difficult cases.
Abstract: Two methods for stroke segmentation from a global point of view are presented and compared. One is based on thinning methods and the other is based on contour curve fitting. For both cases an input image is binarized. For the former, Hilditch's method is used, then crossing points are sought, around which a domain is constructed. Outside the domain, a set of line segments are identified. These lines are connected and approximated by cubic B-spline curves. Smoothly connected lines are selected as segmented curves. This method works well for a limited class of crossing lines, which are shown experimentally. In the latter, a contour line is approximated by cubic B-spline curve, along which curvature is measured. According to the extreme points of the curvature graph, the contour line is segmented, based on which the line segment is obtained. Experimental results are shown for some difficult cases.

Journal ArticleDOI
TL;DR: A novel method for extracting text from document pages of mixed content by detecting pieces of text lines in small overlapping columns of width shifted with respect to each other by image elements and merging these pieces in a bottom-up fashion to form complete text lines and blocks oftext lines.
Abstract: This paper describes a novel method for extracting text from document pages of mixed content. The method works by detecting pieces of text lines in small overlapping columns of width \(w^{'}\), shifted with respect to each other by \(\epsilon < w^{'}\) image elements (good default values are: \(\epsilon=1\%\) of the image width, \(w^{'}=2\epsilon\)) and by merging these pieces in a bottom-up fashion to form complete text lines and blocks of text lines. The algorithm requires about 1.3 s for a 300 dpi image on a PC with a Pentium II CPU, 300 MHz, MotherBoard Intel440LX. The algorithm is largely independent of the layout of the document, the shape of the text regions, and the font size and style. The main assumptions are that the background be uniform and that the text sit approximately horizontally. For a skew of up to about 10 degrees no skew correction mechanism is necessary. The algorithm has been tested on the UW English Document Database I of the University of Washington and its performance has been evaluated by a suitable measure of segmentation accuracy. Also, a detailed analysis of the segmentation accuracy achieved by the algorithm as a function of noise and skew has been carried out.

PatentDOI
TL;DR: In this paper, a list of diacriticals (19) is generated by traversing the ink and a data structure (50) is added to the theory to ensure that all handwritten ink is used and is used only once.
Abstract: Handwritten ink is scanned to identify potential diacriticals. A list of diacriticals (19) is generated by traversing the ink. Potential diacritical-containing characters are processed by scoring them with and without a diacritical to generate a first and second score. The first score is compared to the second score to in order to make a decision as to which variant of the potential diacritical-containing character produced a highest score. The highest score is used as a score for a theory and the decision is recorded. A data structure (50) is added to the theory. Each data unit in the data structure (50) corresponds to an entry in the list of diacriticals (19). As a new theory is created by propagation, contents of the data structure (50) are copied into the new theory. Thus, the data structure (50) is used to ensure that all handwritten ink is used and is used only once.

Journal ArticleDOI
TL;DR: A complete system which interprets 2D paper-based engineering drawings, and outputs 3D models that can be displayed as wireframes using a technique based on evidential reasoning and a wide range of rules and heuristics is presented.
Abstract: Converting paper-based engineering drawings into CAD model files is a tedious process. Therefore, automating the conversion of such drawings represents tremendous time and labor savings. We present a complete system which interprets such 2D paper-based engineering drawings, and outputs 3D models that can be displayed as wireframes. The system performs the detection of dimension sets, the extraction of object lines, and the assembly of 3D objects from the extracted object lines. A knowledge-based method is used to remove dimension sets and text from ANSI engineering drawings, a graphics recognition procedure is used to extract complete object lines, and an evidential rule-based method is utilized to identify view relationships. While these methods are the subject of several of our previous papers, this paper focuses on the 3D interpretation of the object. This is accomplished using a technique based on evidential reasoning and a wide range of rules and heuristics. The system is limited to the interpretation of objects composed of planar, spherical, and cylindrical surfaces. Experimental results are presented.