Showing papers on "Document processing published in 2004"

PDF

Open Access

Patent•

Low resolution OCR for camera acquired documents

[...]

Charles E. Jacobs¹, James Russell Rinker¹, Patrice Y. Simard¹, Paul A. Viola¹•Institutions (1)

20 May 2004

TL;DR: This paper proposed a global optimization framework for optical character recognition (OCR) of low-resolution photographed documents that combines a binarization-type process, segmentation, and recognition into a single process.

...read moreread less

Abstract: A global optimization framework for optical character recognition (OCR) of low-resolution photographed documents that combines a binarization-type process, segmentation, and recognition into a single process. The framework includes a machine learning approach trained on a large amount of data. A convolutional neural network can be employed to compute a classification function at multiple positions and take grey-level input which eliminates binarization. The framework utilizes preprocessing, layout analysis, character recognition, and word recognition to output high recognition rates. The framework also employs dynamic programming and language models to arrive at the desired output.

...read moreread less

123 citations

Patent•

Agent architecture employed within an integrated message, document and communication system

[...]

Boban Mathew, Thomas John, Dagny Evans

12 Jul 2004

TL;DR: In this article, a system for managing messages, communications and documents comprising a plurality of task agents, each task agent to perform one or more specified message processing actions, document processing actions and/or communications processing actions on different types of messages, documents and or communication channels, respectively, is described.

...read moreread less

Abstract: A system is described for managing messages, communications and/or documents comprising: a plurality of task agents, each task agent to perform one or more specified message processing actions, document processing actions, and/or communications processing actions on different types of messages, documents and/or communication channels, respectively; and one or more manager agents to coordinate the actions of the plurality of task agents responsive to a plurality of message and/or document processing rules.

...read moreread less

105 citations

Proceedings Article•DOI•

Machine learning methods for automatically processing historical documents: from paper acquisition to XML transformation

[...]

Floriana Esposito, Donato Malerba, Giovanni Semeraro, Stefano Ferilli, O. Altamura, Teresa Maria Altomare Basile, Margherita Berardi, Michelangelo Ceci, N. Di Mauro - Show less +5 more

23 Jan 2004

TL;DR: This work proposes the use of a document processing system, WISDOM++, which uses heavily machine learning techniques in order to perform such a task, and reports promising results obtained in preliminary experiments.

...read moreread less

Abstract: One of the aims of the EU project COLLATE is to design and implement a Web-based collaboratory for archives, scientists and end-users working with digitized cultural material. Since the originals of such a material are often unique and scattered in various archives, severe problems arise for their wide fruition. A solution would be to develop intelligent document processing tools that automatically transform printed documents into a Web-accessible form such as XML. Here, we propose the use of a document processing system, WISDOM++, which uses heavily machine learning techniques in order to perform such a task, and report promising results obtained in preliminary experiments.

...read moreread less

84 citations

Patent•

Document processing system with improved image quality assurance

[...]

Robert Klein¹, Craig F. Lapan¹, George E. Reasoner¹•Institutions (1)

Unisys¹

30 Apr 2004

TL;DR: In this paper, a document processing system comprising an image capture subsystem for capturing selected image metrics and at least one image rendition from a plurality of documents and for determining if the selected image metric does not successfully compare against preselected image quality metric threshold values is presented.

...read moreread less

Abstract: A document processing system comprising an image capture subsystem for capturing selected image metrics and at least one image rendition from a plurality of documents and for determining if at least one of the selected image metrics for any of the at least one image rendition does not successfully compare against preselected image quality metric threshold values. An image quality flag is generated for any of the at least one image rendition if it does not successfully compare, and a record entry for each imaged document having at least one flagged image rendition is created in an image quality flag file. An image index file for individually accessing the image renditions is modified to include a reference to the corresponding image quality flag file record entry The document processing system may optionally compare selected document metrics against preselected document metrics in a similar manner. Image defects in the plurality of documents can be identified by examining the record entries in the image quality flag file.

...read moreread less

73 citations

Patent•

System and method for role based access control of a document processing device

[...]

Marianne Kodimer¹, Michael Yeung¹, Amir Shahindoust, Girish R. Krishna•Institutions (1)

Toshiba¹

04 Feb 2004

TL;DR: In this paper, a system and method for controlling access to a document processing device based on roles assigned to user groups is presented, where each group of users has certain functions for which they are authorized to use the document processing devices.

...read moreread less

Abstract: A system and method for controlling access to a document processing device based on roles assigned to user groups. Each group of users has certain functions for which they are authorized to use the document processing device. The device compares a username and password with correlating information stored in an authentication server. The server transmits a list of functions for which the user is authorized to employ the device. The device then compares the requested function with the authorized functions to determine if the user is allowed to utilize the document processing device for the requested function. The document processing device then performs the authorized requested function.

...read moreread less

62 citations

Proceedings Article•DOI•

Text line segmentation in handwritten document using a production system

[...]

Stéphane Nicolas¹, Thierry Paquet¹, Laurent Heutte¹•Institutions (1)

Centre national de la recherche scientifique¹

26 Oct 2004

TL;DR: Considering the drawbacks of traditional methods for text line extraction in handwritten documents, a new approach for handwritten page segmentation is proposed, based on a traditional problem solving framework used in artificial intelligence.

...read moreread less

Abstract: We present in this paper a digitization project of cultural heritage manuscripts and we discuss the underlying problems, particularly those relative to document analysis. Considering the drawbacks of traditional methods for text line extraction in handwritten documents, we propose to adopt a new approach for handwritten page segmentation, based on a traditional problem solving framework used in artificial intelligence.

...read moreread less

61 citations

Proceedings Article•DOI•

Gabor filters for document analysis in Indian bilingual documents

[...]

Peeta Basa Pati¹, S. Sabari Raju¹, Nishikanta Pati¹, A. G. Ramakrishnan¹•Institutions (1)

Indian Institute of Science¹

24 Aug 2004

TL;DR: A biologically inspired, multi-channel filtering scheme for page layout analysis and it has been seen to be computationally viable for commercial OCR system development.

...read moreread less

Abstract: Reasonable success has been achieved at developing monolingual OCR systems in Indian scripts. Scientists, optimistically, have started to look beyond. Development of bilingual OCR systems and OCR systems with capability to identify the text areas are some of the pointers to future activities in Indian scenario. The separation of text and non-text regions before considering the document image for OCR is an important task. In this paper, we present a biologically inspired, multi-channel filtering scheme for page layout analysis. The same scheme has been used for script recognition as well. Parameter tuning is mostly done heuristically. It has also been seen to be computationally viable for commercial OCR system development.

...read moreread less

59 citations

Patent•

Document processing apparatus and document processing method

[...]

Kazufumi Kobashi¹•Institutions (1)

Canon Inc.¹

12 Nov 2004

TL;DR: In this paper, a registration unit registers data indicative of post-processing data, which has been performed on a sheet document in page unit, as document setting of the generated electronic document.

...read moreread less

Abstract: Document processing apparatus which reflects post-processing data, which is recognized by a scanner, on document setting of a generated electronic document. The document processing apparatus comprises: a registration unit which registers data indicative of post-processing setting, which has been performed on a sheet document in page unit, as document setting of the generated electronic document; and a generation unit which generates printing data by reflecting the registered document setting on the generated electronic document.

...read moreread less

57 citations

Patent•

Document retrieving method and apparatus

[...]

Eiichiro Toshima¹•Institutions (1)

Canon Inc.¹

26 Apr 2004

TL;DR: In this paper, text feature data that bases upon text data included in a document and image feature data based upon a document image are stored in a memory, and a document corresponding to the search document is retrieved from plural documents.

...read moreread less

Abstract: In the proposed document retrieving apparatus, text feature data that bases upon text data included in a document and image feature data that bases upon a document image are stored in a memory. Image data of a search document is subjected to character recognition processing, text feature data is acquired based on the obtained text data, and image feature data (layout data) is acquired based on the image data of the search document. Using the text feature data and image feature data acquired with respect to the search document, a memory is searched, and a document corresponding to the search document is retrieved from plural documents.

...read moreread less

50 citations

Patent•

Document search method and apparatus

[...]

Eiichiro Toshima¹•Institutions (1)

Canon Inc.¹

19 May 2004

TL;DR: In this paper, a character recognition process is applied to an image of a search image, and text data which is estimated to be correctly recognized is extracted from the text data obtained by the character recognition.

...read moreread less

Abstract: In a document search method for searching for a document, a character recognition process is applied to an image of a search image, and text data which is estimated to be correctly recognized is extracted from the text data obtained by the character recognition process. Text feature information is generated based on the extracted text data, and a plurality of documents are searched for a document corresponding to the search document using the generated text feature information as a query.

...read moreread less

49 citations

Patent•

Combined speech and handwriting recognition

[...]

Daniel L. Roth¹, Edward W. Porter•Institutions (1)

Nuance Communications¹

05 Dec 2004

TL;DR: In this paper, the combination of speech recognition with handwriting and/or character recognition was proposed, and the best-scoring recognition candidates were selected as a function of recognition of both handwritten and spoken representations of a sequence of one or more words to be recognized.

...read moreread less

Abstract: The invention relates to the combination of speech recognition with handwriting and/or character recognition. This includes the innovation of selecting one or more best-scoring recognition candidates as a function of recognition of both handwritten and spoken representations of a sequence of one or more words to be recognized. It also includes the innovation of using character or handwriting recognition of one or more letters to alphabetically filter speech recognition of one or more words. It also includes the innovations of using speech recognition of one or more letter-identifying words to alphabetically filter handwriting recognition, and of using speech recognition to correct handwriting recognition of one or more words.

...read moreread less

Patent•

Claim data and document processing system

[...]

Thomas Prendergast

20 Feb 2004

TL;DR: A system which processes claim data related to provision of healthcare to a patient includes the following: an interface processor receives claim data for provision of a service to a particular patient; an attachment processor automatically applies predetermined claim submission requirements in processing the claim data to identify: (a) whether an attachment document is required to be submitted together with the claim to a payer for claim reimbursement; and (b) which particular document is to be provided together with a claim to said payer in order to satisfy the claim requirements as mentioned in this paper.

...read moreread less

Abstract: A system which processes claim data related to provision of healthcare to a patient includes the following. An interface processor receives claim data related to a claim for provision of a service to a particular patient. An attachment processor automatically applies predetermined claim submission requirements in processing the claim data to identify: (a) whether an attachment document is required to be submitted together with the claim to a payer for claim reimbursement; and (b) which particular document is to be provided together with the claim to said payer for claim reimbursement. A document processor retrieves the particular document from storage for provision to said payer for claim reimbursement.

...read moreread less

Patent•

Electronic document processing system, electronic document processing method, and storage medium storing therein program for executing the method

[...]

Imai Satoshi¹•Institutions (1)

Canon Inc.¹

27 Apr 2004

TL;DR: A document processing system which can form, for example, an electronic document that is encrypted, having: setting means for setting access permission attributes for restricting a predetermined function request and ID information per function into the electronic document; storing means for storing the electronic documents in which the access permission attribute attributes are set so that the e-doc can be updated; an access information management table for managing the access authority attributes of each electronic document and the ID information which are set into the e -doc.

...read moreread less

Abstract: A document processing system which can form, for example, an electronic document that is encrypted, having: setting means for setting access permission attributes for restricting a predetermined function request and ID information per function into the electronic document; storing means for storing the electronic document in which the access permission attributes are set so that the electronic document can be updated; an access information management table for managing the access permission attributes of each electronic document and the ID information which are set into the electronic document by the setting means; and electronic document managing means for updating the access permission attributes and the ID information set in the electronic document in response to a change request for the access permission attributes and the ID information after the electronic document was registered into the storing means and updating contents in the access information management table on the basis of the changed access permission attributes and the changed ID information so as to be matched.

...read moreread less

Patent•

Apparatus and method for document processing

[...]

Timothy Underwood, Benjamin Farrow, Joseph LaBonty

16 Mar 2004

TL;DR: In this paper, the authors present a universal document package which combines the loan data, rules, forms and form data into a single complete Universal Document Package along with the tools to support various electronic and manual mortgage processing activities.

...read moreread less

Abstract: The apparatus and methods of the present invention implement a universal document package which combines the loan data, rules, forms and form data into a single complete universal document package along with the tools to support various electronic and manual mortgage processing activities. In the most preferred embodiments of the present invention, the universal document package is implemented as an extensible markup language (XML) document. Additionally, the methods for computer-based procurement, implementation, and use of the universal document package are disclosed.

...read moreread less

Patent•

System and method for digital payment of document processing services

[...]

Cozianu Costin¹, George Koppich¹•Institutions (1)

Toshiba¹

28 Sep 2004

TL;DR: In this paper, the authors present a system and method for the digital payment of document processing services, which uses a pre-paid or digital payment mechanism to charge for processing services.

...read moreread less

Abstract: This invention is directed to a system and method for the digital payment of document processing services. More particularly, this invention is directed to a system and method which uses a pre-paid or digital payment mechanism to charge for document processing services.

...read moreread less

Patent•

Document processor, document processing method, and document processing program

[...]

Yoshihisa Oguro, 慶久大黒

25 Feb 2004

TL;DR: In this article, a document processor is proposed to extract a characteristic of a character line without recognizing a character to grasp contents of the character line by extracting characteristics showing an arrangement state of an in-line rectangle of a text character line image, and quantizing them in a fixed stage to generate a symbol.

...read moreread less

Abstract: PROBLEM TO BE SOLVED: To extract a characteristic of a character line without recognizing a character to grasp contents of the character line by extracting characteristics showing an arrangement state of an in-line rectangle of a character line image, and quantizing them in a fixed stage to generate a symbol. SOLUTION: This document processor includes: an image input part 201 inputting a document image of an identification target; a rectangle extraction part 202 extracting a rectangle from the document image; a line cutout part 203 performing a cutout process of the in-line rectangle from the rectangle; a symbol generation part 204 extracting the characteristics showing the arrangement state of the in-line rectangle, and quantizing them to generate the symbol; an appearance frequency totaling part 205 performing a prescribed process to a symbol series, and calculating and totaling appearance probability of the symbol series by languages; and decision part 206 deciding that the language showing the highest appearance probability is a language to which a collation target line belongs, from a totaling result by the appearance frequency totaling part 205. COPYRIGHT: (C)2005,JPO&NCIPI

...read moreread less

Book Chapter•DOI•

Techniques for Efficient Query Expansion

[...]

Bodo Billerbeck¹, Justin Zobel¹•Institutions (1)

RMIT University¹

05 Oct 2004

TL;DR: This work explores alternative methods for reducing query-evaluation costs, and proposes a new method based on keeping a brief summary of each document in memory that allows query expansion to proceed three times faster than previously, while approximating the effectiveness of standard expansion.

...read moreread less

Abstract: Query expansion is a well-known method for improving average effectiveness in information retrieval. However, the most effective query expansion methods rely on costly retrieval and processing of feedback documents. We explore alternative methods for reducing query-evaluation costs, and propose a new method based on keeping a brief summary of each document in memory. This method allows query expansion to proceed three times faster than previously, while approximating the effectiveness of standard expansion.

...read moreread less

Proceedings Article•DOI•

Discrimination of machine-printed from handwritten text using simple structural characteristics

[...]

Ergina Kavallieratou¹, S. Stamatatos¹•Institutions (1)

American Hotel & Lodging Educational Institute¹

23 Aug 2004

TL;DR: Experiments on document images taken from IAM-DB and GRUHD databases show a remarkable performance of the proposed approach to discriminate between machine-printed and handwritten text that requires minimal training data.

...read moreread less

Abstract: In this paper, we present a trainable approach to discriminate between machine-printed and handwritten text. An integrated system able to localize text areas and split them in text-lines is used. A set of simple and easy-to-compute structural characteristics that capture the differences between machine-printed and handwritten text-lines is introduced. Experiments on document images taken from IAM-DB and GRUHD databases show a remarkable performance of the proposed approach that requires minimal training data.

...read moreread less

Patent•

Document processing apparatus and document processing method

[...]

Makoto Tomita¹•Institutions (1)

Canon Inc.¹

08 Dec 2004

TL;DR: A document processing apparatus includes a first determination unit for determining, as an image processing option, an object related to a predetermined print setting included in image data corresponding to a page of a source document read by an image reading unit for reading the source document as image data.

...read moreread less

Abstract: A document processing apparatus includes a first determination unit for determining, as an image processing option, an object related to a predetermined print setting included in image data corresponding to a page of a source document read by an image reading unit for reading the source document as image data and an output unit for outputting the option determined by the first determination unit.

...read moreread less

Patent•

System and method for verifying and searching documents

[...]

Curtis W. Hallowell, Robert B. Fitzgerald

12 Aug 2004

TL;DR: In this paper, a method of processing tickets in a document processing device including receiving a stack of barcoded tickets in an input receptacle (2700) of a document-processing device is described.

...read moreread less

Abstract: A method of processing barcoded tickets in a document processing device including receiving a stack of barcoded tickets in an input receptacle (2700) of a document processing device Each barcoded ticket includes a document-identifier or ticket number that identifies the barcoded ticket At least one specific document-identifier is inputted by the operator to search for a specific document in a stack of documents (2710, 2720) Each of the documents are transported, one at a time, past a detector, which detects the document-identifier of each ticket A determination is made whether a detected document-identifier matches the specific document-identifier requested by the operator, and if so, the ticket in question is directed to a pre-programmed or user-specified output receptacle (2730)

...read moreread less

Journal Article•DOI•

An optical character recognition system for printed Telugu text

[...]

C. Vasantha Lakshmi¹, C. Patvardhan¹•Institutions (1)

Dayalbagh Educational Institute¹

01 Jul 2004-Pattern Analysis and Applications

TL;DR: The algorithms designed exploit special characteristics of Telugu script for processing the document images efficiently, and are implemented to create a Telugu OCR system for printed text (TOSP).

...read moreread less

Abstract: Telugu is one of the oldest and popular languages of India, spoken by more than 66 million people, especially in South India. Not much work has been reported on the development of optical character recognition (OCR) systems for Telugu text. Therefore, it is an area of current research. Some characters in Telugu are made up of more than one connected symbol. Compound characters are written by associating modifiers with consonants, resulting in a huge number of possible combinations, running into hundreds of thousands. A compound character may contain one or more connected symbols. Therefore, systems developed for documents of other scripts, like Roman, cannot be used directly for the Telugu language. The individual connected portions of a character or a compound character are defined as basic symbols in this paper and treated as a unit of recognition. The algorithms designed exploit special characteristics of Telugu script for processing the document images efficiently. The algorithms have been implemented to create a Telugu OCR system for printed text (TOSP). The output of TOSP is in phonetic English that can be transliterated to generate editable Telugu text. A special feature of TOSP is that it is designed to handle a large variety of sizes and multiple fonts, and still provides raw OCR accuracy of nearly 98%. The phonetic English representation can be also used to develop a Telugu text-to-speech system; work is in progress in this regard.

...read moreread less

Journal Article•DOI•

Unsupervised writer adaptation applied to handwritten text recognition

[...]

A. Nosary¹, Laurent Heutte¹, Thierry Paquet¹•Institutions (1)

University of Rouen¹

01 Feb 2004-Pattern Recognition

TL;DR: Tests carried out on a sample of 15 writers show the interest of the proposed adaptation scheme since they obtain during iterations an improvement of recognition rates both at the letter and the word levels.

...read moreread less

Proceedings Article•DOI•

Machine-printed from handwritten text discrimination

[...]

Ergina Kavallieratou, S. Stamatatos, Hera Antonopoulou¹•Institutions (1)

Research Academic Computer Technology Institute¹

26 Oct 2004

TL;DR: A set of simple structural characteristics that capture the differences between machine-printed and handwritten text-lines is presented and preliminary experiments on document images taken from databases of different languages and characteristics show a remarkable performance.

...read moreread less

Abstract: This paper deals with the discrimination between machine-printed and handwritten text, a prerequisite for many OCR applications. An easy-to-follow approach is proposed based on an integrated system able to localize text areas and split them in text-lines. A set of simple structural characteristics that capture the differences between machine-printed and handwritten text-lines is presented and preliminary experiments on document images taken from databases of different languages and characteristics show a remarkable performance.

...read moreread less

Patent•

Document processing method and apparatus

[...]

Koji c o Canon Kabushiki Kaisha Nakagiri¹•Institutions (1)

Canon Inc.¹

02 Nov 2004

TL;DR: In this article, a document processing method of documenting image data obtained by double-sided scanning of an original containing a single-sided printed part and a doublesided printing part was proposed.

...read moreread less

Abstract: According to this invention, a document processing method of documenting image data obtained by double-sided scanning of an original containing a single-sided printed part and a double-sided printed part includes a blank determination step of determining on the basis of the image data whether the lower surface of an original is blank, and a documentation step of, when the lower surface is determined in the blank determination step to be blank, saving image data corresponding to the upper surface of the original as document information together with a single-sided printing attribute, and when the lower surface is determined in the blank determination step not to be blank, saving image data corresponding to the two surfaces of the original as document information together with a double-sided printing attribute.

...read moreread less

Journal Article•DOI•

Adaptive topological tree structure for document organisation and visualisation

[...]

Richard T. Freeman¹, Hujun Yin¹•Institutions (1)

University of Manchester¹

01 Oct 2004-Neural Networks

TL;DR: Adaptive Topological Tree Structure (ATTS) as discussed by the authors generates a taxonomy of underlying topics from a set of unclassified, unstructured documents, which can be browsed like a content hierarchy and reflect the connections between related topics at each level.

...read moreread less

Patent•

Document processing system

[...]

Kazuhiko Abe, Tezuka Katsumi, 克己手塚, 和彦阿部

15 Dec 2004

TL;DR: In this article, the authors propose a document processing system whereby a client terminal MT can easily and quickly connect to a device selected by the client terminal and executes document processing under an environment wherein a plurality of document processing apparatuses MFD exist.

...read moreread less

Abstract: PROBLEM TO BE SOLVED: To provide a document processing system whereby a client terminal MT can easily and quickly connect to a document processing apparatus selected by the client terminal MT and executes document processing under an environment wherein a plurality of document processing apparatuses MFD exist. SOLUTION: Each of the client terminal MT and the document processing apparatuses MFD (A) to MFD (C) respectively includes a radio communication function, the document processing apparatus MFD (A) uses a beacon signal to transmit setting information required for its radio connection, the client terminal MT uses the received setting information to connect itself to the document processing apparatus MFD (A), the document processing apparatus MFD (A) transmits setting information for adhoc radio connection to the other document processing apparatuses MFD (B), MFD (C), and the client terminal MT makes the adhoc connection to the other document processing apparatus MFD (B) on the basis of the received setting information to transmit a document processing job to the other document processing apparatus MFD (B). COPYRIGHT: (C)2006,JPO&NCIPI

...read moreread less

Proceedings Article•DOI•

Making handwritten archives documents accessible to public with a generic system of document image analysis

[...]

Bertrand Coüasnon¹, Jean Camillerapp, Ivan Leplumey•Institutions (1)

French Institute for Research in Computer Science and Automation¹

23 Jan 2004

TL;DR: A platform for managing annotations needed for handwritten archive document retrieval by content is presented as well as examples of automatic annotations on civil status registers, military forms and naturalization decrees, using a generic document recognition method.

...read moreread less

Abstract: We present annotations needed for handwritten archive document retrieval by content. We propose two complementary ways of producing those annotations: automatically by using document image analysis and collectively by using Internet and a manual input by users. A platform for managing those annotations is presented as well as examples of automatic annotations on civil status registers, military forms (tested on 60000 pages) and naturalization decrees, using a generic document recognition method. Examples of collective annotations built on automatic annotations are also given. This platform will be officially open to public on Internet and inside the new building of the Archives departementales des Yvelines in December 2003. 1200000 images of civil status registers will be available for collective annotation as well as 35000 pages of military forms with automatic annotation of handwritten names.

...read moreread less

Journal Article•

A Data Base for Arabic Handwritten Text Recognition Research.

[...]

Somaya Al-Maadeed, Dave Elliman, Colin Higgins

01 Jan 2004-The International Arab Journal of Information Technology

TL;DR: The most popular words in Arabic writing were identified for the first time, using an associated program and designed an innovative, simple yet powerful, in place tagging procedure for the database.

...read moreread less

Abstract: In this paper we present a new database for off-line Arabic handwriting recognition, together with associated preprocessing procedures. We have developed a new database for the collection, storage and retrieval of Arabic handwritten text (AHDB). This is an advance both in terms of the size of the database as well as the number of different writers involved. We further designed an innovative, simple yet powerful, in place tagging procedure for our database. It enables us to easily extract the bitmaps of words. We also constructed a preprocessing class, which contains some useful preprocessing operations. In this paper the most popular words in Arabic writing were identified for the first time, using an associated program.

...read moreread less

Proceedings Article•DOI•

Spontaneous handwriting recognition and classification

[...]

Alejandro Héctor Toselli, Alfons Juan, Enrique Vidal

23 Aug 2004

TL;DR: Finite-state models are used to implement a handwritten text recognition and classification system for a real application entailing casual, spontaneous writing with large vocabulary.

...read moreread less

Abstract: Finite-state models are used to implement a handwritten text recognition and classification system for a real application entailing casual, spontaneous writing with large vocabulary. Handwritten short paragraphs are to be classified into a small number of predefined classes. The paragraphs involve a wide variety of writing styles and contain many non-textual artifacts. HMMs and n-grams are used for text recognition and n-grams are also used for text classification. Experimental results are reported which, given the extreme difficulty of the task, are encouraging.

...read moreread less

Patent•

Document processing method and system

[...]

John E. Jones, Paul A. Jones, William J. Jones, Douglas U. Mennie

09 Jan 2004

Collapse