scispace - formally typeset
Search or ask a question

Showing papers on "Document processing published in 1988"


Patent
21 Dec 1988
TL;DR: In this article, a document processing system for processing documents having data printed thereon including pre-printed markings visible to the eye and machine-readable characters includes a scanner for capturing the color image of the document being processed.
Abstract: A document processing system for processing documents having data printed thereon including pre-printed markings visible to the eye and machine-readable characters includes a scanner for capturing the color image of the document being processed. Circuitry is provided for reducing the contract between the pre-printed marking and the background surrounding a machine-readable character based upon the hue of the pre-printed markings, such that the pre-printed markings and the background surrounding a character are not distinguishable with respect to light reflected from the document thereby generating a filtered image of the document. An optical character recognition device receives the filtered image of the document representing character information only for identifying the characters appearing on the document.

110 citations


Proceedings ArticleDOI
14 Nov 1988
TL;DR: The author assesses the current status of the field and places the problem of Chinese recognition into perspective with other areas of optical character recognition.
Abstract: The author assesses the current status of the field and places the problem of Chinese recognition into perspective with other areas of optical character recognition. Early experiments are briefly reviewed, and sources of more up-to-date information, including review articles, are indicated, advances in computer technology are discussed that have had a significant impact on the problem, and a sampling of relatively recent research on the classification of both printed and handprinted ideographs is presented. Included in the discussion are techniques of preprocessing (character location and segmentation) and hierarchical classification. >

59 citations


Patent
31 Mar 1988
TL;DR: A document processing system includes an input section (1), a memory section (2, 3), a text analyzing section (4, 5, 6), an image identifying section (8, 5 and 6), image size identifying sections (9), a layout processing section (10, 11), and an output section (12).
Abstract: A document processing system includes an input section (1), a memory section (2, 3), a text analyzing section (4, 5, 6), an image identifying section (8, 5, 6), an image size identifying section (9), a layout processing section (10, 11), and an output section (12) Document data is constituted by text data and image data The text data includes key information corre­sponding to the image data, and the image data is laid out in the document data The text data and image data input through the input section (1) are stored in the memory section (2) The text analyzing section (4, 5, 6) identifies a position in the document data at which the image data is to be laid out, based on a position of key information in the text data The image identifying section (8, 5, 6) identifies image data corresponding to the key information The image size identifying section (9) identifies an image size of the image data identi­fied by the image identifying section (8, 5, 6) The layout processing section (10, 11) lays out the identi­fied image data at the identified image layout position in accordance with a predetermined layout rule

58 citations


Proceedings ArticleDOI
14 Nov 1988
TL;DR: A methodology for recognizing ZIP codes in handwritten addresses is presented that uses many diverse pattern recognition and image processing algorithms and takes the form of a blackboard architecture that opportunistically invokes routines as needed.
Abstract: A methodology for recognizing ZIP codes (US postal codes) in handwritten addresses is presented that uses many diverse pattern recognition and image processing algorithms. Given a high-resolution image of a handwritten address block, the solution invokes routines capable of hypothesizing the location of the ZIP code, segmenting and recognizing ZIP code digits, locating and recognizing city and state names, and looking up the results in a dictionary. The control structure is not strictly sequential, but rather in the form of a blackboard architecture that opportunistically invokes routines as needed. An implementation of the methodology is described as well as results with a database of grey-level images of handwritten addresses (taken from live mail in a US Postal Service mail processing facility). Future extensions of the approach are discussed. >

42 citations


Patent
Takaaki Nomura1
03 Aug 1988
TL;DR: In this paper, a document editing device for editing a document having a hierarchical structure by sets of element which share definition information on existence and format of serial number and heading title of a text which includes chapters and sections, comprises a document data memory for storing text data with a control character indicating the modification of a hierarchial level of the set and a control characters indicating the end of each element being added to a position in the text data corresponding to the heading.
Abstract: A document editing device for editing a document having a hierarchial structure by sets of element which share definition information on existence and format of serial number and heading title of a text which includes chapters and sections, comprises a document data memory for storing text data with a control character indicating the modification of a hierarchial level of the set and a control character indicating the end of each element being added to a position in the text data corresponding to the heading, and a management information memory for storing management information for each set and each element separately from the text data. When the text data is to be displayed or printed, the control character is detected from the text read from the document data memory, the serial number and title corresponding to the detected control character are read from the management information memory and they are outputted in place of the control character.

35 citations


Journal ArticleDOI
TL;DR: An experimental office system currently being developed at Olivetti research integrates two major requirements of office work: content based document retrieval and mail distribution that closes the gap between electronic document entry systems and processing of (semi-) structured document content.
Abstract: An experimental office system currently being developed at Olivetti research integrates two major requirements of office work: content based document retrieval and mail distribution In this system documents are described and classified by their semantic structure that provides access to abstract concepts contained in the document The derivation of the semantic structure of a document supports both an efficient retrieval by content and an intelligent mail filtering through document semantics A knowledge based classification system automatically generates the conceptual description of a document to be inserted into the system by means of content analysis, and associates the document to an appropriate predefined type The classification system closes the gap between electronic document entry systems and processing of (semi-) structured document content

34 citations


Patent
21 Jan 1988
TL;DR: In this paper, a synthesizer is used for generating software for a computer which is programmed for controlling a physical system, and the software generated by the synthesizer represents a new function to be incorporated in the existing system.
Abstract: A synthesizer means for generating software for a computer which is programmed for controlling a physical system. The software generated by the synthesizer represents a new function to be incorporated in the existing system. The synthesizer includes a device for receiving a formal description representative of the new function in a specification language and for translating the specification into a base document. The base document is further processed by document processing devices for handling the static, interface and dynamic parts of the description to produce an error-free base document. The complete base document is translated by an information processing device into an internal code document which is used by a check device and a simulation device. A compiling device translates the internal code document into an intermediate code document suitable for input to said computer.

31 citations


Proceedings ArticleDOI
14 Nov 1988
TL;DR: Preliminary results are presented to show how the initial stages of syntactic verification can improve character recognition performance.
Abstract: An optical character recognition (OCR) system is developed for recognizing handwritten and handprinted addresses which include a British postcode written within character boxes. The system makes use of syntactic information concerning postcodes and a postcode database which interacts with the character recognition process to ensure that only valid postcodes are recognized. Postulated valid postcodes are then verified using semantic features of the remainder of the address, to produce a final postcode which both matches the input characters and is compatible with the remainder of the address. Preliminary results are presented to show how the initial stages of syntactic verification can improve character recognition performance. >

17 citations


Proceedings ArticleDOI
Koichi Kise1, K. Yamada1, N. Tanaka, Noboru Babaguchi, Yoshikazu Tezuka 
14 Nov 1988
TL;DR: The authors present the visiting card understanding system, whose output is suitable for the input of a visiting-card database and is applicable to many kinds of documents.
Abstract: The authors present the visiting card understanding system, whose output is suitable for the input of a visiting-card database. The system consists of two modules. One is a document model which represents the hierarchical knowledge about the layout structure of visiting cards. The other is an understanding module which interprets the document model to general and test hierarchical hypotheses about the contents of a visiting card. Since the understanding module is fundamentally independent of document type, the system is applicable to many kinds of documents. >

12 citations


Proceedings ArticleDOI
11 Apr 1988
TL;DR: The concept of model driven segmentation allows quick focussing of the analysis on important regions of a document without necessarily requiring CPU-intensive preprocessing steps for the whole document.
Abstract: The task of document recognition requires the scanning of a paper document and the analysis of its content and structure. The resulting electronic representation has to capture the content as well as the logic and layout structure of the document. The first step in the recognition process is scanning, filtering and binarization of the paper document. Based on the preprocessing results we delineate key areas like address or signature for a letter, or the abstract for a report. This segmentation procedure uses a specific document layout model. The validity of this segmentation can be verified in a second step by using the results of more time-consuming procedures like text/graphic classification, optical character recognition (OCR) and the comparison with more elaborate models for specific document parts. Thus our concept of model driven segmentation allows quick focussing of the analysis on important regions. The segmentation is able to operate directly on the raster image of a document without necessarily requiring CPU-intensive preprocessing steps for the whole document. A test version for the analysis of simple business letters has been implemented.

9 citations


Patent
01 Jul 1988
TL;DR: In this paper, the authors define a tree-structured document and edit it in a printing output form to the items of 6 units constituting the document by a printing and outputting and editing means 3 and printed and outputted.
Abstract: PURPOSE:To form and edit a document by defining the item of contents to be an unit by managing the document in the form of a tree structure by defining the item of the contents to be the unit. CONSTITUTION:A document managing means 1 manages the document formed and edited by a document forming means 2 to the items of 6 units formed by a contents editing means 4. The means 2 forms and edits the document based on a word inputted by an input means 5 and the means 4 forms the items 6 in which an item name 61 is specified to associate the document to the tree structure based on the contents of the document of the plural items 6. The document managed in the tree structure can be edited in a printing output form to the items of 6 units constituting the document by a printing and outputting and editing means 3 and printed and outputted.


Proceedings ArticleDOI
14 Nov 1988
TL;DR: A text recognition system for Japanese documents is described, consisting of a personal computer, which is used as a controller; an image scanner; and a recognition unit.
Abstract: A text recognition system for Japanese documents is described. The system is composed of a personal computer, which is used as a controller; an image scanner; and a recognition unit. There are four processing stages: text-line segmentation, character segmentation, character recognition, and postprocessing using the Japanese dictionary. Experimental results of the tests for Japanese handwritten technical reports are presented. >

Patent
14 Sep 1988
TL;DR: In this paper, a document object operating means for processing the data in the document is incorporated in a document processing system itself, the data can be accessed through said operating means, processed, and edited, and the data from the document can be exchanged with an external application program or data base program.
Abstract: PURPOSE: To interchange data outside a document and the data in the document freely and to access the data in the document efficiently from outside the document by using a logical name by providing document processing itself with intelligence. CONSTITUTION: A document object operating means for processing the data in the document is incorporated in a document processing system itself, the data in the document is accessed through said operating means, processed, and edited, and the data in the document is exchanged with an external application program or data base program. The data in the document which is registered as a document object is identified with a group name as the set name of a logical document object and an element name which is added to an individual object. Said operation definition is described by using a language and a group of function units is controlled as an operation definition object. The external application program actuates said operation definition or said definition object to perform necessary processing. COPYRIGHT: (C)1990,JPO&Japio

Patent
19 Apr 1988
TL;DR: In this article, a character type changing-over flag is used to reduce the length of the character type change-over operation by character-converting the attribute of a character string to be inputted as the attribute attribute of the inputted character string at the position near the input position of the already inputting character string.
Abstract: PURPOSE:To contrive to decrease the changing-over operation of a character type by character-converting the attribute of a character string to be inputted as the attribute of the inputted character string at the position near the input position of the already inputted character string CONSTITUTION:When a character key input exists, and an automatic shifting changing- over flag is on, an automatic character type flag is referred to, and when the flag is off, a manual character type flag is referred to and a key code is converted to a character code in accordance with the type code set in the flag After the input position is moved, the automatic shifting changing-over flag is decided, and when the flag is on, the inputted character type of an input position is decided When the inputted character is not present, the character type closest to the input position is investigated and the obtained character type is set to the automatic character type flag When the character after the input is not the character type expected by a user by the automatic shifting changing-over, by providing a means to pushing down simultaneously a control key and a shifting key and change the inputted character type immediately before it, an error input can be corrected Thus, at the time of correcting the inputted character string, the character type changing-over operation can be omitted and the operability is improved


Patent
24 May 1988
TL;DR: In this article, a 2nd control system B serving as a document compiling memory system is connected to an input/output system serving as the 1st control system A of a document processing system via a data transmission/reception means.
Abstract: PURPOSE:To automatically recognize the performance of a printer via a recording system and to attain the easy connection among various printers of different characteristics, by transcribing the printer proper data and the print format data to a document compiling memory system from an input/output system for each text. CONSTITUTION:A 2nd control system B serving as a document compiling memory system is connected to an input/output system serving as a 1st control system A of a document processing system via a data transmission/reception means (f). An electronic typewriter main body containing a keyboard as an input means and a typewriter as a print means is applied to the system A. At the same time, a display containing a memory part is added to the system B and a function pat different from the typewriter main body is connected to the display for indication of document compilation. Then the proper data and the print format data on the typewriter main body are transferred and stored into the system B from the system A for each text. Thus the performance of the typewriter main body is automatically recognized by the system B. In such a way, the operation of a document processing system is facilitated.

Patent
07 Nov 1988
TL;DR: In this article, a character string is edited with the editing processing by editing stored characters to compare them with a document before editing and distinctively outputting extracted points of difference, which can be seen to clarify which part is deleted from or inserted to the original document.
Abstract: PURPOSE:To clarify a character string edited with the editing processing by editing stored characters to compare them with a document before editing and distinctively outputting extracted points of difference. CONSTITUTION:In case of the editing processing where an original document generated and printed beforehand is read from a magnetic disk device 4 to a RAM 5 and a part of this document is corrected, a deletion or insertion start code and a deletion or insertion end code are inserted before and after the edited and corrected character string. These codes are discriminated at the time of reading out document information from the RAM 5 by an MPU 3 and are distributed to picture areas for black, red, and blue in an image memory 6, and the deleted or inserted character string is printed in a color different from that of the original document by a printer 8. Thus, the printed edited document is seen to clarify which part is deleted from or inserted to the original document.

Patent
18 Nov 1988
TL;DR: In this paper, the index information obtained from a row number being edited by referring to a correspondence table and updating index information when a row of a character string is added or deleted is displayed on a display.
Abstract: PURPOSE:To reduce and shorten the trouble and time required to create and edit a document by displaying index information obtained from a row number being edited by referring to a correspondence table and updating index information when a row of a character string is added or deleted CONSTITUTION:The index information consisting of a chapter 13, a clause 14, and a paragraph 15 which are based upon a column number and a row number of a created document, their start column positions, etc, is stored in a correspondence table means 6 in a rewritable state Then the index information obtained from a row number being edited by referring to the correspondence table means 6 is displayed on a display means 2 and when a row of a character string is added or deleted, the row number is increased or decreased to update the index information Consequently, the position of the editing position in the whole document is securely grasped and updated information on the chapter 13, paragraph 14, clause 15, etc, in the whole document after the addition or deletion of the character string is known automatically

Patent
16 Aug 1988
TL;DR: In this paper, the authors propose a method to quickly and simply put an optional image into an optional position of a document view by carrying out the image processing based on a command selected by an image processing menu bar and evolving the processed image into a document page.
Abstract: PURPOSE:To quickly and simply put an optional image into an optional position of a document view by carrying out the image processing based on a command selected by an image processing menu bar and evolving the processed image into a document page. CONSTITUTION:When a document editing job is carried out, an image is processed by a command selected by an image processing menu bar 502 and this processed image is evolved into a document page. In other words, when an optional image is put into an optional position of a document view, a menu bar is switched to the image processing menu bar with selection of a specific command of a document processing menu bar 502 and the relevant image can be evolved into a set area in a document. Thus the stored image information needed for a document editing job can be extracted quickly and simply and then put into an optional position within a document page.

01 Jan 1988
TL;DR: This dissertation describes an approach to a solution to this problem using artificial intelligence and expert system concepts coupled with distributed computer networking to distribute the documents in a large corporation.
Abstract: Document distribution in a large corporation requires a set of routing procedures for each type of document. Documents may include memorandums, payroll reports, technical reports, external correspondence, and internal mail. Some of these documents may require managerial review and signature release authority to leave the organization. The document must be routed through the different levels of the organization according to the document procedures. The availability of the signers and reviewers becomes a delay factor in the routing of the document. This dissertation describes an approach to a solution to this problem using artificial intelligence and expert system concepts coupled with distributed computer networking to distribute the documents. A prototype system has been demonstrated. A document is originated as an "electronic file" on a user workstation (WS), called the Writer. The document is processed by an inference engine in the WS which also appends the list of Signers and Reviewers. The document is then sent to a Knowledge Base Server (KBS) which adds additional information regarding the distribution of the document. Each document contains headers for the communications network in the organization, distribution control header, and the document text body. The KBS stores the document according to the user profiles in the organizations. Activity of reviewing and signing the documents is originated at the user WS. The document is retrieved from the KBS, reviewed by the user, signed and returned to the KBS for intermediate storage. When the KBS has determined that the document has all the required signatures (Signwords), the document is sent to the final destination. The automated document distribution system summarized above has been demonstrated using a C language implementation on PC workstations and a UNIX-based KBS. The PCs are AT&T 6300 systems and the KBS is an AT&T 3B2/310 system. The communications network is a Sytek LocalNet 20 broadband local area network. Knowledge about document processing and distribution is distributed between local workstations' knowledge bases and the KBS. The second phase of the project involves implementing the system using AI and expert systems tools in the PCs and KBS.

Journal Article
TL;DR: A new thinning method is described and some of its properties are given and a study to assess the feasibility of thinning methods for document processing is performed.
Abstract: from each other ( the so-called exoskeThe aim of this paper is to present the results of a study which was performed to assess the feasibility of thinning methods for document processing. As a result of these investigations, both theoretical and numerical, a new thinning method is described and some of its properties are given.

Patent
25 Feb 1988
TL;DR: In this paper, a hand-written table in an official format by making even the widths of columns and characters on lines in a table and generating a table with one line if its original is of one line.
Abstract: PURPOSE:To prepare a hand-written table in an official format by making even the widths of columns and characters on lines in a table and generating a table with one line if its original is of one line. CONSTITUTION:A document processing part in a document recognizing and processing device is constituted of an in-column character count means 11, a maximum and minimum column character number extraction part 12, a maximum and minimum combination adding means 13, an in-line character number comparison and decision means 14, 1st-3rd memories M1-M3, etc. The maximum and minimum combination adding means 12 executes prescribed operations according to the numbers of characters in the maximum and minimum columns, which said means 12 extracts. The means 14 compares the results that said adding means 12 calculates and the number of characters permissible on one line in a converted document format, and decides the number of characters in the column of each line.

01 Jan 1988
TL;DR: The integration of document processing technology is described to develop a system architecture, based on a page description language, to provide network-wide capabilities in a distributed computing environment.
Abstract: At Los Alamos National Laboratory, we have developed an integrated document preparation system that produces publication-quality documents. This system combines text formatters and computer graphics capabilities that have been adapted to meet the needs of users in a large scientific research laboratory. This paper describes the integration of document processing technology to develop a system architecture, based on a page description language, to provide network-wide capabilities in a distributed computing environment. We describe the Laboratory requirements, the integration and implementation issues, and the challenges we faced developing this system.

Patent
Naoki Shimada1
29 Apr 1988
TL;DR: In this article, a pitch control circuit for varying the character pitch in accordance with the attribute imparted by the attribute imparting circuitry to cause the printer to print the characters at a varied character pitch.
Abstract: A document processing apparatus has a printer for printing characters; a keyboard used for inputting characters such as alphabetical characters, symbols and numerical characters; attribute imparting circuitry for imparting an attribute to characters input by the keyboard so as to vary the character spacing of the characters; and a pitch control circuit for varying the character pitch in accordance with the attribute imparted by the attribute imparting circuitry to cause the printer to print the characters at a varied character pitch Accordingly, it is possible to print a numerical expression while providing suitable spacings between adjacent alphanumerical characters and between one or such characters and any adjacent symbol

Patent
06 Jun 1988
TL;DR: In this paper, a processor gives an inquiry to an external memory when the document producing/editing processes are started, and when the memory 4 is available, the documents are produced and edited for the memory 3 only.
Abstract: PURPOSE:To perform the producing/editing processes of documents regardless of the state of an external memory by giving an answer to the inquiry received from a processor which performs the document producing/editing processes in terms of propriety for use of an external memory. CONSTITUTION:A processor 2 gives an inquiry to an external memory 4 when the document producing/editing processes are started. When the memory 4 is available, the document producing/editing jobs are carried out for the document files stored in a main memory 3 and the memory 4. If the memory 4 is not available, the documents are produced and edited for the memory 3 only. Thus the document processing/editing processes are possible even in case the memory 4 is not available.

Patent
09 Dec 1988
TL;DR: In this paper, the authors proposed a simple processing by executing the communication of a processing progress condition and a recognition candidate character between a recognizing task and a correcting task by using a communicating means.
Abstract: PURPOSE: To correct a character, for which a recognition error is generated, by simple processing by executing the communication of a processing progress condition and a recognition candidate character between a recognizing task and a correcting task by using a communicating means. CONSTITUTION: The recognizing task is provided to extract the recognition candidate character by segmenting a recognition object character from a picture, which includes the recognition object character, and collating the character with a dictionary which stores character information. Then, the correcting task is provided to execute the correction and edition of the recognition candidate character. The communication of the processing progress condition and recognition candidate character is executed between the recognizing task and correcting task by using the communicating means. Thus, by using the communicating means with executing recognition processing in the recognizing task or by using a shared memory and the communicating means, the communication of the processing progress condition and recognition candidate character is executed. Then, the correction of the recognition candidate character can be simultaneously executed by the correcting task. COPYRIGHT: (C)1990,JPO&Japio



Proceedings ArticleDOI
01 Jun 1988
TL;DR: A decision making method capable of dealing with the problems faced in real-life applications is developed and its performance on 2759 totally unconstrained handwritten numerals is measured.
Abstract: A decision making method capable of dealing with the problems faced in real-life applications is developed. Its performance on 2759 totally unconstrained handwritten numerals is measured. The estimated recognition reliability of this method for the training set samples is 99.8% and for the testing set is 99.06%.