scispace - formally typeset
Patent

Title extracting apparatus for extracting title from document image and method thereof

TLDR
In this paper, the authors proposed a method to extract a title rectangle from the inside of a table, which is then used as a keyword for the character recognition process by using the characters extracted from the title rectangle as keywords.
Abstract
A title extracting apparatus scans black pixels in a document image and extracts rectangular regions that circumscribe connected regions of the black pixels as character rectangles. In addition, the title extracting apparatus unifies a plurality of character rectangles that adjoin and extracts rectangular regions that circumscribe the character rectangles as character string rectangles. Thereafter, the title extracting apparatus calculates points with the likelihood of being a title corresponding to attributes such as an underline attribute, a frame attribute, and a ruled line attribute of each character string rectangle, the positions of the character string rectangles in the document image, and the mutual position relation and extracts a character string rectangle with the highest points as a title rectangle. In the case of a tabulated document, the title extracting apparatus can extract a title rectangle from the inside of the table. Characters extracted from the title rectangle are used as keywords of a document image by the character recognizing process.

read more

Citations
More filters
Patent

Apparatus and method for extracting management information from image

TL;DR: In this article, a management information extraction apparatus learns the structure of the ruled lines of a document and the position of user-specified management information such as a title, etc., during a form learning process, and stores them in a layout dictionary.
Patent

Reading device with hierarchal navigation

TL;DR: In this paper, a reading device consisting of a camera, at least one processor, and a user interface is described, where the camera scans at least a portion of a document having text to generate a raster file.
Patent

Text capture and presentation device

TL;DR: In this article, the authors proposed a method for capturing text found in a variety of sources and transforming it into a different user-accessible format or medium, such as a magazine or a book.
Patent

Section extraction tool for PDF documents

TL;DR: In this paper, a method of extracting a section of a page from a portable document format file (pdf) is proposed, which includes receiving an indication of a user-defined region on a pdf file page, designating an extraction region including all elements determined to be within the user defined region, and placing the extraction region into a new file.
Patent

Document processing device, document processing method, and storage medium recording program therefor

TL;DR: In this paper, a document processing device including a specifying unit that specifies character strings which have a common property across documents, from among character strings included in plural documents which are represented by plural corresponding document data, and a rewriting unit that rewrites, among the character strings specified by the specifying unit, character string expressed in formats different from a defined format to character strings expressed in the defined format.
References
More filters
Journal ArticleDOI

Automated entry system for printed documents

TL;DR: Recognition experiments with a prototype system for a variety of complex printed documents shows that the proposed system is capable of reading different types of printed documents at an accuracy rate of 94.8–97.2%.
Patent

Document image processing apparatus

TL;DR: In this article, a document image processing apparatus is provided, which comprises a structure analyzing unit for analytically dividing an input document image into sub-blocks having a specificed physical positional relationship to each other.
Patent

Area discrimination system for text image

TL;DR: In this article, an area discrimination system for binary images is proposed, which consists of a reducing unit for reducing a binary image supplied from an external unit, a skew detector for detecting skew of the binary image with respect to a predetermined direction, an extracting unit for extracting black connected components from the reduced image, a block forming unit for forming blocks, each of which includes a plurality of black connected component close to each other, a first merging unit for merging blocks satisfying first conditions into a a character string by using of the skew detected by said skew detector, and a second merging unit
Patent

Method and apparatus for recognizing table area formed in binary image of document

TL;DR: In this paper, a method for recognizing a table area in a document is presented, which includes the steps of extracting image data on a table, having a table from binary image data, extracting a line segment extending in a first direction from the image on the table area, and extracting an extension of the line segment in a second direction perpendicular to the first direction.
Proceedings ArticleDOI

Qualitative/fuzzy approach to document recognition

TL;DR: It is shown that using qualitative reasoning in conjunction with fuzzy logic permits the production of a model that is capable of handling a wider class of documents despite variable locations and sizes of the blocks in the document.