Title extracting apparatus for extracting title from document image and method thereof

Patent

Title extracting apparatus for extracting title from document image and method thereof

TLDR

In this paper, the authors proposed a method to extract a title rectangle from the inside of a table, which is then used as a keyword for the character recognition process by using the characters extracted from the title rectangle as keywords.

Abstract:

A title extracting apparatus scans black pixels in a document image and extracts rectangular regions that circumscribe connected regions of the black pixels as character rectangles. In addition, the title extracting apparatus unifies a plurality of character rectangles that adjoin and extracts rectangular regions that circumscribe the character rectangles as character string rectangles. Thereafter, the title extracting apparatus calculates points with the likelihood of being a title corresponding to attributes such as an underline attribute, a frame attribute, and a ruled line attribute of each character string rectangle, the positions of the character string rectangles in the document image, and the mutual position relation and extracts a character string rectangle with the highest points as a title rectangle. In the case of a tabulated document, the title extracting apparatus can extract a title rectangle from the inside of the table. Characters extracted from the title rectangle are used as keywords of a document image by the character recognizing process.

Citations

PDF

Open Access

More filters

Patent

Apparatus and method for extracting management information from image

Yutaka Katsuyama, +2 more

TL;DR: In this article, a management information extraction apparatus learns the structure of the ruled lines of a document and the position of user-specified management information such as a title, etc., during a form learning process, and stores them in a layout dictionary.

...read moreread less

Patent

Reading device with hierarchal navigation

Gretchen Anderson, +3 more

TL;DR: In this paper, a reading device consisting of a camera, at least one processor, and a user interface is described, where the camera scans at least a portion of a document having text to generate a raster file.

...read moreread less

Patent

Text capture and presentation device

Lea Kobeli, +5 more

TL;DR: In this article, the authors proposed a method for capturing text found in a variety of sources and transforming it into a different user-accessible format or medium, such as a magazine or a book.

...read moreread less

Patent

Section extraction tool for PDF documents

Hui Chao, +1 more

TL;DR: In this paper, a method of extracting a section of a page from a portable document format file (pdf) is proposed, which includes receiving an indication of a user-defined region on a pdf file page, designating an extraction region including all elements determined to be within the user defined region, and placing the extraction region into a new file.

...read moreread less

Patent

Document processing device, document processing method, and storage medium recording program therefor

Atsushi Itoh, +7 more

TL;DR: In this paper, a document processing device including a specifying unit that specifies character strings which have a common property across documents, from among character strings included in plural documents which are represented by plural corresponding document data, and a rewriting unit that rewrites, among the character strings specified by the specifying unit, character string expressed in formats different from a defined format to character strings expressed in the defined format.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Automated entry system for printed documents

T. Akiyama, +1 more

- 01 Oct 1990 -

Pattern Recognition

TL;DR: Recognition experiments with a prototype system for a variety of complex printed documents shows that the proposed system is capable of reading different types of printed documents at an accuracy rate of 94.8–97.2%.

...read moreread less

Patent

Document image processing apparatus

Shuichi Tsujimoto

TL;DR: In this article, a document image processing apparatus is provided, which comprises a structure analyzing unit for analytically dividing an input document image into sub-blocks having a specificed physical positional relationship to each other.

...read moreread less

Patent

Area discrimination system for text image

Takashi Saito

TL;DR: In this article, an area discrimination system for binary images is proposed, which consists of a reducing unit for reducing a binary image supplied from an external unit, a skew detector for detecting skew of the binary image with respect to a predetermined direction, an extracting unit for extracting black connected components from the reduced image, a block forming unit for forming blocks, each of which includes a plurality of black connected component close to each other, a first merging unit for merging blocks satisfying first conditions into a a character string by using of the skew detected by said skew detector, and a second merging unit

...read moreread less

Patent

Method and apparatus for recognizing table area formed in binary image of document

Goroh Bessho

TL;DR: In this paper, a method for recognizing a table area in a document is presented, which includes the steps of extracting image data on a table, having a table from binary image data, extracting a line segment extending in a first direction from the image on the table area, and extracting an extension of the line segment in a second direction perpendicular to the first direction.

...read moreread less

Proceedings ArticleDOI

Qualitative/fuzzy approach to document recognition

H. Fujihara, +1 more

TL;DR: It is shown that using qualitative reasoning in conjunction with fuzzy logic permits the production of a model that is capable of handling a wider class of documents despite variable locations and sizes of the blocks in the document.

...read moreread less

Title extracting apparatus for extracting title from document image and method thereof

Citations

Apparatus and method for extracting management information from image

Reading device with hierarchal navigation

Text capture and presentation device

Section extraction tool for PDF documents

Document processing device, document processing method, and storage medium recording program therefor

References

Automated entry system for printed documents

Document image processing apparatus

Area discrimination system for text image

Method and apparatus for recognizing table area formed in binary image of document

Qualitative/fuzzy approach to document recognition

Related Papers (5)

Title extracting device and its method for extracting title from file images

Header extracting device and method for extracting header from file image

Device and method for extracting title from document image

Method for extracting title from document image

Title extraction device, image reading device, title extraction method, and title extraction program