Detecting and extracting image document components to create flow document

Patent

Detecting and extracting image document components to create flow document

TLDR

In this article, text, paths, and images are extracted from the binarized image document and stored in a data store, and retrieved in order to create a flow document that may provide better adaption to a variety of reading experiences and provide editable documents.

Abstract:

One or more components of an image document may be detected and extracted in order to create a flow document from the image document. Components of an image document may include text, one or more paths, and one or more images. The text may be detected using optical character recognition (OCR) and the image document may be binarized. The detected text may be extracted from the binarized image document to enable detection of the paths, which may then be extracted from the binarized image document to enable detection of the images. In some examples, the images, similar to the text and paths, may be extracted from the binarized image document. The extracted text, paths, and/or images may be stored in a data store, and may be retrieved in order to create a flow document that may provide better adaption to a variety of reading experiences and provide editable documents.

Citations

PDF

Open Access

More filters

Patent

Determining the direction of rows of text

Zagaynov Ivan Germanovich, +1 more

TL;DR: The page orientation component of an image processing device receives an image of a document, transforms the image to a binarized image by performing binarization operation on the image, and identifies a portion of the binarised image that comprises one or more rows of textual content.

...read moreread less

Patent

Method of scanning document and image forming apparatus for performing the same

Kyung-hoon Kang, +4 more

TL;DR: A method of scanning a document includes obtaining an original image by scanning the document, detecting at least one pair of marks disposed on the original image, and extracting an image of an area that is defined by the detected at least two pairs of marks from the image as mentioned in this paper.

...read moreread less

Patent

Componentized Data Storage

James A. Malone

TL;DR: In this paper, the authors present a system that includes a hardware processor, a system memory, and a data componentization unit including a data resolution module and data archiving module stored in the system memory.

...read moreread less

Patent

Automated methods and systems of identifying image fragments in document-containing images to facilitate extraction of information from identificated document-containing image fragments

Zagaynov Ivan Germanovich, +1 more

TL;DR: In this article, each feature detector creates a set of features associated with the detector from the image, for each of one or more document type models; applying the document type model to the resulting image.

...read moreread less

Patent

Method for recognizing table, flowchart and text in document images

Wei Ming

TL;DR: In this paper, a method for recognizing a binary document image as a table, pure text, or flowchart is proposed, which is based on side profiles of the image for each of the four sides, calculating a boundary removal size N corresponding to each side based on widths of lines or strokes closest to that side, and for each side, removing a boundary of size N from the document image, and re-calculating the side profile for each sides after the removal.

...read moreread less

References

PDF

Open Access

More filters

Journal ArticleDOI

Document representation and its application to page decomposition

Anil K. Jain, +1 more

- 01 Mar 1998 -

IEEE Transactions on Pattern Analysis an...

TL;DR: A new document model which preserves top-down generation information is proposed based on which a document is logically represented for interactive editing, storage, retrieval, transfer, and logical analysis.

...read moreread less

Patent

Method for inset detection in document layout analysis

Robert Cooperman

TL;DR: In this paper, a method for detecting insets in the structure of a document page so as to further complement the document layout and textual information provided in an optical character recognition system is presented.

...read moreread less

Patent

Camera-based document imaging

Martin Hunt, +9 more

TL;DR: In this article, a process and system to transform a digital photograph of a text document into a scan-quality image is described. But the system is limited to text documents and cannot handle images with text lines.

...read moreread less

Proceedings ArticleDOI

Document layout structure extraction using bounding boxes of different entitles

Jisheng Liang, +3 more

TL;DR: An efficient technique for document page layout structure extraction and classification by analyzing the spatial configuration of the bounding boxes of different entities on the given image by segments an image into a list of homogeneous zones.

...read moreread less

Patent

Systems and methods for automatically reducing data search space and improving data extraction accuracy using known constraints in a layout of extracted data elements

Girish Welling, +4 more

TL;DR: In this article, a method of automatically narrowing data search space and improving accuracy of data extraction using known constraints in a layout of extracted data elements for classified documented is provided, which includes: analyzing each document to classify it within a document category, each category having a corresponding set of expected layouts.

...read moreread less

Related Papers (5)

System and method for extracting structured information from image documents

Mukhopadhyay Abhisek, +1 more

Document image database retrieval method, image feature vector extraction method, document image perusal system, medium which can be machine-read and image display method

Jiyon Efu Karen, +2 more

Method for processing document image captured by camera

Yu Nam Kim, +3 more

Automated methods and systems of identifying image fragments in document-containing images to facilitate extraction of information from identificated document-containing image fragments

Zagaynov Ivan Germanovich, +1 more

Document image output method and apparatus, tampering judging method and system, and program for controlling tampering judging system

Masahiko Suzaki, +1 more