scispace - formally typeset
Patent

Detecting and extracting image document components to create flow document

TLDR
In this article, text, paths, and images are extracted from the binarized image document and stored in a data store, and retrieved in order to create a flow document that may provide better adaption to a variety of reading experiences and provide editable documents.
Abstract
One or more components of an image document may be detected and extracted in order to create a flow document from the image document. Components of an image document may include text, one or more paths, and one or more images. The text may be detected using optical character recognition (OCR) and the image document may be binarized. The detected text may be extracted from the binarized image document to enable detection of the paths, which may then be extracted from the binarized image document to enable detection of the images. In some examples, the images, similar to the text and paths, may be extracted from the binarized image document. The extracted text, paths, and/or images may be stored in a data store, and may be retrieved in order to create a flow document that may provide better adaption to a variety of reading experiences and provide editable documents.

read more

Citations
More filters
Patent

Determining the direction of rows of text

TL;DR: The page orientation component of an image processing device receives an image of a document, transforms the image to a binarized image by performing binarization operation on the image, and identifies a portion of the binarised image that comprises one or more rows of textual content.
Patent

Method of scanning document and image forming apparatus for performing the same

TL;DR: A method of scanning a document includes obtaining an original image by scanning the document, detecting at least one pair of marks disposed on the original image, and extracting an image of an area that is defined by the detected at least two pairs of marks from the image as mentioned in this paper.
Patent

Componentized Data Storage

TL;DR: In this paper, the authors present a system that includes a hardware processor, a system memory, and a data componentization unit including a data resolution module and data archiving module stored in the system memory.
Patent

Automated methods and systems of identifying image fragments in document-containing images to facilitate extraction of information from identificated document-containing image fragments

TL;DR: In this article, each feature detector creates a set of features associated with the detector from the image, for each of one or more document type models; applying the document type model to the resulting image.
Patent

Method for recognizing table, flowchart and text in document images

Wei Ming
TL;DR: In this paper, a method for recognizing a binary document image as a table, pure text, or flowchart is proposed, which is based on side profiles of the image for each of the four sides, calculating a boundary removal size N corresponding to each side based on widths of lines or strokes closest to that side, and for each side, removing a boundary of size N from the document image, and re-calculating the side profile for each sides after the removal.
References
More filters
Journal ArticleDOI

Document representation and its application to page decomposition

TL;DR: A new document model which preserves top-down generation information is proposed based on which a document is logically represented for interactive editing, storage, retrieval, transfer, and logical analysis.
Patent

Method for inset detection in document layout analysis

TL;DR: In this paper, a method for detecting insets in the structure of a document page so as to further complement the document layout and textual information provided in an optical character recognition system is presented.
Patent

Camera-based document imaging

TL;DR: In this article, a process and system to transform a digital photograph of a text document into a scan-quality image is described. But the system is limited to text documents and cannot handle images with text lines.
Proceedings ArticleDOI

Document layout structure extraction using bounding boxes of different entitles

TL;DR: An efficient technique for document page layout structure extraction and classification by analyzing the spatial configuration of the bounding boxes of different entities on the given image by segments an image into a list of homogeneous zones.
Patent

Systems and methods for automatically reducing data search space and improving data extraction accuracy using known constraints in a layout of extracted data elements

TL;DR: In this article, a method of automatically narrowing data search space and improving accuracy of data extraction using known constraints in a layout of extracted data elements for classified documented is provided, which includes: analyzing each document to classify it within a document category, each category having a corresponding set of expected layouts.
Related Papers (5)