scispace - formally typeset
Search or ask a question
Topic

Document layout analysis

About: Document layout analysis is a research topic. Over the lifetime, 1462 publications have been published within this topic receiving 34021 citations.


Papers
More filters
Patent
24 Feb 2011
TL;DR: In this paper, the authors present a document image generating apparatus that can keep a layout of original text present in the original text image and then can improve the readability of original texts.
Abstract: It is expected to provide a document image generating apparatus, a document image generating method and a computer program that can keep a layout of original text present in the original text image and then can improve the readability of original text and the readability annotation corresponding to the original text (e.g., translation). A translation 421 of original text 411 is aligned at the interline space between the original text 411 at the first line and the original text 412 at the second line. When the interline space is narrow as shown in FIG. 4B, the original text 411 overlays the translation 421. At that time, the color regarding the original text 411 is changed to be a low visibility color, and the color regarding the translation 421 is changed to be a high visibility color.

12 citations

Patent
21 Jul 2009
TL;DR: In this paper, a method of storing a document and one or more related images of alterations made to the document, comprising capturing an image of the document and storing the image in memory, is described.
Abstract: Methods for storing and managing hard copy documents and their modified versions are disclosed. Specifically, a method of storing a document and one or more related images of alterations made to the document, comprising capturing an image of the document; storing the image of the document in memory; capturing an image of an altered version of the document; comparing the image of the document to the image of the altered version of the document; extracting the differences between the image of the document and the image of the altered version of the document; creating an image of the extracted differences between the image of the document and the image of the altered version of the document; and storing the image of the extracted differences in memory.

12 citations

Patent
09 Nov 1990
TL;DR: In this article, the authors propose to convert a text file which is represented with linear character strings into a hierarchical tree structure by analyzing index character strings corresponding to the chapters, paragraphs, and clauses in the main body of a document and automatically generating the tree-shaped logical structure.
Abstract: PURPOSE: To convert a text file which is represented with linear character strings into a hierarchical tree structure by analyzing index character strings corresponding to the chapters, paragraphs, and clauses in the main body of a document and automatically generating the tree-shaped logical structure. CONSTITUTION: A document read part 101 recognizes the characters of inputted document image data and the recognized document data are stored, document by document, in a document data storage part 103; and an index symbol analytic part 102 extracts index symbols and generate the logical structures of the documents from the meaning of the index symbols, and the generated logical structures are stored in the logical structure data storage part 104. A display control part 105 displays the logical structure of a document on a terminal device 106 with a screen according to the stored logical structure data. Consequently, the document file which is represented with linear character strings can be converted into the hierarchical tree structure. COPYRIGHT: (C)1992,JPO&Japio

12 citations

Journal ArticleDOI
TL;DR: A supervised learning approach to segment text and illustration of digitized old documents using a texture feature based on local correlation aimed at detecting the repeating patterns of text regions and differentiate them from pictorial elements is proposed.
Abstract: In this paper we describe a system for automatically analyzing old documents and creating hyper linking between different epochs, thus opening ancient documents to young people and to make them available on the web with old and current content. We propose a supervised learning approach to segment text and illustration of digitized old documents using a texture feature based on local correlation aimed at detecting the repeating patterns of text regions and differentiate them from pictorial elements. Moreover we present a solution to help the user in finding contemporary content connected to what is automatically extracted from the ancient documents.

12 citations

Patent
Hiromi Oda1
07 Oct 2005
TL;DR: A document classifying device, including a vector creating element for creating a document feature vector from an input document to classify, based upon frequencies with which predetermined collocations occur in the input document as discussed by the authors.
Abstract: A document classifying device, including (a) a vector creating element for creating a document feature vector from an input document to be classified, based upon frequencies with which predetermined collocations occur in the input document; and (b) a classifying element for classifying the input document into one of a number of categories using the document feature vector.

12 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
82% related
Feature (computer vision)
128.2K papers, 1.7M citations
82% related
Object detection
46.1K papers, 1.3M citations
81% related
Image segmentation
79.6K papers, 1.8M citations
80% related
Convolutional neural network
74.7K papers, 2M citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20235
202219
202134
202019
201914
20189