scispace - formally typeset
Search or ask a question
Topic

Document layout analysis

About: Document layout analysis is a research topic. Over the lifetime, 1462 publications have been published within this topic receiving 34021 citations.


Papers
More filters
Patent
20 May 2003
TL;DR: In this article, a method of automated document structure identification based on visual cues is proposed, which can be applied in the generation of extensible mark-up language files, natural language parsing and search engine ranking mechanisms.
Abstract: A method of automated document structure identification based on visual cues is disclosed herein The two dimensional layout of the document is analyzed to discern visual cues related to the structure of the document, and the text of the document is tokenized so that similarly structured elements are treated similarly The method can be applied in the generation of extensible mark-up language files, natural language parsing and search engine ranking mechanisms

91 citations

Patent
Robert Cooperman1
23 May 1996
TL;DR: In this paper, a system for providing information on the structure of a document page so as to complement the textual information provided in an optical character recognition system is presented, which can be used to produce a file editable in a native word-processing environment from input data including the content and characteristics of regions of at least one page forming the document.
Abstract: The present invention is a system for providing information on the structure of a document page so as to complement the textual information provided in an optical character recognition system. The system employs a method that can be used to produce a file editable in a native word-processing environment from input data including the content and characteristics of regions of at least one page forming the document. The method includes the steps of: (a) identifying sections within the page; (b) identifying captions; (c) determining boundaries of at least one column on the page, and optionally (d) resizing at least one element of the page of the document so that all pages of the document are of a common size.

91 citations

Patent
06 Mar 2006
TL;DR: In this article, a method and apparatus for automated document layout creation is described, which comprises receiving a first layout of document image objects and creating a second layout of image objects subject to placement constraints corresponding to the placement of the image objects.
Abstract: A method and apparatus for automated document layout creation is disclosed. In one embodiment, the method comprises receiving a first layout of document image objects and creating a second layout of document image objects subject to placement constraints corresponding to placement of document image objects, at least one of the placement constraints being based on object content in one or more of the document image objects.

90 citations

Patent
14 Jan 2004
TL;DR: In this paper, a method is proposed to generate a minimum set of simplified and navigable web contents from a single web document that is oversized for targeted smaller devices, while preserving text, image, transactional and embedded presentation constraint information.
Abstract: A method is disclosed to generate, while preserving text, image, transactional and embedded presentation constraint information, a minimum set of simplified and navigable web contents from a single web document that is oversized for targeted smaller devices. The method includes a parser, a content tree builder, a document tree builder, a document simplifier, a virtual layout engine, a document partitioner, a content scalar and a markup generator. The parser generates markup and data tags from an HTML source document. The builder constructs a content tree. The simplifier transforms the document tree into an intermediate one defined by a subset of XHTML tags and attributes. Layout constraints, including size, area, placement order, and column/row relationships, are calculated for partitioning and scaling the document tree into sub document trees with assigned navigation order and hierarchical hyperlinks. A simplified HTML document is then generated with the markup generator.

90 citations

Book
01 Jan 1995
TL;DR: In this article, a conceptual framework for solving the task of document analysis, which consists in the conversion of the document's pixel representation into an equivalent knowledge network representation holding the document content and layout, is presented.
Abstract: The authors present a conceptual framework for solving the task of document analysis, which, in essence, consists in the conversion of the document's pixel representation into an equivalent knowledge network representation holding the document's content and layout. Starting on the pixel level, the formation of elementary geometric objects on which layout analysis as well as the definition of character objects is based is described. Character recognition accomplishes the mapping from geometric object to character meaning in ASCII representation. On the next level of abstraction words are formed and verified by contextual processing. Modeled knowledge about complete documents and about how their constituents are related to the application forms the highest level of abstraction. The various problems arising at each stage are discussed. The dependencies between the different levels are exemplified and technical solutions put forward. >

89 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
82% related
Feature (computer vision)
128.2K papers, 1.7M citations
82% related
Object detection
46.1K papers, 1.3M citations
81% related
Image segmentation
79.6K papers, 1.8M citations
80% related
Convolutional neural network
74.7K papers, 2M citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20235
202219
202134
202019
201914
20189