scispace - formally typeset
Search or ask a question
Topic

Document layout analysis

About: Document layout analysis is a research topic. Over the lifetime, 1462 publications have been published within this topic receiving 34021 citations.


Papers
More filters
Patent
01 Mar 1996
TL;DR: In this paper, the authors propose a declarative specification for generating copyright notices, button bars, or other text to be included in most or all document fragments sent over the network.
Abstract: The ability to provide certain text in most, if not all, of the documents sent over the network by a server is provided by a mechanism which combines context information with a document, or document fragment, to be sent. The context information can be made dependent on the document type and defined by the style sheet. In particular, it may be defined by the style definition for a header or footer style for the particular document type. Such feature is particularly useful for generating copyright notices, button bars, or other text to be included in most or all document fragments sent over the network. The context information is defined by a declarative specification that operates on document structure which reduces document management effort.

95 citations

Proceedings ArticleDOI
Lawrence O'Gorman1
30 Aug 1992
TL;DR: Three techniques are described for noise reduction from binary document pages to improve page appearance and subsequent optical character recognition and compression, and for subsampling the text image to fit on the computer screen white maintaining readability.
Abstract: Describes some of the document processing techniques used in the RightPages electronic library system. Since the system deals with scanned images of document pages, these techniques are critical to the use and appearance of the system. The author describes three techniques: (1) for noise reduction from binary document pages to improve page appearance and subsequent optical character recognition and compression; (2) for subsampling the text image to fit on the computer screen white maintaining readability; and (3) a document layout analysis technique to determine text blocks. >

95 citations

Patent
28 Aug 1996
TL;DR: In this paper, a method and system use a processor, video display and relational database to format, define, generate, maintain, distribute and analyze sets of related documents, where a computer controlled video display guides the document author through the creation and maintenance of the documents, while enforcing the document structure and dependency rules.
Abstract: A method and system use a processor, video display and relational database to format, define, generate, maintain, distribute and analyze sets of related documents. The process stores document-specific data in a relational database. A computer-controlled video display guides the document author through the creation and maintenance of the documents, while enforcing the document structure and dependency rules. The creation of documents that relate to activities that require multiple actions at specified time points is driven by a time/action electronic matrix that serves as a central control mechanism for each document set; an example is a document set for use in clinical research studies of drugs. Document sets may be electronically transferred between computers by initial storage at each computer of assembly instructions and formatting information and content common to all documents in the document set, and subsequently by transmitting only document-specific information for each document, with assembly of the document being performed by the receiving computer.

94 citations

Proceedings ArticleDOI
01 Jan 2018
TL;DR: This paper proposes an open-source implementation of a CNN-based pixel-wise predictor coupled with task dependent post-processing blocks and shows that a single CNN-architecture can be used across tasks with competitive results.
Abstract: In recent years there have been multiple successful attempts tackling document processing problems separately by designing task specific hand-tuned strategies. We argue that the diversity of historical document processing tasks prohibits to solve them one at a time and shows a need for designing generic approaches in order to handle the variability of historical series. In this paper, we address multiple tasks simultaneously such as page extraction, baseline extraction, layout analysis or multiple typologies of illustrations and photograph extraction. We propose an open-source implementation of a CNN-based pixel-wise predictor coupled with task dependent post-processing blocks. We show that a single CNN-architecture can be used across tasks with competitive results. Moreover most of the task-specific post-precessing steps can be decomposed in a small number of simple and standard reusable operations, adding to the flexibility of our approach.

92 citations

Proceedings ArticleDOI
01 Mar 1990
TL;DR: This paper shows how user-specified layout constraints may be easily added to many automatic graph layout algorithms and allows a continuum between manual and automatic layout by allowing the user to specify how stable the graph's layout should be.
Abstract: Automatic layout algorithms are commonly used when displaying graphs on the screen because they provide a “nice” drawing of the graph without user intervention. There are, however, a couple of disadvantages to automatic layout. Without user intervention, an automatic layout algorithm is only capable of producing an aesthetically pleasing drawing of the graph. User- or application-specified layout constraints (often concerning the semantics of a graph) are difficult or impossible to specify. A second problem is that automatic layout algorithms seldom make use of information in the current layout when calculating the new layout. This can also be frustrating to the user because whenever a new layout is done, the user's orientation in the graph is lost.This paper suggests using layout constraints to solve both of these problems. We show how user-specified layout constraints may be easily added to many automatic graph layout algorithms. Additionally, the constraints specified by the current layout are used when calculating the new layout to achieve a more stable layout. This approach allows a continuum between manual and automatic layout by allowing the user to specify how stable the graph's layout should be.

92 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
82% related
Feature (computer vision)
128.2K papers, 1.7M citations
82% related
Object detection
46.1K papers, 1.3M citations
81% related
Image segmentation
79.6K papers, 1.8M citations
80% related
Convolutional neural network
74.7K papers, 2M citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20235
202219
202134
202019
201914
20189