Topic

Document layout analysis

About: Document layout analysis is a research topic. Over the lifetime, 1462 publications have been published within this topic receiving 34021 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Patent•

Method and apparatus for editing documents

[...]

Mika Fukui¹, Isamu Iwai¹, Miwako Doi¹, Yoichi Takebayashi¹•Institutions (1)

Toshiba¹

31 Mar 1992

TL;DR: An apparatus and method for editing a document to automatically produce a satisfactory, well ordered layout which includes the steps of extracting characteristic quantities which characterize different elements of the document, deriving relationships among the different elements in accordance with the characteristic quantities, determining a layout of the different parts of the documents, and processing the documents in accordance to the layout is described in this paper.

...read moreread less

Abstract: An apparatus and method for editing a document to automatically produce a satisfactory, well ordered layout which includes the steps of (a) extracting characteristic quantities which characterize different elements of the document; (b) deriving relationships among the different elements of the document in accordance with the characteristic quantities; (c) determining a layout of the different elements of the document in accordance with the relationships; and (d) processing the document in accordance with the layout.

...read moreread less

40 citations

Patent•

Identifying key images in a document in correspondence to document text

[...]

Chinmoy Panda¹•Institutions (1)

Adobe Systems¹

23 Mar 2000

TL;DR: In this article, a computer-implemented method and system for identifying key images in a document is presented, which includes extracting one or more document keywords from the document considered important in describing the document, collecting one or several images associated with the document including information describing each image, generating a proximity factor for each image collected from the documents and each document keyword that reflects the degree of correlation between the image and the document keyword, and determining the importance of each image according to an image metric that combines the proximity factors for each document keywords and image pair.

...read moreread less

Abstract: A computer-implemented method and system for identifying key images in a document is provided. The operations used include extracting one or more document keywords from the document considered important in describing the document, collecting one or more images associated with the document including information describing each image, generating a proximity factor for each image collected from the document and each document keyword that reflects the degree of correlation between the image and the document keyword, and determining the importance of each image according to an image metric that combines the proximity factors for each document keyword and image pair. In addition, the operations may also include ordering the document keywords according to an ordering criterion and weighting the proximity factor associated with each document keyword and image pair based on the order of the document keyword.

...read moreread less

40 citations

Patent•

Semantics-bases indexing in a distributed data processing system

[...]

Brandon Brockway¹, Tiffany Durham¹, Cheryl Malatras¹, Gregory Roberts¹•Institutions (1)

IBM¹

05 Jun 2003

TL;DR: In this article, a distributed data processing system, including providing document structure templates comprising model document structures and semantics for the model document structure, identifying the structure of a document, selecting a document structure template in dependence upon the document and the model documents in the template, and storing search keywords from the document in records in a semantics-based search index according to the semantics from the selected template.

...read moreread less

Abstract: Indexing information in a distributed data processing system, including providing document structure templates comprising model document structures and semantics for the model document structures; identifying the structure of a document; selecting a document structure template in dependence upon the structure of the document and the model document structures in the document structure templates; and storing search keywords from the document in records in a semantics-based search index according to the semantics from the selected document structure template. Selecting a document structure template in dependence upon the structure of the document and the model document structures in the document structure templates typically further comprises comparing the structure of the document and the model document structures in the templates; and selecting a template whose model document structure matches the structure of the document.

...read moreread less

40 citations

Patent•

Automated markup language layout

[...]

Brian D. Hanechak

16 Aug 2006

TL;DR: In this article, the layout is based on the text elements having user text content, while text elements without text content are disregarded, and position of text elements is determined based on height of the text element, defined text element spacing distances, and a defined positioning order.

...read moreread less

Abstract: Methods and computer programs for automatically creating a text layout in a markup language design for a product to be printed. A number of defined text elements are available for user text. The layout is based on the text elements having user text content. Text elements without text content are disregarded. Positioning of the text elements is determined based on the height of the text elements, defined text element spacing distances, and a defined positioning order. Creating a layout may include positioning design elements relative to the text elements. Font sizes and spacing distances are automatically reduced if necessary to create a suitable layout.

...read moreread less

39 citations

Document Image Noises and Removal Methods

[...]

Atena Farahmand, Hossein Sarrafzadeh, Jamshid Shanbehzadeh

01 Jan 2013

TL;DR: Noise in scanned document images is reviewed, which reduces the accuracy of subsequent tasks of OCR (Optical character Recognition) systems and some noise removal methods are discussed.

...read moreread less

Abstract:  Abstract- document images may be contaminated with noise during transmission, scanning or conversion to digital form. We can categorize noises by identifying their features and can search for similar patterns in a document image to choose appropriate methods for their removal. After a brief introduction, this paper reviews noises that might appear in scanned document images and discusses some noise removal methods. owadays, with the increase in computer use in everybody's lives, the ability for people to convert documents to digital and readable formats has become a necessity. Scanning documents is a way of changing printed documents into digital format. A common problem encountered when scanning documents is 'noise' which can occur in an image because of paper quality, the typing machine used, or it can be created by scanners during the scanning process. Noise removal is one of the steps in pre- processing. Among other things, noise reduces the accuracy of subsequent tasks of OCR (Optical character Recognition) systems. It can appear in the foreground or background of an image and can be generated before or after scanning. Examples of noise in scanned document images are as follows. The page rule line is a source of noise which interferes with text objects. The marginal noise usually appears in a large dark region around the document image and can be textual or non-textual. Some forms of clutter noise appear in an image because of document skew while scanning or are from holes punched in the document, or background noise, such as uneven contrast, show through effects, interfering strokes, and background spots, etc. Next, we will discuss each type in detail.

...read moreread less

39 citations

Collapse

Network Information

Performance

Metrics

1,488

Papers

35,779

Citations

No. of papers in the topic in previous years
Year	Papers
2023	5
2022	19
2021	34
2020	19
2019	14
2018	9

Document layout analysis

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics