Topic

Document layout analysis

About: Document layout analysis is a research topic. Over the lifetime, 1462 publications have been published within this topic receiving 34021 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Patent•

Translation device, image processing device, translation method, and recording medium

[...]

Toshiya Koyama¹, Teruka Saito¹, Masakazu Tateno¹, Kei Tanaka¹, Takashi Nagao¹, Masayoshi Sakakibara¹, Xinyu Peng¹, Kotaro Nakamura¹, Atsushi Itoh¹, Masatoshi Tagawa¹, Michihiro Tamune¹, Hiroshi Masuichi¹, Sato Naoko¹, Kiyoshi Tashiro¹ - Show less +10 more•Institutions (1)

Fuji Xerox¹

04 Aug 2005

TL;DR: In this paper, a translation device consisting of a character recognition unit that recognizes text data in a text region of an input image, a translator that translates the text data from the text region to the image region, and a layout configuration processor that generates data containing the translated text data and graphics in the input image is described.

...read moreread less

Abstract: A translation device comprises a character recognition unit that recognizes text data in a text region of an input image; a translator that translates the text data in the text region; and a layout configuration processor that generates data containing the translated text data in the text region and graphics in the input image, wherein a layout of the input image is maintained in a layout of the image of the data generated by the layout configuration processor.

...read moreread less

32 citations

Patent•

System and method for capturing document style by example

[...]

David K. McKnight¹, Eduardus A. T. Merks¹•Institutions (1)

IBM¹

24 Jan 2000

TL;DR: In this paper, the style of an example document is determined by examining the example file for syntax patterns that are required in a document of this type, each pattern is used to create a section template (a sub-template for a larger template).

...read moreread less

Abstract: A system and method of using an example document to create another document with the same style. The style is determined by examining the example file for syntax patterns that are required in a document of this type. Each pattern is used to create a section template (a sub-template for a larger template). After all the required sub-templates have been defined, by examining the example, we have a document template that may be used to format new documents. Along with user-specific content, a document generator uses the captured document template to generate sections of a new document. When a section of a document is generated, the sub-template that corresponds to that section of a document is inserted with user-specific content. The generated file ends up with the same kind of text spacing and positioning, ordering of sections, presence of annotations and other nonfunctional attributes as the example.

...read moreread less

32 citations

Patent•

Document Content Reconstruction

[...]

Joshua Richardson, Vincent Le Chevalier, Ashit Joshi, Dax Eckenberg, Rahul Ravindra Mutalik Desai, Brent S. Tworetzky, Charles F. Geiger - Show less +3 more

06 Jul 2012

TL;DR: In this article, a method, a storage medium and a system for document content reconstruction are provided in a digital content delivery and online education services platform to enable delivery of textbooks and other copyrighted material to multi-platform web browser applications.

...read moreread less

Abstract: A method, a storage medium and a system for document content reconstruction are provided in a digital content delivery and online education services platform to enable delivery of textbooks and other copyrighted material to multi-platform web browser applications. The method comprises ingesting a document page in an unstructured document format. The method further comprises extracting one or more images and metadata associated with the images and text and fonts associated with the texts from the document page. In addition, the method comprises coalescing text into paragraphs and creating a structured document page in a markup language format using the extracted images, text and fonts rendered with layout fidelity to the original ingested document page.

...read moreread less

32 citations

Patent•

User interface for adaptive document layout via manifold content

[...]

David Salesin¹, Charles E. Jacobs¹, Wilmot Li¹•Institutions (1)

Microsoft¹

04 Dec 2004

TL;DR: Manifold representations of content are: multiple versions of anything that might appear in a document, from text, to images, to even such things as stylistic conventions as mentioned in this paper.

...read moreread less

Abstract: A user interface for a system and method for improving document layout on arbitrary devices of different resolutions and size using manifold representations of content. Manifold representations of content are: multiple versions of anything that might appear in a document, from text, to images, to even such things as stylistic conventions. The specific content is selected and formatted dynamically, on the fly, by a layout engine in order to best adapt to a given viewing situation.

...read moreread less

32 citations

Journal Article•DOI•

Page segmentation using minimum homogeneity algorithm and adaptive mathematical morphology

[...]

Tuan Anh Tran¹, In Seop Na¹, Soo-Hyung Kim¹•Institutions (1)

Chonnam National University¹

01 Sep 2016-International Journal on Document Analysis and Recognition

TL;DR: A novel hybrid method, which includes three main stages to deal with document layout analysis or page segmentation, which is the combination of connected component analysis and multilevel homogeneity structure and achieves a higher accuracy compared to other methods.

...read moreread less

Abstract: Document layout analysis or page segmentation is the task of decomposing document images into many different regions such as texts, images, separators, and tables. It is still a challenging problem due to the variety of document layouts. In this paper, we propose a novel hybrid method, which includes three main stages to deal with this problem. In the first stage, the text and non-text elements are classified by using minimum homogeneity algorithm. This method is the combination of connected component analysis and multilevel homogeneity structure. Then, in the second stage, a new homogeneity structure is combined with an adaptive mathematical morphology in the text document to get a set of text regions. Besides, on the non-text document, further classification of non-text elements is applied to get separator regions, table regions, image regions, etc. The final stage, in refinement region and noise detection process, all regions both in the text document and non-text document are refined to eliminate noises and get the geometric layout of each region. The proposed method has been tested with the dataset of ICDAR2009 page segmentation competition and many other databases with different languages. The results of these tests showed that our proposed method achieves a higher accuracy compared to other methods. This proves the effectiveness and superiority of our method.

...read moreread less

31 citations

Collapse

Network Information

Performance

Metrics

1,488

Papers

35,779

Citations

No. of papers in the topic in previous years
Year	Papers
2023	5
2022	19
2021	34
2020	19
2019	14
2018	9

Document layout analysis

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics