Topic

Document layout analysis

About: Document layout analysis is a research topic. Over the lifetime, 1462 publications have been published within this topic receiving 34021 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers

[...]

Raphaël Barman, Maud Ehrmann, Simon Clematide, Sofia Ares Oliveira, Frédéric Kaplan - Show less +1 more

19 Jan 2021-Journal of Data Mining and Digital Humanities

TL;DR: In this paper, a multimodal approach for semantic segmentation of historical newspapers was proposed, which combines visual and textual features, and a series of experiments on diachronic Swiss and Luxembourgish newspapers were conducted to investigate the predictive power of visual and text features and their capacity to generalize across time and sources.

...read moreread less

Abstract: The massive amounts of digitized historical documents acquired over the last decades naturally lend themselves to automatic processing and exploration. Research work seeking to automatically process facsimiles and extract information thereby are multiplying with, as a first essential step, document layout analysis. If the identification and categorization of segments of interest in document images have seen significant progress over the last years thanks to deep learning techniques, many challenges remain with, among others, the use of finer-grained segmentation typologies and the consideration of complex, heterogeneous documents such as historical newspapers. Besides, most approaches consider visual features only, ignoring textual signal. In this context, we introduce a multimodal approach for the semantic segmentation of historical newspapers that combines visual and textual features. Based on a series of experiments on diachronic Swiss and Luxembourgish newspapers, we investigate, among others, the predictive power of visual and textual features and their capacity to generalize across time and sources. Results show consistent improvement of multimodal models in comparison to a strong visual baseline, as well as better robustness to high material variance.

...read moreread less

14 citations

Patent•

Detection and Reconstruction of East Asian Layout Features in a Fixed Format Document

[...]

Drazen Zaric¹, Milan Sesum¹, Milos Lazarevic¹, Milos Raskovic¹•Institutions (1)

Microsoft¹

28 Feb 2014

TL;DR: In this article, a fixed format document is detected and rotated for layout analysis, and the rotated text is rotated back and restructured in a flow format document, which is used to detect East Asian layout features.

...read moreread less

Abstract: Detection of East Asian layout features and reconstruction of East Asian layout features is provided. Vertically written text in the fixed format document is detected and rotated for layout analysis. After layout analysis, the rotated text is rotated back and restructured in a flow format document. When a plurality of characters is written horizontally in a vertical line of text, vertically overlapping text runs are detected, designated as horizontal-in-vertical text, and are restructured as horizontal-in-vertical text in a flow format document. Lines of text are analyzed for attributes of a ruby line and are designated as ruby text, associated with corresponding text in a ruby base line, and restructured as ruby text in a flow format document. Text in a fixed format document is analyzed for detection of a particular East Asian language so that a font for the language is designated in a flow format document.

...read moreread less

14 citations

Patent•

PDF document recognition method

[...]

樊孝龙

19 Mar 2013

TL;DR: In this paper, a PDF document recognition method is proposed, which comprises the steps as follows: S1: analyzing path objects in a PDF text document, and recognizing forms in the PDF document; S2: analyzing text objects outside table areas in PDF documents, and S3: writing recognition results into a temporary file, or writing the recognition result into a PDF file in the form of an attachment.

...read moreread less

Abstract: The invention discloses a PDF document recognition method The method comprises the steps as follows: S1: analyzing path objects in a PDF document, and recognizing forms in the PDF document; S2: analyzing text objects outside table areas in the PDF document, and recognizing text content in the PDF document; S3: writing recognition results into a temporary file, or writing the recognition results into a PDF file in the form of an attachment By the aid of the PDF document recognition method, objects such as the forms, paragraphs, titles, lists and the like in the PDF document can be recognized, so that the PDF document can be edited with one paragraph as the unit, labels can be added to the PDF document conveniently, the reading order can be determined, and persons with dysopia can read conveniently; meanwhile, documents in other formats can be exported according to the recognition results, so that users can read and edit the PDF document conveniently

...read moreread less

14 citations

Patent•

Form layout method and system

[...]

Jonathan E. Peters¹, Matthew R. Foster¹•Institutions (1)

Accenture¹

30 Mar 2012

TL;DR: In this article, the authors present a form layout tool that provides a flexible way to lay out forms on a web page by configuring a web configuration file with the location of form layout styles.

...read moreread less

Abstract: A form layout system includes a form layout tool that provides a flexible way to lay out forms on a web page. The form layout tool configures a web configuration file with the location of form layout styles, and uses the form layout styles, a number of columns, a number of fields, and a “size” of each field to include in the component of a page layout to create a page layout for a target application. The form layout tool generates a revised application page with the created page layout by applying the form layout style to the created page layout.

...read moreread less

14 citations

Patent•

Image processing system for transferring electronic document and paper document as single mail

[...]

Ogaki Takeshi¹, Takeda Yoshiko¹, Shiro Takagi¹, Akinori Iwase¹•Institutions (1)

Toshiba¹

06 Jun 1996

TL;DR: In this paper, an image processing system has a scanner for reading image information of a sheet, on which a first instruction for transferring image information from an electronic document to an arbitrary program at a first terminal as a single transferred document and a second instruction for starting an arbitrary second terminal for creating the electronic document, together with the image of the paper document, a function for starting the program at the second terminal to create the image information and image information for the electronic documents on the basis of the second instruction.

...read moreread less

Abstract: An image processing system has a scanner for reading image information of a sheet, on which a first instruction for transferring image information of an electronic document and image information of a paper document to an arbitrary program at a first terminal as a single transferred document and a second instruction for starting an arbitrary second terminal for creating the electronic document, together with the image information of the paper document, a function for starting the program at the second terminal to create the image information of the electronic document on the basis of the second instruction, and a function for transferring the image information of the paper document and the image information of the electronic document as a single transferred document to the first terminal on the basis of the first instruction.

...read moreread less

14 citations

Collapse

Network Information

Performance

Metrics

1,488

Papers

35,779

Citations

No. of papers in the topic in previous years
Year	Papers
2023	5
2022	19
2021	34
2020	19
2019	14
2018	9

Document layout analysis

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics