Topic

Document layout analysis

About: Document layout analysis is a research topic. Over the lifetime, 1462 publications have been published within this topic receiving 34021 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

An Automatic Word-spotting Framework for Medieval Manuscripts

[...]

Ruggero Pintus, Ying Yang¹, Enrico Gobbetti, Holly Rushmeier¹•Institutions (1)

Yale University¹

01 Sep 2015

TL;DR: A completely automatic and scalable framework to perform query-by-example word-spotting on medieval manuscripts that does not require any human intervention to produce a large amount of annotated training data and provides Computer Vision researchers and Cultural Heritage practitioners with a compact and efficient system for document analysis.

...read moreread less

Abstract: We present a completely automatic and scalable framework to perform query-by-example word-spotting on medieval manuscripts. Our system does not require any human intervention to produce a large amount of annotated training data, and it provides Computer Vision researchers and Cultural Heritage practitioners with a compact and efficient system for document analysis. We have executed the pipeline both in a single-manuscript and a cross-manuscript setup, and we have tested it on a heterogeneous set of medieval manuscripts, that includes a variety of writing styles, languages, image resolutions, levels of conservation, noise and amount of illumination and ornamentation. We also present a precision/recall based analysis to quantitatively assess the quality of the proposed algorithm.

...read moreread less

7 citations

Proceedings Article•DOI•

Table form document synthesis by grammar-based structure analysis

[...]

Akira Amano¹, Naoki Asada, T. Motoyama, T. Sumiyoshi, K. Suzuki - Show less +1 more•Institutions (1)

Hiroshima City University¹

10 Sep 2001

TL;DR: This paper presents a computer assisted document synthesis system based on the grammar-based structure analysis designed to accomplish the analysis and synthesis of table form documents cooperatively by a user and computer.

...read moreread less

Abstract: Document structure is an important issue not only for document analysis but for document synthesis. This paper presents a computer assisted document synthesis system based on the grammar-based structure analysis. The system is designed to accomplish the analysis and synthesis of table form documents cooperatively by a user and computer; namely, the user interprets the document meaning and gives the entry data to be filled in, while the computer detects the boxes formed by horizontal and vertical rules and determines the logical relations of adjacent boxes. First, the document is decomposed into a set of boxes and they are classified semi-automatically into four types: blank, insertion, indication and explanation. Then the box relations between the indication box and its associated entry one are analyzed based on the semantic and geometric knowledge defined in the document structure grammar. Finally, the system generates LATEX codes of the synthesized documents whose blank and insertion boxes are filled with the text and image data given by user. Experimental results have shown that the system analyzed successfully several kinds of table forms and yielded synthesized documents as expected.

...read moreread less

7 citations

Proceedings Article•DOI•

Logical labeling of Arabic newspapers using artificial neural nets

[...]

Karim Hadjar, Rolf Ingold

31 Aug 2005

TL;DR: This paper proposes a learning-based method to label logical components on Arabic newspaper documents called LUNET, which is driven by artificial neural nets.

...read moreread less

Abstract: Logical structure analysis is an important phase in the process of document image understanding. In this paper we propose a learning-based method to label logical components on Arabic newspaper documents. The labeling is driven by artificial neural nets. Each one is specialized in a document class. The first prototype of LUNET has been tested on a set of Arabic newspapers of three document classes. Some promising experimental results are reported.

...read moreread less

7 citations

Patent•

Computer execution method for operating document archiving system, automatic document archiving system, document collation system, method for operating digital copying machine, and digital copying machine

[...]

John F Cullen, Pierce Mark, フランシスカレンジョン, ピアースマーク

19 Aug 1997

TL;DR: In this paper, a matching processing for estimating the preceding version of a new document to be archived is to search the document having the most descriptors similar to the new document, when the most suitable document is the latest version, the document which is newly archived, shows that it is the unique succeeding version of the most appropriate document.

...read moreread less

Abstract: PROBLEM TO BE SOLVED: To facilitate the filing of an electronic document and document comparasion by storing the descriptor of the stored document and comparing the descriptor with a descriptor in a data base. SOLUTION: The descriptor data base 304 stores the plural descriptors for identifying the feature of one document. The documents in the document data base 302 are stored for every descriptor. A matching processing for estimating the preceding version of the new document to be archived is to search the document having the most descriptors similar to the new document. When the most suitable document is the latest version, the document, which is newly archived, shows that it is the unique succeeding version of the most suitable document. Namely, a hyper text data base 312 is updated in such a way that the title of the document which has been newly archived becomes just below the title of the most suitable document on HTML pages 314 and 316 provided for respective document version groups.

...read moreread less

7 citations

Journal Article•DOI•

BINYAS: a complex document layout analysis system

[...]

Showmik Bhowmik, Soumyadeep Kundu¹, Ram Sarkar¹•Institutions (1)

Jadavpur University¹

01 Mar 2021-Multimedia Tools and Applications

TL;DR: A competent DLA system, named as BINYAS, based on the connected component (CC) and pixel analysis based approach is proposed, which performs significantly better than state-of-the-art methods in terms of the evaluation metrics considered by the research community of this domain.

...read moreread less

Abstract: Document layout analysis (DLA) is an irreplaceable pre-requisite for the development of a comprehensive document image processing and analysis system. The main purpose of DLA is to segment an input document image into its constituent and coherent regions and identify their classes. In this paper, we propose a competent DLA system, named as BINYAS, based on the connected component (CC) and pixel analysis based approach. Here, we initially identify the regions and then classify these regions as paragraph, separator, graphic, image, table, chart, and inverted text etc. The proposed system is evaluated on four publicly available standard datasets, namely ICDAR 2009, 2015, 2017 and 2019 page segmentation competition datasets, and the performance is compared with many contemporary methods, which also include some well-known software products and deep learning based methods. Experimental results show that our method performs significantly better than state-of-the-art methods in terms of the evaluation metrics considered by the research community of this domain.

...read moreread less

7 citations

Collapse

Network Information

Performance

Metrics

1,488

Papers

35,779

Citations

No. of papers in the topic in previous years
Year	Papers
2023	5
2022	19
2021	34
2020	19
2019	14
2018	9

Document layout analysis

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics