Topic

Document layout analysis

About: Document layout analysis is a research topic. Over the lifetime, 1462 publications have been published within this topic receiving 34021 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•

Combining Qualitative and Quantitative Keyword Extraction Methods with Document Layout Analysis.

[...]

Stefano Ferilli¹, Marenglen Biba¹, Teresa Maria Altomare Basile¹, Floriana Esposito¹•Institutions (1)

University of Bari¹

01 Jan 2009

TL;DR: This work aims at introducing in the document processing framework of DOMINUS qualitative techniques based on the lexical taxonomy WordNet and its extension WordNet Domains for text categorization and keyword extraction, that can support the currently embedded techniquesbased on quantitative approaches.

...read moreread less

Abstract: The large availability of documents in digital format posed the problem of efficient and effective retrieval mechanisms. This involves the ability to process natural language, which is a significantly complex task. Traditional algorithms based on term matching between the document and the query, although efficient, are not able to catch the intended meaning of both, and hence cannot ensure effectiveness. To step on toward semantics, problems such as polysemy and synonimy must be tackled automatically by text processing systems. This work aims at introducing in the document processing framework of DOMINUS qualitative techniques based on the lexical taxonomy WordNet and its extension WordNet Domains for text categorization and keyword extraction, that can support the currently embedded techniques based on quantitative approaches. In particular, a density function is exploited to assign the proper importance to the involved concepts and domains. Preliminary results on texts of different subjects confirm its effectiveness.

...read moreread less

8 citations

Patent•

Apparatus, method and computer program product for processing documents

[...]

Kosei Fume¹•Institutions (1)

Toshiba¹

22 Jan 2009

TL;DR: A document processing apparatus includes an extracting unit that extracts text document information from a document data; an analyzing unit that analyzes a modification relation of a character string included in the text documents; and an attribute unit that assigns an attribute indicating details of the modification relation to the character string, and embeds the attribute in text documents as discussed by the authors.

...read moreread less

Abstract: A document processing apparatus includes an extracting unit that extracts text document information from a document data; an analyzing unit that analyzes a modification relation of a character string included in the text document information; an attribute unit that assigns an attribute indicating details of the modification relation to the character string, and embeds the attribute in the text document information; a document specifying unit that specifies a document-specifying character string that specifies other text document information, using the text document information in which the attribute is embedded by the attribute unit; and a document-identification unit that assigns document identification information to the document-specifying character string, and embeds the document identification information in the text document information.

...read moreread less

8 citations

Patent•

Formatting computer generated documents for output

[...]

Buis Roger Lee, Reinhard Heinrich Hohensee, Mcelrafth Susan Cheryl, Middendorf Alan Lee, Jamsie R. Treppendahl - Show less +1 more

27 Sep 2000

TL;DR: In this article, a method and apparatus for formatting a computer-generated document for output, such as printing, is provided, where information necessary to generate a document is extracted from a database and a layout program assigns specific layout parameters to each layout identifier, which specify the placement of an associated print data record within a document.

...read moreread less

Abstract: A method and apparatus for formatting a computer-generated document for output, such as printing, is provided. Information necessary to generate a document is extracted from a database. The extraction program assigns a layout identifier to each data record retrieved from the database based on the type of information contained within the data record and how the information is to be formatted in the document. A layout program assigns specific layout parameters to each layout identifier, which specify the placement of an associated print data record within a document. Next, a formatting program applies the set of layout parameters to a data stream containing a plurality of data records to create a formatted document. The various elements of the invention such as the data extraction program, the database, the layout program and the formatter, may be integrated into a single software program, co-resident on a single computer system, or distributed across various computer systems on a network. It is also contemplated that the one or more of the various elements of the invention such as the formatter, the extraction program, or the layout program could be embodied as hardware instead of software.

...read moreread less

8 citations

Book Chapter•DOI•

Learning to Segment Document Images

[...]

K. S. Sesh Kumar¹, Anoop M. Namboodiri¹, C. V. Jawahar¹•Institutions (1)

International Institute of Information Technology, Hyderabad¹

20 Dec 2005

TL;DR: A hierarchical framework for document segmentation is proposed as an optimization problem and the novelty of this approach lies in learning the segmentation parameters in the absence of groundtruth.

...read moreread less

Abstract: A hierarchical framework for document segmentation is proposed as an optimization problem The model incorporates the dependencies between various levels of the hierarchy unlike traditional document segmentation algorithms This framework is applied to learn the parameters of the document segmentation algorithm using optimization methods like gradient descent and Q-learning The novelty of our approach lies in learning the segmentation parameters in the absence of groundtruth

...read moreread less

8 citations

Journal Article•DOI•

Illustrations Segmentation in Digitized Documents Using Local Correlation Features

[...]

Dalia Coppi¹, Costantino Grana¹, Rita Cucchiara¹•Institutions (1)

University of Modena and Reggio Emilia¹

01 Jan 2014

TL;DR: An approach for Document Layout Analysis based on local correlation features that identifies and extracts illustrations in digitized documents by learning the discriminative patterns of textual and pictorial regions.

...read moreread less

Abstract: In this paper we propose an approach for Document Layout Analysis based on local correlation features. We identify and extract illustrations in digitized documents by learning the discriminative patterns of textual and pictorial regions. The proposal has been demonstrated to be effective on historical datasets and to outperform the state-of-the-art in presence of challenging documents with a large variety of pictorial elements.

...read moreread less

8 citations

Collapse

Network Information

Performance

Metrics

1,488

Papers

35,779

Citations

No. of papers in the topic in previous years
Year	Papers
2023	5
2022	19
2021	34
2020	19
2019	14
2018	9

Document layout analysis

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics