Topic
Document layout analysis
About: Document layout analysis is a research topic. Over the lifetime, 1462 publications have been published within this topic receiving 34021 citations.
Papers published on a yearly basis
Papers
More filters
••
20 Sep 2019TL;DR: This paper studies the problem of layout analysis, which is the first step of automatic reading for reading the ancient Chu Nôm characters that are written in columns from top to bottom in Historical Vietnamese steles.
Abstract: Stone engravings in Historical Vietnamese steles allow historians to study the life of common people in the villages. Only recently, a large amount of images of such engravings have become available. For supporting the historians, automatic document analysis systems are needed for reading the ancient Chu Nom characters that are written in columns from top to bottom. In this paper, we study the problem of layout analysis, which is the first step of automatic reading. Semantic segmentation is applied at pixel-level to find the title, main text, label, and reference number on the page using deep convolutional neural networks. Afterwards, seam carving is used to segment the text columns within the main text. We present baseline results for hundred exemplary pages, discuss error cases, and outline lines of future research.
10 citations
01 Jan 2014
TL;DR: This paper presents a comparative study and performance evaluation of various text extraction techniques and concludes that text extraction without characters recognition capabilities is to extract regions just contains text.
Abstract: Text extraction is one of the key tasks in document image analysis. Automatic text extraction without characters recognition capabilities is to extract regions just contains text. The text extraction process includes detection, localization, segmentation and enhancement of the text from the given input image. In this paper we present a comparative study and performance evaluation of various text extraction techniques.
10 citations
••
06 Sep 2005TL;DR: A novel algorithm which is able to correct document image warping based on the detection of distorted text lines is presented and the proposed solution is used in a recent project of digitizing old, poor quality manuscripts.
Abstract: Image warping caused by scanning, photocopying or photographing a document is a common problem in the .eld of document processing and understanding. Distortion within the text documents impairs OCRability and thus strongly decreases the usability of the results. This is one of the major obstacles for automating the process of digitizing printed documents.
In this paper we present a novel algorithm which is able to correct document image warping based on the detection of distorted text lines. The proposed solution is used in a recent project of digitizing old, poor quality manuscripts. The algorithm is compared to other published approaches. Experiments with various document samples and the resulting improvements of the text recognition rate achieved by a commercial OCR engine are also presented.
10 citations
••
TL;DR: An efficient method for extracting a logical structure from a Web document based on a visual structure from the visual grouping phase as well as a document model that describes logical structure information of a document type, it supports sophisticated structure analysis.
10 citations
••
01 Jul 2019TL;DR: A system is able to perform region segmentation, region classification and baseline detection in an integrated manner and is extended to also address complex handwritten music scores.
Abstract: Document Layout Analysis (DLA) is a process that must be performed before attempting to recognize the content of handwritten musical scores by a modern automatic or semiautomatic system. DLA should provide the segmentation of the document image into semantically useful region types such as staff, lyrics, etc. In this paper we extend our previous work for DLA of handwritten text documents to also address complex handwritten music scores. This system is able to perform region segmentation, region classification and baseline detection in an integrated manner.
10 citations