scispace - formally typeset
Search or ask a question
Topic

Document layout analysis

About: Document layout analysis is a research topic. Over the lifetime, 1462 publications have been published within this topic receiving 34021 citations.


Papers
More filters
Patent
16 Nov 2011
TL;DR: In this paper, a difference component operative to determine a set of differences between the old layout and the new layout, and an animation layer generation component operating on the set of difference.
Abstract: Techniques for the automatic animation of document content are described. An apparatus may comprise a difference component operative to receive an old layout of a document and a new layout of the document, the new layout corresponding to an application of one or more changes to the old layout of the document, the difference component operative to determine a set of differences between the old layout and the new layout, and an animation layer generation component operative to generate a set of animation layers from the set of differences. Other embodiments are described and claimed.

12 citations

Journal ArticleDOI
TL;DR: In this paper, a new computational backend model that supports Arabic document information retrieval (ADIR) as a dataset and OCR services is presented, which can provide accessing different methods of document layout analysis with a platform where they can share and handle such methods without any setup requirements.
Abstract: This paper presents a new computational backend model that supports Arabic document information retrieval (ADIR) as a dataset and OCR services. Therefore, different services that support document analysis, retrieving, processing including dataset preparation, and recognition will be discussed. Consequently, ADIR services provide general functions of the Arabic OCR to compose many other services in the OCR domain. Furthermore, the proposed work can provide accessing different methods of document layout analysis with a platform where they can share and handle such methods (services) without any setup requirements. One of the used datasets composed from 16,800 Arabic letters written by 60 writers. Each writer wrote each letter from Alif to Ya 10 times in two forms. The forms were scanned at 300 DPI resolution and are segmented in two sets: training set with 13,440 letters for 48 images per class label, and testing set with 3,360 letters to 120 images per class label Convolutional neural network (CNN) is used and adapted for Arabic handwritten letters classification. In an experimental test, we showed that our results outperform 100% classification accuracy rate on testing images. Therefore, the ADIR services provide a “service description”, which includes an interface and a server’s URL. The interface allows communication process between clients and services. Although, in this article we evaluate IR results and compared them with respect to corrected equivalent.

12 citations

Proceedings ArticleDOI
10 Sep 2001
TL;DR: The objective is to map a low level logical structure, which consists of a set of logical labels, on the extracted layout structure components, and proposes a probabilistic model represented by a Bayesian Network, which is a graphical model used in the problem as a classifier.
Abstract: This paper discusses logical labeling in documents, which is one basic step in logical structure recognition. Logical labels have to be attributed to text blocks composing the layout structure. Our study is based on physical characteristics having a visual aspect: typographic, geometric and/or topologic attributes. Our objective is to map a low level logical structure, which consists of a set of logical labels, on the extracted layout structure components. We have to build a model that allows this mapping. However, the documents we consider have various layout and logical structures, thus, we chose to perform this task by supervised learning on the basis of a set of training documents. This allows us to define a generic method to solve this problem, without imposing any constraint on document structure. We propose a probabilistic model represented by a Bayesian Network (BN), which is a graphical model used in our problem as a classifier. A prototype has been implemented, and applied to tables of contents in periodicals.

12 citations

Patent
26 Apr 2006
TL;DR: A test document is parsed into components which may include bounding boxes, segments, and points as mentioned in this paper, and test code makes calls to properties and methods of components in order to verify document layout.
Abstract: A test document is parsed into components which may include bounding boxes, segments, and points. Test code makes calls to properties and methods of components in order to verify document layout. Rather than take absolute measurements of component placement, components are evaluated relative to each other. Layout verification logic may be part of a larger software test system.

12 citations

Proceedings ArticleDOI
29 Jun 2016
TL;DR: This paper shows that the inaccessibility of scanned PDF documents is in large part due to the failure of the OCR engine to understand the layout of an Arabic document, and investigates the performance of state-of-the-art document annotation tools.
Abstract: Millions of individuals in the Arab world have significant visual impairments that make it difficult for them to access printed text. Assistive technologies such as scanners and screen readers often fail to turn text into speech because optical character recognition software (OCR) has difficulty to interpret the textual content of Arabic documents. In this paper, we show that the inaccessibility of scanned PDF documents is in large part due to the failure of the OCR engine to understand the layout of an Arabic document. Arabic document layout analysis (DLA) is therefore an urgent research topic, motivated by the goal to provide assistive technology that serves people with visual impairments. We announce the launching of a large annotated dataset of Arabic document images, called BCE-Arabic-v1, to be used as a benchmark for DLA, OCR and text-to-speech research. Our dataset contains 1,833 images of pages scanned from 180 books and represents a variety of page content and layout, in particular, Arabic text in various fonts and sizes, photographs, tables, diagrams, and charts in single or multiple columns. We report the results of a formative study that investigated the performance of state-of-the-art document annotation tools. We found significant differences and limitations in the functionality and labeling speed of these tools, and selected the best-performing tool for annotating our benchmark BCE-Arabic-v1.

12 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
82% related
Feature (computer vision)
128.2K papers, 1.7M citations
82% related
Object detection
46.1K papers, 1.3M citations
81% related
Image segmentation
79.6K papers, 1.8M citations
80% related
Convolutional neural network
74.7K papers, 2M citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20235
202219
202134
202019
201914
20189