scispace - formally typeset
Search or ask a question
Topic

Document layout analysis

About: Document layout analysis is a research topic. Over the lifetime, 1462 publications have been published within this topic receiving 34021 citations.


Papers
More filters
Patent
25 Jan 2000
TL;DR: In this article, the authors propose to perform document synthesizing processing by extracting document components from a structure document and inserting/replacing the respective document components in a model document without using a script with described procedure.
Abstract: PROBLEM TO BE SOLVED: To perform a document synthesizing processing by extracting document components from a structure document and inserting/replacing the respective document components in a model document without using a script with described procedure. SOLUTION: In a structured document, an extraction instruction for taking out the document components and repetitive copying and insertion/replacement instructions are imparted. Thus, as the result of specifying the take-out of the document components, the repetitive copying and the document components (parts) for inserting or replacing the document components, dynamically synthesizing the instructions taken out from the inputted plural structured documents and preparing a document processing description, the need of a document processing description script is eliminated. Thus, the time and labor of managing the script separately from original documents are omitted.

10 citations

Journal ArticleDOI
TL;DR: The methodology is based on the recognition of text characters and words for the efficient separation text paragraphs from images by keeping their relationships for a possible reconstruction of the original page.

10 citations

Patent
Boris Chidlovskii1
02 Jul 2010
TL;DR: In this paper, a method and a system are disclosed for querying a document collection based on the layout of only a fragment of the content of a document, specified as a query zone.
Abstract: A method and a system are disclosed for querying a document collection based on the layout of only a fragment of the content of a document, specified as a query zone. The method includes providing an index for a collection of documents. In the index, content of a document page in the collection that has been decomposed into layout blocks is indexed according to representations of the blocks and one or more geometric relations between the blocks. A query is generated which is based on representations of blocks determined to be within the query zone and geometric relations between them. This is used to query the index to retrieve pages of documents in the collection which can each be expected to include a layout zone somewhere in the page that is similar in layout to the query zone.

10 citations

Journal ArticleDOI
TL;DR: The results of the proposed method for paragraph structure recognition are comparable to the referenced methods which offer segmentation only.
Abstract: The paper presents a complete solution for recognition of textual and graphic structures in various types of documents acquired from the Internet. In the proposed approach, the document structure recognition problem is divided into sub-problems. The first one is localizing logical structure elements within the document. The second one is recognizing segmented logical structure elements. The input to the method is an image of document page, the output is the XML file containing all graphic and textual elements included in the document, preserving the reading order of document blocks. This file contains information about the identity and position of all logical elements in the document image. The paper describes all details of the proposed method and shows the results of the experiments validating its effectiveness. The results of the proposed method for paragraph structure recognition are comparable to the referenced methods which offer segmentation only.

10 citations

Patent
13 Nov 1999
TL;DR: In this paper, the authors present a system that can detect hidden and visual information on the security documents, and it can also detect information about the user, and also automatically detect the user's identity.
Abstract: Security documents which has multiple field each of which contains information that is perceptible in more than one way. One field can contain a visually perceptible image (23, 24, 25) and a ditigal watermark (22) that can be detected when the image is scanned (302) and processed, another field can contain machine readable OCR text (24) that can be read by both a human and by a programmed computer, and still another field can contain watermark data (22). Documents are produced by beginning with a template (21) which defines the placement of elements on the document and the interrelationships between hidden and visual information on the document. Pictures, graphics and digital data are extracted from a data bank, and watermark data is embedded (27) in the pictures and graphics as appropriate. An automatic validation system (312) of the present invention reads multiple fields on the document, and it also automatically detects information about the user.

10 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
82% related
Feature (computer vision)
128.2K papers, 1.7M citations
82% related
Object detection
46.1K papers, 1.3M citations
81% related
Image segmentation
79.6K papers, 1.8M citations
80% related
Convolutional neural network
74.7K papers, 2M citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20235
202219
202134
202019
201914
20189