scispace - formally typeset
Search or ask a question
Topic

Document layout analysis

About: Document layout analysis is a research topic. Over the lifetime, 1462 publications have been published within this topic receiving 34021 citations.


Papers
More filters
Patent
Pramod Sankar Kompalli1
28 Apr 2016
TL;DR: In this paper, devices and methods for processing an image document in a client-server environment such that privacy of text information contained in the image document is preserved are discussed. But the authors do not discuss how to preserve the text information of the image documents.
Abstract: Disclosed are devices and methods for processing an image document in a client-server environment such that privacy of text information contained in the image document is preserved. Specifically, in a client-server environment, an image document can be processed using a local computerized device of a client to create an obfuscated document by identifying word images in the image document and scrambling those word images. The obfuscated document can be received by a server of a service provider over a network (e.g., the Internet) and processed by previously trained software (e.g., a previously trained convolutional neural network (CNN)) to recognize specific words represented by the scrambled images in the obfuscated document without having to reconstruct the image document. Since the image document is neither communicated over the network, nor reconstructed and stored on the server, privacy concerns are minimized.

21 citations

Patent
15 May 2007
TL;DR: In this paper, a document resource including pre-built textual components and document settings and properties is first passed through a translation process for translating any prebuilt textual content to one or more target languages.
Abstract: Automated localization (translation) and internationalization of document resources may be provided for use by various target user groups requiring different text languages and/or document settings. A document resource including pre-built textual components and document settings and properties is first passed through a translation process for translating any pre-built textual content to one or more target languages. Text strings in the document resource may be extracted, translated and replaced to the document resource. Internationalization processing may then be accomplished wherein default page sizes, margin settings, language reading direction, and other document settings and properties are modified according to each target user group for the document resource. For initial document resource assembly, source files are identified for each component of a given document resource. The source files may be localized and internationalized and then may be used to compile a document resource for each of one or more target user groups.

21 citations

Proceedings ArticleDOI
18 May 2015
TL;DR: Different techniques for recognizing types of partly very similar identity documents using state-of-the-art visual recognition approaches including feature representations based on recent achievements with convolutional neural networks are developed and evaluated.
Abstract: In this paper, we tackle the task of recognizing types of partly very similar identity documents using state-of-the-art visual recognition approaches. Given a scanned document, the goal is to identify the country of issue, the type of document, and its version. Whereas recognizing the individual parts of a document with known standardized layout can be done reliably, identifying the type of a document and therefore also its layout is a challenging problem due to the large variety of documents. In our paper, we develop and evaluate different techniques for this application including feature representations based on recent achievements with convolutional neural networks. On a dataset with 74 different classes and using only one training image per class, our best approach achieves a mean class-wise accuracy of 97.7%.

21 citations

Patent
Shixia Liu1, Liping Yang1
22 Oct 2004
TL;DR: In this paper, a method and system for automatically generating a summary for a textual document, and relevant applications, is presented, which includes segmenting a given textual document into document segments, wherein the document segments are words, sentences or paragraphs in the textual document; extracting a number of document segments to form an initial summary for the document; calculating the correlation degrees between the document segment and its neighboring document segments.
Abstract: According to the present invention, there is provided a method and system for automatically generating a summary for a textual document, and relevant applications. The method and system includes segmenting a given textual document into document segments, wherein the document segments are words, sentences or paragraphs in the textual document; extracting a number of the document segments to form an initial summary for the document; for each of the document segments in the initial summary, calculating the correlation degrees between the document segment and its neighboring document segments, wherein the neighboring document segments of a given document segment refer to the document segments within a predefined distance from the given document segment; and adding the document segments with higher correlation degrees into the initial summary.

21 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
82% related
Feature (computer vision)
128.2K papers, 1.7M citations
82% related
Object detection
46.1K papers, 1.3M citations
81% related
Image segmentation
79.6K papers, 1.8M citations
80% related
Convolutional neural network
74.7K papers, 2M citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20235
202219
202134
202019
201914
20189