scispace - formally typeset
Search or ask a question
Topic

Document layout analysis

About: Document layout analysis is a research topic. Over the lifetime, 1462 publications have been published within this topic receiving 34021 citations.


Papers
More filters
Patent
10 Jun 2005
TL;DR: An intelligent document recognition-based document management system as discussed by the authors includes modules for image capture, image enhancement, image identification, optical character recognition (OCR), data extraction, and quality assurance.
Abstract: An intelligent document recognition-based document management system (Fig. 2) includes modules for image capture (32), image enhancement (32), image identification (34), optical character recognition (36), data extraction (37) and quality assurance (42). The system captures data from electronic documents as diverse as facsimile images, scanned images and images from document management systems. It processes these images and presents the data in, for example, a standard XML format. The document management system processes both structured document images (40) (ones which have a standard format) and unstructured document images (38) (ones which do not have a standard format). The system can extract images directly from a facsimile machine, a scanner or a document management system for processing.

233 citations

Patent
02 Aug 2000
TL;DR: In this article, a shape can be defined by a set of associated edges in a specified configuration, and a catalog of shapes is defined and layout processing actions are associated with the various shapes.
Abstract: Layout processing can be applied to an integrated circuit (IC) layout using a shape-based system. A shape can be defined by a set of associated edges in a specified configuration. A catalog of shapes is defined and layout processing actions are associated with the various shapes. Each layout processing action applies a specified layout modification to its associated shape. A shape-based rule system advantageously enables efficient formulation and precise application of layout modifications. Shapes/actions can be provided as defaults, can be retrieved from a remote source, or can be defined by the user. The layout processing actions can be compiled in a bias table. The bias table can include both rule-based and model-based actions, and can also include single-edge shapes for completeness. The scanning of the IC layout can be performed in order of increasing or decreasing complexity, or can be specified by the user. The appropriate layout processing actions are applied to matching portions of the IC layout to form the corrected photomask layout. This process can be sequential or batch mode. Shape and action conflicts can be resolved by marking identified/modified elements or by designing rules for orderly resolution of any inconsistencies or overlaps.

224 citations

Patent
28 Jul 2000
TL;DR: In this article, a system and method for text-based document retrieval is proposed, which is based on utilizing information contained in the document collection about the statistics of word relationships (context) to facilitate the specification of search queries and document comparison.
Abstract: A system and method for document retrieval is disclosed The invention addresses a major problem in text-based document retrieval: rapidly finding a small subset of documents in a large document collection (eg Web pages on the Internet) that are relevant to a limited set of query terms supplied by the user The invention is based on utilizing information contained in the document collection about the statistics of word relationships (“context”) to facilitate the specification of search queries and document comparison The method consists of first compiling word relationships into a context database that captures the statistics of word proximity and occurrence throughout the document collection At retrieval time, a search matrix is computed from a set of user-supplied keywords and the context database For each document in the collection, a similar matrix is computed using the contents of the document and the context database Document relevance is determined by comparing the similarity of the search and document matrices The disclosed system therefore retrieves documents with contextual similarity rather than word frequency similarity, simplifying search specification while allowing greater search precision

221 citations

Patent
31 May 1989
TL;DR: In this paper, the authors used the technique of recognition of global document features compared to a knowledge base of known document types to segment the digitized image of a document into physical and logical areas of significance and attempt to label these areas by determining the type of information they contain.
Abstract: This invention relates to an automatic identification method for scanned documents in an electronic document capture and storage system. The invention uses the technique of recognition of global document features compared to a knowledge base of known document types. The system first segments the digitized image of a document into physical and logical areas of significance and attempts to label these areas by determining the type of information they contain, without using OCR techniques. The system then attempts to match the areas segmented to objects described in the knowledge base. The system labels the areas successfully matched then selects the most probable document type based on the areas found within the document. Using computer learning methods, the system is capable of improving its knowledge of the documents it is supposed to recognize, by dynamically modifying the characteristics of its knowledge base thus sharpening its decision making capability.

217 citations

Patent
16 Aug 2007
TL;DR: In this paper, a method and apparatus for reformatting electronic documents is disclosed, which consists of performing layout analysis on an electronic version of a document to locate text zones, assigning attributes for scale and importance to text zones in the electronic version, and reformating text based on the attributes to create an image.
Abstract: A method and apparatus for reformatting electronic documents is disclosed. In one embodiment, the method comprises performing layout analysis on an electronic version of a document to locate text zones, assigning attributes for scale and importance to text zones in the electronic version of the document, and reformatting text in the electronic version of the document based on the attributes to create an image.

216 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
82% related
Feature (computer vision)
128.2K papers, 1.7M citations
82% related
Object detection
46.1K papers, 1.3M citations
81% related
Image segmentation
79.6K papers, 1.8M citations
80% related
Convolutional neural network
74.7K papers, 2M citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20235
202219
202134
202019
201914
20189