Showing papers on "Document layout analysis published in 2002"

PDF

Open Access

Patent•

System and method for adaptive document layout via manifold content

[...]

David Salesin¹, Charles E. Jacobs¹, Wilmot Li¹•Institutions (1)

30 May 2002

TL;DR: In this paper, a system and method for improving document layout on arbitrary devices of different resolutions and size using manifold representations of content is presented, where multiple versions of anything that might appear in a document, from text, to images, to even such things as stylistic conventions are selected and formatted dynamically, on the fly, by a layout engine.

...read moreread less

Abstract: A system and method for improving document layout on arbitrary devices of different resolutions and size using manifold representations of content. Manifold representations of content are: multiple versions of anything that might appear in a document, from text, to images, to even such things as stylistic conventions. The specific content is selected and formatted dynamically, on the fly, by a layout engine in order to best adapt to a given viewing situation. A user interface for authoring and editing such manifold content is disclosed.

...read moreread less

176 citations

Journal Article•DOI•

Document image analysis: A primer

[...]

Rangachar Kasturi¹, Lawrence O'Gorman², Venu Govindaraju³•Institutions (3)

Pennsylvania State University¹, Avaya², University at Buffalo³

01 Feb 2002-Sadhana-academy Proceedings in Engineering Sciences

TL;DR: This paper briefly describes various components of a document analysis system and provides the background necessary to understand the detailed descriptions of specific techniques presented in other papers in this issue.

...read moreread less

Abstract: Document image analysis refers to algorithms and techniques that are applied to images of documents to obtain a computer-readable description from pixel data. A well-known document image analysis product is the Optical Character Recognition (OCR) software that recognizes characters in a scanned document. OCR makes it possible for the user to edit or search the document’s contents. In this paper we briefly describe various components of a document analysis system. Many of these basic building blocks are found in most document analysis systems, irrespective of the particular domain or language to which they are applied. We hope that this paper will help the reader by providing the background necessary to understand the detailed descriptions of specific techniques presented in other papers in this issue.

...read moreread less

143 citations

Patent•

Computer based summarization of natural language documents

[...]

Leonid Batchilo, Valery Tsourikov, Igor Sovpel

31 Jul 2002

TL;DR: In this article, a system and method for summarizing the contents of a natural language document provided in electronic or digital form includes preformatting the document, performing linguistic analysis, weighting each sentence in the document as a function of quantitative importance, and generating one or more document summaries, from a plurality of selectable document summary types, as a result of the sentence weights.

...read moreread less

Abstract: A system and method for summarizing the contents of a natural language document provided in electronic or digital form includes preformatting the document, performing linguistic analysis, weighting each sentence in the document as a function of quantitative importance, and generating one or more document summaries, from a plurality of selectable document summary types, as a function of the sentence weights.

...read moreread less

120 citations

Patent•

Systems and methods for authenticating and verifying documents

[...]

Jan Matthias Ruhl¹, David E. Goldberg, Marshall W. Bern•Institutions (1)

Xerox¹

19 Dec 2002

TL;DR: In this paper, a set of features of the document is generated using information contained in the assist channel appended to the document, which is then compared to the set of feature features appended on the document.

...read moreread less

Abstract: Xerox Docket No. D/A0037Q Document authentication is accomplished by acquiring document image data, generating a set of features of the document, and generating an assist channel that includes information on how to generate the set of features. The set of features and the assist channel are digitally signed and then append to the document. Document verification is accomplished by acquiring document image data and verifying the signature. If the signature is valid, a set of features of the document is generated using information contained in the assist channel appended to the document. The generated set of features is then compared to the set of features appended on the document. If the sets do not match, the document is determined to have been altered sometimes after the assist channel was appended to the document, i.e., the document is not genuine. Otherwise, the document can be considered to be genuine.

...read moreread less

102 citations

Journal Article•DOI•

Imaged document text retrieval without OCR

[...]

Chew Lim Tan¹, Weihua Huang¹, Zhaohui Yu, Yi Xu²•Institutions (2)

National University of Singapore¹, Agilent Technologies²

01 Jun 2002-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Testing with seven corpora of imaged textual documents in English and Chinese as well as images from the UW1 (University of Washington 1) database confirms the validity of the proposed method for text retrieval from document images without the use of OCR.

...read moreread less

Abstract: We propose a method for text retrieval from document images without the use of OCR. Documents are segmented into character objects. Image features, namely the vertical traverse density (VTD) and horizontal traverse density (HTD), are extracted. An n-gram-based document vector is constructed for each document based on these features. Text similarity between documents is then measured by calculating the dot product of the document vectors. Testing with seven corpora of imaged textual documents in English and Chinese as well as images from the UW1 (University of Washington 1) database confirms the validity of the proposed method.

...read moreread less

98 citations

Patent•

Apparatus and method for automatically highlighting text in an electronic document

[...]

Brian John Cragun¹, Paul Reuben Day¹•Institutions (1)

IBM¹

29 Oct 2002

TL;DR: In this paper, an apparatus and method helps a user to determine parts of an electronic document that are of interest by allowing the user to define preferences for processing electronic documents, and by automatically highlighting one or more portions of the document according to the user preferences.

...read moreread less

Abstract: An apparatus and method helps a user to determine parts of an electronic document that are of interest by allowing the user to define preferences for processing an electronic document, and by automatically highlighting one or more portions of the document according to the user preferences. Highlighting includes any way to enhance or alter the appearance of text, including bold, italics, underlining, change in font style, change in font size, change in color, change in background color, etc. The automatic highlighting of portions of the document attract the user's eyes to that portion of the document, which helps the user to discern whether or not the highlighted portion is relevant or interesting. The preferred embodiments also include a document generator that takes an input document and generates therefrom an output document that has one or more highlighted portions that are hard-coded into the document according to the user preferences.

...read moreread less

79 citations

Patent•

Document classification and labeling using layout graph matching

[...]

Yue Ma, Jinhong Guo, David Doermann, Jian Liang

13 Nov 2002

TL;DR: In this paper, a document processing system for use in identifying a segmented document includes a data store of layout graph models that are classified and/or labeled, and a matching module makes a determination of a match between a layout graph sample for the segmented documents and a particular layout graph model.

...read moreread less

Abstract: A document processing system for use in identifying a segmented document includes a data store of layout graph models that are classified and/or labeled A matching module makes a determination of a match between a layout graph sample for the segmented document and a particular layout graph model The matching module uses a correlator to generate an identified, segmented document that is classified and/or labeled based on the segmented document, the layout graph model, and the determination of a match

...read moreread less

76 citations

Book Chapter•DOI•

Document Image De-warping for Text/Graphics Recognition

[...]

Changhua Wu¹, Gady Agam¹•Institutions (1)

Illinois Institute of Technology¹

06 Aug 2002-Lecture Notes in Computer Science

TL;DR: The proposed approach uses the texture of a document image so as to infer the document structure distortion and a two-pass image warping algorithm is then used to correct the images.

...read moreread less

Abstract: Document analysis and graphics recognition algorithms are normally applied to the processing of images of 2D documents scanned when flattened against a planar surface Technological advancements in recent years have led to a situation in which digital cameras with high resolution are widely available Consequently, traditional graphics recognition tasks may be updated to accommodate document images captured through a hand-held camera in an uncontrolled environment In this paper the problem of perspective and geometric deformations correction in document images is discussed The proposed approach uses the texture of a document image so as to infer the document structure distortion A two-pass image warping algorithm is then used to correct the images In addition to being language independent, the proposed approach may handle document images that include multiple fonts, math notations, and graphics The de-warped images contain less distortions and so are better suited for existing text/graphics recognition techniques

...read moreread less

55 citations

Journal Article•DOI•

A semi-structured document model for text mining

[...]

Yang Jian-wu¹, Chen Xiaoou¹•Institutions (1)

Peking University¹

01 Aug 2002-Journal of Computer Science and Technology

TL;DR: A structured link vector model (SLVM) is presented in this paper, where a vector represents a document, and vectors' elements are determined by terms, document structure and neighboring documents.

...read moreread less

Abstract: A semi-structured document has more structured information compared to an ordinary document, and the relation among semi-structured documents can be fully utilized In order to take advantage of the structure and link information in a semi-structured document for better mining, a structured link vector model (SLVM) is presented in this paper, where a vector represents a document, and vectors' elements are determined by terms, document structure and neighboring documents Text mining based on SLVM is described in the procedure of K-means for briefness and clarity: calculating document similarity and calculating cluster center The clustering based on SLVM performs significantly better than that based on a conventional vector space model in the experiments, and its F value increases from 065-073 to 082-086

...read moreread less

49 citations

Patent•

Method for constraint-based document generation

[...]

Lisa S. Purvis¹•Institutions (1)

Xerox¹

23 Jul 2002

TL;DR: In this paper, a system and method specify a custom document as a constraint satisfaction problem to create the specified document using existing constraint solving algorithms wherein the document, its content components, and its layout requirements as elements of a constrained satisfaction problem which when solved, results in an automated document layout for the set of content components.

...read moreread less

Abstract: A system and method specify a custom document as a constraint satisfaction problem to create the specified document using existing constraint solving algorithms wherein the document, its content components, and its layout requirements as elements of a constraint satisfaction problem which when solved, results in an automated document layout for the set of content components. The system and method enables an automated custom document creation process, providing a wider array of output documents.

...read moreread less

42 citations

Patent•

Document identification device, document definition method and document identification method

[...]

Kazuaki Yokota

27 Nov 2002

TL;DR: In this paper, a plurality of document definition information for identifying documents, and format control information for recognizing a character recorded on a document corresponding to each of the plurality of definition information are held beforehand.

...read moreread less

Abstract: A plurality of document definition information for identifying documents, and format control information for recognizing a character recorded on a document corresponding to each of the plurality of document definition information are held beforehand, documents targeted for character recognition are identified as specific documents based on document images of the entered documents targeted for character recognition and the document definition information and, based on a result of the identification, character recognition is executed by using corresponding format control information. A document definition device adds a plane area of each of documents to be identified to the document definition. An OCR device checks the plane area on the document by using the document definition before check of a preprint accompanied by character recognition.

...read moreread less

Book Chapter•DOI•

Logical Labeling of Document Images Using Layout Graph Matching with Adaptive Learning

[...]

Jian Liang¹, David Doermann¹•Institutions (1)

University of Maryland, College Park¹

19 Aug 2002

TL;DR: This system is able to learn a model for a document class, use this model to label document images through graph matching, and adaptively improve the model with error feed back.

...read moreread less

Abstract: Logical structure analysis of document images is an important problem in document image understanding. In this paper, we propose a graph matching approach to label logical components on a document page. Our system is able to learn a model for a document class, use this model to label document images through graph matching, and adaptively improve the model with error feed back. We tested our method on journal/proceeding article title pages. The experimental results show promising accuracy, and confirm the ability of adaptive learning.

...read moreread less

Patent•

Digital document browsing system and method thereof

[...]

Takenori Kohda¹, Moriyoshi Ohara¹, Katashi Nagao¹•Institutions (1)

IBM¹

15 Feb 2002

TL;DR: In this paper, a digital document browsing system includes a layout engine for determining the layout of digital documents based on previously obtained historical data for a display form of the digital document, a summarization engine for preparing a summary for the sentences of the documents, and a view generator for arranging the summary obtained by the summary in accordance with the layout.

...read moreread less

Abstract: A digital document browsing system includes: a layout engine for determining the layout of a digital document based on previously obtained historical data for a display form of the digital document, a summarization engine for preparing a summary for the sentences of the digital document based on the historical data for the digital document. Further included is a view generator for arranging the summary obtained by the summarization engine in accordance with the layout, and for generating data relating to the display form of the digital document. A user interface for displaying the digital document on a display device based on the data related to the display form is still further included.

...read moreread less

Journal Article•DOI•

Page segmentation of Chinese newspapers

[...]

Jie Xi¹, Jianming Hu¹, Lide Wu¹•Institutions (1)

Fudan University¹

01 Dec 2002-Pattern Recognition

TL;DR: This paper describes a new bottom-up method for page segmentation of Chinese document images based on run-length smoothing algorithm and minimal spanning tree clustering that can resolve the problems of segmenting Chinese documents that differ from English documents.

...read moreread less

Proceedings Article•DOI•

Bayesian networks classifiers applied to documents

[...]

S. Souafi-Bensafi¹, M. Parizeau, Frank Lebourgeois, Hubert Emptoz•Institutions (1)

Institut national des sciences Appliquées de Lyon¹

11 Aug 2002

TL;DR: The use of the Bayesian network model is discussed for a classification problem related to the document image understanding field, focused on logical labeling in documents, which consists in assigning logical labels to text blocks.

...read moreread less

Abstract: This paper discusses the use of the Bayesian network model for a classification problem related to the document image understanding field. Our application is focused on logical labeling in documents, which consists in assigning logical labels to text blocks. The objective is to map a set of logical tags, composing the document logical structure, to the physical text components. We build a Bayesian network model that allows this mapping using supervised learning, and without imposing a priori constraints on the document structure. The learning strategy is based partly on genetic programming tools. A prototype has been implemented, and tested on tables of contents found in periodicals and magazines.

...read moreread less

Journal Article•DOI•

Model-based Information Extraction Method Tolerant of OCR Errors for Document Images

[...]

Y. Ishitani

01 Jun 2002-International Journal of Computer Processing of Languages

TL;DR: A new method for information extraction from document images is proposed in this paper as the basis for a document reader which can extract the required keywords and their logical relationship from various printed documents and is robust and effective for various document structures.

...read moreread less

Abstract: A new method for information extraction from document images is proposed in this paper as the basis for a document reader which can extract the required keywords and their logical relationship from various printed documents. The proposed method consists of robust keyword matching, global document matching, and postprocessing for matching errors. First, robust keyword matching between two-dimensional OCR results consisting of a set of possible character candidate lists and a set of keywords defined in the keyword dictionary is carried out. This keyword dictionary includes incorrect words with typical OCR errors and segments of words in order to deal with OCR errors. Next, document matching is invoked between keyword matching results in an input document and document models. Each document model consists of a set of word models with their logical relationship described in terms of a tree structure. This model matching extracts the required keywords and their logical relationship from the input document and determines the most suitable model for the input document. Finally, postprocessing using heuristic rules defined in the model is applied to document matching results to recover keyword matching errors and to modify keyword matching results. This comprehensive approach solves word segmentation problems accurately even if a document has unknown words, compound words, or incorrect words due to OCR errors. Experimental results obtained for 100 documents show that the method is robust and effective for various document structures.

...read moreread less

Patent•

Object layout device, image layout device, object layout program, image layout program, object layout method, and image layout method

[...]

Atsuji Nagahara, Michihiro Nagaishi, 敦示永原, 道博長石

14 Mar 2002

TL;DR: In this article, an object layout device is provided with an image feature information extraction part 120 for extracting image features representing the features of each of a plurality of candidate images, an evaluation value calculation part 140 for calculating evaluation values of images on the basis of image features extracted by the image feature extractor part 120, and an image layout part 170 for determining the layout of the selected image selected by image selection part 150 on basis of evaluation value calculated by the evaluation value part 140.

...read moreread less

Abstract: PROBLEM TO BE SOLVED: To provide an object layout device which reduces the time and labor required for processing and is suitable to realize a well-balanced layout in accordance with the contents of images. SOLUTION: The object layout device is provided with an image feature information extraction part 120 for extracting image feature information representating the features of each of a plurality of candidate images, an evaluation value calculation part 140 for calculating evaluation values of images on the basis of image feature information extracted by the image feature information extraction part 120, an image selection part 150 for selecting an image from a plurality of candidate images, and an image layout part 170 for determining the layout of the image selected by the image selection part 150 on the basis of the evaluation value calculated by the evaluation value calculation part 140. COPYRIGHT: (C)2003,JPO

...read moreread less

Patent•

Document reading system, document reading method and program therefor

[...]

Naohiro Furukawa, Ryuji Mine, Yutaka Sako, 直広古川, 竜治嶺, 裕酒匂 - Show less +2 more

12 Apr 2002

TL;DR: In this article, a document reading system scans a profile preparation sheet under each scanning environment, and extracts scanning characteristics by analyzing a read image, then records a profile of the scanning environment defining the characteristic or link information to the characteristic in each document definition, compares the profiles for definition preparation and for the book reading apparatus while reading the document, and recognizes and verifies a character string based on the comparison result.

...read moreread less

Abstract: PROBLEM TO BE SOLVED: To provide a method for automatically extracting characteristics of each scanner provided on a document processing system, and a method for preparing and using a single document definition database even when scanning environments at defining a document and at reading a document are different from each other or a plurality of scanners are used for defining or reading the document, thereby facilitating a document definition preparation operation and preventing degradation in accuracy of a document reading. SOLUTION: The document reading system scans a profile preparation sheet under each scanning environment, and extracts scanning characteristics by analyzing a read image. The document reading system then records a profile of the scanning environment defining the characteristic or link information to the characteristic in each document definition, compares the profiles for definition preparation and for the book reading apparatus while reading the document, and recognizes and verifies a character string based on the comparison result. COPYRIGHT: (C)2004,JPO

...read moreread less

Patent•

Document analysis system and method

[...]

Nathan Joel McDonald

14 Aug 2002

TL;DR: A document mapping system as discussed by the authors is a set of element classes, each element class having an associated set of document elements and associated sets of format and mapping rules, identifying means for identifying one or more document elements within an original document, and mapping means for creating and displaying a map of document sections linked by labels representing the respective documents elements associated with those document sections.

...read moreread less

Abstract: A document mapping system including a set of element classes, each element class having an associated set of document elements and an associated set of format and mapping rules, identifying means for identifying one or more document elements within an original document, and mapping means for creating and displaying a map of document sections linked by labels representing the respective documents elements associated with those document sections.

...read moreread less

Patent•

Automatic validation method for multimedia product manuals

[...]

Liang H. Hsu¹, Peiya Liu²•Institutions (2)

Siemens¹, Princeton University²

09 Sep 2002

TL;DR: In this paper, a Document Constraint Analyzer (DCA) takes as input a set of document files together with a document constraint specification file, extracts and examines the contents, attributes, and relationships associated with the document objects, and evaluates the logical expressions specified in the document constraints.

...read moreread less

Abstract: A Product Document Constraint Specification Language (PDCSL) is provided for a document author to represent various types of documentation guidelines that must be enforced within documents or across different documents. A Document Constraint Analyzer (DCA) takes as input a set of document files together with a document constraint specification file, extracts and examines the contents, attributes, and relationships associated with the document objects, and evaluates the logical expressions specified in the document constraints. If a document constraint is not satisfied, an action can be taken to correct the documents or provide an explanation to the document author.

...read moreread less

Proceedings Article•DOI•

Hierarchical content classification and script determination for automatic document image processing

[...]

Qing Wang¹, Zheru Chi², Rongchun Zhao¹•Institutions (2)

Northwestern Polytechnical University¹, Hong Kong Polytechnic University²

11 Aug 2002

TL;DR: An enhanced background-thinning based page segmentation algorithm to process document images rapidly and eliminate some small regions embedded in other regions and a hierarchical approach, which combines the cross correlation measure, Kolmogorov complexity measure, and a neural network, to classify sub-images into halftones and texts.

...read moreread less

Abstract: Page segmentation and image content classification plays an important role in automatic document image processing with applications to mixed-type document image compression, form and check reading, and automatic mail sorting. We propose an enhanced background-thinning based page segmentation algorithm to process document images rapidly and eliminate some small regions embedded in other regions. We then present a hierarchical approach, which combines the cross correlation measure, Kolmogorov complexity measure, and a neural network, to classify sub-images into halftones and texts. The approach also achieves high accuracy in text determination using a three-layer feed-forward network where the text region can be classified into Chinese or alphabetic characters. Experimental results on a number of mixed-type document images show the efficiency and effectiveness of our approach.

...read moreread less

Patent•

Layout-based page capture

[...]

Robert S. Murata¹•Institutions (1)

Adobe Systems¹

31 May 2002

TL;DR: In this article, a user selection performs a selection defining an area of interest based on the layout of an electronic document, which can be applied across multiple electronic documents and is performed visually on a rendering of the electronic document thereby providing the user with visual feedback as to what content has been selected.

...read moreread less

Abstract: Techniques for layout-based page capture. A user selection performs a selection defining an area of interest based on the layout of an electronic document. The program retrieves electronic documents based on the defined area of interest. The selection can be performed visually on a rendering of the electronic document thereby providing the user with visual feedback as to what content has been selected. The selection can be applied across multiple electronic documents.

...read moreread less

Journal Article•DOI•

An empirical measure of the performance of a document image segmentation algorithm

[...]

Amit Kumar Das¹, Sanjoy Kumar Saha¹, Bhabatosh Chanda²•Institutions (2)

AMIT¹, Indian Statistical Institute²

01 Mar 2002-International Journal on Document Analysis and Recognition

TL;DR: A new document model in terms of a bounding box representation of its constituent parts is described and an empirical measure of performance of a segmentation algorithm based on this new graph-like model of the document is suggested.

...read moreread less

Abstract: Document image segmentation is the first step in document image analysis and understanding. One major problem centres on the performance analysis of the evolving segmentation algorithms. The use of a standard document database maintained at the Universities/Research Laboratories helps to solve the problem of getting authentic data sources and other information, but some methodologies have to be used for performance analysis of the segmentation. We describe a new document model in terms of a bounding box representation of its constituent parts and suggest an empirical measure of performance of a segmentation algorithm based on this new graph-like model of the document. Besides the global error measures, the proposed method also produces segment-wise details of common segmentation problems such as horizontal and vertical split and merge as well as invalid and mismatched regions.

...read moreread less

Patent•

Apparatus and method for market-based document content and layout selection

[...]

Scott H. Clearwater¹•Institutions (1)

Hewlett-Packard¹

23 Dec 2002

TL;DR: In this paper, a content and layout coordination system for market-based document content selection is presented, which takes simple criteria from a user and automatically constructs virtual documents from a much larger underlying database of content and layouts.

...read moreread less

Abstract: A method and corresponding apparatus for market-based document content and layout selection use an automated auction or bartering system, i.e., an automated content and layout coordination system, to automatically coordinate content and layout selection for document presentation. The system takes simple criteria from a user and automatically constructs virtual documents from a much larger underlying database of content and layout. By trading among the virtual documents, the automated content and layout coordination system affords a flexible and scalable method for coordinating the selection of high-value content and layout and generating a high value document.

...read moreread less

Proceedings Article•DOI•

Robust text detection from binarized document images

[...]

Oleg Okun¹, Y. Yan¹, Matti Pietikäinen¹•Institutions (1)

University of Oulu¹

11 Aug 2002

TL;DR: A new method for text detection using a binary image alone is proposed, which includes detection of both normal and inverted text and robustness to various font types, styles and sizes and small skew angles, combined with a moderate number of free parameters.

...read moreread less

Abstract: Many document images are rich in color and have complex background. To detect text from them, a standard approach utilizes both color and binary information. This often leads to time-consuming processing and requires a lot of parameters to be tuned. In contrast, we propose a new method for text detection using a binary image alone. The main virtues of our method include detection of both normal and inverted text and robustness to various font types, styles and sizes and small skew angles, combined with a moderate number of free parameters.

...read moreread less

Patent•

Method and computer system for separating and processing layout information and data of a document

[...]

Dirk Ahlert, Gunther Liebich, Wolfgang Koch

16 Apr 2002

TL;DR: In this paper, the authors present a method for separating and processing layout information and data of a document by using style sheet language transformations, which can be rendered by a conventional browser.

...read moreread less

Abstract: Computer-implemented method, computer system and computer program product for separating and processing layout information and data of a document. The computer system providing a predefined document description (120, 130). The document description (120, 130) is decomposed (420) into a layout template (140-1) and a data description (140-2). In a preferred embodiment of the invention decomposition (420) is achieved by using style sheet language transformations. Optionally, the computer system instantiates (460) a data instance (150) from the data description (140-2) and merges (470) the data instance (150) with the layout template (140-1) into an individual document description (160). The individual document description (160) can be rendered by a conventional browser.

...read moreread less

Patent•

Structured document mapping apparatus and method

[...]

Ayane Suzuki¹, Katsuhiko Yuura¹, Taiki Sakata¹•Institutions (1)

Hitachi¹

21 Nov 2002

TL;DR: In this paper, a structured document mapping apparatus that correlates items included in text document data inputted to items of hierarchical document having attributes and a structure is presented, and automatically maps the items of the input text document according to the rules stored in a search database and creates an XBRL document that corresponds to attributes and structure of the standard taxonomy or the corporation unique taxonomy.

...read moreread less

Abstract: A structured document mapping apparatus that correlates items included in text document data inputted to items of hierarchical document having attributes and a structure. The structured document mapping apparatus refers to at least one of a standard taxonomy and a corporation-unique taxonomy, and automatically maps the items of the input text document according to the rules stored in a search database and creates an XBRL document that corresponds to attributes and structure of the standard taxonomy or the corporation-unique taxonomy.

...read moreread less

Patent•

Composing unique document layout for document differentiation

[...]

Hui Chao¹, Henry Sang¹•Institutions (1)

Hewlett-Packard¹

28 Feb 2002

TL;DR: In this paper, a computer-implemented document composition device includes a processor and a memory communicating with the processor, which includes a document storage area storing one or more electronic documents and a distance modifier routine.

...read moreread less

Abstract: A computer-implemented document composition device includes a processor and a memory communicating with the processor. The memory includes a document storage area storing one or more electronic documents and a distance modifier routine. The processor uses the distance modifier routine to modify a separation distance between two particular text clusters in the electronic document. A document thus differentiated may be easily discriminated from other, similar documents. The differentiation may result in a human eye insignificant difference to the document layout that may be computer recognizable.

...read moreread less

Book Chapter•DOI•

Adaptive Layout Analysis of Document Images

[...]

Donato Malerba¹, Floriana Esposito¹, O. Altamura¹•Institutions (1)

University of Bari¹

27 Jun 2002

TL;DR: This paper investigates the possibility of supporting the user during the correction of the results of the global analysis of the document processing system WISDOM++ by allowing the user to correct the results and then by learning rules for layout correction from the sequence of user actions.

...read moreread less

Abstract: Layout analysis is the process of extracting a hierarchical structure describing the layout of a page. In the document processing system WISDOM++ the layout analysis is performed in two steps: firstly, the global analysis determines possible areas containing paragraphs, sections, columns, figures and tables, and secondly, the local analysis groups together blocks that possibly fall within the same area. The result of the local analysis process strongly depends on the quality of the results of the first step. In this paper we investigate the possibility of supporting the user during the correction of the results of the global analysis. This is done by allowing the user to correct the results of the global analysis and then by learning rules for layout correction from the sequence of user actions. Experimental results on a set of multipage documents are reported.

...read moreread less

Patent•

Method and apparatus for forward annotating documents and for generating a summary from a document image

[...]

Ecran E. Kuruoglu¹, Alex S. Taylor¹•Institutions (1)

Xerox¹

11 Oct 2002

TL;DR: In this article, a target document in a document processing system is annotated on the basis of annotations made previously to a source document, and the target document is searched to locate any of the keywords of interest to the user.

...read moreread less

Abstract: A target document in a document processing system is annotated on the basis of annotations made previously to a source document. A source document (either a scanned image of a paper document or an electronic document) is annotated by a user to identify words or phrases of interest. The annotated words are extracted for use as keywords or phrases to search in future document. When a target document is processed, the target document is searched to locate any of the keywords of interest to the user. If any of the keywords are located, electronic annotations are applied to these in the target document for display or printing out and/or registered as keywords to the project. The automatically annotated words or phrases enable the user to locate regions of interest more quickly. A summary of a captured document image is produced on the basis of detected annotations made to a document prior to image capture. The scanned (or otherwise captured) image is processed to detect annotations made to the document prior to scanning. The detected annotations can be used to identify features, or text, for use to summarize that document. Additionally, or alternatively, the detected annotations in one document can be used to identify features, or text, for use to summarize a different document. The summary may be displayed in expandable detail levels.

...read moreread less