scispace - formally typeset
Proceedings ArticleDOI

Image and document processing techniques for the RightPages electronic library system

Lawrence O'Gorman
- pp 260-263
TLDR
Three techniques are described for noise reduction from binary document pages to improve page appearance and subsequent optical character recognition and compression, and for subsampling the text image to fit on the computer screen white maintaining readability.
Abstract
Describes some of the document processing techniques used in the RightPages electronic library system. Since the system deals with scanned images of document pages, these techniques are critical to the use and appearance of the system. The author describes three techniques: (1) for noise reduction from binary document pages to improve page appearance and subsequent optical character recognition and compression; (2) for subsampling the text image to fit on the computer screen white maintaining readability; and (3) a document layout analysis technique to determine text blocks. >

read more

Citations
More filters
Book

Algorithms for image processing and computer vision

TL;DR: Algorithms for Image Processing and Computer Vision, 2nd Edition provides the tools to speed development of image processing applications.
Journal ArticleDOI

The document spectrum for page layout analysis

TL;DR: The document spectrum (or docstrum) as discussed by the authors is a method for structural page layout analysis based on bottom-up, nearest-neighbor clustering of page components, which yields an accurate measure of skew, within-line, and between-line spacings and locates text lines and text blocks.
Book

The document spectrum for page layout analysis

TL;DR: The document spectrum (or docstrum), which is a method for structural page layout analysis based on bottom-up, nearest-neighbor clustering of page components, yields an accurate measure of skew, within-line, and between-line spacings and locates text lines and text blocks.
Proceedings ArticleDOI

Electronic marking and identification techniques to discourage document copying

TL;DR: Three coding methods are proposed that discourage illicit distribution by embedding each document with a unique codeword, yet enable one to identify the sanctioned recipient of a document by examination of a recovered document.
Patent

Automatically providing content associated with captured information, such as information captured in real-time

TL;DR: In this paper, a system and method for automatically providing content associated with captured information is described, in which the system receives input by a user, and automatically provides content or links to the information associated with the input.
References
More filters
Journal ArticleDOI

The document spectrum for page layout analysis

TL;DR: The document spectrum (or docstrum) as discussed by the authors is a method for structural page layout analysis based on bottom-up, nearest-neighbor clustering of page components, which yields an accurate measure of skew, within-line, and between-line spacings and locates text lines and text blocks.
Book

The document spectrum for page layout analysis

TL;DR: The document spectrum (or docstrum), which is a method for structural page layout analysis based on bottom-up, nearest-neighbor clustering of page components, yields an accurate measure of skew, within-line, and between-line spacings and locates text lines and text blocks.
Journal ArticleDOI

The RightPages image-based electronic library for alerting and browsing

TL;DR: The RightPages electronic library prototype system, which gives users full online library services, is described, and the system's image and document processing, including noise reduction, document layout analysis, text processing, and display processing are discussed.
Journal ArticleDOI

k X k thinning

TL;DR: Criteria are given by which thek × k method thins to minimally 8-connected lines while retaining connectivity and endpoints, and a procedure to obtain line widths in the course of thinning is described.