Proceedings ArticleDOI
Image and document processing techniques for the RightPages electronic library system
Lawrence O'Gorman
- pp 260-263
TLDR
Three techniques are described for noise reduction from binary document pages to improve page appearance and subsequent optical character recognition and compression, and for subsampling the text image to fit on the computer screen white maintaining readability.Abstract:
Describes some of the document processing techniques used in the RightPages electronic library system. Since the system deals with scanned images of document pages, these techniques are critical to the use and appearance of the system. The author describes three techniques: (1) for noise reduction from binary document pages to improve page appearance and subsequent optical character recognition and compression; (2) for subsampling the text image to fit on the computer screen white maintaining readability; and (3) a document layout analysis technique to determine text blocks. >read more
Citations
More filters
Book
Algorithms for image processing and computer vision
TL;DR: Algorithms for Image Processing and Computer Vision, 2nd Edition provides the tools to speed development of image processing applications.
Journal ArticleDOI
The document spectrum for page layout analysis
TL;DR: The document spectrum (or docstrum) as discussed by the authors is a method for structural page layout analysis based on bottom-up, nearest-neighbor clustering of page components, which yields an accurate measure of skew, within-line, and between-line spacings and locates text lines and text blocks.
Book
The document spectrum for page layout analysis
TL;DR: The document spectrum (or docstrum), which is a method for structural page layout analysis based on bottom-up, nearest-neighbor clustering of page components, yields an accurate measure of skew, within-line, and between-line spacings and locates text lines and text blocks.
Proceedings ArticleDOI
Electronic marking and identification techniques to discourage document copying
TL;DR: Three coding methods are proposed that discourage illicit distribution by embedding each document with a unique codeword, yet enable one to identify the sanctioned recipient of a document by examination of a recovered document.
Patent
Automatically providing content associated with captured information, such as information captured in real-time
Martin T. King,Redwood Stephens,Claes-Fredrik Mannby,Jesse Peterson,Mark Sanvitale,Michael John Sebastian Smith,Christopher J. Daley-Watson +6 more
TL;DR: In this paper, a system and method for automatically providing content associated with captured information is described, in which the system receives input by a user, and automatically provides content or links to the information associated with the input.
References
More filters
Journal ArticleDOI
The document spectrum for page layout analysis
TL;DR: The document spectrum (or docstrum) as discussed by the authors is a method for structural page layout analysis based on bottom-up, nearest-neighbor clustering of page components, which yields an accurate measure of skew, within-line, and between-line spacings and locates text lines and text blocks.
Book
The document spectrum for page layout analysis
TL;DR: The document spectrum (or docstrum), which is a method for structural page layout analysis based on bottom-up, nearest-neighbor clustering of page components, yields an accurate measure of skew, within-line, and between-line spacings and locates text lines and text blocks.
Journal ArticleDOI
The RightPages image-based electronic library for alerting and browsing
TL;DR: The RightPages electronic library prototype system, which gives users full online library services, is described, and the system's image and document processing, including noise reduction, document layout analysis, text processing, and display processing are discussed.
Journal ArticleDOI
k X k thinning
TL;DR: Criteria are given by which thek × k method thins to minimally 8-connected lines while retaining connectivity and endpoints, and a procedure to obtain line widths in the course of thinning is described.