scispace - formally typeset
Journal ArticleDOI

Integrating diverse knowledge sources in text recognition

TLDR
An algorithm for text recognition/correction that effectively merges a bottom-up refinement process that is based on the utilization of transitional probabilities and letter confusion probabilities, known as the Viterbi algorithm [VA], together with a top-down process based on searching a trie structure representation of a lexicon.
Abstract
The capabilities of present commercial machines for producing correct text by recognizing words in print, handwriting and speech are very limited. For example, most optical character recognition [OCR] machines are limited to a few fonts of machine print, or text that is handprinted under certain constraints; any deviation from these constraints will produce highly garbled text. This paper describes an algorithm for text recognition/correction that effectively merges a bottom-up refinement process that is based on the utilization of transitional probabilities and letter confusion probabilities, known as the Viterbi algorithm [VA], together with a top-down process based on searching a trie structure representation of a lexicon. The algorithm is applicable to text containing an arbitrary number of character substitution errors such as that produced by OCR machines.

read more

Citations
More filters
Journal ArticleDOI

Techniques for automatically correcting words in text

Karen Kukich
TL;DR: Research aimed at correcting words in text has focused on three progressively more difficult problems: nonword error detection; (2) isolated-word error correction; and (3) context-dependent work correction, which surveys documented findings on spelling error patterns.
Journal ArticleDOI

Off-line cursive script word recognition

TL;DR: In this paper, a word image is transformed through a hierarchy of representation levels: points, contours, features, letters, and words, and a unique feature representation is generated bottom-up from the image using statistical dependences between letters and features.
Patent

Automatically providing content associated with captured information, such as information captured in real-time

TL;DR: In this paper, a system and method for automatically providing content associated with captured information is described, in which the system receives input by a user, and automatically provides content or links to the information associated with the input.
Patent

Handheld device for capturing text from both a document printed on paper and a document displayed on a dynamic display device

TL;DR: In this paper, a device for capturing rendered text is described, which consists of one or more visual sensors that receive visual information as a part of capturing text, and a visual information disposition subsystem for disposing of visual information received by the visual sensors.
Patent

Data capture from rendered documents using handheld device

TL;DR: A portable device having scanning, imaging or other data-capture capability is described in this paper, where the portable device can indicate to the user when enough information has been captured to uniquely identify a source document.
References
More filters
Journal ArticleDOI

The viterbi algorithm

TL;DR: This paper gives a tutorial exposition of the Viterbi algorithm and of how it is implemented and analyzed, and increasing use of the algorithm in a widening variety of areas is foreseen.
Book

Sorting and Searching

TL;DR: The first revision of this third volume is a survey of classical computer techniques for sorting and searching that extends the treatment of data structures to consider both large and small databases and internal and external memories.
Journal ArticleDOI

Approximate String Matching

TL;DR: Approximate matching of strings is reviewed with the aim of surveying techniques suitable for finding an item in a database when there may be a spelling mistake or other error in the keyword.
Journal ArticleDOI

Computer programs for detecting and correcting spelling errors

TL;DR: Peterson investigates the basic structure of several such existing programs and their approaches to solving the problems which arise when this type of program is created.
Proceedings ArticleDOI

Pattern recognition and reading by machine

TL;DR: Many efforts have been made to discriminate, categorize, and quantitate patterns, and to reduce them into a usable machine language, and the results have ordinarily been methods or devices with a high degree of specificity.