scispace - formally typeset
Search or ask a question
Topic

Intelligent word recognition

About: Intelligent word recognition is a research topic. Over the lifetime, 2480 publications have been published within this topic receiving 45813 citations.


Papers
More filters
Proceedings ArticleDOI
16 Aug 2005
TL;DR: This paper attempts to use the fuzzy concept on handwritten numerals and Tamil characters to classify them as one among the prototype characters using a feature called distance from the frame and a suitable membership function.
Abstract: The theory of fuzzy set provides an approximate but effective means of describing the behavior of ill-defined systems. Patterns of human origin like handwritten characters are to some extent found to be fuzzy in nature. It is decided to use fuzzy conceptual approach effectively in this paper. In fact, in this paper, we attempt to use the fuzzy concept on handwritten numerals and Tamil characters to classify them as one among the prototype characters using a feature called distance from the frame and a suitable membership function. The unknown and prototype characters are preprocessed and considered for recognition. The algorithm is tested for about 250 samples for numerals and seven chosen Tamil characters and the success rate obtained varies from 76% to 94%.

13 citations

Proceedings ArticleDOI
10 Sep 2001
TL;DR: The application of the Mutual Information criterion to validate feature sets extracted from handwritten words in Brazilian legal amounts to improve the perpetual feature set with complementary geometric features, and also modeling the prefix and suffix of the words.
Abstract: The paper presents the application of the Mutual Information criterion (T.M. Cover and J.A. Thomas, 1991) to validate feature sets extracted from handwritten words in Brazilian legal amounts. The lexicon includes a subset of short words without ascenders/descenders and subsets of words with the same prefix or suffix. These particularities of the Brazilian lexicon show that it is necessary to improve the perpetual feature set with complementary geometric features, and also modeling the prefix and suffix of the words. Finally, the experiments show the viability of our approach.

13 citations

Proceedings ArticleDOI
26 Jul 2009
TL;DR: A document level OCR which incorporates information from the entire document to reduce word error rates and demonstrates a relative improvement of 28% for long words and 12% for all words which appear at least twice in the corpus for Telugu.
Abstract: The word error rate of any optical character recognition system (OCR) is usually substantially below its component or character error rate. This is especially true of Indic languages in which a word consists of many components. Current OCRs recognize each character or word separately and do not take advantage of document level constraints. We propose a document level OCR which incorporates information from the entire document to reduce word error rates. Word images are first clustered using a locality sensitive hashing technique. Individual words are then recognized using a (regular) OCR. The OCR outputs of word images in a cluster are then corrected probabilistically by comparing with the OCR outputs of other members of the same cluster. The approach may be applied to improve the accuracy of any OCR run on documents in any language. In particular, we demonstrate it for Telugu, where the use of language models for post-processing is not promising. We show a relative improvement of 28% for long words and 12% for all words which appear at least twice in the corpus.

13 citations

Proceedings ArticleDOI
12 Oct 1997
TL;DR: A method for the recognition of handwritten Hindi numerals is proposed based on structural descriptors of numeral shapes, which proves the tolerance of the proposed system to recognize a high variability ofnumeral shapes.
Abstract: A method for the recognition of handwritten Hindi numerals is proposed based on structural descriptors of numeral shapes. The method consists of three major steps: 1) preprocessing, where a handwritten numeral is scanned, normalized, and then thinned; 2) a robust algorithm is developed to segment the scanned numeral image into stroke(s), based on feature points; and 3) identify cavity features. The output of this algorithm is a syntactic representation (that is one or more syntactic terms) of the scanned numeral. Finally, the syntactic representation is matched against a set of syntactic representation prototypes of handwritten numerals and the recognition result is reported. Early experimental results are encouraging and prove the tolerance of the proposed system to recognize a high variability of numeral shapes.

13 citations

Proceedings ArticleDOI
26 Jul 2009
TL;DR: This paper presents a unified approach for multi-lingual recognition of alphabetic scripts using the multi-stream paradigm and shows interesting recognition performances with only 1.5% of script confusion and an overall word recognition rate of 84.5%.
Abstract: Generally, handwritten word recognition systems use script specific methodologies. In this paper, we present a unified approach for multi-lingual recognition of alphabetic scripts. The proposed system operates independently of the nature of the script using the multi-stream paradigm. The experiments have been carried out on a multi-script database composed of Arabic and Latin handwritten words from the IFN/ENIT and the IRONOFF public databases and show interesting recognition performances with only1.5% of script confusion and an overall word recognition rate of 84.5% using a multi-script lexicon of 1142 words.

13 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
86% related
Feature (computer vision)
128.2K papers, 1.7M citations
85% related
Image segmentation
79.6K papers, 1.8M citations
84% related
Convolutional neural network
74.7K papers, 2M citations
84% related
Object detection
46.1K papers, 1.3M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202314
202241
20201
20192
20189
201751