Topic
Intelligent word recognition
About: Intelligent word recognition is a research topic. Over the lifetime, 2480 publications have been published within this topic receiving 45813 citations.
Papers published on a yearly basis
Papers
More filters
••
TL;DR: A method, “Shortest Path Segmentation” (SPS), which combines dynamic programming and a neural net recognizer for segmenting and recognizing character strings is described, and applications of some of these ideas are described.
Abstract: We describe a method, “Shortest Path Segmentation” (SPS), which combines dynamic programming and a neural net recognizer for segmenting and recognizing character strings. We describe the application of this method to two problems: recognition of handwritten ZIP Codes, and recognition of handwritten words. For the ZIP Codes, we also used the method to automatically segment the images during training: the dynamic programming stage both performs the segmentation and provides inputs and desired outputs to the neural network. Results are reported for a test set of 2642 unsegmented handwritten 212 dpi binary ZIP Code (5- and 9-digit) images. For handwritten word recognition, we combined SPS with a “Space Displacement Neural Network” approach, in which a single-character-recognition network is extended over the entire word image, and in which SPS techniques are then used to rank order a given lexicon. We report results on a test set of 3000 300 ppi gray scale word images, extracted from images of live mail pieces, for lexicons of size 10, 100, and 1000. Representing the problem as a graph as proposed in this paper has advantages beyond the efficient finding of the final optimal segmentation, or the automatic segmentation of images during training. We can also easily extend the technique to generate K “runner up” answers (for example, by finding the K shortest paths). This paper will also describe applications of some of these ideas.
28 citations
••
11 Apr 2008TL;DR: Research on Urdu Nastaliq OCR is reported, challenges are discussed and a new solution for its implementation is suggested to suggest a new approach to its implementation.
Abstract: Character recognition in cursive scripts or handwritten Latin script has attracted researchers’ attention recently and some research has been done in this area. Optical character recognition is the translation of optically-scanned bitmaps of printed or written text into digitally editable data files. OCRs developed for many world languages are already in use but none exists for Urdu Nastaliq – a calligraphic adaptation of the Arabic script, just as Jawi is for Malay. Urdu Nastaliq has 39 characters against Arabic 28. Each character then has 2-4 different shapes according to its position in the word: initial, medial, final and isolated. In Nastaliq, inter-word and intra-word overlapping makes optical recognition more complex. Character recognition of the Latin script is relatively easier. This paper reports research on Urdu Nastaliq OCR, discusses challenges and suggest a new solution for its implementation.
27 citations
•
04 Nov 2002TL;DR: In this paper, a handwritten character recognition apparatus performs a recognition process for a handwritten input pattern to input character codes, which is similar in shape to the handwritten input patterns, using a plurality of characters.
Abstract: A handwritten character recognition apparatus performs a recognition process for a handwritten input pattern to input character codes. The handwritten character recognition apparatus recognizes a handwritten input pattern as one pictorial symbol formed of a plurality of characters. The plurality of characters are similar in shape to the handwritten input pattern.
27 citations
••
23 Aug 2004
TL;DR: Finite-state models are used to implement a handwritten text recognition and classification system for a real application entailing casual, spontaneous writing with large vocabulary.
Abstract: Finite-state models are used to implement a handwritten text recognition and classification system for a real application entailing casual, spontaneous writing with large vocabulary. Handwritten short paragraphs are to be classified into a small number of predefined classes. The paragraphs involve a wide variety of writing styles and contain many non-textual artifacts. HMMs and n-grams are used for text recognition and n-grams are also used for text classification. Experimental results are reported which, given the extreme difficulty of the task, are encouraging.
27 citations
••
27 Sep 2006TL;DR: This paper proposes a quadratic classifier based scheme for the recognition of off-line handwritten characters of three popular south Indian scripts: Kannada, Telugu, and Tamil, and used 64-dimensional features for high speed recognition and 400-dimensional Features for high accuracy recognition.
Abstract: India is a multi-lingual, multi-script country. Considerably less work has been done towards handwritten character recognition of Indian languages than for other languages. In this paper we propose a quadratic classifier based scheme for the recognition of off-line handwritten characters of three popular south Indian scripts: Kannada, Telugu, and Tamil. The features used here are mainly obtained from the directional information. For feature computation, the bounding box of a character is segmented into blocks, and the directional features are computed in each block. These blocks are then down-sampled by a Gaussian filter, and the features obtained from the down-sampled blocks are fed to a modified quadratic classifier for recognition. Here, we used two sets of features. We used 64-dimensional features for high speed recognition and 400-dimensional features for high accuracy recognition. A five-fold cross validation technique was used for result computation, and we obtained 90.34%, 90.90%, and 96.73% accuracy rates from Kannada, Telugu, and Tamil characters, respectively, from 400 dimensional features.
27 citations