Showing papers on "Intelligent word recognition published in 2008"

PDF

Open Access

Journal Article•DOI•

Semi-continuous HMMs with explicit state duration for unconstrained Arabic word modeling and recognition

[...]

Abdallah Benouareth¹, Abdellatif Ennaji², Mokhtar Sellami¹•Institutions (2)

University of Paris-Sud¹, University of Rouen²

01 Sep 2008-Pattern Recognition Letters

TL;DR: It is shown experimentally that explicit state duration modeling in the SCHMM framework can significantly improve the discriminating capacity of the SCHMMs to deal with very difficult pattern recognition tasks such as unconstrained handwritten Arabic recognition.

...read moreread less

67 citations

Journal Article•DOI•

Multilingual OCR system for South Indian scripts and English documents: An approach based on Fourier transform and principal component analysis

[...]

V. N. Manjunath Aradhya¹, G. Hemantha Kumar¹, S. Noushath¹•Institutions (1)

University of Mysore¹

01 Jun 2008-Engineering Applications of Artificial Intelligence

TL;DR: This paper presents a multilingual character recognition system for printed South Indian scripts (Kannada, Telugu, Tamil and Malayalam) and English documents based on Fourier transform and principal component analysis (PCA), which are two commonly used techniques of image processing and recognition.

...read moreread less

60 citations

Proceedings Article•DOI•

Writer identification using edge-based directional probability distribution features for arabic words

[...]

Somaya Al-Maadeed¹, E. Mohammed¹, D. Al Kassis¹•Institutions (1)

Qatar University¹

31 Mar 2008

TL;DR: A new database of off-line Arabic handwriting text is built to be used for writer identification research and the performance of edge-based directional probability distributions as features and other features in Arabic writer identification is evaluated.

...read moreread less

Abstract: A system for writer identification based on Arabic handwritten words was built. First a database of words was gathered and used as a test base. Then, features vectors were extracted from writers' word images. Prior to feature extraction, normalization operations were applied to a word or text line. In this research, we studied the feature extraction and recognition operations on Arabic text, on the identification rate of writers. Since there is no well known database containing Arabic handwritten words for researchers to test, we built a new database of off-line Arabic handwriting text to be used for writer identification research. The proposed database is meant to provide training and testing sets for Arabic writer identification research. Arabic handwritten words were collected from 100 writers. We evaluated the performance of edge-based directional probability distributions as features and other features in Arabic writer identification.

...read moreread less

55 citations

Proceedings Article•DOI•

Off-line Cursive Handwritten Tamil Character Recognition

[...]

R.J. Kannan¹, R. Prabhakar, R. M. Suresh¹•Institutions (1)

RMK Engineering College¹

13 Dec 2008

TL;DR: In this paper, a system for offline recognition of handwritten handwritten Tamil characters using Hidden Markov Models (HMM) has been presented, which uses a combination of Time domain and frequency domain feature.

...read moreread less

Abstract: Concerning to optical character recognition, handwriting has sustained to persist as a means of communication and recording information in day to day life even with the introduction of new technologies. Hidden Markov Models (HMM) have long been a popular choice for Western cursive handwriting recognition following their success in speech recognition. However, when it comes to Indic script recognition, the published work employing HMMs is limited, and generally focused on isolated character recognition. A system for offline recognition of cursive handwritten Tamil characters is presented. In this effort, offline cursive handwritten recognition system for Tamil based on HMM and uses a combination of Time domain and frequency domain feature is proposed. The tolerance of the system is evident as it can overwhelm the complexities arise out of font variations and proves to be flexible and robust. Higher degree of accuracy in results has been obtained with the implementation of this approach on a comprehensive database. These initial results are promising and warrant further research in this direction. The results are also encouraging to explore possibilities for adopting the approach to other Indic scripts as well.

...read moreread less

45 citations

Patent•

OCR of books by word recognition

[...]

Asaf Tzadok¹, Eugeniusz Walach¹•Institutions (1)

IBM¹

16 Apr 2008

TL;DR: In this article, a document-specific database is created from an OCR scan of a document of interest, which contains an exhaustive listing of words in the document and images of each word, taken from all the fonts encountered, are entered into the database and mapped to a corresponding textual representation.

...read moreread less

Abstract: Disclosed embodiments of the invention provide automated global optimization methods and systems of OCR, tailored to each document being digitized. A document-specific database is created from an OCR scan of a document of interest, which contains an exhaustive listing of words in the document. Images of each word, taken from all the fonts encountered, are entered into the database and mapped to a corresponding textual representation. After entry of a first instance of an image of a word written in a particular font, each new occurrence of the word in that font can be quickly recognized by image processing techniques. The disclosed methods and systems may be used in conjunction with adaptive character recognition training and word recognition training of the OCR engines.

...read moreread less

41 citations

Proceedings Article•DOI•

A Segmentation Based Approach to Offline Handwritten Devanagari Word Recognition

[...]

Bikash Shaw, S.K. Parui, Malayappan Shridhar¹•Institutions (1)

University of Michigan¹

17 Dec 2008

TL;DR: A segmentation-based approach to handwritten Devanagari word recognition is proposed, on the basis of the head line, a word image is segmented in to pseudo characters.

...read moreread less

Abstract: The present paper proposes a segmentation-based approach to handwritten Devanagari word recognition. On the basis of the head line, a word image is segmented in to pseudo characters. Hidden Markov models are proposed to recognize the pseudo characters. The word level recognition is done on the basis of a string edit distance.

...read moreread less

38 citations

Proceedings Article•DOI•

Neural network based handwritten numeral recognition of Kannada and Telugu scripts

[...]

S.V. Rajashekararadhya¹, Prashant Ranjan¹•Institutions (1)

Anna University¹

01 Nov 2008

TL;DR: Zone and Distance metric based feature extraction system is presented and 98 % and 96 % recognition rate for Kannada and Telugu numerals respectively are obtained.

...read moreread less

Abstract: Character recognition is the important area in image processing and pattern recognition fields. Handwritten character recognition has received extensive attention in academic and production fields. The recognition system can be either on-line or off-line. Off-line handwriting recognition is the subfield of optical character recognition. India is a multi-lingual and multi-script country, where eighteen official scripts are accepted and have over hundred regional languages. In this paper we present Zone and Distance metric based feature extraction system. The character centroid is computed and the image is further divided in to n equal zones. Average distance from the character centroid to the each pixel present in the zone is computed. This procedure is repeated for all the zones present in the numeral image. Finally n such features are extracted for classification and recognition. Feed forward back propagation neural network is designed for subsequent classification and recognition purpose. We obtained 98 % and 96 % recognition rate for Kannada and Telugu numerals respectively.

...read moreread less

37 citations

Proceedings Article•

Offline handwritten Devanagari word recognition: A segmentation based

[...]

Bikash Shaw¹, S.K. Parui¹, Malayappan Shridhar•Institutions (1)

Indian Statistical Institute¹

01 Jan 2008

37 citations

Proceedings Article•DOI•

Isolated Handwritten Kannada and Tamil Numeral Recognition: A Novel Approach

[...]

S. V. Rajashekararadhya¹, P. Vanaja Ranjan¹, V. N. Manjunath Aradhya²•Institutions (2)

Anna University¹, University of Mysore²

16 Jul 2008

TL;DR: The projection distance metric and zoning based scheme for numeral recognition and a nearest neighbor classifier is used for subsequent purpose and gives around 93% and 90% of recognition accuracy for Kannada and Tamil numerals respectively.

...read moreread less

Abstract: Handwritten character recognition has received extensive attention in academic and production fields. The recognition system can be either online or off-line. There is a large demand for Optical character recognition on hand written documents. India is a multi-lingual country and multi script country, where eighteen official scripts are accepted and have over hundred regional languages. In this paper we have proposed the projection distance metric and zoning based scheme for numeral recognition. We tested our proposed method for Kannada and Tamil numerals. A nearest neighbor classifier is used for subsequent purpose. The proposed method gives around 93% and 90% of recognition accuracy for Kannada and Tamil numerals respectively.

...read moreread less

35 citations

Proceedings Article•DOI•

Word segmentation of off-line handwritten documents

[...]

Chen Huang¹, Sargur N. Srihari¹•Institutions (1)

University at Buffalo¹

27 Jan 2008

TL;DR: A gap metrics based machine learning approach to separate a line of unconstrained handwritten text into words and proposes a combined distance measure computed using three different methods to overcome the disadvantage of different distance computation methods.

...read moreread less

Abstract: Word segmentation is the most critical pre-processing step for any handwritten document recognition and/or retrieval system. When the writing style is unconstrained (written in a natural manner), recognition of individual components may be unreliable, so they must be grouped together into word hypotheses before recognition algorithms can be used. This paper describes a gap metrics based machine learning approach to separate a line of unconstrained handwritten text into words. Our approach uses a set of both local and global features, which is motivated by the ways in which human beings perform this kind of task. In addition, in order to overcome the disadvantage of different distance computation methods, we propose a combined distance measure computed using three different methods. The classification is done by using a three-layer neural network. The algorithm is evaluated using an unconstrained handwriting database that contains 50 pages (1026 line, 7562 words images) handwritten documents. The overall accuracy is 90.8%, which shows a better performance than a previous method.

...read moreread less

34 citations

Proceedings Article•DOI•

Offline handwritten Devanagari word recognition: A segmentation based approach

[...]

Bikash Shaw, S.K. Parui, Malayappan Shridhar¹•Institutions (1)

University of Michigan¹

01 Dec 2008

TL;DR: A novel segmentation based approach is proposed for recognition of offline handwritten Devanagari words and a hidden Markov model is used for recognition at pseudocharacter level.

...read moreread less

Abstract: A novel segmentation based approach is proposed for recognition of offline handwritten Devanagari words. Stroke based features are used as feature vectors. A hidden Markov model is used for recognition at pseudocharacter level. The word level recognition is done on the basis of a string edit distance.

...read moreread less

Book Chapter•DOI•

Implementation Challenges for Nastaliq Character Recognition

[...]

Sohail Abdul Sattar¹, Shamsul Haque¹, Mahmood K. Pathan¹, Quintin Gee²•Institutions (2)

NED University of Engineering and Technology¹, University of Southampton²

11 Apr 2008

TL;DR: Research on Urdu Nastaliq OCR is reported, challenges are discussed and a new solution for its implementation is suggested to suggest a new approach to its implementation.

...read moreread less

Abstract: Character recognition in cursive scripts or handwritten Latin script has attracted researchers’ attention recently and some research has been done in this area. Optical character recognition is the translation of optically-scanned bitmaps of printed or written text into digitally editable data files. OCRs developed for many world languages are already in use but none exists for Urdu Nastaliq – a calligraphic adaptation of the Arabic script, just as Jawi is for Malay. Urdu Nastaliq has 39 characters against Arabic 28. Each character then has 2-4 different shapes according to its position in the word: initial, medial, final and isolated. In Nastaliq, inter-word and intra-word overlapping makes optical recognition more complex. Character recognition of the Latin script is relatively easier. This paper reports research on Urdu Nastaliq OCR, discusses challenges and suggest a new solution for its implementation.

...read moreread less

Book Chapter•DOI•

Skeleton-Based Recognition of Chinese Calligraphic Character Image

[...]

Kai Yu¹, Jiangqin Wu¹, Yueting Zhuang¹•Institutions (1)

Zhejiang University¹

09 Dec 2008

TL;DR: A novel skeletonization algorithm called MFITS (morphology-fused index table skeletonization) is proposed and a skeleton-based Chinese calligraphic character recognition method is proposed too.

...read moreread less

Abstract: The large amount of digitized Chinese calligraphic works in existence is a valuable part of the Chinese cultural heritage. But they can hardly be recognized by optical character recognition (OCR) which performs well on machine printed characters against clean background, because there are so different styles of shape complexity characters. So the approaches of automatic Chinese calligraphic character recognition become more and more important. A novel skeletonization algorithm called MFITS (morphology-fused index table skeletonization) is proposed and a skeleton-based Chinese calligraphic character recognition method is proposed too. The experiments show that MFITS can extract skeletons with only a few deformations and the skeleton-based Chinese calligraphic character image recognition method has a good performance.

...read moreread less

Journal Article•DOI•

An Improved Handwritten Tamil Character Recognition System using Octal Graph

[...]

R. J. Kannan, R. Prabhakar

31 Jul 2008-Journal of Computer Science

TL;DR: This study proposes a novel solution for performing character recognition in Tamil using octal graph conversion for recognizing off-line handwritten Tamil characters which improves the slant correction and indicates that the approach can be used forCharacter recognition in other Indic scripts as well.

...read moreread less

Abstract: Problem Statement: Handwriting recognition has attracted voluminous research in recent times. The segmentation and recognition of the characters from handwritten scripts incorporates considerable overhead. Almost all the existing handwritten character recognition techniques use neural network approach, which requires lot of preprocessing and hence accomplishing these problems using neural network is a tedious task. Approach: In this study we propose a novel solution for performing character recognition in Tamil, the official language of the south Indian province of Tamil Nadu. Pursued by the preprocessing techniques, Segmentation, Normalization and Feature Extraction the approach utilizes octal graph conversion for recognizing off-line handwritten Tamil characters which improves the slant correction. The graph tries to represent the basic form of a letter independent of the style of writing. Using the weights of the graphs and by the appropriate feature matching with the predefined characters, the written characters are recognized. Results: The performance evaluation of off line handwritten Tamil character using octal graph conversion and the metrics based on ranks of the letters proves good Recognition Efficiency Conclusion: We show that, in practise, the proposed approach produces near optimal results besides outperforming the other methodologies in existence. Results indicate that the approach can be used for character recognition in other Indic scripts as well.

...read moreread less

Proceedings Article•DOI•

Handwritten Arabic character recognition based on SVM Classifier

[...]

Faouzi Bouchareb¹, Rachid Hamdi², Mouldi Bedda³•Institutions (3)

Yahoo!¹, University of Annaba², Al Jouf University³

07 Apr 2008

TL;DR: A novel algorithm for smoothing image and segmentation of the Arabic character using width writing estimated from skeleton character and Principal component Analysis (PCA) as data processing algorithm to features vector in order to reduce dimension is proposed.

...read moreread less

Abstract: This paper describes new methods for handwritten Arabic character recognition. We propose a novel algorithm for smoothing image and segmentation of the Arabic character using width writing estimated from skeleton character. The moments and Fourier descriptor of profile projection and centroid distance are used as features of each character these feature are invariant in translation , rotation and scale we apply Principal component Analysis (PCA) as data processing algorithm to features vector in order to reduce dimension. The classifier proposed in this work is based on Support Vector Machines (SVM) wich considerd an recent optimal classifier up to now. The results show that these methods are very powerful for isolated handwritten Arabic character.

...read moreread less

Proceedings Article•DOI•

Arabic handwriting recognition: Challenges and solutions

[...]

Abdurazzag Ali Aburas¹, Mohamed E. Gumah²•Institutions (2)

International Islamic University Malaysia¹, Information Technology University²

10 Jun 2008

TL;DR: The main challenges (difficulties) researchers are facing and up to dated solutions (the common methods) are used for Arabic text recognition.

...read moreread less

Abstract: Optical Characters Recognition (OCR) is one of the active subjects of research since the early days of computer science. Even if Arabic characters are used by more than a half a billion people; Arabic characters recognition has not received enough interests by the researchers. Little research progress has been achieved comparing to what has been done with Latin and Chinese. The cursive nature of the Arabic characters makes it more difficult to achieve a high accuracy in character recognition since even printed Arabic characters are in cursive form. This paper presents the main challenges (difficulties) researchers are facing and up to dated solutions (the common methods) are used for Arabic text recognition.

...read moreread less

Proceedings Article•DOI•

Character recognition using parallel BP neural network

[...]

Feng Yang, Fan Yang

07 Jul 2008

TL;DR: A novel character recognition method of license plate number based on parallel BP neural networks that will enhance the accuracy of the recognition system that aims to read automatically the Chinese license plate.

...read moreread less

Abstract: In the automated license plate recognition system, many reading errors are caused by inadequate character recognition method. This paper presents a novel character recognition method of license plate number based on parallel BP neural networks. This will enhance the accuracy of the recognition system that aims to read automatically the Chinese license plate. In the proposed methodology, the character is binarized and the noise is eliminated in the preprocessing stage, then the character feature is extracted by using skeleton and the character is normalized to size 8*16 pixels. Finally, the character feature is put into the parallel neural networks and the character is recognized. The proposed method in character recognition is effective, and promising results have been obtained in experiments on Chinese license plates.

...read moreread less

Proceedings Article•DOI•

Recognition of Arabic Handwritten Words using Contextual Character Models

[...]

Ramy El-Hajj¹, Chafic Mokbel¹, Laurence Likforman-Sulem²•Institutions (2)

University of Balamand¹, École Normale Supérieure²

27 Jan 2008

TL;DR: A system for the off-line recognition of cursive Arabic handwritten words based on Hidden Markov Models (HMMs) and uses a sliding window approach, which shows that using contextual character models improves recognition.

...read moreread less

Abstract: In this paper we present a system for the off-line recognition of cursive Arabic handwritten words. This system in an enhanced version of our reference system presented in [El-Hajj et al., 05] which is based on Hidden Markov Models (HMMs) and uses a sliding window approach. The enhanced version proposed here uses contextual character models. This approach is motivated by the fact that the set of Arabic characters includes a lot of ascending and descending strokes which overlap with one or two neighboring characters. Additional character models are constructed according to characters in their left or right neighborhood. Our experiments on images of the benchmark IFN/ENIT database of handwritten villages/towns names show that using contextual character models improves recognition. For a lexicon of 306 name classes, accuracy is increased by 0.6% in absolute value which corresponds to a 7.8% reduction in error rate.

...read moreread less

Proceedings Article•DOI•

Topic based language models for OCR correction

[...]

Anurag Bhardwaj¹, Faisal Farooq¹, Huaigu Cao¹, Venu Govindaraju¹•Institutions (1)

University at Buffalo¹

24 Jul 2008

TL;DR: This research constructs a topic based language model for every document using a training data which is manually categorized and trains a topic categorization sub-system based on Maximum Entropy model which is used to generate the topic distribution of a test document.

...read moreread less

Abstract: Despite several decades of research in document analysis, recognition of unconstrained handwritten documents is still considered a challenging task. Previous research in this area has shown that word recognizers produce reasonably clean output when used with a restricted lexicon. But in absence of such a restricted lexicon, the output of an unconstrained handwritten word recognizer is noisy. The objective of this research is to process noisy recognizer output and eliminate spurious recognition choices using a topic based language model. We construct a topic based language model for every document using a training data which is manually categorized. A topic categorization sub-system based on Maximum Entropy model is also trained which is used to generate the topic distribution of a test document. A given test word image is processed by the recognizer and its word recognition likelihood is refined by incorporating topic distribution of the document and topic based language model probability. The proposed method is evaluated on a publicly available IAM dataset and experimental results show significant improvement in the word recognition accuracy from 32% to 40% over a test set consisting of 4033 word images extracted from 70 handwritten document images.

...read moreread less

Patent•

Text input system and method involving finger-based handwriting recognition and word prediction

[...]

Ramin O. Assadollahi

08 Feb 2008

TL;DR: In this article, a text input system and method involving finger-based handwriting recognition and word prediction was presented, which consisted of a text prediction component (300) for predicting a plurality of follow-up words based on a text context, the text predictive component (310) outputting a set of candidate words; a character handwriting recognition component (330) for recognizing a handwritten character candidate, the handwritten character candidates being determined based upon handwriting input received from a touch sensitive input field (340); a candidate word filtering component (350) for filtering the set of candidates received from the text

...read moreread less

Abstract: The present invention relates to a text input system and method involving finger-based handwriting recognition and word prediction. A text input device (300) comprises: a text prediction component (310) for predicting a plurality of follow-up words based on a text context, the text prediction component (310) outputting a set of candidate words; a character handwriting recognition component (330) for recognizing a handwritten character candidate, the handwritten character candidate being determined based upon handwriting input received from a touch sensitive input field (340); a candidate word filtering component (350) for filtering the set of candidate words received from the text prediction component (310) based on the recognized handwritten character candidate; a word presentation component (360) for presenting candidate words from the filtered set of candidate words to a user of the device; and a word selection component (380) for receiving a user selection of a presented candidate word from the user.

...read moreread less

Proceedings Article•DOI•

A novel orientation free method for online unconstrained cursive handwritten chinese word recognition

[...]

Teng Long¹, Lianwen Jin¹•Institutions (1)

South China University of Technology¹

01 Dec 2008

TL;DR: The promising experimental results demonstrated the method is an orientation free and stroke-order free method for unconstrained cursive handwritten Chinese word recognition.

...read moreread less

Abstract: In this paper, we propose an orientation free method for unconstrained cursive handwritten chinese word recognition. By a novel gravity center balancing method, the orientation of a handwritten word can be detected. Through the process of stroke extraction, stroke breaking, heuristic over-segmentation and path searching by recognition and lexicon information, the handwritten word with characters even connected or partially overlapped can be recognized. Experiments were performed on 173,660 unconstrained handwritten Chinese word samples collected by Pocket PC. The promising experimental results demonstrated our method is an orientation free and stroke-order free method for unconstrained cursive handwritten Chinese word recognition.

...read moreread less

Proceedings Article•

Chinese Named Entity Recognition and Word Segmentation Based on Character.

[...]

Jingzhou He, Houfeng Wang¹•Institutions (1)

Peking University¹

01 Jan 2008

TL;DR: This paper presents a character-based Conditional Random Fields (CRFs) model for Chinese word segmentation and named entity recognition, and it turns out to perform well.

...read moreread less

Abstract: Chinese word segmentation and named entity recognition (NER) are both important tasks in Chinese information processing. This paper presents a character-based Conditional Random Fields (CRFs) model for such two tasks. In The SIGHAN Bakeoff 2007, this model participated in all closed tracks for both Chinese NER and word segmentation tasks, and turns out to perform well. Our system ranks 2nd in the closed track on NER of MSRA, and 4th in the closed track on word segmentation of SXU.

...read moreread less

Proceedings Article•DOI•

Online Writer-Independent Character Recognition Using a Novel Relational Context Representation

[...]

S. Izadi¹, C.Y. Suen¹•Institutions (1)

Concordia University¹

11 Dec 2008

TL;DR: The approach to gain performance in online character recognition is to design more representative features for handwritten character representation in order to tackle the huge inter-class variability problem and increase recognition accuracy.

...read moreread less

Abstract: Transforming handwriting into digital text and recognition of handwritten patterns opens a vast scope of application opportunities from searching for handwritten notes and document management to causing actions by writing symbols. Despite receiving a great attention, a massive number of applications, and a huge research effort, recognition of handwritten text has not still reached a desired efficiency and is an active area of research. One of the most important factors that makes handwriting recognition a challenging task is the huge variety of writing styles which can not be captured efficiently through available classification methods using current feature descriptors. Our approach to gain performance in online character recognition is to design more representative features for handwritten character representation in order to tackle the huge inter-class variability problem and increase recognition accuracy. The representation can also be used in recognition of other online planar patterns. The experimental results show that proposed representation with SVM classifier outperforms best reported recognition rates for Arabic characters in a writer-independent system.

...read moreread less

Optical Character Recognition System Using BP Algorithm

[...]

Sang Sung Park, Won Gyo Jung, Young Geun Shin, Dong Sik Jang

01 Jan 2008

TL;DR: OCR system that saves abstracted characters to DB automatically after extracting only equivalent and necessary characters from a large amount of documents by using BP algorithm that is one of Artificial neural network is constructed.

...read moreread less

Abstract: †Summary Most government agencies and companies have kept proof data and documentations which are passed certain period of time and exchanged electronic forms by the regulation of an office management. The method that saving relevant documents by scanning or entering manually on computer was used for document's digitalizing. So that the government agencies and companies are trying to reduce these inconvenience nowadays. They use OCR (OCR : Optical Character Recognition) technique which is that saving relevant documents to DB after extracting text by using OCR(Optical Character Recognition). However, there is inconvenience in general OCR. That is, text should be entered to DB after classifying segments one by one in realized whole document after doing character recognition through OCR. In this paper, in order to solve this problem, we constructed OCR system that saves abstracted characters to DB automatically after extracting only equivalent and necessary characters from a large amount of documents by using BP algorithm that is one of Artificial neural network.

...read moreread less

Proceedings Article•DOI•

A study of on-line handwritten chemical expressions recognition

[...]

Jufeng Yang¹, Guangshun Shi¹, Kai Wang¹, Qian Geng¹, Qing-Ren Wang¹ - Show less +1 more•Institutions (1)

Nankai University¹

01 Dec 2008

TL;DR: A novel two-level algorithm to recognize expressions is proposed that segments expressions fatherly and recognizes isolated symbols and an XML-based system to help users save, modify and search the recognition result is designed.

...read moreread less

Abstract: In this paper, we study the major modules of on-line handwritten chemical expressions recognition. We propose a novel two-level algorithm to recognize expressions. In the first level, structural information is used to distinguish different parts and recognize substances. Then the algorithm segments expressions fatherly and recognizes isolated symbols. To meet the demand of actual applications, the paper also designs an XML-based system to help users save, modify and search the recognition result. The experiment shows that the presented algorithm is reliable.

...read moreread less

Proceedings Article•DOI•

Robust time recognition of video clock based on digit transition detection and digit-sequence recognition

[...]

Xinguo Yu¹, Yiqun Li¹, Wei San Lee²•Institutions (2)

Institute for Infocomm Research Singapore¹, National University of Singapore²

01 Dec 2008

TL;DR: Experimental results show that the robustness of the algorithm benefits from the facts that both digit transition detection and digit-sequence recognition are more reliable than direct character recognition.

...read moreread less

Abstract: This paper presents an algorithm for robust time recognition of video clock. The existing OCR algorithms cannot recognize time properly due to digits of time are in very low resolution and blur. To confront the challenges of time recognition, our algorithm employs three techniques. The first one is a digit transition detection, which identifies SECOND transit frames. The second is a digit-sequence recognition, which uses the property that digits in clock appear in cycle of 0 to 9 to form digit sequence. The third is an on-the-fly template creation. Informally, the robustness of our algorithm benefits from the facts that both digit transition detection and digit-sequence recognition are more reliable than direct character recognition. Experimental results show that our algorithm can achieve a high accuracy in recognizing time.

...read moreread less

Journal Article•DOI•

Two-stage lexicon reduction for offline arabic handwritten word recognition

[...]

Saeed Mozaffari¹, Karim Faez¹, Volker Märgner, Haikal El Abed•Institutions (1)

Amirkabir University of Technology¹

01 Nov 2008-International Journal of Pattern Recognition and Artificial Intelligence

TL;DR: A holistic lexicon reduction technique for offline handwritten Arabic word recognition is proposed in this paper and involves the extraction of dots and subwords from the cursive Arabic word image to describe its shape.

...read moreread less

Abstract: Given large number of words to be recognized, a two-stage strategy for eliminating unlikely candidates before recognition can be a reasonable and powerful approach for increasing the recognition speed. A holistic lexicon reduction technique for offline handwritten Arabic word recognition is proposed in this paper. The principle of this technique involves the extraction of dots and subwords from the cursive Arabic word image to describe its shape. In the first stage of reduction, the number of subwords in the input word is estimated. Then in the second stage, the word descriptor, based on the dots information, is used while taking into account only the candidates selected in the first stage. Experimental results on IFN/ENIT database, consisting of 26,459 cursive Arabic word images, show a lexicon reduction of 92.5% with accuracy of 74%.

...read moreread less

Proceedings Article•DOI•

Bio-inspired unified model of visual segmentation system for CAPTCHA character recognition

[...]

Chi-Wei Lin¹, Yu-Han Chen¹, Liang-Gee Chen¹•Institutions (1)

National Taiwan University¹

17 Nov 2008

TL;DR: A bio-inspired unified model to improve the recognition accuracy of character recognition problems for CAPTCHA (completely automated public turing test to tell computers and humans apart) and can be generalized to cope with broader domains.

...read moreread less

Abstract: In this paper, we present a bio-inspired unified model to improve the recognition accuracy of character recognition problems for CAPTCHA (completely automated public turing test to tell computers and humans apart). Our study focused on segmenting different CAPTCHA characters to show the importance of visual preprocessing in recognition. Traditional character recognition systems show a low recognition rate for CAPTCHA characters due to their noisy backgrounds and distorted characters. We imitated the human visual attention system to let a recognition system know where to focus on despite the noise. The preprocessed characters were then recognized by an OCR system. For the CAPTHA characters we tested, the overall recognition rate increased from 16.63% to 70.74% after preprocessing. From our experimental results, we found out the importance of preprocessing for character recognition. Also, by imitating the human visual system, a more unified model can be built. The model presented is an instance for a certain type of visual recognition problem and can be generalized to cope with broader domains.

...read moreread less

Proceedings Article•DOI•

Recognition of books by verification and retraining

[...]

N.V. Neeba, C. V. Jawahar

01 Dec 2008

TL;DR: An adaptation framework to recognize characters in a book with a learning framework is proposed and the post processor verifies the output of the recognition module, which is further used for learning and thus to improve the performance over iteration.

...read moreread less

Abstract: The problem of character recognition in a book should be formulated significantly different from that of a single page or word. An ideal approach to design such a recognizer is to adapt the classifier to the font and style of the collection. In this paper, we propose an adaptation framework to recognize characters in a book with a learning framework. In the proposed system, the post processor verifies the output of the recognition module, which is further used for learning and thus to improve the performance over iteration. Experiments are conducted on about 500,000 annotated symbols from five books in Malayalam (an Indian language). We achieve an average improvement of 14% in classification accuracy.

...read moreread less

Journal Article•DOI•

Off-line cursive handwritten Tamil character recognition

[...]

R. Jagadeesh Kannan¹, R. Prabhakar¹•Institutions (1)

RMK Engineering College¹

01 Jun 2008-WSEAS Transactions on Signal Processing archive

TL;DR: A system for offline recognition of cursive handwritten Tamil characters is presented and uses a combination of Time domain and frequency domain feature, which proves to be flexible and robust.

...read moreread less

Abstract: In spite of several advancements in technologies pertaining to Optical character recognition, handwriting continues to persist as means of documenting information for day-to-day life. The process of segmentation and recognition pose quiets a lot of challenges especially in recognizing cursive handwritten scripts of different languages. The concept proposed is a solution crafted to perform character recognition of hand-written scripts in Tamil, a language having official status in India, Sri Lanka, and Singapore. The approach utilizes discrete Hidden Markov Models (HMMs) for recognizing off-line cursive handwritten Tamil characters. The tolerance of the system is evident as it can overwhelm the complexities arise out of font variations and proves to be flexible and robust. Higher degree of accuracy in results has been obtained with the implementation of this approach on a comprehensive database and the precision of the results demonstrates its application on commercial usage. The methodology promises to present a simple and fast scaffold to construct a full OCR system extended with suitable pre-processing.

...read moreread less