scispace - formally typeset
Search or ask a question

Showing papers on "Devanagari published in 2013"


Journal ArticleDOI
TL;DR: A review of OCR work on Indian scripts, mainly on Bangla and Devanagari—the two most popular scripts in India, and the various methodologies and their reported results are presented.
Abstract: The past few decades have witnessed an intensive research on optical character recognition (OCR) for Roman, Chinese, and Japanese scripts. A lot of work has been also reported on OCR efforts for various Indian scripts, like Devanagari, Bangla, Oriya, Tamil, Telugu, Malayalam, Kannada, Gurmukhi, Gujarati, etc. In this paper, we present a review of OCR work on Indian scripts, mainly on Bangla and Devanagari—the two most popular scripts in India. We have summarized most of the published papers on this topic and have also analysed the various methodologies and their reported results. Future directions of research in OCR for Indian scripts have been also given.

70 citations


Journal ArticleDOI
TL;DR: Three feature extraction techniques have been used to improve the rate of recognition of Optical Character Recognition (OCR) for printed Hindi text in Devanagari script, using Artificial Neural Network (ANN), which improves its efficiency.
Abstract: Hindi is the most widely spoken language in India, with more than 300 million speakers. As there is no separation between the characters of texts written in Hindi as there is in English, the Optical Character Recognition (OCR) systems developed for the Hindi language carry a very poor recognition rate. In this paper we propose an OCR for printed Hindi text in Devanagari script, using Artificial Neural Network (ANN), which improves its efficiency. One of the major reasons for the poor recognition rate is error in character segmentation. The presence of touching characters in the scanned documents further complicates the segmentation process, creating a major problem when designing an effective character segmentation technique. Preprocessing, character segmentation, feature extraction, and finally, classification and recognition are the major steps which are followed by a general OCR. The preprocessing tasks considered in the paper are conversion of gray scaled images to binary images, image rectification, and segmentation of the documents textual contents into paragraphs, lines, words, and then at the level of basic symbols. The basic symbols, obtained as the fundamental unit from the segmentation process, are recognized by the neural classifier. In this work, three feature extraction techniques-: histogram of projection based on mean distance, histogram of projection based on pixel value, and vertical zero crossing, have been used to improve the rate of recognition. These feature extraction techniques are powerful enough to extract features of even distorted characters/symbols. For development of the neural classifier, a back-propagation neural network with two hidden layers is used. The classifier is trained and tested for printed Hindi texts. A performance of approximately 90% correct recognition rate is achieved.

41 citations


Proceedings ArticleDOI
24 Aug 2013
TL;DR: A novel offline strategy for recognition of online handwritten Devanagari characters entered in an unconstrained manner, based on CNN, that allows writers to enter characters in any number or order of strokes and is also robust to certain amount of overwriting.
Abstract: In this paper, we introduce a novel offline strategy for recognition of online handwritten Devanagari characters entered in an unconstrained manner. Unlike the previous approaches based on standard classifiers - SVM, HMM, ANN and trained on statistical, structural or spectral features, our method, based on CNN, allows writers to enter characters in any number or order of strokes and is also robust to certain amount of overwriting. The CNN architecture supports an increased set of 42 Devanagari character classes. Experiments with 10 different configurations of CNN and for both Exponential Decay and Inverse Scale Annealing approaches to convergence, show highly promising results. In a further improvement, the final layer neuron outputs of top 3 configurations are averaged and used to make the classification decision, achieving an accuracy of 99.82% on the train data and 98.19% on the test data. This marks an improvement of 0.2% and 5.81%, for the train and test set respectively, over the existing state-of-the-art in unconstrained input. The data used for building the system is obtained from different parts of Devanagari writing states in India, in the form of isolated words. Character level data is extracted from the collected words using a hybrid approach and covers all possible variations owing to the different writing styles and varied parent word structures.

31 citations


Journal Article
TL;DR: An accurate and exhaustive approach to detect the skew angle of the images of words/ characters of cursive Devanagari script and it is efficient in terms of time and is a simpler process as compared to the existing ones.
Abstract: This paper proposes an accurate and exhaustive approach to detect the skew angle of the images of words/ characters of cursive Devanagari script. This approach was applied to 235 writing samples and a total collection of around 6000 samples. It is efficient in terms of time and is a simpler process as compared to the existing ones. The method is an extension to the work of Pal and Chaudhuri [B. B. Chaudhuri and U. Pal, Skew angle detection of digitized Indian script documents, IEEE Trans. PAMI-19, 182-186 (1997)]. Heuristic approach has been applied to detect the skew angle. The inherent dominating features of the structure of the Devanagari script have been used to accurately calculate the skew of the Devanagari word.

30 citations


Proceedings Article
14 Nov 2013
TL;DR: The proposed classification system preprocess and normalize the 27000 handwritten character images into 30×30 pixels images and divides them into zones and produces three classes depending on presence or absence of vertical bar.
Abstract: Compound character recognition of Devanagari script is one of the challenging tasks since the characters are complex in structure and can be modified by writing combination of two or more characters. These compound characters occurs 12 to 15% in the Devanagari Script. The moment based techniques are being successfully applied to several image processing problems and represents a fundamental tool to generate feature descriptors where the Zernike moment technique has a rotation invariance property which found to be desirable for handwritten character recognition. This paper discusses extraction of features from handwritten compound characters using Zernike moment feature descriptor and proposes SVM and k-NN based classification system. The proposed classification system preprocess and normalize the 27000 handwritten character images into 30×30 pixels images and divides them into zones. The pre-classification produces three classes depending on presence or absence of vertical bar. Further Zernike moment feature extraction is performed on each zone. The overall recognition rate of proposed system using SVM and k-NN classifier is upto 98.37%, and 95.82% respectively.

22 citations


Proceedings ArticleDOI
22 Mar 2013
TL;DR: Artificial Neural Network technique is used to designed to preprocess, segment and recognize devanagari characters, which is found to exhibit an accuracy of 75.6% on noisy characters.
Abstract: Character recognition systems for various languages and script has gain importance in recent decades and is the area of deep interest for many researchers. Their development is strongly integerated with Neural Networks. But, recognizing Devanagari Script is relatively greater challenge due to script's complexity. Various techniques have been implemented for this problem with many improvements so far. This paper describes the development and implementation of one such system comprising combination of several stages. Mainly Artificial Neural Network technique is used to designed to preprocess, segment and recognize devanagari characters. The system was designed, implemented, trained and found to exhibit an accuracy of 75.6% on noisy characters.

20 citations


Proceedings ArticleDOI
25 Aug 2013
TL;DR: This paper proposes a formulation, where the expectations on these two modules is minimized, and the harder recognition task is modelled as learning of an appropriate sequence to sequence translation scheme, and forms the recognition as a direct transcription problem.
Abstract: Optical Character Recognition (OCR) problems are often formulated as isolated character (symbol) classification task followed by a post-classification stage (which contains modules like Unicode generation, error correction etc.) to generate the textual representation, for most of the Indian scripts. Such approaches are prone to failures due to (i) difficulties in designing reliable word-to-symbol segmentation module that can robustly work in presence of degraded (cut/fused) images and (ii) converting the outputs of the classifiers to a valid sequence of Unicodes. In this paper, we propose a formulation, where the expectations on these two modules is minimized, and the harder recognition task is modelled as learning of an appropriate sequence to sequence translation scheme. We thus formulate the recognition as a direct transcription problem. Given many examples of feature sequences and their corresponding Unicode representations, our objective is to learn a mapping which can convert a word directly into a Unicode sequence. This formulation has multiple practical advantages: (i) This reduces the number of classes significantly for the Indian scripts. (ii) It removes the need for a reliable word-to-symbol segmentation. (ii) It does not require strong annotation of symbols to design the classifiers, and (iii) It directly generates a valid sequence of Unicodes. We test our method on more than 6000 pages of printed Devanagari documents from multiple sources. Our method consistently outperforms other state of the art implementations.

18 citations


Proceedings ArticleDOI
24 Aug 2013
TL;DR: This work uses an ensemble of MLP classifiers having different hidden layer sizes and results of their classification are combined based on Adaboost technique, and studies use of boosting as a solution to this problem of using MLP as a classifier in real-life applications.
Abstract: In this article, we present our recent study of offline recognition of handwritten numerals of three Indian scripts -- Devanagari, Bangla and Oriya. Here, we propose a novel approach to combination of multiple MLP classifiers with varying number of hidden nodes based on Adaboost technique. In this recognition study, we used Zernike moment features of different orders. We obtained classification results corresponding to a number of orders of this moment function and the best classification result for each script was obtained when the feature vector consists of moment values up to the order 8. It is well-known that the classification performance of an MLP largely depends on the choice of the number of hidden nodes. In the present work, we studied use of boosting as a solution to this problem of using MLP as a classifier in real-life applications. Here, we use an ensemble of MLP classifiers having different hidden layer sizes and results of their classification are combined based on Adaboost technique. Classification results have been provided using publicly available databases [1] of offline handwritten numeral images of three Indian scripts.

15 citations


Proceedings ArticleDOI
24 Aug 2013
TL;DR: An algorithm for extraction and recognition of Bangla and Devanagari text form video frames with complex background by using Adaptive SIS binarization technique and state of the art OCR.
Abstract: Extraction and recognition of Bangla text from video frame images is challenging due to fonts type and style variation, complex color background, low-resolution, low contrast etc. In this paper, we propose an algorithm for extraction and recognition of Bangla and Devanagari text form video frames with complex background. Here, a two-step approach has been proposed. After text localization, the text line is segmented into words using information based on line contours. First order gradient values of the text blocks are used to find the word gap. Next, an Adaptive SIS binarization technique is applied on each word. Next this binarized text block is sent to a state of the art OCR for recognition.

12 citations


Proceedings ArticleDOI
24 Aug 2013
TL;DR: This research paper proposes a recognition system for handwritten Devanagari Compound character recognition based on Legendre moment feature descriptor, which has been successfully applied to many pattern recognition problem.
Abstract: Handwritten Devanagari Compound character recognition is one of the new challenging task for the researcher, because Compound character are complex in structure, they are written by combination two or more character. Their occurrence in the script is up to 12 to 15%. In this research paper, a recognition system for handwritten Devanagari Compound Character is proposed bases on Legendre moment feature descriptor are used to recognize. Moment function have been successfully applied to many pattern recognition problem, due to this they tends to capture global features which makes them well suited as feature descriptor. The process image is normalized to 30X30 pixel size divided into zone, from this structural as well as statistical feature are extracted from each zone. The proposed system is trained and tested on 27000 handwritten collected from different people. For classification we have used Artificial Neural Network. The overall recognition rate for basic is up to 98.25% and for all compound character is 98.36%.

11 citations


Journal ArticleDOI
TL;DR: A methodology to segment the Devanagari words, extracted from the scene images, into characters is presented and an indigenous database is created to serve as baseline for the future researchers.
Abstract: A methodology to segment the Devanagari words, extracted from the scene images, into characters is presented. Scene images include street signs, shop names, product advertisements, posters on streets, etc. Such words are prone to multiple sources of noise and these make the segmentation very challenging. The problem gets more complicated while developing the text recognition methodologies for different scripts because there is no general solution to this problem and recognizing text in some scripts can be tougher than in others. An indigenous database is created for this purpose. It consists of 130 samples, manually extracted from 200 natural scene images. The results obtained by applying the proposed techniques are encouraging. The average performance is found to be 55.77 %. The execution time for a typical word of size 1169 × 353 is found to be 4.76 s. The database and the results can serve as baseline for the future researchers.

Proceedings ArticleDOI
25 Aug 2013
TL;DR: This paper developed a novel part-based model technique that can use either the machine printed or the handwritten dataset for training on Devanagari character recognition from scene images and presents the results on the publicly available dataset (DSIW2K) containing images of street scenes taken in New Delhi, India.
Abstract: Character recognition in scene images is an extremely challenging task. Although several techniques are reported performing well, they pertain to English only. This paper focuses on Devanagari character recognition from scene images. Devanagari script is very popular language and has very typical characteristics different from other scripts, particularly English. Combination of basic Devanagari consonants and vowels in multi-variegated ways can yield as many as 100s of characters. Building a classifier to recognize all these classes will be a difficult task. To alleviate this problem, a novel part-based model technique is proposed. 40 basic classes were identified from the Devanagari script for the same purpose. The technique was proposed so as to classify an instance of one these classes in any given test sample. Procuring a large dataset for training is not feasible in the case of scene images. To simultaneously solve this problem, we developed our technique that can use either the machine printed or the handwritten dataset for training. We present our results on the publicly available dataset (DSIW2K) containing images of street scenes taken in New Delhi, India.

Journal Article
TL;DR: The experimental results confirm the proposition to be superior approach over other conventional methodologies to OCR system implementation for Devanagari scripts and detailed approach to conventional pre-processing involved in initial stage of OCR, including noise removal techniques, along with the other conventional approaches to segmentation.
Abstract: Optical Character Recognition (OCR) system aims to convert optically scanned text image to a machine editable text form. Multiple approaches to preprocessing and segmentation exist for various scripts. However, only a restricted combination of the same has been experimented on Devanagari script. This paper proposes a study which aims to explore and bring out an alternative and efficient strategy of preprocessing and segmentation in handling OCR for Devanagari scripts. Efficiency evaluation of the proposed alternative has been undertaken by subjecting it to documents with varying degree of noise severity and border artifacts. The experimental results confirm our proposition to be superior approach over other conventional methodologies to OCR system implementation for Devanagari scripts. Also described is detailed approach to conventional pre-processing involved in initial stage of OCR, including noise removal techniques, along with the other conventional approaches to segmentation. The proposed alternative has been deployed to reach character and top character segmentation level.

01 Jan 2013
TL;DR: A new is used for recognition of handwritten Devanagari characters and the segmentation, feature extraction, water reservoir method, and neural network techniques are explained.
Abstract: Recognition of Devanagari numerals is a difficult task.Extensive research has been done on character recognition in the last few decades.In optical character a numeral, character or symbol to be recognized can be machined printed or handwritten character or numeral. There are several approaches that deal with the study of handwritten Devanagari numerals depending on the type of feature extracted and different way of extracting them. In this paper, a new is used for recognition of handwritten Devanagari characters. We explain the segmentation, feature extraction, water reservoir method, and neural network techniques. Feature extraction provides 92% accuracy of numeral recognition; water reservoir provides 94.34% accuracy. We also discuss various problems exist in numeral recognition.

01 Jan 2013
TL;DR: This paper is combining statistical, structural Global transformation and moments features to form hybrid feature vector to abolish the hitc h of misclassification and increase the classifier accuracy combining SVM and KNN together.
Abstract: In this paper we combining statistical, structural Global transformation and moments features to form hybrid feature vector .We are combining Classifiers for ach ieving high accuracy for Devanagari Script. To abolish the hitc h of misclassification and increase the classifier accur ac combining SVM and KNN together. The dataset used for experiment are created by us.

01 Jan 2013
TL;DR: This paper analyses the various approaches and challenges concerning offline Sanskrit (Devanagari) handwritten character recognition and offers many motivating challenges to researchers.
Abstract: Sanskrit (Devanagari), an alphabetic script, is used by over 500 million people all over the world. Recognition of Sanskrit (Devanagari) handwritten scripts is complicated compared to other language scripts. However, many researchers have provided real-time solutions for offline Sanskrit character recognition also. Offline Sanskrit handwritten documents recognition still offers many motivating challenges to researchers. Current research offers many solutions on Sanskrit (Devanagari) handwritten documents recognition even then reasonable accuracy and performance has not been achieved. This paper analyses the various approaches and challenges concerning offline Sanskrit (Devanagari) handwritten character recognition.

Patent
16 Dec 2013
TL;DR: In this paper, a method for recognizing Devanagari script handwriting is described, which is based on one or more shirorekha detection criteria, such as the length of the stroke, horizontal position of stroke, straightness of stroke and the position in time at which stroke is made in relation to other strokes in the handwritten input.
Abstract: Methods and systems for recognizing Devanagari script handwriting are provided. A method may include receiving a handwritten input and determining that the handwritten input comprises a shirorekha stroke based on one or more shirorekha detection criteria. Shirorekha detection criteria may be at least one criterion such as a length of the shirorekha stroke, a horizontality of the shirorekha stroke, a straightness of the shirorekha stroke, a position in time at which the shirorekha stroke is made in relation to one or more other strokes in the handwritten input, and the like. Next, one or more recognized characters may be provided corresponding to the handwritten input.

01 Jan 2013
TL;DR: An efficient image retrieval technique which uses dominant color and texture features of an image by extracting various supportive features like moments invariant, vector Gradient, chain code, image thinning, structuring the image in box format, noise removal, etc.
Abstract: In India, more than 300 million people use Devanagari script for documentation. There has been a significant improvement in the research related to the recognition of printed as well as handwritten Devanagari text in the past few years. An attempt is made to address the most important results reported so far and it is also tried to highlight the beneficial directions of the research till date. In this paper we propose an efficient image retrieval technique which uses dominant color and texture features of an image. Though, Affine Moment invariant technique is well experimented by many researchers, an attempt is made to enhance the existing results by extracting various supportive features like moments invariant, vector Gradient, chain code(freeman chain code) image thinning, structuring the image in box format, noise removal, etc. A performance of approximately 90% correct recognition is achieved.

01 Jan 2013
TL;DR: Geometric and Zernike moment features of Devanagari basic and compound Character are used to recognize the handwritten character in this research paper.
Abstract: The recognition of Handwritten Devanagari character plays a vital role in the research area. Number of approaches has been used in preceding researches and still it is being carried out ahead. In this research paper, Geometric and Zernike moment features of Devanagari basic and compound Character are used to recognize the handwritten character. Compound character is a special feature of Devanagari scripting; it joins two or more character in various ways forming a new character. The complexity and frequency occur in writing the compound character is more as compared to other languages. The proposed system is trained and tested on 27000 handwritten Devanagari basic and compound character database collected from different people. Each image is normalized to 30X30 pixel size. For recognition of Devanagari basic and compound character we have used MLP and KNN. The recognition rate is 98.78% and 95.56% which is comprehensive to the meth- od.

Proceedings ArticleDOI
25 Aug 2013
TL;DR: An approach for selecting best discriminative primitives for writer recognition is presented and a hybrid system by combining both writer recognition and handwriting recognition for improved accuracy is proposed.
Abstract: Writer recognition based on peculiarity of hand-writing is an important aspect of any forensic analysis. We present an approach for selecting best discriminative primitives for writer recognition. After selecting the primitives we also propose a hybrid system by combining both writer recognition and handwriting recognition for improved accuracy. We have also validated the performance of selected primitives on publically available dataset. We have performed this study on the Devanagri script. Experimental results verified the effectiveness of the proposed franework.

01 Jan 2013
TL;DR: This research work aims at the development of Hindi stemmer based on Devanagari script for stripping both prefixes as well as suffixes from derived word to provide better stemming than previous stemmers.
Abstract: In today's word of internet web search engines are developing the techniques to make the surfing faster. Stemming is a technique used by web search engines for prefix and suffix removal from the derived word. Stemming provides the way to store similar documents together. This research work aims at the development of Hindi stemmer based on Devanagari script for stripping both prefixes as well as suffixes from derived word to provide better stemming than previous stemmers. Proposed stemmer uses the hybrid approach which is the combination of lookup algorithm, suffix stripping algorithm and prefix removal algorithm.

Journal ArticleDOI
30 Apr 2013
TL;DR: Handwritten character recognition is the important area in image processing and pattern recognition fields where the aim is too atomized and reduces the human efforts for form filling, job application, bank and postal automation.
Abstract: Handwritten character recognition is the important area in image processing and pattern recognition fields. This field of research is applicable to various application areas where the aim is too atomized and reduces the human efforts for form filling, job application, bank and postal automation [1-3] etc. Handwritten character recognition in Indian script [4] is a challenging task specially Devanagari, for several reason like complex structure of character with their modifiers and present of compound character. Compound character are those where two or more character are joined together to produce a special character. These characters are such type in which one half of character is connected to full character. Thus there are large variations in shape of character as writing style, pen quality (thick/thin), strokes that substantial extent the recognition accuracy. Writing style in Devanagari script is from left to right. The concept of upper/lower case is absent in Devanagari script. In Devanagari script a vowel following a consonant takes a modified shape. Depending on the vowel, its modified shape is placed at the left, right (or both) or bottom of the consonant. These modified shapes are called modified characters. A consonant or vowel following a consonant sometimes takes a compound orthographic shape, which we call as compound character.

Proceedings ArticleDOI
01 Nov 2013
TL;DR: This paper will present the Grapheme to Phoneme (G2P) converter module used in this system which converts Sanskrit text in Devanagari UTF-8 into its phonetic representation with assigned syllable boundaries and stress values of the syllable.
Abstract: The authors of this paper have developed a speech synthesis system for Sanskrit. This paper will present the Grapheme to Phoneme (G2P) converter module used in this system which converts Sanskrit text in Devanagari UTF-8 into its phonetic representation with assigned syllable boundaries and stress values of the syllable. The stress rules applied here are very basic and are different from Vedic supra-segmental svaras, Though the Sanskrit G2P converter, converting Unicode Sanskrit text into plain phone sequence representation, was already there, it lacked syllabification and stress marking. Syllable is very important unit in speech and in speech technology. Festival framework, which is used for said Sanskrit speech synthesis and many other speech synthesis systems, by default, requires the phonetic representation with syllable boundaries and stress values. Also many features used for F0 and duration modeling depend on syllable. Thus the phonetic representation with and without syllabification makes a significant difference.

Journal Article
TL;DR: The main objective of the paper is to test the possibility of using the MI for recognition of printed character independent of its Size, slant and other variations.
Abstract: In this paper we deals with the recognition of printed Devanagari Characters with neural network approach. The paper shows measurement of the effectiveness classifier in terms of precision in recognition. It is also a benchmark for testing and verifying new pattern recognition theories and algorithms. 10 samples of each devanagari vowel and consonant from 10 different printed kruti dev font have been sampled and database was prepared. After segmentation, an individual image is normalized to 100X100 pixel size. Seven moment invariants (MIs) are evaluated for each character along with GLCM properties like Contrast, Homogeneity, Entropy, Correlation , color domain and histogram. The Neural network function has been adopted for classification. The main objective of the paper is to test the possibility of using the MI for recognition of printed character independent of its Size, slant and other variations.

01 Jan 2013
TL;DR: The frequency analysis of spoken Devnagari script and Numerals from the original speech signals can be potentially utilized in implementation of a voice-driven help setup at call centres of commercial organizations operating in India and other foreign region.
Abstract: This paper contains the frequency analysis of spoken Devnagari script and Numerals from the original speech signals. Devnagari vowels and numerals are playing the vital role in pronunciation of any word or counting. Each vowel & number is classified as starting, middle and end according to the duration of occurrences in the word. The Devnagari script having 12-vowels and 34-consonants are used in some Indian language like Hindi and 10 numerals (0-9) are used in mathematics. Sound samples from multiple speakers were utilized to extract different features. Initial processing of data, i.e., normalizing and time-slicing was done using a combination of Simulink and MATLAB. Afterwards, the same tools were used for calculation of Fourier descriptions and correlations. The correlation allowed comparison of the same words or numeral spoken by the same and different speakers. So the frequency has been calculated in statistical manner and generates a table between amplitude and frequencies. Mean and standard deviation such a system can be potentially utilized in implementation of a voice-driven help setup at call centres of commercial organizations operating in India and other foreign region. The implementation, experiments and result discussions are also existence.

Journal Article
TL;DR: An artificial neural network based classifier and statistical and structural method based feature extraction for Hindi Characters and Self organizing map (SOM) is proposed.
Abstract: Devanagari is one of the basic Script widely used for many Indian Languages Like Hindi, Marathi, Rajasthani etc. Devanagari Scripts Hindi language is the third common language used all over the word. In this work we propose an artificial neural network based classifier and statistical and structural method based feature extraction. Optical isolated Hindi Characters are taken as an input image through the scanner. An input image is preprocessed and is segmented in terms of various structural and stastical features like End points, middle bar, loop, end bar, aspect ratio. Features are extracted and the feature vector is applied to Self organizing map (SOM) which is one of the classifier of an artificial neural Network. SOM is trained for such 500 different characters collected from 500 persons. The characters are classified into three different classes. The proposed classifier attains 91% accuracy.

01 Jan 2013
TL;DR: A technical review of the state-of-the-art techniques in Devanagari hand writing recognition is presented in this paper, where the authors present a set of techniques for Indian handwriting recognition.
Abstract: A technical Review of the state of art: techniques in Devanagari hand writing recognition is presented. The handwriting recognition is matured for Roman, Japanese and Chinese and Arabian language scripts but for Indian languages a lot of scope is there. For Indian languages most of the work is limited to isolated characters and numerals. Compound characters and word recognition has not been explored to that extent.

01 Jan 2013
TL;DR: A simple technique for script identification from a set of English, Telugu, and Devanagari document images in printed form is presented and uses stroke features, pixel distribution along a sequence of words.
Abstract: India is a multilingual, multi-script country. There are totally 22 official languages and 12 scripts in India. People adopted to use two or more languages resulting in bilingual and trilingual documents. Many official documents are available with a combination of local language, English and sometimes Hindi. In this context script identification relies on the fact that each script has unique spatial distribution and visual attributes that make it possible to distinguish it from other scripts. Many script identification methods such as Distribution of an index of optical density method, identification of frequently occurring connected component templates, filtered pixel projection profiles vertical and horizontal projection profiles of document images were proposed earlier. In this work, a simple technique for script identification from a set of English, Telugu, and Devanagari document images in printed form is presented. The proposed system uses stroke features, pixel distribution along a sequence of words.

Book ChapterDOI
21 Jul 2013
TL;DR: The effect of transliteration, on the human readability is explored by studying the changes in the eye-gaze patterns, which are recorded with an eye-tracker during experimentation over the areas of interest.
Abstract: We present our efforts on studying the effect of transliteration, on the human readability. We have tried to explore the effect by studying the changes in the eye-gaze patterns, which are recorded with an eye-tracker during experimentation. We have chosen Hindi and English languages, written in Devanagari and Latin scripts respectively. The participants of the experiments are subjected to transliterated words and asked to speak the word. During this, their eye movements are recorded. The eye-tracking data is later analyzed for eye-fixation trends. Quantitative analysis of fixation count and duration as well as visit count is performed over the areas of interest.

01 Jan 2013
TL;DR: A methodology for extracting text from printed image document is presented and Devanagari Script (Hindi language) from extracted text is identified and compared with edge based and connected component with projection profile approach.
Abstract: Texts that appear in the image contain useful and important information. Optical Character Recognition technology is restricted to finding text printed against clean backgrounds, and cannot handle text printed against shaded or textured backgrounds or embedded in images. It is necessary to extract the text form image which is helpful in a society for a blind and visually impaired person when voice synthesizer is attached with the system. In this paper, we present a methodology for extracting text from printed image document and then identified Devanagari Script (Hindi language) from extracted text. Firstly we used Morphological Approach for extracting the text from image documents. The resultant text image is passed to Optical Character Recognition for Identification purpose. Projection profile is used for segmentation followed by Visual Discriminating approach for feature extraction. Finally for classification purpose Heuristic search is used. The result of proposed method for text extraction is compared with edge based and connected component with projection profile approach. After comparison using precision and recall rate it is observed that proposed algorithm work well.