scispace - formally typeset
Search or ask a question

Showing papers on "Intelligent word recognition published in 2010"


Book ChapterDOI
05 Sep 2010
TL;DR: It is argued that the appearance of words in the wild spans this range of difficulties and a new word recognition approach based on state-of-the-art methods from generic object recognition is proposed, in which object categories are considered to be the words themselves.
Abstract: We present a method for spotting words in the wild, i.e., in real images taken in unconstrained environments. Text found in the wild has a surprising range of difficulty. At one end of the spectrum, Optical Character Recognition (OCR) applied to scanned pages of well formatted printed text is one of the most successful applications of computer vision to date. At the other extreme lie visual CAPTCHAs - text that is constructed explicitly to fool computer vision algorithms. Both tasks involve recognizing text, yet one is nearly solved while the other remains extremely challenging. In this work, we argue that the appearance of words in the wild spans this range of difficulties and propose a new word recognition approach based on state-of-the-art methods from generic object recognition, in which we consider object categories to be the words themselves. We compare performance of leading OCR engines - one open source and one proprietary - with our new approach on the ICDAR Robust Reading data set and a new word spotting data set we introduce in this paper: the Street View Text data set. We show improvements of up to 16% on the data sets, demonstrating the feasibility of a new approach to a seemingly old problem.

503 citations


01 Jan 2010
TL;DR: Experimental result shows that the approach used in this paper for English character recognition is giving high recognition accuracy and minimum training time.
Abstract: Neural Networks are recently being used in various kind of pattern recognition. Handwritings of different person are different; therefore it is very difficult to recognize the handwritten characters. Handwritten Character recognition is an area of pattern recognition that has become the subject of research during the last some decades. Neural network is playing an important role in handwritten character recognition. Many reports of character recognition in English have been published but still high recognition accuracy and minimum training time of handwritten English characters using neural network is an open problem. Therefore, it is a great important to develop an automatic handwritten character recognition system for English language [1]. In this paper, efforts have been made to develop automatic handwritten character recognition system for English language with high recognition accuracy and minimum training and classification time. Experimental result shows that the approach used in this paper for English character recognition is giving high recognition accuracy and minimum training time.

129 citations


Journal ArticleDOI
TL;DR: The proposed methodology relies on a new feature extraction technique based on recursive subdivisions of the character image so that the resulting sub-images at each iteration have balanced (approximately equal) numbers of foreground pixels, as far as this is possible.

109 citations


Journal ArticleDOI
TL;DR: A system for recognizing offline handwritten Tamil characters using support vector machine (SVM) has achieved a very good recognition accuracy of 82.04% on the handwritten Tamil character database.
Abstract: This paper describes a system for recognizing offline handwritten Tamil characters using support vector machine (SVM). Data samples are collected from different writers on A4 sized documents. They are scanned using a flat bed scanner at a resolution of 300 dpi and stored as gray-scale images. Various preprocessing operations are performed on the digitized image to enhance the quality of the image. Pixel densities are calculated for 64 different zones of the image and these values are used as the features of a character. These features are used to train the SVM. The SVM is tested for the first time to recognize handwritten Tamil characters. The system has achieved a very good recognition accuracy of 82.04% on the handwritten Tamil character database.

105 citations


Posted Content
TL;DR: In this article, a novel approach for recognition of handwritten compound Bangla characters along with the Basic characters of Bangla alphabet is presented, which makes an attempt to identify compound character classes from most frequently to less frequently occurred ones, i.e., in order of importance.
Abstract: A novel approach for recognition of handwritten compound Bangla characters, along with the Basic characters of Bangla alphabet, is presented here. Compared to English like Roman script, one of the major stumbling blocks in Optical Character Recognition (OCR) of handwritten Bangla script is the large number of complex shaped character classes of Bangla alphabet. In addition to 50 basic character classes, there are nearly 160 complex shaped compound character classes in Bangla alphabet. Dealing with such a large varieties of handwritten characters with a suitably designed feature set is a challenging problem. Uncertainty and imprecision are inherent in handwritten script. Moreover, such a large varieties of complex shaped characters, some of which have close resemblance, makes the problem of OCR of handwritten Bangla characters more difficult. Considering the complexity of the problem, the present approach makes an attempt to identify compound character classes from most frequently to less frequently occurred ones, i.e., in order of importance. This is to develop a frame work for incrementally increasing the number of learned classes of compound characters from more frequently occurred ones to less frequently occurred ones along with Basic characters. On experimentation, the technique is observed produce an average recognition rate of 79.25 after three fold cross validation of data with future scope of improvement and extension.

84 citations


Proceedings ArticleDOI
02 Apr 2010
TL;DR: A wearable input device which enables the user to input text into a computer via character gestures, like using an imaginary blackboard, and a data glove, equipped with three gyroscopes and three accelerometers to measure hand motion is presented.
Abstract: In this work we present a wearable input device which enables the user to input text into a computer. The text is written into the air via character gestures, like using an imaginary blackboard. To allow hands-free operation, we designed and implemented a data glove, equipped with three gyroscopes and three accelerometers to measure hand motion. Data is sent wirelessly to the computer via Bluetooth. We use HMMs for character recognition and concatenated character models for word recognition. As features we apply normalized raw sensor signals. Experiments on single character and word recognition are performed to evaluate the end-to-end system. On a character database with 10 writers, we achieve an average writer-dependent character recognition rate of 94.8% and a writer-independent character recognition rate of 81.9%. Based on a small vocabulary of 652 words, we achieve a single-writer word recognition rate of 97.5%, a performance we deem is advisable for many applications. The final system is integrated into an online word recognition demonstration system to showcase its applicability.

63 citations


Journal ArticleDOI
TL;DR: The main purpose of this paper is to provide the new segmentation technique based on structure approach for Handwritten Hindi text, and the overall results are very promising.
Abstract: The main purpose of this paper is to provide the new segmentation technique based on structure approach for Handwritten Hindi text. Segmentation is one of the major stages of character recognition. The handwritten text is separated into lines, lines into words and words into characters. The errors in segmentation propagate to recognition. The performance is evaluated on handwritten data of 1380 words of 200 lines written by 15 different writers. The overall results of segmentation are very promising.

59 citations


Proceedings ArticleDOI
06 Dec 2010
TL;DR: This article reported the results of online and offline handwritten Chinese character recognition using the new generation of databases, targeting 3,755 Chinese characters of the GB2312-80 first level set.
Abstract: Chinese handwriting recognition remains a challenge. Research works have reported very high accuracies on neatly handwritten characters yet the performance on unconstrained handwriting remains quite low. To promote the recognition technology, new databases of unconstrained handwriting have been constructed for academic research and benchmarking. This paper reports the contest results of online and offline handwritten Chinese character recognition using the new generation of databases, targeting 3,755 Chinese characters of the GB2312-80 first level set. Nine systems from four groups were submitted for evaluation. The best results are 92.39% accuracy for online character recognition and 89.99% accuracy for offline character recognition. Detailed analysis of results on data of different writers reveals the diversity of writing quality. The future contests will consider continuous script recognition as well as isolated character recognition.

53 citations


Proceedings ArticleDOI
19 Mar 2010
TL;DR: This paper proposes a recognition model for English handwritten (lowercase, uppercase and letter) character recognition that uses Freeman chain code (FCC) as the representation technique of an image character.
Abstract: This paper proposes a recognition model for English handwritten (lowercase, uppercase and letter) character recognition that uses Freeman chain code (FCC) as the representation technique of an image character. Chain code representation gives the boundary of a character image in which the codes represent the direction of where is the location of the next pixel. An FCC method that uses 8-neighbourhood that starts from direction labelled as 1 to 8 is used. Randomized algorithm is used to generate the FCC. After that, features vector is built. The criteria of features to input the classification is the chain code that converted to 64 features. Support vector machine (SVM) is chosen for the classification step. NIST Databases are used as the data in the experiment. Our test results show that by applying the proposed model, we reached a relatively high accuracy for the problem of English handwritten recognition.

51 citations


Journal ArticleDOI
TL;DR: A ''critical region analysis'' technique which highlights the critical regions that distinguish one character from another similar character is proposed and a record high recognition rate of 99.53% on the ETL-9B database is obtained.

49 citations




Journal ArticleDOI
TL;DR: A system that locates words in document image archives bypassing character recognition and using word images as queries makes use of document image processing techniques, in order to extract powerful features for the description of the word images.

Posted Content
TL;DR: A feature set of 88 features is designed to represent samples of handwritten Arabic numerals designed to include 72 shadow and 16 octant features and can be extended to include OCR of handwritten characters of Arabic alphabet.
Abstract: Handwritten numeral recognition is in general a benchmark problem of Pattern Recognition and Artificial Intelligence Compared to the problem of printed numeral recognition, the problem of handwritten numeral recognition is compounded due to variations in shapes and sizes of handwritten characters Considering all these, the problem of handwritten numeral recognition is addressed under the present work in respect to handwritten Arabic numerals Arabic is spoken throughout the Arab World and the fifth most popular language in the world slightly before Portuguese and Bengali For the present work, we have developed a feature set of 88 features is designed to represent samples of handwritten Arabic numerals for this work It includes 72 shadow and 16 octant features A Multi Layer Perceptron (MLP) based classifier is used here for recognition handwritten Arabic digits represented with the said feature set On experimentation with a database of 3000 samples, the technique yields an average recognition rate of 9493% evaluated after three-fold cross validation of results It is useful for applications related to OCR of handwritten Arabic Digit and can also be extended to include OCR of handwritten characters of Arabic alphabet

Proceedings ArticleDOI
01 Mar 2010
TL;DR: An approach to segment the scanned document image using the concept of variable sized window, that is, the window whose size can be adjusted according to needs, was implemented and results were analyzed.
Abstract: The scanned text image is a non editable image though it has the text but one can not edit it or make any change, if required, to that scanned document. This provides a basis for the optical character recognition (OCR) theory. OCR is the process of recognizing a segmented part of the scanned image as a character. The overall OCR process consists of three major sub processes like pre processing, segmentation and then recognition. Out of these three, the segmentation process is the back bone of the overall OCR process. We can say that the segmentation process is the most significant process because if the segmentation is incorrect then we can not have the correct results; it is just like garbage in and garbage out. But it is not an easy job, because segmentation is one of the complex processes. It is more difficult if the document is handwritten because in that case only few points are there which can be used to make segmentation. In this paper, we formulate an approach to segment the scanned document image. As per this approach, initially this considers the whole image as one large window. Then this large window is broken into less large windows giving lines, once the lines are identified then each window consisting of a line is used to find a word present in that line and finally to characters. For that purpose we used the concept of variable sized window, that is, the window whose size can be adjusted according to needs. This concept was implemented and results were analyzed. After the analysis the same concept was modified and finally tried on different documents and we got good reasonable results.

Journal ArticleDOI
TL;DR: The proposed character recognition system using multilayer Feed forward neural network will aid applications for postal/parcel address recognition and conversion of any hand written document into structural text form.
Abstract: handwritten character recognition system using multilayer Feed forward neural network is proposed in this paper. The character data set suitable for recognizing postal addresses contains 38 elements which include 26 alphabets, 10 numerals and 2 symbols. Fifteen different handwritten data sets were used for training the neural network for classification and recognition of the characters. Three different orientations, namely, horizontal, vertical and diagonal directions are used for extracting 54 features from each character. The trained neural recognition system is tested for various inputs and found to perform well. The diagonal orientation for feature extraction is identified to be the most suitable method as it yields higher recognition accuracy. The proposed system will aid applications for postal/parcel address recognition and conversion of any hand written document into structural text form.

Journal ArticleDOI
TL;DR: Experiments indicate that the proposed recognition system performs well with the combined features and is robust to the writing variations that exist between persons and for a single person at different instances, thus being promising for user independent character recognition.
Abstract: In this paper a backpropagation neural network based handwritten characters (Mapum Mayek ) recognition system of Manipuri Script is investigated. This paper presents various steps involved in the recognition process. It begins with thresholding of gray level image into binarised image, then from the binarised image the character pattern is segmented using connected component analysis and from the resized character matrix, its probabilistic features and fuzzy features are extracted. Using these features the network is trained and recognition tests are performed. Experiments indicate that the proposed recognition system performs well with the combined features and is robust to the writing variations that exist between persons and for a single person at different instances, thus being promising for user independent character recognition.

Proceedings ArticleDOI
16 Nov 2010
TL;DR: This paper provides a detailed analysis in order to understand the results and find the merits of the local approach to part-based character recognition.
Abstract: In the part-based recognition method proposed in this paper, a handwritten character image is represented by just a set of local parts. Then, each local part of the input pattern is recognized by a nearest-neighbor classifier. Finally, the category of the input pattern is determined by aggregating the local recognition results. This approach is opposed to conventional character recognition approaches which try to benefit from the global structure information as much as possible. Despite a pessimistic expectation, we have reached recognition rates much higher than 90% for a digit recognition task. In this paper we provide a detailed analysis in order to understand the results and find the merits of the local approach.

Journal ArticleDOI
TL;DR: This character recognition finds applications in document analysis where the handwritten document can be converted to editable printed document and structure analysis suggested that the proposed system of RCS with back propagation network is given higher recognition rate.
Abstract: Handwritten character recognition is a difficult problem due to the great variations of writing styles, different size and orientation angle of the characters. The scanned image is segmented into paragraphs using spatial space detection technique, paragraphs into lines using vertical histogram, lines into words using horizontal histogram, and words into character image glyphs using horizontal histogram. The extracted features considered for recognition are given to Support Vector Machine, Self Organizing Map, RCS, Fuzzy Neural Network and Radial Basis Network. Where the characters are classified using supervised learning algorithm. These classes are mapped onto Unicode for recognition. Then the text is reconstructed using Unicode fonts. This character recognition finds applications in document analysis where the handwritten document can be converted to editable printed document. Structure analysis suggested that the proposed system of RCS with back propagation network is given higher recognition rate.

Journal ArticleDOI
TL;DR: This paper is extracting Gradient feature of handwritten and ISM printed characters of devanagri script using Sobel and Robert operator and computing gradient in 8,12,16,32 directions and getting different feature vectors respectively.
Abstract: In this paper we are extracting feature of handwritten and ISM printed characters of devanagri script. we are extracting Gradient feature of the devanagari script ,for that we are using two operators i.e. Sobel and Robert operator respectively . Here we are computing gradient in 8,12,16,32 directions and getting different feature vectors respectively. We are using each directional vector separately for classification.

Proceedings ArticleDOI
01 Dec 2010
TL;DR: A presentation on attempt to extract words from handwritten text lines in Gujarati script using combination of some proven methods like projection profile with morphological operations is used to enhance accuracy of the word extraction.
Abstract: A presentation on attempt to extract words from handwritten text lines in Gujarati script is hereby submitted. The very cursive nature of most Indian scripts makes the word extraction process a very critical one for Optical Character Recognition (OCR) activity. This cursive nature also causes difficulty during character extraction and modifier extraction. Word extraction is considered as one of the important stage of OCR, which directly affects the accuracy level of OCR. A combination of some proven methods like projection profile with morphological operations is used to enhance accuracy of the word extraction.

Proceedings ArticleDOI
05 Aug 2010
TL;DR: An efficient Online Handwritten character Recognition System for Malayalam Characters (OHR-M) using K-NN algorithm is presented and gives an excellent accuracy of 98.125% with recognition time of 15-30 milliseconds.
Abstract: On-line handwriting recognition has been a frontier area of research for the last few decades under the purview of pattern recognition. Word processing turns to be a vexing experience even if it is with the assistance of an alphanumeric keyboard in Indian languages. A natural solution for this problem is offered through online character recognition. There is abundant literature on the handwriting recognition of western, Chinese and Japanese scripts, but there are very few related to the recognition of Indic script such as Malayalam. This paper presents an efficient Online Handwritten character Recognition System for Malayalam Characters (OHR-M) using K-NN algorithm. It would help in recognizing Malayalam text entered using pen-like devices. A novel feature extraction method, a combination of time domain features and dynamic representation of writing direction along with its curvature is used for recognizing Malayalam characters. This writer independent system gives an excellent accuracy of 98.125% with recognition time of 15-30 milliseconds.

01 Nov 2010
TL;DR: In this article, the features of statistical and semantic information of MICR have been used in back-propagation neural network as input nodes to achieve the high accuracy rates and very fast recognition rate compare with other recognition systems.
Abstract: This paper contributes an effective recognition approach for Myanmar Handwritten Characters. In this article, Hybrid approach use ICR and OCR recognition through MICR (Myanmar Intelligent Character Recognition) and backpropagation neural network. MICR is one kind of ICR. It composed of statistical/semantic information and final decision is made by voting system. In Hybrid approach, the features of statistical and semantic information of MICR have been used in back-propagation neural network as input nodes. So it needs a few input nodes to use. The back-propagation algorithm has been used to train the feed-forward neural network and adjustment of weights to require the desired output. The purpose of Hybrid approach to achieve the high accuracy rates and very fast recognition rate compare with other recognition systems. The experiments were carried out on 1000 words samples of different writer. Using Hybrid approach, over-all recognition accuracy of 95% was obtained.

01 Jan 2010
TL;DR: Structural analysis suggested that the proposed system of RCS with back propagation network is given higher recognition rate, and the training set produced much higher recognition rates than the test set.
Abstract: Hand written Tamil Character recognition refers to the process of conversion of handwritten Tamil character into Unicode Tamil character. The scanned image is segmented into paragraphs using spatial space detection technique, paragraphs into lines using vertical histogram, lines into words using horizontal histogram, and words into character image glyphs using horizontal histogram. The extracted features considered for recognition are given to Support Vector Machine, Self Organizing Map, RCS, Fuzzy Neural Network and Radial Basis Network. Where the characters are classified using supervised learning algorithm. These classes are mapped onto Unicode for recognition. Then the text is reconstructed using Unicode fonts. This character recognition finds applications in document analysis where the handwritten document can be converted to editable printed document. This approach can be extended to recognition and reproduction of hand written documents in South Indian languages. In the training set, a recognition rate of 100% was achieved and in the test set the recognized speed for each character is 0.1sec and accuracy is 97%. Understandably, the training set produced much higher recognition rate than the test set. Structure analysis suggested that the proposed system of RCS with back propagation network is given higher recognition rate. Handwritten character recognition is a difficult problem due to the great variations of writing styles, different size and orientation angle of the characters. Among different branches of handwritten character recognition it is easier to recognize English alphabets and numerals than Tamil characters. Many researchers have also applied the excellent generalization capabilities offered by ANNs to the recognition of characters. Many studies have used fourier descriptors and Back Propagation Networks for classification tasks. Fourier descriptors were used in to recognize handwritten numerals. Neural Network approaches were used to classify tools. There have been only a few attempts in the past to address the recognition of printed or handwritten Tamil Characters. However, less attention had been given to Indian language recognition. Some efforts have been reported in the literature

Journal ArticleDOI
TL;DR: Zoning based feature extraction system which calculates the densities of object pixels in each zone provides better recognition accuracy than manual system and is presented as a new approach to off-line handwritten numeral recognition.
Abstract: paper presents a new approach to off-line handwritten numeral recognition. From the concept of perturbation due to writing habits and instruments, we propose a recognition method which is able to account for a variety of distortions due to eccentric handwriting. The recognition of handwritten numerals is a challenging task in the field of image processing and pattern recognition. It can be considered as one of the benchmarks in evaluating feature extraction methods and the performance of classifiers. The performance of character recognition system depends heavily on what kind of features are being used. The objective of this paper is to provide efficient and reliable techniques for recognition of handwritten numerals. In this paper we propose Zoning based feature extraction system which calculates the densities of object pixels in each zone. Firstly the whole image is divided into 4 4 zones. Further in order to gain more accuracy these zones are divided into 6 6 zones. The division of zones carried out up to 8 8 zones. Hence 116 features are extracted in all. Nearest neighbour classifier is used for subsequent classification and recognition purpose. For the testing purpose an Award List has been used. Data has been collected from 200 users and extracted individual digits from these forms. These forms are filled out from different users in order to take different samples of handwriting. This Award List has 4-digit Roll no, 4-digit Code no and 3-digit Marks. The outcome of the research will be an automated system for recognition of awards. This automated system will recognize only digits. This procedure can also be done manually, but that is a tedious task and prone to error. This system's complexity lies in the different handwriting styles which vary from human to human. Thus automated system provides better recognition accuracy than manual system. The award list which is used in the automated system is shown below: The rest of the paper is organized into five sections. In the Section 2 we will briefly explain about the review of literature in which the feature extraction technique along with the classifier is discussed. Section 3 describes the proposed system. In section 4 we will discuss about Recognition Result and Comparisons among Different Zoning Techniques and finally conclusion is given in section 5.

Proceedings ArticleDOI
04 Nov 2010
TL;DR: A handwritten character recognition algorithm based on artificial immune that steals the merit of self-adaptive learning, and immune memory in the biology immune system, which can also be applied to abnormity detection and pattern recognition.
Abstract: Handwritten character recognition is an important research and application area on pattern recognition theory, which plays an important role on realizing automation of inputting character at all cases. In order to improve the rate of character recognition and decrease the time of recognition training, referencing to immune biological principle, a handwritten character recognition algorithm based on artificial immune is proposed. The antigen and memory cell in the artificial immune system are described. The equations of clone selection principle and of evolving memory cell are established. Finally, the process of character recognition is given. The experiment uses the well-know character set providing by F.Prat from UCI. The simulation results show that the method has faster speed and higher accuracy than the traditional handwritten recognition based on neural network. The algorithm steals the merit of self-adaptive learning, and immune memory in the biology immune system, which can also be applied to abnormity detection and pattern recognition.

Proceedings ArticleDOI
09 Apr 2010
TL;DR: Online recognition of multi stroke (two-, three-, and four-stroke) handwritten Urdu characters is presented in this paper, whereas, single-stroke character recognition was presented in a preceding work.
Abstract: Character recognition has enjoyed a lot of research in the recent past. Good recognition systems are available commercially for alphabetical languages based on Roman characters and for symbolic languages like Chinese. But languages based on Arabic alphabets like Arabic, Urdu etc. do not have such recognition systems. The recognition systems generally have a scanner or camera as the input device for off-line recognition, or a stylus/tablet as input device for online recognition. These systems are used in conjunction with the input peripheral devices like keyboards and mice. With the recent developments in electronic tablets, pen movements can be captured more accurately. This paper presents part of the work for online recognition of handwritten Urdu language characters. Urdu language is based on Arabic alphabets with larger character set as compared to Arabic (37 characters). Urdu, due to its large character set and limited number strokes, is difficult to recognize. Many characters are similar with little differences. Online recognition of multi stroke (two-, three-, and four-stroke) handwritten Urdu characters is presented in this paper, whereas, single-stroke character recognition was presented in a preceding work. After necessary preprocessing, some novel features are extracted. Various types of classification methodologies are then tested in order to find the best combination of features and classifiers for two-, three-, and four-strokes handwritten Urdu characters recognition.

Proceedings ArticleDOI
22 Nov 2010
TL;DR: A Bayesian-based probabilistic model is presented for unconstrained handwritten offline Chinese text line recognition that can incorporate isolated character recognition, character sample verification, and n-gram language model in a simple way, leading to more reliable recognition of a text line.
Abstract: A Bayesian-based probabilistic model is presented for unconstrained handwritten offline Chinese text line recognition. After pre-segmentation of a text line, plenty of invalid characters are produced which heavily interfere in the process of text line recognition. The proposed probabilistic model can incorporate isolated character recognition, character sample verification, and n-gram language model in a simple way, leading to more reliable recognition of a text line. When testing on HIT-MW database, experiments show that the proposed method can achieve character-level recognition accuracies of 63.19% without language model and 73.97% with bi-gram language model, respectively, outperforming the most recent results testing on the same dataset.

Proceedings ArticleDOI
23 Aug 2010
TL;DR: A general approach for script (and language) recognition from printed documents and for writer identification in handwritten documents based on a bag of visual word strategy where the visual words correspond to characters and the clustering is obtained by means of Self Organizing Maps (SOM).
Abstract: In this paper, we describe a general approach for script (and language) recognition from printed documents and for writer identification in handwritten documents. The method is based on a bag of visual word strategy where the visual words correspond to characters and the clustering is obtained by means of Self Organizing Maps (SOM). Unknown pages (words in the case of script recognition) are classified comparing their vectorial representations with those of one training set using a cosine similarity. The comparison is improved using a similarity score that is obtained taking into account the SOM organization of cluster centroids. % Promising results are presented for both printed documents and handwritten musical scores.

Proceedings ArticleDOI
27 Dec 2010
TL;DR: A technique for segmentation of handwritten Gurmukhi script documents into lines with the use of strip based projection profile technique, and to segment lines into words using white space and pitch method is described.
Abstract: Optical Character Recognition (OCR) is an essential part of Document Analysis System. Among few phases of an OCR system, segmentation is an important phase. After preprocessing phase, it is necessary to segment the text into lines, words and characters before the recognition of text. Segmentation is one of the most important and challenging tasks in a handwritten recognition system. Gurmukhi script can be segmented into paragraphs, lines, words and characters. This paper describes a technique for segmentation of handwritten Gurmukhi script documents into lines with the use of strip based projection profile technique, and to segment lines into words using white space and pitch method.