scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Handwritten text segmentation using average longest path algorithm

15 Jan 2013-pp 505-512
TL;DR: This paper uses a graph model that describes the possible locations for segmenting neighboring characters, and develops an average longest path algorithm to identify the globally optimal segmentation, which finds the text segmentation with the maximum average likeliness for the resulting characters.
Abstract: Offline handwritten text recognition is a very challenging problem. Aside from the large variation of different handwriting styles, neighboring characters within a word are usually connected, and we may need to segment a word into individual characters for accurate character recognition. Many existing methods achieve text segmentation by evaluating the local stroke geometry and imposing constraints on the size of each resulting character, such as the character width, height and aspect ratio. These constraints are well suited for printed texts, but may not hold for handwritten texts. Other methods apply holistic approach by using a set of lexicons to guide and correct the segmentation and recognition. This approach may fail when the lexicon domain is insufficient. In this paper, we present a new global non-holistic method for handwritten text segmentation, which does not make any limiting assumptions on the character size and the number of characters in a word. Specifically, the proposed method finds the text segmentation with the maximum average likeliness for the resulting characters. For this purpose, we use a graph model that describes the possible locations for segmenting neighboring characters, and we then develop an average longest path algorithm to identify the globally optimal segmentation. We conduct experiments on real images of handwritten texts taken from the IAM handwriting database and compare the performance of the proposed method against an existing text segmentation algorithm that uses dynamic programming.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: A comprehensive survey of the handwriting databases developed during the last two decades is presented in this article, where the ground truth information of the databases along with the supported tasks is also discussed.
Abstract: Handwriting has remained one of the most frequently occurring patterns that we come across in everyday life. Handwriting offers a number of interesting pattern classification problems including handwriting recognition, writer identification, signature verification, writer demographics classification and script recognition, etc. Research in these and similar related problems requires the availability of handwritten samples for validation of the developed techniques and algorithms. Like any other scientific domain, the handwriting recognition community has developed a large number of standard databases allowing development, evaluation and comparison of different techniques developed for a variety of recognition tasks. This paper is intended to provide a comprehensive survey of the handwriting databases developed during the last two decades. In addition to the statistics of the discussed databases, we also present a comparison of these databases on a number of dimensions. The ground truth information of the databases along with the supported tasks is also discussed. It is expected that this paper would not only allow researchers in handwriting recognition to objectively compare different databases but will also provide them the opportunity to select the most appropriate database(s) for evaluation of their developed systems.

34 citations

01 Jan 2014
TL;DR: The proposed text-line extraction algorithm for cursive handwriting is based on connected components, however, unlike conventional methods, it analysed strokes and partition under-segmented CCs into normalized ones, allowing for a range of different languages and writing styles.
Abstract: Text-line extraction in handwritten documents is an important step for document image understanding, and a number of algorithms have been proposed to address this problem. In order to overcome this limitation, we develop text-line extraction algorithm for cursive handwriting. Our method is based on connected components (CCs), however, unlike conventional methods, we analysed strokes and partition under-segmented CCs into normalized ones. Due to this normalization, the proposed method is able to estimate the states of CCs for a range of different languages and writing styles.

33 citations

Journal ArticleDOI
TL;DR: Investigational outcomes show that the proposed PPTRPRT technique is competent at extracting characters from English offline handwritten cursive scripts.
Abstract: In the present paper, we used the Pixel Plot and Trace and Re-plot and Re-trace (PPTRPRT) technique for English offline handwritten curve scripts and leads. Unlike other approaches, the PPTRPRT technique prioritizes segmentation of words and characters. The PPTRPRT technique extracts text regions from English offline handwritten cursive scripts and leads an iterative procedure for segmentation of text lines along with skew and de-skew operations. Iteration outcomes provide for pixel space-based word segmentation which enables segmentation of characters. The PPTRPRT technique embraces various dispensations in segmentation of characters from English offline handwritten cursive scripts. Moreover, various normalization steps allow for deviations in pen breadth and inscription slant. Investigational outcomes show that the proposed technique is competent at extracting characters from English offline handwritten cursive scripts.

21 citations


Cites background or methods from "Handwritten text segmentation using..."

  • ...[1] Graph model Average longest path Algo 1300 73....

    [...]

  • ...[1] tested segmentation by evaluating the local stroke geometry (imposed the width, height and aspect-ratio constraints in resultant characters), & Manoj Kumar Sharma manoj186@yahoo....

    [...]

Journal ArticleDOI
TL;DR: A new Russian and Kazakh database (with about 95% of Russian and 5% of Kazakh words/sentences respectively) for offline handwriting recognition is presented and can serve researchers in the field of handwriting recognition tasks by using deep and machine learning.
Abstract: In this paper, we present a new Russian and Kazakh database (with about 95% of Russian and 5% of Kazakh words/sentences respectively) for offline handwriting recognition. A few pre-processing and segmentation procedures have been developed together with the database. The database is written in Cyrillic and shares the same 33 characters. Besides these characters, the Kazakh alphabet also contains 9 additional specific characters. This dataset is a collection of forms. The sources of all the forms in the datasets were generated by \LaTeX which subsequently was filled out by persons with their handwriting. The database consists of more than 1400 filled forms. There are approximately 63000 sentences, more than 715699 symbols produced by approximately 200 different writers. It can serve researchers in the field of handwriting recognition tasks by using deep and machine learning.

18 citations


Cites methods from "Handwritten text segmentation using..."

  • ...It has been widely used in word spotting [11, 12,13,14], writer identification [15,16,17,18,19], handwritten text segmentation [20,21,22]and offline handwriting recognition [23,24,25,26]....

    [...]

01 Jan 2014
TL;DR: A new algorithm is discussed that can perform line segmentation in handwritten text based on projection profile technique that mainly deals with skewed text but also with overlapping and touching of characters.
Abstract: Text line segmentation is a very crucial step in optical character recognition. Poor line segmentation leads to wrong results in recognition. In printed text, line segmentation is quite easy but in handwritten text, it is quite difficult due to problems like overlapping, touching of characters and also due to different writing style of a writer. In this paper we have discussed a new algorithm that can perform line segmentation in handwritten text. This algorithm mainly deals with skewed text but also with overlapping and touching of characters. This algorithm is based on projection profile technique. We have applied this algorithm on many of document images and it has given promising results.

17 citations

References
More filters
Journal ArticleDOI
TL;DR: Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.
Abstract: LIBSVM is a library for Support Vector Machines (SVMs). We have been actively developing this package since the year 2000. The goal is to help users to easily apply SVM to their applications. LIBSVM has gained wide popularity in machine learning and many other areas. In this article, we present all implementation details of LIBSVM. Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.

40,826 citations


"Handwritten text segmentation using..." refers methods in this paper

  • ...More specifically, in this paper we use the libSVM [4] implementation for each binary SVM classifier, which has two outputs: a classification indicator of positive (+1)/negative (−1), and a probability estimate p in [0, 1] that describes the confidence of the classification....

    [...]

  • ...We train a binary SVM classifier [4] for each class of characters....

    [...]

Book
01 Jan 1993
TL;DR: In-depth, self-contained treatments of shortest path, maximum flow, and minimum cost flow problems, including descriptions of polynomial-time algorithms for these core models are presented.
Abstract: A comprehensive introduction to network flows that brings together the classic and the contemporary aspects of the field, and provides an integrative view of theory, algorithms, and applications. presents in-depth, self-contained treatments of shortest path, maximum flow, and minimum cost flow problems, including descriptions of polynomial-time algorithms for these core models. emphasizes powerful algorithmic strategies and analysis tools such as data scaling, geometric improvement arguments, and potential function arguments. provides an easy-to-understand descriptions of several important data structures, including d-heaps, Fibonacci heaps, and dynamic trees. devotes a special chapter to conducting empirical testing of algorithms. features over 150 applications of network flows to a variety of engineering, management, and scientific domains. contains extensive reference notes and illustrations.

8,496 citations


"Handwritten text segmentation using..." refers methods in this paper

  • ...To search for this optimal b and this zero-weight cycle, we can use the sequential-search algorithm [2] shown in Algorithm 1 on G....

    [...]

Journal ArticleDOI
TL;DR: A new analytic scheme, which uses a sequence of image segmentation and recognition algorithms, is proposed for the off-line cursive handwriting recognition problem and indicates higher recognition rates compared to the available methods reported in the literature.
Abstract: A new analytic scheme, which uses a sequence of image segmentation and recognition algorithms, is proposed for the off-line cursive handwriting recognition problem. First, some global parameters, such as slant angle, baselines, stroke width and height, are estimated. Second, a segmentation method finds character segmentation paths by combining gray-scale and binary information. Third, a hidden Markov model (HMM) is employed for shape recognition to label and rank the character candidates. For this purpose, a string of codes is extracted from each segment to represent the character candidates. The estimation of feature space parameters is embedded in the HMM training stage together with the estimation of the HMM model parameters. Finally, information from a lexicon and from the HMM ranks is combined in a graph optimization problem for word-level recognition. This method corrects most of the errors produced by the segmentation and HMM ranking stages by maximizing an information measure in an efficient graph search algorithm. The experiments indicate higher recognition rates compared to the available methods reported in the literature.

184 citations


"Handwritten text segmentation using..." refers methods in this paper

  • ...[3] extend the method used by [10] to segment characters by running a series of constrained shortest-path algorithms, and use a Hidden Markov Model to do word-level recognition....

    [...]

Journal ArticleDOI
TL;DR: A new methodology for character segmentation and recognition which makes the best use of the characteristics of gray-scale images and a recognition-based segmentation method is adopted.
Abstract: Generally speaking, through the binarization of gray-scale images, useful information for the segmentation of touched or overlapped characters may be lost in many cases. If we analyze gray-scale images, however, specific topographic features and the variation of intensities can be observed in the character boundaries. In this paper, we propose a new methodology for character segmentation and recognition which makes the best use of the characteristics of gray-scale images. In the proposed methodology, the character segmentation regions are determined by using projection profiles and topographic features extracted from the gray-scale images. Then a nonlinear character segmentation path in each character segmentation region is found by using multi-stage graph search algorithm. Finally, in order to confirm the nonlinear character segmentation paths and recognition results, a recognition-based segmentation method is adopted. Through the experiments with various kinds of printed documents, it is convinced that the proposed methodology is very effective for the segmentation and recognition of touched and overlapped characters.

154 citations

Journal ArticleDOI
TL;DR: A dynamic recursive segmentation algorithm is developed for effectively segmenting touching characters based on both pixel and profile projections using contextual information and spell checking to correct errors caused by incorrect recognition and segmentation.

90 citations


"Handwritten text segmentation using..." refers methods in this paper

  • ...[11] propose two different types of projections to construct a segmentation, and optimize this segmentation using a dynamic recursive algorithm and contextual information....

    [...]