scispace - formally typeset
Search or ask a question

Showing papers in "International Journal on Document Analysis and Recognition in 2016"


Journal ArticleDOI
TL;DR: A novel analysis combining results from competing systems at the level of individual strokes and stroke pairs is provided, which allows us to more closely examine limitations for current state-of-the-art systems.
Abstract: The CROHME competitions have helped organize the field of handwritten mathematical expression recognition. This paper presents the evolution of the competition over its first 4 years, and its contributions to handwritten math recognition, and more generally structural pattern recognition research. The competition protocol, evaluation metrics and datasets are presented in detail. Participating systems are analyzed and compared in terms of the central mathematical expression recognition tasks: (1) symbol segmentation, (2) classification of individual symbols, (3) symbol relationships and (4) structural analysis (parsing). The competition led to the development of label graphs, which allow recognition results with conflicting segmentations to be directly compared and quantified using Hamming distances. We introduce structure confusion histograms that provide frequencies for incorrect subgraphs corresponding to ground-truth label subgraphs of a given size and present structure confusion histograms for symbol bigrams (two symbols with a relationship) for CROHME 2014 systems. We provide a novel analysis combining results from competing systems at the level of individual strokes and stroke pairs; this virtual merging of system outputs allows us to more closely examine limitations for current state-of-the-art systems. Datasets along with evaluation and visualization tools produced for the competition are publicly available.

59 citations


Journal ArticleDOI
TL;DR: A model for recognition by selection of symbol candidates, based on evaluation of relations between candidates using a set of predicates, is proposed, suitable for simpler structures where the relations are explicitly given by symbols, arrows in the case of diagrams.
Abstract: We introduce a new, online, stroke-based recognition system for hand-drawn diagrams which belong to a group of documents with an explicit structure obvious to humans but only loosely defined from the machine point of view. We propose a model for recognition by selection of symbol candidates, based on evaluation of relations between candidates using a set of predicates. It is suitable for simpler structures where the relations are explicitly given by symbols, arrows in the case of diagrams. Knowledge of a specific diagram domain is used--the two domains are flowcharts and finite automata. Although the individual pipeline steps are tailored for these, the system can readily be adapted for other domains. Our entire diagram recognition pipeline is outlined. Its core parts are text/non-text separation, symbol segmentation, their classification and structural analysis. Individual parts have been published by the authors previously and so are described briefly and referenced. Thorough evaluation on benchmark databases shows the accuracy of the system reaches the state of the art and is ready for practical use. The paper brings several contributions: (a) the entire system and its state-of-the-art performance; (b) the methodology exploring document structure when it is loosely defined; (c) the thorough experimental evaluation; (d) the new annotated database for online sketched flowcharts and finite automata diagrams.

34 citations


Journal ArticleDOI
TL;DR: A novel hybrid method, which includes three main stages to deal with document layout analysis or page segmentation, which is the combination of connected component analysis and multilevel homogeneity structure and achieves a higher accuracy compared to other methods.
Abstract: Document layout analysis or page segmentation is the task of decomposing document images into many different regions such as texts, images, separators, and tables. It is still a challenging problem due to the variety of document layouts. In this paper, we propose a novel hybrid method, which includes three main stages to deal with this problem. In the first stage, the text and non-text elements are classified by using minimum homogeneity algorithm. This method is the combination of connected component analysis and multilevel homogeneity structure. Then, in the second stage, a new homogeneity structure is combined with an adaptive mathematical morphology in the text document to get a set of text regions. Besides, on the non-text document, further classification of non-text elements is applied to get separator regions, table regions, image regions, etc. The final stage, in refinement region and noise detection process, all regions both in the text document and non-text document are refined to eliminate noises and get the geometric layout of each region. The proposed method has been tested with the dataset of ICDAR2009 page segmentation competition and many other databases with different languages. The results of these tests showed that our proposed method achieves a higher accuracy compared to other methods. This proves the effectiveness and superiority of our method.

31 citations


Journal ArticleDOI
TL;DR: In this paper, the authors make explicit use of text structure, aiming directly to the detection of region groupings corresponding to text within a hierarchy produced by an agglomerative similarity clustering process over individual regions.
Abstract: Typography and layout lead to the hierarchical organization of text in words, text lines, paragraphs. This inherent structure is a key property of text in any script and language, which has nonetheless been minimally leveraged by existing scene text detection methods. This paper addresses the problem of text segmentation in natural scenes from a hierarchical perspective. Contrary to existing methods, we make explicit use of text structure, aiming directly to the detection of region groupings corresponding to text within a hierarchy produced by an agglomerative similarity clustering process over individual regions. We propose an optimal way to construct such an hierarchy introducing a feature space designed to produce text group hypotheses with high recall and a novel stopping rule combining a discriminative classifier and a probabilistic measure of group meaningfulness based on perceptual organization. Results obtained over four standard datasets, covering text in variable orientations and different languages, demonstrate that our algorithm, while being trained in a single mixed dataset, outperforms state-of-the-art methods in unconstrained scenarios.

31 citations


Journal ArticleDOI
TL;DR: A text detection algorithm which is hybrid and multi-scale, which relies on a connected component-based approach and has been validated on three different datasets: Two come from the ICDAR competition, and the third one contains photographs the authors have taken with various daily life texts.
Abstract: In this paper, we propose a text detection algorithm which is hybrid and multi-scale. First, it relies on a connected component-based approach: After the segmentation of the image, a classification step using a new wavelet descriptor spots the letters. A new graph modeling and its traversal procedure allow to form candidate text areas. Second, a texture-based approach discards the false positives. Finally, the detected text areas are precisely cut out and a new binarization step is introduced. The main advantage of our method is that few assumptions are put forward. Thus, "challenging texts" like multi-sized, multi-colored, multi-oriented or curved text can be localized. The efficiency of TextCatcher has been validated on three different datasets: Two come from the ICDAR competition, and the third one contains photographs we have taken with various daily life texts. We present both qualitative and quantitative results.

30 citations


Journal ArticleDOI
TL;DR: Based on a formalization of header-indexed tables, an algorithmic solution to end-to-end table processing for a large class of human-readable tables is proffer.
Abstract: Much of the world's quantitative data reside in scattered web tables. For a meaningful role in Big Data analytics, the facts reported in these tables must be brought into a uniform framework. Based on a formalization of header-indexed tables, we proffer an algorithmic solution to end-to-end table processing for a large class of human-readable tables. The proposed algorithms transform header-indexed tables to a category table format that maps easily to a variety of industry-standard data stores for query processing. The algorithms segment table regions based on the unique indexing of the data region by header paths, classify table cells, and factor header category structures of two-dimensional as well as the less common multidimensional tables. Experimental evaluations substantiate the algorithmic approach to processing heterogeneous tables. As demonstrable results, the algorithms generate queryable relational database tables and semantic-web triple stores. Application of our algorithms to 400 web tables randomly selected from diverse sources shows that the algorithmic solution automates end-to-end table processing.

28 citations


Journal ArticleDOI
TL;DR: A system for recognizing online handwritten mathematical expressions (MEs), by applying improved structural analysis, is proposed and experimentally evaluated on two databases and shows that the recognition rate of the proposed system is improved, while the processing time on a common CPU is kept to a practical level.
Abstract: A system for recognizing online handwritten mathematical expressions (MEs), by applying improved structural analysis, is proposed and experimentally evaluated on two databases. With this system, MEs are represented in the form of stochastic context-free grammar (SCFG), and the Cocke---Younger---Kasami (CYK) algorithm is used to parse two-dimensional (2D) structures of online handwritten MEs and select the best interpretation in terms of the results of symbol segmentation and recognition as well as structural analysis. A concept of "body box" is proposed, and two SVM models are applied for learning and analyzing structural relations from training patterns without the need for any heuristic decisions. Stroke order is used to reduce the complexity of the parsing algorithm. Even though SCFG does not resolve ambiguities in some cases, the proposed system still gives users a list of candidates that contains the expected result. The results of experimental evaluations of the proposed system on the CROHME 2013 and CROHME 2014 databases and on an in-house ("Hand-Math") database show that the recognition rate of the proposed system is improved, while the processing time on a common CPU is kept to a practical level.

27 citations


Journal ArticleDOI
TL;DR: This work presents a novel approach to tackle the music staff removal by proposing to model the problem as a supervised learning classification task, and uses pairs of scores with and without staff lines to train classification algorithms.
Abstract: This work presents a novel approach to tackle the music staff removal. This task is devoted to removing the staff lines from an image of a music score while maintaining the symbol information. It represents a key step in the performance of most optical music recognition systems. In the literature, staff removal is usually solved by means of image processing procedures based on the intrinsics of music scores. However, we propose to model the problem as a supervised learning classification task. Surprisingly, although there is a strong background and a vast amount of research concerning machine learning, the classification approach has remained unexplored for this purpose. In this context, each foreground pixel is labelled as either staff or symbol. We use pairs of scores with and without staff lines to train classification algorithms. We test our proposal with several well-known classification techniques. Moreover, in our experiments no attempt of tuning the classification algorithms has been made, but the parameters were set to the default setting provided by the classification software libraries. The aim of this choice is to show that, even with this straightforward procedure, results are competitive with state-of-the-art algorithms. In addition, we also discuss several advantages of this approach for which conventional methods are not applicable such as its high adaptability to any type of music score.

25 citations


Journal ArticleDOI
TL;DR: A knowledge-based system to recognize historical Mongolian documents in which the words exhibit remarkable variation and character overlapping is proposed, which achieves 80.86 % word accuracy on the Mongolian Kanjur test samples.
Abstract: This paper proposes a knowledge-based system to recognize historical Mongolian documents in which the words exhibit remarkable variation and character overlapping. According to the characteristics of Mongolian word formation, the system combines a holistic scheme and a segmentation-based scheme for word recognition. Several types of words and isolated suffixes that cannot be segmented into glyph-units or do not require segmentation are recognized using the holistic scheme. The remaining words are recognized using the segmentation-based scheme, which is the focus of this paper. We exploit the knowledge of the glyph characteristics to segment words into glyph-units in the segmentation-based scheme. Convolutional neural networks are employed not only for word recognition in the holistic scheme, but also for glyph-unit recognition in the segmentation-based scheme. Based on the analysis of recognition errors in the segmentation-based scheme, the system is enhanced by integrating three strategies into glyph-unit recognition. These strategies involve incorporating baseline information, glyph-unit grouping, and recognizing under-segmented and over-segmented fragments. The proposed system achieves 80.86 % word accuracy on the Mongolian Kanjur test samples.

16 citations


Journal ArticleDOI
TL;DR: Evaluation done using a limited number of Nom historical documents after providing ground truths for them showed that the two stages of recognition along with user checking and correction improved the recognition results significantly.
Abstract: A Nom historical document recognition system is being developed for digital archiving that uses image binarization, character segmentation, and character recognition. It incorporates two versions of off-line character recognition: one for automatic recognition of scanned and segmented character patterns (7660 categories) and the other for user handwritten input (32,695 categories). This separation is used since including less frequently appearing categories in automatic recognition increases the misrecognition rate without reliable statistics on the Nom language. Moreover, a user must be able to check the results and identify the correct categories from an extended set of categories, and a user can input characters by hand. Both versions use the same recognition method, but they are trained using different sets of training patterns. Recursive X---Y cut and Voronoi diagrams are used for segmentation; k---d tree and generalized learning vector quantization are used for coarse classification; and the modified quadratic discriminant function is used for fine classification. The system provides an interface through which a user can check the results, change binarization methods, rectify segmentation, and input correct character categories by hand. Evaluation done using a limited number of Nom historical documents after providing ground truths for them showed that the two stages of recognition along with user checking and correction improved the recognition results significantly.

14 citations


Journal ArticleDOI
TL;DR: The proposed stroke grouping machine learning approach improves classification robustness in relation to different input styles and enhances the overall discriminating accuracy of the text/shape recognition system by 11.3 %.
Abstract: The paper provides a practical solution to a real-time text/shape differentiation problem for online handwriting input. The proposed structure of the classification system comprises stroke grouping and stroke classification blocks. A new set of features is derived that has low computational complexity. The method achieves 98.5 % text/shape classification accuracy on a benchmark dataset. The proposed stroke grouping machine learning approach improves classification robustness in relation to different input styles. In contrast to the threshold-based techniques, this grouping adaptation enhances the overall discriminating accuracy of the text/shape recognition system by 11.3 %. The solution improves system's response on a touch-screen device.

Journal ArticleDOI
TL;DR: This paper proposes a novel method to improve the discrimination ability of convolutional neural networks (CNNs) by hybrid learning that embeds a collection of discriminators as well as a recognizer in a shared CNN.
Abstract: The discrimination of similar patterns is important because they are the major sources of the classification error. This paper proposes a novel method to improve the discrimination ability of convolutional neural networks (CNNs) by hybrid learning. The proposed method embeds a collection of discriminators as well as a recognizer in a shared CNN. By visualizing contrastive class saliency, we show that learning with embedded discriminators leads the shared CNN to detect and catch the differences among similar classes. Also proposed is a hybrid learning algorithm that learns recognition and discrimination together. The proposed method learns recognition focusing on the differences among similar classes, and thereby improves the discrimination ability of the CNN. Unlike conventional discrimination methods, the proposed method does not require predefined sets of similar classes or additional step to integrate its result with that of the recognizer. In experiments on two handwritten Hangul databases SERI95a and PE92, the proposed method reduced classification error from 2.56 to 2.33, and from 4.04 to 3.66 % respectively. These improvement lead to relative error reduction rates of 8.97 % on SERI95a, and 9.42 % on PE92. Our best results update the state-of-the-art performance which were 4.04 % on SERI95a and 7.08 % on PE92.

Journal ArticleDOI
TL;DR: This paper proposes an algorithm by leveraging CNN confidence for discovering similar character pairs/sets by applying a deep CNN to output the top ranked candidates and the corresponding confidence scores, followed by an accumulating and averaging procedure.
Abstract: A primary reason for performance degradation in unconstrained online handwritten Chinese character recognition is the subtle differences between similar characters. Various methods have been proposed in previous works to address the problem of generating similar characters. These methods are basically comprised of two components--similar character discovery and cascaded classifiers. The goal of similar character discovery is to make similar character pairs/sets cover as many misclassified samples as possible. It is observed that the confidence of convolutional neural network (CNN) is output by an end-to-end manner and it can be understood as one type of probability metric. In this paper, we propose an algorithm by leveraging CNN confidence for discovering similar character pairs/sets. Specifically, a deep CNN is applied to output the top ranked candidates and the corresponding confidence scores, followed by an accumulating and averaging procedure. We experimentally found that the number of similar character pairs for each class is diverse and the confusion degree of similar character pairs is varied. To address these problems, we propose an entropy- based similarity measurement to rank these similar character pairs/sets and reject those with low similarity. The experimental results indicate that by using 30,000 similar character pairs, our method achieves the hit rates of 98.44 and 98.05 % on CASIA-OLHWDB1.0 and CASIA-OLHWDB1.0---1.2 datasets, respectively, which are significantly higher than corresponding results produced by MQDF-based method (95.42 and 94.49 %). Furthermore, recognition of ten randomly selected similar character subsets with a two-stage classification scheme results in a relative error reduction of 30.11 % comparing with traditional single stage scheme, showing the potential usage of the proposed method.

Journal ArticleDOI
TL;DR: The USPTO Patent and Trademark Office (USPTO) presented an online competition in which participants developed algorithms to detect figures and diagram part labels as discussed by the authors, which drew 232 teams of two, of which 70 teams (30 %) submitted solutions.
Abstract: Most United States Patent and Trademark Office (USPTO) patent documents contain drawing pages which describe inventions graphically. By convention and by rule, these drawings contain figures and parts that are annotated with numbered labels but not with text. As a result, readers must scan the document to find the description of a given part label. To make progress toward automatic creation of `tool-tips' and hyperlinks from part labels to their associated descriptions, the USPTO hosted a monthlong online competition in which participants developed algorithms to detect figures and diagram part labels. The challenge drew 232 teams of two, of which 70 teams (30 %) submitted solutions. An unusual feature was that each patent was represented by a 300-dpi page scan along with an HTML file containing patent text, allowing integration of text processing and graphics recognition in participant algorithms. The design and performance of the top-5 systems are presented along with a system developed after the competition, illustrating that the winning teams produced near state-of-the-art results under strict time and computation constraints. The first place system used the provided HTML text, obtaining a harmonic mean of recall and precision (F-measure) of 88.57 % for figure region detection, 78.81 % for figure regions with correctly recognized figure titles, and 70.98 % for part label detection and recognition. Data and source code for the top-5 systems are available through the online UCI Machine Learning Repository to support follow-on work by others in the document recognition community.

Journal ArticleDOI
TL;DR: A consensus-based clustering approach for document image segmentation that is used iteratively with a classifier to label each primitive block and shows that the dependency of classification performance on the training data is significantly reduced.
Abstract: Segmentation of a document image plays an important role in automatic document processing. In this paper, we propose a consensus-based clustering approach for document image segmentation. In this method, the foreground regions of a document image are grouped into a set of primitive blocks, and a set of features is extracted from them. Similarities among the blocks are computed on each feature using a hypothesis test-based similarity measure. Based on the consensus of these similarities, clustering is performed on the primitive blocks. This clustering approach is used iteratively with a classifier to label each primitive block. Experimental results show the effectiveness of the proposed method. It is further shown in the experimental results that the dependency of classification performance on the training data is significantly reduced.

Journal ArticleDOI
TL;DR: Easy reconstruction to scalable vector graphics demonstrates the efficiency of the algorithm and also its robustness and stability toward any affine transformation and injected noise as a state-of-the-art solution.
Abstract: We propose here an efficient algorithm for high-level vectorization of scanned images of mechanical engineering drawings. The algorithm is marked by several novel features, which merit its superiority over the existing techniques. After preprocessing and necessary refinement of junction points in the image skeleton, it first extracts the graphic primitives, such as lines, circles, and arcs, based on certain digital geometric properties of straightness and circularity in the discrete domain. The primitives are classified into different types with all associated details based on fast and efficient geometric analysis. The vector set is succinctly reduced by such classification in tandem with further consolidation to make out meaningful objects like rectangles and annuli, together with hatching information. Exhaustive testing shows the efficiency of the algorithm and also its robustness and stability toward any affine transformation and injected noise. Easy reconstruction to scalable vector graphics demonstrates its readiness and usability as a state-of-the-art solution.

Journal ArticleDOI
TL;DR: A new algorithm for segmenting documents into regions containing musical scores and text is proposed, based on the bag-of-visual-words representation followed by random block voting (RBV) in order to detect the bounding boxes containing the musical score and text within a document image.
Abstract: A new algorithm for segmenting documents into regions containing musical scores and text is proposed. Such segmentation is a required step prior to applying optical character recognition and optical music recognition on scanned pages that contain both music notation and text. Our segmentation technique is based on the bag-of-visual-words representation followed by random block voting (RBV) in order to detect the bounding boxes containing the musical score and text within a document image. The RBV procedure consists of extracting a fixed number of blocks whose position and size are sampled from a discrete uniform distribution that "over"-covers the input image. Each block is automatically classified as either coming from musical score or text and votes with a particular posterior probability of classification in its spatial domain. An initial coarse segmentation is obtained by summarizing all the votes in a single image. Subsequently, the final segmentation is obtained by subdividing the image in microblocks and classifying them using a N-nearest neighbor classifier which is trained using the coarse segmentation. We demonstrate the potential of the proposed method by experiments on two different datasets. One is on a challenging dataset of images collected and artificially combined and manipulated for this project. The other is a music dataset obtained by the scanning of two music books. The results are reported using precision/recall metrics of the overlapping area with respect to the ground truth. The proposed system achieves an overall averaged F-measure of 85 %. The complete source code package and associated data are available at https://github.com/fpeder/mscr under the FreeBSD license to support reproducibility.

Journal ArticleDOI
TL;DR: The potentially deleterious effects of common preprocessing methods are illustrated through a series of dramatic albeit contrived examples and then shown to affect real applications of ongoing interest to the community through three writer identification experiments conducted on Arabic handwriting.
Abstract: Many preprocessing techniques intended to normalize artifacts and clean noise induce anomalies in part due to the discretized nature of the document image and in part due to inherent ambiguity in the input image relative to the desired transformation. The potentially deleterious effects of common preprocessing methods are illustrated through a series of dramatic albeit contrived examples and then shown to affect real applications of ongoing interest to the community through three writer identification experiments conducted on Arabic handwriting. Retaining ruling lines detected by multi-line linear regression instead of repairing strokes broken by deleting ruling lines reduced the error rate by 4.5 %. Exploiting word position relative to detected rulings instead of ignoring it decreased errors by 5.5 %. Counteracting page skew by rotating extracted contours during feature extraction instead of rectifying the page image reduced the error by 1.4 %. All of these accuracy gains are shown to be statistically significant. Analogous methods are advocated for other document processing tasks as topics for future research.

Journal ArticleDOI
TL;DR: A new lexicon reduction method for historical Arabic scripts that compares the input subword image with the lexicon entries, and selects the most similar ones, and uses a retrieval-based measure to compute a distinction score for each local region, indicating how prominent that region is.
Abstract: This paper presents a new lexicon reduction method for historical Arabic scripts that compares the input subword image with the lexicon entries, and selects the most similar ones. In comparing two subword images, more importance is given to the prominent shape regions, defined as those local regions of a subword that distinguish it from other lexicon subwords. In this method, first a retrieval-based measure is applied to compute a distinction score for each local region, indicating how prominent that region is. These scores are subsequently used in a proposed distance measure to modulate the weights of corresponding shape features, where most distinctive regions are given more weight. A global shape-based lexicon reduction based on the characteristic loci is used as well, to complement the local subword descriptors. We evaluated the performance of our proposed method on the Ibn Sina database, containing more than 12,000 subwords extracted from a historical Arabic document, and the degree of reduction of 98.15 % with an accuracy of 90.15 % was achieved.

Journal ArticleDOI
TL;DR: An evidence-based model of saliency feature extraction (SFE) to probe saliency text points (STPs), which have strong text signal structure in multi-observations simultaneously and always appear between text and its background and can be the extremely reliable text candidates for future text detectors.
Abstract: Saliency text is that characters are ordered with visibility and expressivity. It also contains important clues for video analysis, indexing, and retrieval. Thus, in order to localize the saliency text, a critical stage is to collect key points from real text pixels. In this paper, we propose an evidence-based model of saliency feature extraction (SFE) to probe saliency text points (STPs), which have strong text signal structure in multi-observations simultaneously and always appear between text and its background. Through the multi-observations, each signal structure with rhythms of signal segments is extracted at every location in the visual field. It supports source of evidences for our evidence-based model, where evidences are measured to effectively estimate the degrees of plausibility for obtaining the STP. The evaluation results on benchmark datasets also demonstrate that our proposed approach achieves the state-of-the-art performance on exploring real text pixels and significantly outperforms the existing algorithms for detecting text candidates. The STPs can be the extremely reliable text candidates for future text detectors.

Journal ArticleDOI
TL;DR: The Set Partitioning in Hierarchical Trees (SPIHT) coder is used in the framework of ROI coding along with some image enhancement techniques in order to remove the leakage effect which occurs in the wavelet-based low-bit-rate compression.
Abstract: In this paper, we deal with those applications of textual image compression where high compression ratio and maintaining or improving the visual quality and readability of the compressed images are of main concern. In textual images, most of the information exists in the edge regions; therefore, the compression problem can be studied in the framework of region-of-interest (ROI) coding. In this paper, the Set Partitioning in Hierarchical Trees (SPIHT) coder is used in the framework of ROI coding along with some image enhancement techniques in order to remove the leakage effect which occurs in the wavelet-based low-bit-rate compression. We evaluated the compression performance of the proposed method with respect to some qualitative and quantitative measures. The qualitative measures include the averaged mean opinion scores (MOS) curve along with demonstrating some outputs in different conditions. The quantitative measures include two proposed modified PSNR measures and the conventional one. Comparing the results of the proposed method with those of three conventional approaches, DjVu, JPEG2000, and SPIHT coding, showed that the proposed compression method considerably outperformed the others especially from the qualitative aspect. The proposed method improved the MOS by 20 and 30 %, in average, for high- and low-contrast textual images, respectively. In terms of the modified and conventional PSNR measures, the proposed method outperformed DjVu and JPEG2000 up to 0.4 dB for high-contrast textual images at low bit rates. In addition, compressing the high contrast images using the proposed ROI technique, compared to without using this technique, improved the average textual PSNR measure up to 0.5 dB, at low bit rates.

Journal ArticleDOI
TL;DR: The proposed evaluation methodology is based on the comparison of the visual features extracted from the binarized document images with ground truth features instead of comparing images between themselves and was used here to assess the performances of eleven algorithms based on different approaches on a collection of real and synthetic images.
Abstract: One of the most important and necessary steps in the process of document analysis and recognition is the binarization, which allows extracting the foreground from the background. Several binarization techniques have been proposed in the literature, but none of them was reliable for all image types. This makes the selection of one method to apply in a given application very difficult. Thus, performance evaluation of binarization algorithms becomes therefore vital. In this paper, we are interested in the evaluation of binarization techniques for the purpose of retrieving words from the images of degraded Arabic documents. A new evaluation methodology is proposed. The proposed evaluation methodology is based on the comparison of the visual features extracted from the binarized document images with ground truth features instead of comparing images between themselves. The most appropriate thresholding method for each image is the one for which the visual features of the identified words in the image are "closer" to the features of the reference words. The proposed technique was used here to assess the performances of eleven algorithms based on different approaches on a collection of real and synthetic images.

Journal ArticleDOI
TL;DR: This paper proposes a method for recognizing text by utilizing the layout consistency of a text string, and calls this two-way process—from extraction and recognition to layout, and from layout to extraction and Recognition—“bidirectional” to discriminate it from previous feedback refinement approaches.
Abstract: Text recognition in natural scene images is a challenging task that has recently been garnering increased research attention. In this paper, we propose a method for recognizing text by utilizing the layout consistency of a text string. We estimate the layout (four lines of a text string) using initial character extraction and recognition result. On the basis of the layout consistency across a word, we perform character extraction and recognition again using four lines, which is more accurate than the first process. Our layout estimation method is different from previous methods in terms of exploiting character recognition results and its use of a class-conditional layout model. More accurate and robust estimation is achieved, and it can be used to refine character extraction and recognition. We call this two-way process--from extraction and recognition to layout, and from layout to extraction and recognition--"bidirectional" to discriminate it from previous feedback refinement approaches. Experimental results demonstrate that our bidirectional processes provide a boost to the performance of word recognition.