scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Recognition based text localization from natural scene images

01 Dec 2016-pp 1177-1182
TL;DR: A novel hybrid framework for text localization is proposed which uses character level recognition recursively in a feedback mechanism to refine text patches and reduce false positives and aims at achieving high recall rather than achieving higher precision.
Abstract: With the rapid increase of multimedia data, textual content in an image has become a very important source of information for several applications like navigation, image search and retrieval, image understanding, captioning, machine translation and several others. Scene text localization is the first step towards such applications and most current methods focus on generating a small set of high precision detectors rather than obtaining large set of detections covering all text patches. In this work we propose a novel hybrid framework for text localization which uses character level recognition recursively in a feedback mechanism to refine text patches and reduce false positives. We use popular MSER algorithm at multiple scales as an initial region proposal algorithm and several filtering stages recursively to improve precision as well as maximize recall. We aim at achieving high recall rather than achieving higher precision since several robust word recognition systems are already available. The word recognition systems are mature enough to produce highly accurate results if provided with maximum amount of regions rather than providing small set of highly precise text patches and losing several other text regions. The main contribution of this paper is the use of character recognizer within a novel feedback mechanism to recursively search for text regions in the neighborhood of previously detected text patches. Using 3 publicly available benchmark datasets (ICDAR2011, MSRA TD-500 and OSTD), we demonstrate the efficacy of the proposed framework for text localization.
Citations
More filters
Book ChapterDOI
01 Jan 2021
TL;DR: In this article, the problem of stroke edges recognition of handwritten characters with various pattern recognition techniques is addressed. But, the authors focus on stroke edges in the handwritten characters and do not consider the stroke edges of the characters.
Abstract: Researchers have found out the many technologies to recognize hand written character into text. Handwriting recognition is the capability of a computer to read handwriting as actual text. To convert handwriting to text, this is irrefutably the best program which can be used to overcome the many problems which are present in recognition. The paper mainly focuses on basic problem faced in recognition of stroke edges in the handwritten character with various pattern recognition techniques. Based on the survey taken for this problem, say that segmentation of the character is low and prevents high popularity accuracy of unconstrained handwriting.

3 citations

Journal ArticleDOI
TL;DR: In this article, a combination of CNNs and transfer learning was used to localize text in natural scene images, and the obtained results proved to be more effective with accuracy and an F-score of 0.8279 compared to state of the art methods.
Abstract: Text localization from natural images plays an essential role in reading the text content present in the illustration. It is complex to localize the textual content because the text in natural scene images will be scattered. Prior information about the location of the text, size of the text, the orientation of the text, and the number of text present in the images are not available. These factors have posed a challenge to localize text in natural scene images. We have proposed a comprehensive solution for localizing text using Deep Convolution Neural Network (DCNN) and Transfer Learning (TL). DCNN layers such as convolution, dense layers, dropout, and learning rate are optimized using a random search. A combination of DCNN+TL is more effective in processing complex text images using VGG16 architecture. The proposed method has experimented on the standard ICDAR 2015 dataset, and the obtained results proved to be more effective with accuracy and an F-score of 0.8279 compared to state-of-art methods.

2 citations

Journal ArticleDOI
TL;DR: In this paper, the pros and cons of technologies developed for the visually impaired people in terms of education material obtained from text image and handwritten image are summarized and a performance comparison of different methods.
Abstract: According to World Health Organization in 2017 nearly 253 million people are visually impaired of whom 36 million are blind. Braille books involve the tactile format that helps the visually impaired people to gain knowledge but only a limited resource is available. Enormous papers and studies describe the method for obtaining machine readable document from textual image. In upcoming days character recognition might serve a key role to create a paperless environment that helps the visually impaired people to gain enormous amount of educational material. Handwritten script recognition is gaining vital importance in today’s electronically interconnected society. In the field of machine learning and pattern matching handwritten has gained lot of attention. This paper first summarizes the pros and cons of technologies developed for the visually impaired people in terms of education material obtained from text image and handwritten image. Along with that it presents a performance comparison of different methods. Finally, it describes the future research work in this domain.

2 citations

Proceedings ArticleDOI
01 Feb 2020
TL;DR: The keypoint grouping method is proposed by first applying the real-time Laplacian of Gaussian operator (RT-LoG) to detect keypoints, which will be grouped to produce the character patterns.
Abstract: Text detection in scene images is of particular importance for the computer-based applications. The text detection methods must be robust against variabilities and deformations of text entities. In addition, to be embedded into mobile devices, the methods have to be time efficient. In this paper, the keypoint grouping method is proposed by first applying the real-time Laplacian of Gaussian operator (RT-LoG) to detect keypoints. These keypoints will be grouped to produce the character patterns. The patterns will be filtered out by using a CNN model before aggregating into words. Performance evaluation is discussed on the ICDAR2017 RRC-MLT and the Challenge 4 of ICDAR2015 datasets. The results are given in terms of detection accuracy and time processing against different end-to-end systems in the literature. Our system performs as one of the strongest detection accuracy while supporting at approximately 15.6 frames per second to the HD resolution on a regular CPU architecture. It is one of the best candidates to guarantee the trade-off between accuracy and speed in the literature.

1 citations


Cites methods from "Recognition based text localization..."

  • ...Several CNN models have been proposed in the literature for the text verification (Yang et al., 2015; Ray et al., 2016; Wang et al., 2018; Zhang et al., 2015; Ghosh et al., 2019)....

    [...]

  • ...Several CNN models have been proposed in the literature for the text verification (Yang et al., 2015; Ray et al., 2016; Wang et al., 2018; Zhang et al., 2015; Ghosh et al., 2019)....

    [...]

Journal ArticleDOI
31 Dec 2019
TL;DR: Results show the robustness of the text localization system by successfully locating the text region in the scene images with different background and non-uniform text sizes.
Abstract: In an automated text recognition system, one of the prerequisites is the localization of text. It is a challenging task in scene image due to their background and non uniform size of characters in the images. In this study, an efficient text localization system using bendlet transform is presented. Among the various multi-resolution and multi-directional analysis, bendlet transform has superior property that they classify the curvature precisely. To achieve this property, it uses an addition parameter than shearlets called bending operator. The system decomposes the scene images by bendlet transform and then reconstructs using the bands which contains only the edge information. Then, a series of post processing is applied to locate the text region in a scene image. Results show the robustness of the text localization system by successfully locating the text region in the scene images with different background and non-uniform text sizes.

Cites background from "Recognition based text localization..."

  • ...Text recognition and localization using natural scene images is discussed in [10]....

    [...]

References
More filters
Proceedings ArticleDOI
20 Jun 2005
TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
Abstract: We study the question of feature sets for robust visual object recognition; adopting linear SVM based human detection as a test case. After reviewing existing edge and gradient based descriptors, we show experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection. We study the influence of each stage of the computation on performance, concluding that fine-scale gradients, fine orientation binning, relatively coarse spatial binning, and high-quality local contrast normalization in overlapping descriptor blocks are all important for good results. The new approach gives near-perfect separation on the original MIT pedestrian database, so we introduce a more challenging dataset containing over 1800 annotated human images with a large range of pose variations and backgrounds.

31,952 citations


"Recognition based text localization..." refers background in this paper

  • ...• HOG: Histogram of Oriented Gradients is a scale invariant feature descriptor popularly used for object detection [16]....

    [...]

Journal ArticleDOI
TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

3,422 citations

Proceedings ArticleDOI
01 Jan 2002
TL;DR: The wide-baseline stereo problem, i.e. the problem of establishing correspondences between a pair of images taken from different viewpoints, is studied and an efficient and practically fast detection algorithm is presented for an affinely-invariant stable subset of extremal regions, the maximally stable extremal region (MSER).
Abstract: The wide-baseline stereo problem, i.e. the problem of establishing correspondences between a pair of images taken from different viewpoints is studied. A new set of image elements that are put into correspondence, the so called extremal regions , is introduced. Extremal regions possess highly desirable properties: the set is closed under (1) continuous (and thus projective) transformation of image coordinates and (2) monotonic transformation of image intensities. An efficient (near linear complexity) and practically fast detection algorithm (near frame rate) is presented for an affinely invariant stable subset of extremal regions, the maximally stable extremal regions (MSER). A new robust similarity measure for establishing tentative correspondences is proposed. The robustness ensures that invariants from multiple measurement regions (regions obtained by invariant constructions from extremal regions), some that are significantly larger (and hence discriminative) than the MSERs, may be used to establish tentative correspondences. The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes. Significant change of scale (3.5×), illumination conditions, out-of-plane rotation, occlusion, locally anisotropic scale change and 3D translation of the viewpoint are all present in the test problems. Good estimates of epipolar geometry (average distance from corresponding points to the epipolar line below 0.09 of the inter-pixel distance) are obtained.

3,400 citations


"Recognition based text localization..." refers background or methods in this paper

  • ...A region can be regarded as an Extremal Region when the intensity of all pixels within the region is less than the intensity of the pixels forming the boundary [7]....

    [...]

  • ...We use a bottom up approach for text localization wherein the initial regions are obtained by using a variant of Maximally Stable Extremal Region (MSER) [7]....

    [...]

Proceedings ArticleDOI
13 Jun 2010
TL;DR: A novel image operator is presented that seeks to find the value of stroke width for each image pixel, and its use on the task of text detection in natural images is demonstrated.
Abstract: We present a novel image operator that seeks to find the value of stroke width for each image pixel, and demonstrate its use on the task of text detection in natural images. The suggested operator is local and data dependent, which makes it fast and robust enough to eliminate the need for multi-scale computation or scanning windows. Extensive testing shows that the suggested scheme outperforms the latest published algorithms. Its simplicity allows the algorithm to detect texts in many fonts and languages.

1,531 citations


"Recognition based text localization..." refers methods in this paper

  • ...make use of several other pruning algorithms such as Bag of Words and deep CNNs since MSER produces huge number of regions [3], [13]....

    [...]

  • ...Existing approaches used for text detection can be broadly classified into two groups: Sliding Window based [1], [2] and Region Based [3], [4], [5]....

    [...]

Proceedings ArticleDOI
25 Aug 2013
TL;DR: The datasets and ground truth specification are described, the performance evaluation protocols used are details, and the final results are presented along with a brief summary of the participating methods.
Abstract: This report presents the final results of the ICDAR 2013 Robust Reading Competition. The competition is structured in three Challenges addressing text extraction in different application domains, namely born-digital images, real scene images and real-scene videos. The Challenges are organised around specific tasks covering text localisation, text segmentation and word recognition. The competition took place in the first quarter of 2013, and received a total of 42 submissions over the different tasks offered. This report describes the datasets and ground truth specification, details the performance evaluation protocols used and presents the final results along with a brief summary of the participating methods.

1,191 citations