scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

ICDAR 2003 robust reading competitions

Simon M. Lucas1, A. Panaretos1, L. Sosa1, A. Tang1, S. Wong1, R. Young1 
03 Aug 2003-Vol. 3, pp 682-687
TL;DR: The robust reading problem was broken down into three sub-problems, and competitions for each stage, and also a competition for the best overall system, which was the only one to have any entries.
Abstract: This paper describes the robust reading competitions forICDAR 2003. With the rapid growth in research over thelast few years on recognizing text in natural scenes, thereis an urgent need to establish some common benchmarkdatasets, and gain a clear understanding of the current stateof the art. We use the term robust reading to refer to text imagesthat are beyond the capabilities of current commercialOCR packages. We chose to break down the robust readingproblem into three sub-problems, and run competitionsfor each stage, and also a competition for the best overallsystem. The sub-problems we chose were text locating,character recognition and word recognition.By breaking down the problem in this way, we hope togain a better understanding of the state of the art in eachof the sub-problems. Furthermore, our methodology involvesstoring detailed results of applying each algorithm toeach image in the data sets, allowing researchers to study indepth the strengths and weaknesses of each algorithm. Thetext locating contest was the only one to have any entries.We report the results of this contest, and show cases wherethe leading algorithms succeed and fail.

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI
13 Jun 2010
TL;DR: A novel image operator is presented that seeks to find the value of stroke width for each image pixel, and its use on the task of text detection in natural images is demonstrated.
Abstract: We present a novel image operator that seeks to find the value of stroke width for each image pixel, and demonstrate its use on the task of text detection in natural images. The suggested operator is local and data dependent, which makes it fast and robust enough to eliminate the need for multi-scale computation or scanning windows. Extensive testing shows that the suggested scheme outperforms the latest published algorithms. Its simplicity allows the algorithm to detect texts in many fonts and languages.

1,531 citations

Proceedings ArticleDOI
23 Aug 2015
TL;DR: A new Challenge 4 on Incidental Scene Text has been added to the Challenges on Born-Digital Images, Focused Scene Images and Video Text and tasks assessing End-to-End system performance have been introduced to all Challenges.
Abstract: Results of the ICDAR 2015 Robust Reading Competition are presented. A new Challenge 4 on Incidental Scene Text has been added to the Challenges on Born-Digital Images, Focused Scene Images and Video Text. Challenge 4 is run on a newly acquired dataset of 1,670 images evaluating Text Localisation, Word Recognition and End-to-End pipelines. In addition, the dataset for Challenge 3 on Video Text has been substantially updated with more video sequences and more accurate ground truth data. Finally, tasks assessing End-to-End system performance have been introduced to all Challenges. The competition took place in the first quarter of 2015, and received a total of 44 submissions. Only the tasks newly introduced in 2015 are reported on. The datasets, the ground truth specification and the evaluation protocols are presented together with the results and a brief summary of the participating methods.

1,224 citations


Cites background from "ICDAR 2003 robust reading competiti..."

  • ...The competition dates back to 2003[1] [2] [3], and was substantially revised in 2011 and 2013 [4] [5] [6], creating a comprehensive reference framework for robust reading pipelines evaluation [7]....

    [...]

Proceedings ArticleDOI
25 Aug 2013
TL;DR: The datasets and ground truth specification are described, the performance evaluation protocols used are details, and the final results are presented along with a brief summary of the participating methods.
Abstract: This report presents the final results of the ICDAR 2013 Robust Reading Competition. The competition is structured in three Challenges addressing text extraction in different application domains, namely born-digital images, real scene images and real-scene videos. The Challenges are organised around specific tasks covering text localisation, text segmentation and word recognition. The competition took place in the first quarter of 2013, and received a total of 42 submissions over the different tasks offered. This report describes the datasets and ground truth specification, details the performance evaluation protocols used and presents the final results along with a brief summary of the participating methods.

1,191 citations

Proceedings ArticleDOI
06 Nov 2011
TL;DR: While scene text recognition has generally been treated with highly domain-specific methods, the results demonstrate the suitability of applying generic computer vision methods.
Abstract: This paper focuses on the problem of word detection and recognition in natural images. The problem is significantly more challenging than reading text in scanned documents, and has only recently gained attention from the computer vision community. Sub-components of the problem, such as text detection and cropped image word recognition, have been studied in isolation [7, 4, 20]. However, what is unclear is how these recent approaches contribute to solving the end-to-end problem of word recognition. We fill this gap by constructing and evaluating two systems. The first, representing the de facto state-of-the-art, is a two stage pipeline consisting of text detection followed by a leading OCR engine. The second is a system rooted in generic object recognition, an extension of our previous work in [20]. We show that the latter approach achieves superior performance. While scene text recognition has generally been treated with highly domain-specific methods, our results demonstrate the suitability of applying generic computer vision methods. Adopting this approach opens the door for real world scene text recognition to benefit from the rapid advances that have been taking place in object recognition.

1,074 citations


Cites background or methods from "ICDAR 2003 robust reading competiti..."

  • ...We used data from the Chars74K4 dataset, introduced in [6] for cropped character classification; the ICDAR Robust Reading Competition dataset [13], discussed in Section 1; and Street View Text (SVT), a full image lexicon-driven scene text dataset introduced in [20]5....

    [...]

  • ...The ICDAR Robust Reading challenge [13] was the first public dataset collected to highlight the problem of detecting and recognizing scene text....

    [...]

  • ...We follow the evaluation guidelines outlined in [13], which are essentially the same as the evaluation guidelines of other object recognition competitions, like PASCAL VOC [8]....

    [...]

Journal ArticleDOI
TL;DR: An end-to-end system for text spotting—localising and recognising text in natural scene images—and text based image retrieval and a real-world application to allow thousands of hours of news footage to be instantly searchable via a text query is demonstrated.
Abstract: In this work we present an end-to-end system for text spotting--localising and recognising text in natural scene images--and text based image retrieval. This system is based on a region proposal mechanism for detection and deep convolutional neural networks for recognition. Our pipeline uses a novel combination of complementary proposal generation techniques to ensure high recall, and a fast subsequent filtering stage for improving precision. For the recognition and ranking of proposals, we train very large convolutional neural networks to perform word recognition on the whole proposal region at the same time, departing from the character classifier based systems of the past. These networks are trained solely on data produced by a synthetic text generation engine, requiring no human labelled data. Analysing the stages of our pipeline, we show state-of-the-art performance throughout. We perform rigorous experiments across a number of standard end-to-end text spotting benchmarks and text-based image retrieval datasets, showing a large improvement over all previous methods. Finally, we demonstrate a real-world application of our text spotting system to allow thousands of hours of news footage to be instantly searchable via a text query.

1,054 citations


Cites methods from "ICDAR 2003 robust reading competiti..."

  • ...ICDAR, pp. 1491–1496....

    [...]

  • ...ICDAR 2003 (IC03) [1], ICDAR 2011 (IC11) [52], and ICDAR 2013 (IC13) [33] are scene text recognition datasets consisting of 251, 255, and 233 full scene images respectively....

    [...]

  • ...Shahab, A., Shafait, F., Dengel, A.: ICDAR 2011 robust reading competition challenge 2: Reading text in scene images....

    [...]

  • ...Interestingly, when the overlap threshold is reduced to 0.3 (last row of Table 5), we see a small improvement across ICDAR datasets and a large +8% improvement on SVT-50....

    [...]

  • ...The clusters are formed by k-means clustering the RGB components of each image of the training datasets of [36] into three clus-...

    [...]

References
More filters
Journal ArticleDOI
TL;DR: This work states that the FVC2000 protocol, databases, and results will be useful to all practitioners in the field not only as a benchmark for improving methods, but also for enabling an unbiased evaluation of algorithms.
Abstract: Reliable and accurate fingerprint recognition is a challenging pattern recognition problem, requiring algorithms robust in many contexts. FVC2000 competition attempted to establish the first common benchmark, allowing companies and academic institutions to unambiguously compare performance and track improvements in their fingerprint recognition algorithms. Three databases were created using different state-of-the-art sensors and a fourth database was artificially generated; 11 algorithms were extensively tested on the four data sets. We believe that FVC2000 protocol, databases, and results will be useful to all practitioners in the field not only as a benchmark for improving methods, but also for enabling an unbiased evaluation of algorithms.

815 citations


"ICDAR 2003 robust reading competiti..." refers methods in this paper

  • ...We aimed to broadly follow the principles and procedures used to run the Fingerprint Verification 2000 (and 2002) competitions [6]....

    [...]

Journal ArticleDOI
TL;DR: This paper presents a methodology for evaluation of low-level image analysis methods, using binarization (two-level thresholding) as an example, and defines the performance of the character recognition module as the objective measure.
Abstract: This paper presents a methodology for evaluation of low-level image analysis methods, using binarization (two-level thresholding) as an example. Binarization of scanned gray scale images is the first step in most document image analysis systems. Selection of an appropriate binarization method for an input image domain is a difficult problem. Typically, a human expert evaluates the binarized images according to his/her visual criteria. However, to conduct an objective evaluation, one needs to investigate how well the subsequent image analysis steps will perform on the binarized image. We call this approach goal-directed evaluation, and it can be used to evaluate other low-level image processing methods as well. Our evaluation of binarization methods is in the context of digit recognition, so we define the performance of the character recognition module as the objective measure. Eleven different locally adaptive binarization methods were evaluated, and Niblack's method gave the best performance.

700 citations

Journal ArticleDOI
TL;DR: This work presents algorithms for detecting and tracking text in digital video that implements a scale-space feature extractor that feeds an artificial neural processor to detect text blocks.
Abstract: Text that appears in a scene or is graphically added to video can provide an important supplemental source of index information as well as clues for decoding the video's structure and for classification. In this work, we present algorithms for detecting and tracking text in digital video. Our system implements a scale-space feature extractor that feeds an artificial neural processor to detect text blocks. Our text tracking scheme consists of two modules: a sum of squared difference (SSD) based module to find the initial position and a contour-based module to refine the position. Experiments conducted with a variety of video sources show that our scheme can detect and track text robustly.

635 citations


"ICDAR 2003 robust reading competiti..." refers background in this paper

  • ...In recent years there has been some significant research into these general reading systems that are able to locate and/or read text in scene images [11, 2, 3, 4, 1, 10]....

    [...]

Journal ArticleDOI
Rainer Lienhart1, A. Wernicke
TL;DR: This work proposes a novel method for localizing and segmenting text in complex images and videos that is not only able to locate and segment text occurrences into large binary images, but is also able to track each text line with sub-pixel accuracy over the entire occurrence in a video.
Abstract: Many images, especially those used for page design on Web pages, as well as videos contain visible text. If these text occurrences could be detected, segmented, and recognized automatically, they would be a valuable source of high-level semantics for indexing and retrieval. We propose a novel method for localizing and segmenting text in complex images and videos. Text lines are identified by using a complex-valued multilayer feed-forward network trained to detect text at a fixed scale and position. The network's output at all scales and positions is integrated into a single text-saliency map, serving as a starting point for candidate text lines. In the case of video, these candidate text lines are refined by exploiting the temporal redundancy of text in video. Localized text lines are then scaled to a fixed height of 100 pixels and segmented into a binary image with black characters on white background. For videos, temporal redundancy is exploited to improve segmentation performance. Input images and videos can be of any size due to a true multiresolution approach. Moreover, the system is not only able to locate and segment text occurrences into large binary images, but is also able to track each text line with sub-pixel accuracy over the entire occurrence in a video, so that one text bitmap is created for all instances of that text line. Therefore, our text segmentation results can also be used for object-based video encoding such as that enabled by MPEG-4.

478 citations


"ICDAR 2003 robust reading competiti..." refers background in this paper

  • ...In recent years there has been some significant research into these general reading systems that are able to locate and/or read text in scene images [11, 2, 3, 4, 1, 10]....

    [...]

Proceedings ArticleDOI
10 Dec 2002
TL;DR: An algorithm to localize artificial text in images and videos using a measure of accumulated gradients and morphological post processing to detect the text is presented and the quality of the localized text is improved by robust multiple frame integration.
Abstract: The systems currently available for content based image and video retrieval work without semantic knowledge, i.e. they use image processing methods to extract low level features of the data. The similarity obtained by these approaches does not always correspond to the similarity a human user would expect. A way to include more semantic knowledge into the indexing process is to use the text included in the images and video sequences. It is rich in information but easy to use, e.g. by key word based queries. In this paper we present an algorithm to localize artificial text in images and videos using a measure of accumulated gradients and morphological post processing to detect the text. The quality of the localized text is improved by robust multiple frame integration. Anew technique for the binarization of the text boxes is proposed. Finally, detection and OCR results for a commercial OCR are presented.

262 citations


"ICDAR 2003 robust reading competiti..." refers background in this paper

  • ...In recent years there has been some significant research into these general reading systems that are able to locate and/or read text in scene images [11, 2, 3, 4, 1, 10]....

    [...]