ICDAR 2003 robust reading competitions

doi:10.1109/ICDAR.2003.1227749

Home
/
Papers
/
ICDAR 2003 robust reading competitions

Proceedings Article•DOI•

ICDAR 2003 robust reading competitions

Simon M. Lucas¹, A. Panaretos¹, L. Sosa¹, A. Tang¹, S. Wong¹, R. Young¹ - Show less +2 more•Institutions (1)

University of Essex¹

03 Aug 2003-Vol. 3, pp 682-687

TL;DR: The robust reading problem was broken down into three sub-problems, and competitions for each stage, and also a competition for the best overall system, which was the only one to have any entries.

read less

Abstract: This paper describes the robust reading competitions forICDAR 2003. With the rapid growth in research over thelast few years on recognizing text in natural scenes, thereis an urgent need to establish some common benchmarkdatasets, and gain a clear understanding of the current stateof the art. We use the term robust reading to refer to text imagesthat are beyond the capabilities of current commercialOCR packages. We chose to break down the robust readingproblem into three sub-problems, and run competitionsfor each stage, and also a competition for the best overallsystem. The sub-problems we chose were text locating,character recognition and word recognition.By breaking down the problem in this way, we hope togain a better understanding of the state of the art in eachof the sub-problems. Furthermore, our methodology involvesstoring detailed results of applying each algorithm toeach image in the data sets, allowing researchers to study indepth the strengths and weaknesses of each algorithm. Thetext locating contest was the only one to have any entries.We report the results of this contest, and show cases wherethe leading algorithms succeed and fail.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Detecting text in natural scenes with stroke width transform

[...]

Boris Epshtein¹, Eyal Ofek¹, Yonatan Wexler¹•Institutions (1)

Microsoft¹

13 Jun 2010

TL;DR: A novel image operator is presented that seeks to find the value of stroke width for each image pixel, and its use on the task of text detection in natural images is demonstrated.

...read moreread less

Abstract: We present a novel image operator that seeks to find the value of stroke width for each image pixel, and demonstrate its use on the task of text detection in natural images. The suggested operator is local and data dependent, which makes it fast and robust enough to eliminate the need for multi-scale computation or scanning windows. Extensive testing shows that the suggested scheme outperforms the latest published algorithms. Its simplicity allows the algorithm to detect texts in many fonts and languages.

...read moreread less

1,531 citations

Proceedings Article•DOI•

ICDAR 2015 competition on Robust Reading

[...]

Dimosthenis Karatzas¹, Lluis Gomez-Bigorda¹, Anguelos Nicolaou¹, Suman K. Ghosh¹, Andrew D. Bagdanov¹, Masakazu Iwamura², Jiri Matas³, Lukas Neumann³, Vijay Chandrasekhar⁴, Shijian Lu⁴, Faisal Shafait⁵, Seiichi Uchida⁶, Ernest Valveny¹ - Show less +9 more•Institutions (6)

Autonomous University of Barcelona¹, Osaka Prefecture University², Czech Technical University in Prague³, Institute for Infocomm Research Singapore⁴, National University of Science and Technology⁵, Kyushu University⁶

23 Aug 2015

TL;DR: A new Challenge 4 on Incidental Scene Text has been added to the Challenges on Born-Digital Images, Focused Scene Images and Video Text and tasks assessing End-to-End system performance have been introduced to all Challenges.

...read moreread less

Abstract: Results of the ICDAR 2015 Robust Reading Competition are presented. A new Challenge 4 on Incidental Scene Text has been added to the Challenges on Born-Digital Images, Focused Scene Images and Video Text. Challenge 4 is run on a newly acquired dataset of 1,670 images evaluating Text Localisation, Word Recognition and End-to-End pipelines. In addition, the dataset for Challenge 3 on Video Text has been substantially updated with more video sequences and more accurate ground truth data. Finally, tasks assessing End-to-End system performance have been introduced to all Challenges. The competition took place in the first quarter of 2015, and received a total of 44 submissions. Only the tasks newly introduced in 2015 are reported on. The datasets, the ground truth specification and the evaluation protocols are presented together with the results and a brief summary of the participating methods.

...read moreread less

1,224 citations

Cites background from "ICDAR 2003 robust reading competiti..."

...The competition dates back to 2003[1] [2] [3], and was substantially revised in 2011 and 2013 [4] [5] [6], creating a comprehensive reference framework for robust reading pipelines evaluation [7]....
[...]

Proceedings Article•DOI•

ICDAR 2013 Robust Reading Competition

[...]

Dimosthenis Karatzas¹, Faisal Shafait², Seiichi Uchida³, Masakazu Iwamura⁴, Lluís Gómez i Bigorda¹, Sergi Robles Mestre¹, Joan Mas¹, David Fernandez Mota¹, Jon Almazan¹, Lluís-Pere de las Heras¹ - Show less +6 more•Institutions (4)

Autonomous University of Barcelona¹, University of Western Australia², Kyushu University³, Osaka Prefecture University⁴

25 Aug 2013

TL;DR: The datasets and ground truth specification are described, the performance evaluation protocols used are details, and the final results are presented along with a brief summary of the participating methods.

...read moreread less

Abstract: This report presents the final results of the ICDAR 2013 Robust Reading Competition. The competition is structured in three Challenges addressing text extraction in different application domains, namely born-digital images, real scene images and real-scene videos. The Challenges are organised around specific tasks covering text localisation, text segmentation and word recognition. The competition took place in the first quarter of 2013, and received a total of 42 submissions over the different tasks offered. This report describes the datasets and ground truth specification, details the performance evaluation protocols used and presents the final results along with a brief summary of the participating methods.

...read moreread less

1,191 citations

Proceedings Article•DOI•

End-to-end scene text recognition

[...]

Kai Wang¹, Boris Babenko¹, Serge Belongie¹•Institutions (1)

University of California, San Diego¹

06 Nov 2011

TL;DR: While scene text recognition has generally been treated with highly domain-specific methods, the results demonstrate the suitability of applying generic computer vision methods.

...read moreread less

Abstract: This paper focuses on the problem of word detection and recognition in natural images. The problem is significantly more challenging than reading text in scanned documents, and has only recently gained attention from the computer vision community. Sub-components of the problem, such as text detection and cropped image word recognition, have been studied in isolation [7, 4, 20]. However, what is unclear is how these recent approaches contribute to solving the end-to-end problem of word recognition. We fill this gap by constructing and evaluating two systems. The first, representing the de facto state-of-the-art, is a two stage pipeline consisting of text detection followed by a leading OCR engine. The second is a system rooted in generic object recognition, an extension of our previous work in [20]. We show that the latter approach achieves superior performance. While scene text recognition has generally been treated with highly domain-specific methods, our results demonstrate the suitability of applying generic computer vision methods. Adopting this approach opens the door for real world scene text recognition to benefit from the rapid advances that have been taking place in object recognition.

...read moreread less

1,074 citations

Cites background or methods from "ICDAR 2003 robust reading competiti..."

...We used data from the Chars74K4 dataset, introduced in [6] for cropped character classification; the ICDAR Robust Reading Competition dataset [13], discussed in Section 1; and Street View Text (SVT), a full image lexicon-driven scene text dataset introduced in [20]5....
[...]
...The ICDAR Robust Reading challenge [13] was the first public dataset collected to highlight the problem of detecting and recognizing scene text....
[...]
...We follow the evaluation guidelines outlined in [13], which are essentially the same as the evaluation guidelines of other object recognition competitions, like PASCAL VOC [8]....
[...]

Journal Article•DOI•

Reading Text in the Wild with Convolutional Neural Networks

[...]

Max Jaderberg¹, Karen Simonyan¹, Andrea Vedaldi¹, Andrew Zisserman¹•Institutions (1)

University of Oxford¹

01 Jan 2016-International Journal of Computer Vision

TL;DR: An end-to-end system for text spotting—localising and recognising text in natural scene images—and text based image retrieval and a real-world application to allow thousands of hours of news footage to be instantly searchable via a text query is demonstrated.

...read moreread less

Abstract: In this work we present an end-to-end system for text spotting--localising and recognising text in natural scene images--and text based image retrieval. This system is based on a region proposal mechanism for detection and deep convolutional neural networks for recognition. Our pipeline uses a novel combination of complementary proposal generation techniques to ensure high recall, and a fast subsequent filtering stage for improving precision. For the recognition and ranking of proposals, we train very large convolutional neural networks to perform word recognition on the whole proposal region at the same time, departing from the character classifier based systems of the past. These networks are trained solely on data produced by a synthetic text generation engine, requiring no human labelled data. Analysing the stages of our pipeline, we show state-of-the-art performance throughout. We perform rigorous experiments across a number of standard end-to-end text spotting benchmarks and text-based image retrieval datasets, showing a large improvement over all previous methods. Finally, we demonstrate a real-world application of our text spotting system to allow thousands of hours of news footage to be instantly searchable via a text query.

...read moreread less

1,054 citations

Cites methods from "ICDAR 2003 robust reading competiti..."

...ICDAR, pp. 1491–1496....
[...]
...ICDAR 2003 (IC03) [1], ICDAR 2011 (IC11) [52], and ICDAR 2013 (IC13) [33] are scene text recognition datasets consisting of 251, 255, and 233 full scene images respectively....
[...]
...Shahab, A., Shafait, F., Dengel, A.: ICDAR 2011 robust reading competition challenge 2: Reading text in scene images....
[...]
...Interestingly, when the overlap threshold is reduced to 0.3 (last row of Table 5), we see a small improvement across ICDAR datasets and a large +8% improvement on SVT-50....
[...]
...The clusters are formed by k-means clustering the RGB components of each image of the training datasets of [36] into three clus-...
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

FVC2000: fingerprint verification competition

[...]

Dario Maio¹, Davide Maltoni¹, Raffaele Cappelli¹, James L. Wayman², Anil K. Jain³ - Show less +1 more•Institutions (3)

University of Bologna¹, San Jose State University², Michigan State University³

01 Mar 2002-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This work states that the FVC2000 protocol, databases, and results will be useful to all practitioners in the field not only as a benchmark for improving methods, but also for enabling an unbiased evaluation of algorithms.

...read moreread less

Abstract: Reliable and accurate fingerprint recognition is a challenging pattern recognition problem, requiring algorithms robust in many contexts. FVC2000 competition attempted to establish the first common benchmark, allowing companies and academic institutions to unambiguously compare performance and track improvements in their fingerprint recognition algorithms. Three databases were created using different state-of-the-art sensors and a fourth database was artificially generated; 11 algorithms were extensively tested on the four data sets. We believe that FVC2000 protocol, databases, and results will be useful to all practitioners in the field not only as a benchmark for improving methods, but also for enabling an unbiased evaluation of algorithms.

...read moreread less

815 citations

"ICDAR 2003 robust reading competiti..." refers methods in this paper

...We aimed to broadly follow the principles and procedures used to run the Fingerprint Verification 2000 (and 2002) competitions [6]....
[...]

Journal Article•DOI•

Goal-directed evaluation of binarization methods

[...]

O.D. Trier¹, Anil K. Jain²•Institutions (2)

University of Oslo¹, Michigan State University²

01 Dec 1995-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper presents a methodology for evaluation of low-level image analysis methods, using binarization (two-level thresholding) as an example, and defines the performance of the character recognition module as the objective measure.

...read moreread less

Abstract: This paper presents a methodology for evaluation of low-level image analysis methods, using binarization (two-level thresholding) as an example. Binarization of scanned gray scale images is the first step in most document image analysis systems. Selection of an appropriate binarization method for an input image domain is a difficult problem. Typically, a human expert evaluates the binarized images according to his/her visual criteria. However, to conduct an objective evaluation, one needs to investigate how well the subsequent image analysis steps will perform on the binarized image. We call this approach goal-directed evaluation, and it can be used to evaluate other low-level image processing methods as well. Our evaluation of binarization methods is in the context of digit recognition, so we define the performance of the character recognition module as the objective measure. Eleven different locally adaptive binarization methods were evaluated, and Niblack's method gave the best performance.

...read moreread less

700 citations

Journal Article•DOI•

Automatic text detection and tracking in digital video

[...]

Huiping Li¹, David Doermann¹, Omid Kia²•Institutions (2)

University of Maryland, College Park¹, National Institute of Standards and Technology²

01 Jan 2000-IEEE Transactions on Image Processing

TL;DR: This work presents algorithms for detecting and tracking text in digital video that implements a scale-space feature extractor that feeds an artificial neural processor to detect text blocks.

...read moreread less

Abstract: Text that appears in a scene or is graphically added to video can provide an important supplemental source of index information as well as clues for decoding the video's structure and for classification. In this work, we present algorithms for detecting and tracking text in digital video. Our system implements a scale-space feature extractor that feeds an artificial neural processor to detect text blocks. Our text tracking scheme consists of two modules: a sum of squared difference (SSD) based module to find the initial position and a contour-based module to refine the position. Experiments conducted with a variety of video sources show that our scheme can detect and track text robustly.

...read moreread less

635 citations

"ICDAR 2003 robust reading competiti..." refers background in this paper

...In recent years there has been some significant research into these general reading systems that are able to locate and/or read text in scene images [11, 2, 3, 4, 1, 10]....
[...]

Journal Article•DOI•

Localizing and segmenting text in images and videos

[...]

Rainer Lienhart¹, A. Wernicke•Institutions (1)

Intel¹

01 Apr 2002-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: This work proposes a novel method for localizing and segmenting text in complex images and videos that is not only able to locate and segment text occurrences into large binary images, but is also able to track each text line with sub-pixel accuracy over the entire occurrence in a video.

...read moreread less

Abstract: Many images, especially those used for page design on Web pages, as well as videos contain visible text. If these text occurrences could be detected, segmented, and recognized automatically, they would be a valuable source of high-level semantics for indexing and retrieval. We propose a novel method for localizing and segmenting text in complex images and videos. Text lines are identified by using a complex-valued multilayer feed-forward network trained to detect text at a fixed scale and position. The network's output at all scales and positions is integrated into a single text-saliency map, serving as a starting point for candidate text lines. In the case of video, these candidate text lines are refined by exploiting the temporal redundancy of text in video. Localized text lines are then scaled to a fixed height of 100 pixels and segmented into a binary image with black characters on white background. For videos, temporal redundancy is exploited to improve segmentation performance. Input images and videos can be of any size due to a true multiresolution approach. Moreover, the system is not only able to locate and segment text occurrences into large binary images, but is also able to track each text line with sub-pixel accuracy over the entire occurrence in a video, so that one text bitmap is created for all instances of that text line. Therefore, our text segmentation results can also be used for object-based video encoding such as that enabled by MPEG-4.

...read moreread less

478 citations

"ICDAR 2003 robust reading competiti..." refers background in this paper

...In recent years there has been some significant research into these general reading systems that are able to locate and/or read text in scene images [11, 2, 3, 4, 1, 10]....
[...]

Proceedings Article•DOI•

Text localization, enhancement and binarization in multimedia documents

[...]

Christian Wolf¹, J.-M. Jolion¹, Francoise Chassaing•Institutions (1)

Vision Institute¹

10 Dec 2002

TL;DR: An algorithm to localize artificial text in images and videos using a measure of accumulated gradients and morphological post processing to detect the text is presented and the quality of the localized text is improved by robust multiple frame integration.

...read moreread less

Abstract: The systems currently available for content based image and video retrieval work without semantic knowledge, i.e. they use image processing methods to extract low level features of the data. The similarity obtained by these approaches does not always correspond to the similarity a human user would expect. A way to include more semantic knowledge into the indexing process is to use the text included in the images and video sequences. It is rich in information but easy to use, e.g. by key word based queries. In this paper we present an algorithm to localize artificial text in images and videos using a measure of accumulated gradients and morphological post processing to detect the text. The quality of the localized text is improved by robust multiple frame integration. Anew technique for the binarization of the text boxes is proposed. Finally, detection and OCR results for a commercial OCR are presented.

...read moreread less

262 citations

"ICDAR 2003 robust reading competiti..." refers background in this paper

...In recent years there has been some significant research into these general reading systems that are able to locate and/or read text in scene images [11, 2, 3, 4, 1, 10]....
[...]