Home
/
Authors
/
Dimosthenis Karatzas

Author

Dimosthenis Karatzas

Other affiliations: University of Liverpool, University of Barcelona, University of Southampton ...read more

Bio: Dimosthenis Karatzas is an academic researcher from Autonomous University of Barcelona. The author has contributed to research in topics: Image retrieval & Question answering. The author has an hindex of 24, co-authored 114 publications receiving 1624 citations. Previous affiliations of Dimosthenis Karatzas include University of Liverpool & University of Barcelona.

Papers published on a yearly basis

2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

ICDAR2019 Robust Reading Challenge on Multi-lingual Scene Text Detection and Recognition — RRC-MLT-2019

[...]

Nibal Nayef, Cheng-Lin Liu¹, Jean-Marc Ogier², Yash Patel³, Michal Busta³, Pinaki Nath Chowdhury⁴, Dimosthenis Karatzas⁵, Wafa Khlif², Jiri Matas³, Umapada Pal⁴, Jean-Christophe Burie² - Show less +7 more•Institutions (5)

Chinese Academy of Sciences¹, University of La Rochelle², Czech Technical University in Prague³, Indian Statistical Institute⁴, Autonomous University of Barcelona⁵

01 Sep 2019

TL;DR: The RRC-MLT-2019 challenge as discussed by the authors was the first edition of the multi-lingual scene text (MLT) detection and recognition challenge, which aims to systematically benchmark and push the state-of-the-art forward.

...read moreread less

Abstract: With the growing cosmopolitan culture of modern cities, the need of robust Multi-Lingual scene Text (MLT) detection and recognition systems has never been more immense. With the goal to systematically benchmark and push the state-of-the-art forward, the proposed competition builds on top of the RRC-MLT-2017 with an additional end-to-end task, an additional language in the real images dataset, a large scale multi-lingual synthetic dataset to assist the training, and a baseline End-to-End recognition method. The real dataset consists of 20,000 images containing text from 10 languages. The challenge has 4 tasks covering various aspects of multi-lingual scene text: (a) text detection, (b) cropped word script classification, (c) joint text detection and script classification and (d) end-to-end detection and recognition. In total, the competition received 60 submissions from the research and industrial communities. This paper presents the dataset, the tasks and the findings of the presented RRC-MLT-2019 challenge.

...read moreread less

175 citations

Proceedings Article•DOI•

ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction

[...]

Huang Zheng¹, Kai Chen, Jianhua He², Xiang Bai³, Dimosthenis Karatzas⁴, Shijian Lu⁵, C. V. Jawahar⁶ - Show less +3 more•Institutions (6)

Shanghai Jiao Tong University¹, Aston University², Huazhong University of Science and Technology³, Autonomous University of Barcelona⁴, Nanyang Technological University⁵, International Institute of Information Technology, Hyderabad⁶

01 Sep 2019

TL;DR: The ICDAR 2019 Challenge on "Scanned receipts OCR and key information extraction" (SROIE) covers important aspects related to the automated analysis of scanned receipts, and is considered to evolve into a useful resource for the community, drawing further attention and promoting research and development efforts in this field.

...read moreread less

Abstract: The ICDAR 2019 Challenge on "Scanned receipts OCR and key information extraction" (SROIE) covers important aspects related to the automated analysis of scanned receipts. The SROIE tasks play a key role in many document analysis systems and hold significant commercial potential. Although a lot of work has been published over the years on administrative document analysis, the community has advanced relatively slowly, as most datasets have been kept private. One of the key contributions of SROIE to the document analysis community is to offer a first, standardized dataset of 1000 whole scanned receipt images and annotations, as well as an evaluation procedure for such tasks. The Challenge is structured around three tasks, namely Scanned Receipt Text Localization (Task 1), Scanned Receipt OCR (Task 2) and Key Information Extraction from Scanned Receipts (Task 3). The competition opened on 10th February, 2019 and closed on 5th May, 2019. We received 29, 24 and 18 valid submissions received for the three competition tasks, respectively. This report presents the competition datasets, define the tasks and the evaluation protocols, offer detailed submission statistics, as well as an analysis of the submitted performance. While the tasks of text localization and recognition seem to be relatively easy to tackle, it is interesting to observe the variety of ideas and approaches proposed for the information extraction task. According to the submissions' performance we believe there is still margin for improving information extraction performance, although the current dataset would have to grow substantially in following editions. Given the success of the SROIE competition evidenced by the wide interest generated and the healthy number of submissions from academic, research institutes and industry over different countries, we consider that the SROIE competition can evolve into a useful resource for the community, drawing further attention and promoting research and development efforts in this field.

...read moreread less

143 citations

Proceedings Article•DOI•

ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text - RRC-ArT

[...]

Chee-Kheng Chng¹, Errui Ding², Jingtuo Liu², Dimosthenis Karatzas³, Chee Seng Chan¹, Lianwen Jin⁴, Yuliang Liu⁴, Yipeng Sun⁴, Chun Chet Ng¹, Canjie Luo⁴, Ni Zihan², ChuanMing Fang⁴, Shuaitao Zhang⁴, Junyu Han² - Show less +10 more•Institutions (4)

University of Malaya¹, Baidu², Autonomous University of Barcelona³, South China University of Technology⁴

16 Sep 2019

TL;DR: This paper reports the ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text - RRC-ArT that consists of three major challenges: i) scene text detection, ii) sceneText recognition, and iii) scenetext spotting.

...read moreread less

Abstract: This paper reports the ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text - RRC-ArT that consists of three major challenges: i) scene text detection, ii) scene text recognition, and iii) scene text spotting. A total of 78 submissions from 46 unique teams/individuals were received for this competition. The top performing score of each challenge is as follows: i) T1 - 82.65%, ii) T2.1 - 74.3%, iii) T2.2 - 85.32%, iv) T3.1 - 53.86%, and v) T3.2 - 54.91%. Apart from the results, this paper also details the ArT dataset, tasks description, evaluation metrics and participants' methods. The dataset, the evaluation kit as well as the results are publicly available at the challenge website.

...read moreread less

119 citations

Proceedings Article•DOI•

Self-Supervised Learning of Visual Features through Embedding Images into Text Topic Spaces

[...]

Lluis Gomez, Yash Patel, Marçal Rusiñol, Dimosthenis Karatzas, C. V. Jawahar - Show less +1 more

01 Jul 2017

TL;DR: In this article, a self-supervised learning of visual features by mining a large scale corpus of multi-modal (text and image) documents is proposed to train a CNN to predict the semantic context in which a particular image is more probable to appear as an illustration.

...read moreread less

Abstract: End-to-end training from scratch of current deep architectures for new computer vision problems would require Imagenet-scale datasets, and this is not always possible. In this paper we present a method that is able to take advantage of freely available multi-modal content to train computer vision algorithms without human supervision. We put forward the idea of performing self-supervised learning of visual features by mining a large scale corpus of multi-modal (text and image) documents. We show that discriminative visual features can be learnt efficiently by training a CNN to predict the semantic context in which a particular image is more probable to appear as an illustration. For this we leverage the hidden semantic structures discovered in the text corpus with a well-known topic modeling technique. Our experiments demonstrate state of the art performance in image classification, object detection, and multi-modal retrieval compared to recent self-supervised or natural-supervised approaches.

...read moreread less

110 citations

Posted Content•

DocVQA: A Dataset for VQA on Document Images

[...]

Minesh Mathew¹, Dimosthenis Karatzas², C. V. Jawahar¹•Institutions (2)

International Institute of Information Technology, Hyderabad¹, Autonomous University of Barcelona²

01 Jul 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: Although the existing models perform reasonably well on certain types of questions, there is large performance gap compared to human performance (94.36% accuracy).

...read moreread less

Abstract: We present a new dataset for Visual Question Answering (VQA) on document images called DocVQA. The dataset consists of 50,000 questions defined on 12,000+ document images. Detailed analysis of the dataset in comparison with similar datasets for VQA and reading comprehension is presented. We report several baseline results by adopting existing VQA and reading comprehension models. Although the existing models perform reasonably well on certain types of questions, there is large performance gap compared to human performance (94.36% accuracy). The models need to improve specifically on questions where understanding structure of the document is crucial. The dataset, code and leaderboard are available at this http URL

...read moreread less

96 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

Collapse

Cited by

PDF

Open Access

More filters

[신간의 별자리x] 우리/미술, 그리고 ‘슬픔의 박물관’

[...]

이화영

01 Jan 2015

12,972 citations

Journal Article•DOI•

Recent advances in convolutional neural networks

[...]

Jiuxiang Gu¹, Zhenhua Wang¹, Jason Kuen¹, Lianyang Ma¹, Amir Shahroudy¹, Bing Shuai¹, Ting Liu¹, Xingxing Wang¹, Gang Wang¹, Jianfei Cai¹, Tsuhan Chen¹ - Show less +7 more•Institutions (1)

Nanyang Technological University¹

01 May 2018-Pattern Recognition

TL;DR: A broad survey of the recent advances in convolutional neural networks can be found in this article, where the authors discuss the improvements of CNN on different aspects, namely, layer design, activation function, loss function, regularization, optimization and fast computation.

...read moreread less

3,125 citations

Proceedings Article•

Image Processing

[...]

E.E. Pissaloux¹•Institutions (1)

University of Paris¹

01 Jan 1994

TL;DR: The main focus in MUCKE is on cleaning large scale Web image corpora and on proposing image representations which are closer to the human interpretation of images.

...read moreread less

Abstract: MUCKE aims to mine a large volume of images, to structure them conceptually and to use this conceptual structuring in order to improve large-scale image retrieval. The last decade witnessed important progress concerning low-level image representations. However, there are a number problems which need to be solved in order to unleash the full potential of image mining in applications. The central problem with low-level representations is the mismatch between them and the human interpretation of image content. This problem can be instantiated, for instance, by the incapability of existing descriptors to capture spatial relationships between the concepts represented or by their incapability to convey an explanation of why two images are similar in a content-based image retrieval framework. We start by assessing existing local descriptors for image classification and by proposing to use co-occurrence matrices to better capture spatial relationships in images. The main focus in MUCKE is on cleaning large scale Web image corpora and on proposing image representations which are closer to the human interpretation of images. Consequently, we introduce methods which tackle these two problems and compare results to state of the art methods. Note: some aspects of this deliverable are withheld at this time as they are pending review. Please contact the authors for a preview.

...read moreread less

2,134 citations

Posted Content•

Recent Advances in Convolutional Neural Networks

[...]

Jiuxiang Gu, Zhenhua Wang, Jason Kuen, Lianyang Ma, Amir Shahroudy, Bing Shuai, Ting Liu, Xingxing Wang, Li Wang, Gang Wang, Jianfei Cai, Tsuhan Chen - Show less +8 more

22 Dec 2015-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper details the improvements of CNN on different aspects, including layer design, activation function, loss function, regularization, optimization and fast computation, and introduces various applications of convolutional neural networks in computer vision, speech and natural language processing.

...read moreread less

Abstract: In the last few years, deep learning has led to very good performance on a variety of problems, such as visual recognition, speech recognition and natural language processing. Among different types of deep neural networks, convolutional neural networks have been most extensively studied. Leveraging on the rapid growth in the amount of the annotated data and the great improvements in the strengths of graphics processor units, the research on convolutional neural networks has been emerged swiftly and achieved state-of-the-art results on various tasks. In this paper, we provide a broad survey of the recent advances in convolutional neural networks. We detailize the improvements of CNN on different aspects, including layer design, activation function, loss function, regularization, optimization and fast computation. Besides, we also introduce various applications of convolutional neural networks in computer vision, speech and natural language processing.

...read moreread less

1,302 citations

Journal Article•DOI•

Reading Text in the Wild with Convolutional Neural Networks

[...]

Max Jaderberg¹, Karen Simonyan¹, Andrea Vedaldi¹, Andrew Zisserman¹•Institutions (1)

University of Oxford¹

01 Jan 2016-International Journal of Computer Vision

TL;DR: An end-to-end system for text spotting—localising and recognising text in natural scene images—and text based image retrieval and a real-world application to allow thousands of hours of news footage to be instantly searchable via a text query is demonstrated.

...read moreread less

Abstract: In this work we present an end-to-end system for text spotting--localising and recognising text in natural scene images--and text based image retrieval. This system is based on a region proposal mechanism for detection and deep convolutional neural networks for recognition. Our pipeline uses a novel combination of complementary proposal generation techniques to ensure high recall, and a fast subsequent filtering stage for improving precision. For the recognition and ranking of proposals, we train very large convolutional neural networks to perform word recognition on the whole proposal region at the same time, departing from the character classifier based systems of the past. These networks are trained solely on data produced by a synthetic text generation engine, requiring no human labelled data. Analysing the stages of our pipeline, we show state-of-the-art performance throughout. We perform rigorous experiments across a number of standard end-to-end text spotting benchmarks and text-based image retrieval datasets, showing a large improvement over all previous methods. Finally, we demonstrate a real-world application of our text spotting system to allow thousands of hours of news footage to be instantly searchable via a text query.

...read moreread less

1,054 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse