Home
/
Authors
/
Lluís Gómez i Bigorda

Author

Lluís Gómez i Bigorda

Bio: Lluís Gómez i Bigorda is an academic researcher from Autonomous University of Barcelona. The author has contributed to research in topics: Noisy text analytics & Artificial intelligence. The author has an hindex of 2, co-authored 3 publications receiving 938 citations.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

ICDAR 2013 Robust Reading Competition

[...]

Dimosthenis Karatzas¹, Faisal Shafait², Seiichi Uchida³, Masakazu Iwamura⁴, Lluís Gómez i Bigorda¹, Sergi Robles Mestre¹, Joan Mas¹, David Fernandez Mota¹, Jon Almazan¹, Lluís-Pere de las Heras¹ - Show less +6 more•Institutions (4)

Autonomous University of Barcelona¹, University of Western Australia², Kyushu University³, Osaka Prefecture University⁴

25 Aug 2013

TL;DR: The datasets and ground truth specification are described, the performance evaluation protocols used are details, and the final results are presented along with a brief summary of the participating methods.

...read moreread less

Abstract: This report presents the final results of the ICDAR 2013 Robust Reading Competition. The competition is structured in three Challenges addressing text extraction in different application domains, namely born-digital images, real scene images and real-scene videos. The Challenges are organised around specific tasks covering text localisation, text segmentation and word recognition. The competition took place in the first quarter of 2013, and received a total of 42 submissions over the different tasks offered. This report describes the datasets and ground truth specification, details the performance evaluation protocols used and presents the final results along with a brief summary of the participating methods.

...read moreread less

1,191 citations

Posted Content•

Improving Text Proposals for Scene Images with Fully Convolutional Networks.

[...]

Dena Bazazian, Raul Gomez, Anguelos Nicolaou, Lluís Gómez i Bigorda, Dimosthenis Karatzas, Andrew D. Bagdanov - Show less +2 more

16 Feb 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper proposes an improvement over the original Text Proposals algorithm, combining it with Fully Convolutional Networks to improve the ranking of proposals.

...read moreread less

Abstract: Text Proposals have emerged as a class-dependent version of object proposals - efficient approaches to reduce the search space of possible text object locations in an image. Combined with strong word classifiers, text proposals currently yield top state of the art results in end-to-end scene text recognition. In this paper we propose an improvement over the original Text Proposals algorithm of Gomez and Karatzas (2016), combining it with Fully Convolutional Networks to improve the ranking of proposals. Results on the ICDAR RRC and the COCO-text datasets show superior performance over current state-of-the-art.

...read moreread less

18 citations

Book Chapter•DOI•

A Multilingual Approach to Scene Text Visual Question Answering

[...]

Josep Brugués i Pujolràs, Lluís Gómez i Bigorda, Dimosthenis Karatzas

01 Jan 2022

2 citations

Posted Content•

Let there be a clock on the beach: Reducing Object Hallucination in Image Captioning.

[...]

Ali Furkan Biten, Lluís Gómez i Bigorda, Dimosthenis Karatzas¹•Institutions (1)

Autonomous University of Barcelona¹

04 Oct 2021-arXiv: Computer Vision and Pattern Recognition

TL;DR: This article proposed three simple yet efficient training augmentation method for sentences which requires no new training data or increase in the model size and showed that the proposed methods can significantly diminish the models' object bias on hallucination metrics.

...read moreread less

Abstract: Explaining an image with missing or non-existent objects is known as object bias (hallucination) in image captioning. This behaviour is quite common in the state-of-the-art captioning models which is not desirable by humans. To decrease the object hallucination in captioning, we propose three simple yet efficient training augmentation method for sentences which requires no new training data or increase in the model size. By extensive analysis, we show that the proposed methods can significantly diminish our models' object bias on hallucination metrics. Moreover, we experimentally demonstrate that our methods decrease the dependency on the visual features. All of our code, configuration files and model weights will be made public.

...read moreread less

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition

[...]

Baoguang Shi¹, Xiang Bai¹, Cong Yao¹•Institutions (1)

Huazhong University of Science and Technology¹

01 Nov 2017-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Zhang et al. as mentioned in this paper proposed a novel neural network architecture, which integrates feature extraction, sequence modeling and transcription into a unified framework, and achieved remarkable performances in both lexicon free and lexicon-based scene text recognition tasks.

...read moreread less

Abstract: Image-based sequence recognition has been a long-standing research topic in computer vision. In this paper, we investigate the problem of scene text recognition, which is among the most important and challenging tasks in image-based sequence recognition. A novel neural network architecture, which integrates feature extraction, sequence modeling and transcription into a unified framework, is proposed. Compared with previous systems for scene text recognition, the proposed architecture possesses four distinctive properties: (1) It is end-to-end trainable, in contrast to most of the existing algorithms whose components are separately trained and tuned. (2) It naturally handles sequences in arbitrary lengths, involving no character segmentation or horizontal scale normalization. (3) It is not confined to any predefined lexicon and achieves remarkable performances in both lexicon-free and lexicon-based scene text recognition tasks. (4) It generates an effective yet much smaller model, which is more practical for real-world application scenarios. The experiments on standard benchmarks, including the IIIT-5K, Street View Text and ICDAR datasets, demonstrate the superiority of the proposed algorithm over the prior arts. Moreover, the proposed algorithm performs well in the task of image-based music score recognition, which evidently verifies the generality of it.

...read moreread less

2,184 citations

Proceedings Article•DOI•

ICDAR 2015 competition on Robust Reading

[...]

Dimosthenis Karatzas¹, Lluis Gomez-Bigorda¹, Anguelos Nicolaou¹, Suman K. Ghosh¹, Andrew D. Bagdanov¹, Masakazu Iwamura², Jiri Matas³, Lukas Neumann³, Vijay Chandrasekhar⁴, Shijian Lu⁴, Faisal Shafait⁵, Seiichi Uchida⁶, Ernest Valveny¹ - Show less +9 more•Institutions (6)

Autonomous University of Barcelona¹, Osaka Prefecture University², Czech Technical University in Prague³, Institute for Infocomm Research Singapore⁴, National University of Science and Technology⁵, Kyushu University⁶

23 Aug 2015

TL;DR: A new Challenge 4 on Incidental Scene Text has been added to the Challenges on Born-Digital Images, Focused Scene Images and Video Text and tasks assessing End-to-End system performance have been introduced to all Challenges.

...read moreread less

Abstract: Results of the ICDAR 2015 Robust Reading Competition are presented. A new Challenge 4 on Incidental Scene Text has been added to the Challenges on Born-Digital Images, Focused Scene Images and Video Text. Challenge 4 is run on a newly acquired dataset of 1,670 images evaluating Text Localisation, Word Recognition and End-to-End pipelines. In addition, the dataset for Challenge 3 on Video Text has been substantially updated with more video sequences and more accurate ground truth data. Finally, tasks assessing End-to-End system performance have been introduced to all Challenges. The competition took place in the first quarter of 2015, and received a total of 44 submissions. Only the tasks newly introduced in 2015 are reported on. The datasets, the ground truth specification and the evaluation protocols are presented together with the results and a brief summary of the participating methods.

...read moreread less

1,224 citations

Proceedings Article•DOI•

EAST: An Efficient and Accurate Scene Text Detector

[...]

Xinyu Zhou, Cong Yao, He Wen, Yuzhi Wang, Shuchang Zhou, He Weiran, Jiajun Liang - Show less +3 more

01 Jul 2017

TL;DR: This work proposes a simple yet powerful pipeline that yields fast and accurate text detection in natural scenes, and significantly outperforms state-of-the-art methods in terms of both accuracy and efficiency.

...read moreread less

Abstract: Previous approaches for scene text detection have already achieved promising performances across various benchmarks. However, they usually fall short when dealing with challenging scenarios, even when equipped with deep neural network models, because the overall performance is determined by the interplay of multiple stages and components in the pipelines. In this work, we propose a simple yet powerful pipeline that yields fast and accurate text detection in natural scenes. The pipeline directly predicts words or text lines of arbitrary orientations and quadrilateral shapes in full images, eliminating unnecessary intermediate steps (e.g., candidate aggregation and word partitioning), with a single neural network. The simplicity of our pipeline allows concentrating efforts on designing loss functions and neural network architecture. Experiments on standard datasets including ICDAR 2015, COCO-Text and MSRA-TD500 demonstrate that the proposed algorithm significantly outperforms state-of-the-art methods in terms of both accuracy and efficiency. On the ICDAR 2015 dataset, the proposed algorithm achieves an F-score of 0.7820 at 13.2fps at 720p resolution.

...read moreread less

1,161 citations

Proceedings Article•DOI•

Speeding up Convolutional Neural Networks with Low Rank Expansions

[...]

Max Jaderberg¹, Andrea Vedaldi¹, Andrew Zisserman¹•Institutions (1)

University of Oxford¹

15 May 2014

TL;DR: Two simple schemes for drastically speeding up convolutional neural networks are presented, achieved by exploiting cross-channel or filter redundancy to construct a low rank basis of filters that are rank-1 in the spatial domain.

...read moreread less

Abstract: The focus of this paper is speeding up the application of convolutional neural networks. While delivering impressive results across a range of computer vision and machine learning tasks, these networks are computationally demanding, limiting their deployability. Convolutional layers generally consume the bulk of the processing time, and so in this work we present two simple schemes for drastically speeding up these layers. This is achieved by exploiting cross-channel or filter redundancy to construct a low rank basis of filters that are rank-1 in the spatial domain. Our methods are architecture agnostic, and can be easily applied to existing CPU and GPU convolutional frameworks for tuneable speedup performance. We demonstrate this with a real world network designed for scene text character recognition [15], showing a possible 2.5× speedup with no loss in accuracy, and 4.5× speedup with less than 1% drop in accuracy, still achieving state-of-the-art on standard benchmarks.

...read moreread less

1,159 citations

Proceedings Article•DOI•

Synthetic Data for Text Localisation in Natural Images

[...]

Ankush Gupta¹, Andrea Vedaldi¹, Andrew Zisserman¹•Institutions (1)

University of Oxford¹

27 Jun 2016

TL;DR: In this article, a Fully-Convolutional Regression Network (FCRN) was proposed to perform text detection and bounding-box regression at all locations and multiple scales in an image.

...read moreread less

Abstract: In this paper we introduce a new method for text detection in natural images. The method comprises two contributions: First, a fast and scalable engine to generate synthetic images of text in clutter. This engine overlays synthetic text to existing background images in a natural way, accounting for the local 3D scene geometry. Second, we use the synthetic images to train a Fully-Convolutional Regression Network (FCRN) which efficiently performs text detection and bounding-box regression at all locations and multiple scales in an image. We discuss the relation of FCRN to the recently-introduced YOLO detector, as well as other end-toend object detection systems based on deep learning. The resulting detection network significantly out performs current methods for text detection in natural images, achieving an F-measure of 84.2% on the standard ICDAR 2013 benchmark. Furthermore, it can process 15 images per second on a GPU.

...read moreread less

1,142 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse