A
Andres Mafla
Researcher at Autonomous University of Barcelona
Publications - 22
Citations - 456
Andres Mafla is an academic researcher from Autonomous University of Barcelona. The author has contributed to research in topics: Question answering & Image retrieval. The author has an hindex of 8, co-authored 16 publications receiving 231 citations.
Papers
More filters
Proceedings ArticleDOI
Scene Text Visual Question Answering
Ali Furkan Biten,Rubèn Tito,Andres Mafla,Lluis Gomez,Marçal Rusiñol,C. V. Jawahar,Ernest Valveny,Dimosthenis Karatzas +7 more
TL;DR: The ST-VQA dataset as discussed by the authors proposes a series of tasks of increasing difficulty for which reading the scene text in the context provided by the visual information is necessary to reason and generate an appropriate answer.
Posted Content
Scene Text Visual Question Answering
Ali Furkan Biten,Rubèn Tito,Andres Mafla,Lluis Gomez,Marçal Rusiñol,Ernest Valveny,C. V. Jawahar,Dimosthenis Karatzas +7 more
TL;DR: A new dataset, ST-VQA, is presented that aims to highlight the importance of exploiting high-level semantic information present in images as textual cues in the Visual Question Answering process and proposes a new evaluation metric for these tasks to account both for reasoning errors as well as shortcomings of the text recognition module.
Book ChapterDOI
Single Shot Scene Text Retrieval
TL;DR: This paper addresses the problem of scene text retrieval: given a text query, the system must return all images containing the queried text and proposes a single shot CNN architecture that predicts at the same time bounding boxes and a compact text representation of the words in them.
Proceedings ArticleDOI
ICDAR 2019 Competition on Scene Text Visual Question Answering
Ali Furkan Biten,Rubèn Tito,Andres Mafla,Lluis Gomez,Marçal Rusiñol,Minesh Mathew,C. V. Jawahar,Ernest Valveny,Dimosthenis Karatzas +8 more
TL;DR: This paper presents final results of ICDAR 2019 Scene Text Visual Question Answering competition (ST-VQA), which introduces a new dataset comprising 23,038 images annotated with 31,791 question / answer pairs where the answer is always grounded on text instances present in the image.
Proceedings ArticleDOI
Fine-grained Image Classification and Retrieval by Combining Visual and Locally Pooled Textual Features
TL;DR: This approach provides a stronger multimodal representation for this task and as the experiments demonstrate, it achieves state-of-the-art results on two different tasks, fine-grained classification and image retrieval.