Image Retrieval Using Textual Cues
Anand Mishra,Karteek Alahari,C. V. Jawahar +2 more
- pp 3040-3047
Reads0
Chats0
TLDR
An approach for the text-to-image retrieval problem based on textual content present in images, where the retrieval performance is evaluated on public scene text datasets as well as three large datasets, namely IIIT scene text retrieval, Sports-10K and TV series-1M.Abstract:
We present an approach for the text-to-image retrieval problem based on textual content present in images. Given the recent developments in understanding text in images, an appealing approach to address this problem is to localize and recognize the text, and then query the database, as in a text retrieval problem. We show that such an approach, despite being based on state-of-the-art methods, is insufficient, and propose a method, where we do not rely on an exact localization and recognition pipeline. We take a query-driven search approach, where we find approximate locations of characters in the text query, and then impose spatial constraints to generate a ranked list of images in the database. The retrieval performance is evaluated on public scene text datasets as well as three large datasets, namely IIIT scene text retrieval, Sports-10K and TV series-1M, we introduce.read more
Citations
More filters
Journal ArticleDOI
Reading Text in the Wild with Convolutional Neural Networks
TL;DR: An end-to-end system for text spotting—localising and recognising text in natural scene images—and text based image retrieval and a real-world application to allow thousands of hours of news footage to be instantly searchable via a text query is demonstrated.
Journal ArticleDOI
Word Spotting and Recognition with Embedded Attributes
TL;DR: An approach in which both word images and text strings are embedded in a common vectorial subspace, allowing one to cast recognition and retrieval tasks as a nearest neighbor problem and is very fast to compute and, especially, to compare.
Proceedings ArticleDOI
Scene Text Visual Question Answering
Ali Furkan Biten,Rubèn Tito,Andres Mafla,Lluis Gomez,Marçal Rusiñol,C. V. Jawahar,Ernest Valveny,Dimosthenis Karatzas +7 more
TL;DR: The ST-VQA dataset as discussed by the authors proposes a series of tasks of increasing difficulty for which reading the scene text in the context provided by the visual information is necessary to reason and generate an appropriate answer.
Posted Content
Scene Text Visual Question Answering
Ali Furkan Biten,Rubèn Tito,Andres Mafla,Lluis Gomez,Marçal Rusiñol,Ernest Valveny,C. V. Jawahar,Dimosthenis Karatzas +7 more
TL;DR: A new dataset, ST-VQA, is presented that aims to highlight the importance of exploiting high-level semantic information present in images as textual cues in the Visual Question Answering process and proposes a new evaluation metric for these tasks to account both for reasoning errors as well as shortcomings of the text recognition module.
Proceedings ArticleDOI
Iterative Answer Prediction With Pointer-Augmented Multimodal Transformers for TextVQA
TL;DR: Li et al. as mentioned in this paper propose a multimodal transformer architecture accompanied by a rich representation for text in images, which naturally fuses different modalities homogeneously by embedding them into a common semantic space where self-attention is applied to model inter-and intra-modality context.
References
More filters
Proceedings ArticleDOI
Detecting text in natural scenes with stroke width transform
TL;DR: A novel image operator is presented that seeks to find the value of stroke width for each image pixel, and its use on the task of text detection in natural images is demonstrated.
Proceedings ArticleDOI
End-to-end scene text recognition
TL;DR: While scene text recognition has generally been treated with highly domain-specific methods, the results demonstrate the suitability of applying generic computer vision methods.
Proceedings ArticleDOI
Real-time scene text localization and recognition
Lukas Neumann,Jiri Matas +1 more
TL;DR: The proposed end-to-end real-time scene text localization and recognition method achieves state-of-the-art text localization results amongst published methods and it is the first one to report results for end- to-end text recognition.
Journal ArticleDOI
Efficient Additive Kernels via Explicit Feature Maps
Andrea Vedaldi,Andrew Zisserman +1 more
TL;DR: This work introduces explicit feature maps for the additive class of kernels, such as the intersection, Hellinger's, and χ2 kernels, commonly used in computer vision, and enables their use in large scale problems.