Image Retrieval Using Textual Cues

doi:10.1109/ICCV.2013.378

Open AccessProceedings ArticleDOI

Image Retrieval Using Textual Cues

Anand Mishra, +2 more

- pp 3040-3047

Chats0

TLDR

An approach for the text-to-image retrieval problem based on textual content present in images, where the retrieval performance is evaluated on public scene text datasets as well as three large datasets, namely IIIT scene text retrieval, Sports-10K and TV series-1M.

Abstract:

We present an approach for the text-to-image retrieval problem based on textual content present in images. Given the recent developments in understanding text in images, an appealing approach to address this problem is to localize and recognize the text, and then query the database, as in a text retrieval problem. We show that such an approach, despite being based on state-of-the-art methods, is insufficient, and propose a method, where we do not rely on an exact localization and recognition pipeline. We take a query-driven search approach, where we find approximate locations of characters in the text query, and then impose spatial constraints to generate a ranked list of images in the database. The retrieval performance is evaluated on public scene text datasets as well as three large datasets, namely IIIT scene text retrieval, Sports-10K and TV series-1M, we introduce.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Reading Text in the Wild with Convolutional Neural Networks

Max Jaderberg, +3 more

- 01 Jan 2016 -

International Journal of Computer Vision

TL;DR: An end-to-end system for text spotting—localising and recognising text in natural scene images—and text based image retrieval and a real-world application to allow thousands of hours of news footage to be instantly searchable via a text query is demonstrated.

...read moreread less

Journal ArticleDOI

Word Spotting and Recognition with Embedded Attributes

Jon Almazan, +3 more

- 17 Jul 2014 -

IEEE Transactions on Pattern Analysis an...

TL;DR: An approach in which both word images and text strings are embedded in a common vectorial subspace, allowing one to cast recognition and retrieval tasks as a nearest neighbor problem and is very fast to compute and, especially, to compare.

...read moreread less

Proceedings ArticleDOI

Scene Text Visual Question Answering

Ali Furkan Biten, +7 more

TL;DR: The ST-VQA dataset as discussed by the authors proposes a series of tasks of increasing difficulty for which reading the scene text in the context provided by the visual information is necessary to reason and generate an appropriate answer.

...read moreread less

Posted Content

Scene Text Visual Question Answering

Ali Furkan Biten, +7 more

- 31 May 2019 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: A new dataset, ST-VQA, is presented that aims to highlight the importance of exploiting high-level semantic information present in images as textual cues in the Visual Question Answering process and proposes a new evaluation metric for these tasks to account both for reasoning errors as well as shortcomings of the text recognition module.

...read moreread less

Proceedings ArticleDOI

Iterative Answer Prediction With Pointer-Augmented Multimodal Transformers for TextVQA

Ronghang Hu, +3 more

TL;DR: Li et al. as mentioned in this paper propose a multimodal transformer architecture accompanied by a rich representation for text in images, which naturally fuses different modalities homogeneously by embedding them into a common semantic space where self-attention is applied to model inter-and intra-modality context.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Scene Text Recognition Using Part-Based Tree-Structured Character Detection

Cunzhao Shi, +5 more

TL;DR: A novel scene text recognition method using part-based tree-structured character detection that outperforms state-of-the-art methods significantly both for character detection and word recognition.

...read moreread less

International Journal of Computer Vision

VQA: Visual Question Answering

Stanislaw Antol, +6 more

Image Retrieval Using Textual Cues

Citations

Reading Text in the Wild with Convolutional Neural Networks

Word Spotting and Recognition with Embedded Attributes

Scene Text Visual Question Answering

Scene Text Visual Question Answering

Iterative Answer Prediction With Pointer-Augmented Multimodal Transformers for TextVQA

References

Scene Text Recognition Using Part-Based Tree-Structured Character Detection

Related Papers (5)

ICDAR 2013 Robust Reading Competition

ICDAR 2015 competition on Robust Reading

ImageNet: A large-scale hierarchical image database

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

VQA: Visual Question Answering