Image Retrieval Using Textual Cues

doi:10.1109/ICCV.2013.378

Open AccessProceedings ArticleDOI

Image Retrieval Using Textual Cues

Anand Mishra, +2 more

- pp 3040-3047

Chats0

TLDR

An approach for the text-to-image retrieval problem based on textual content present in images, where the retrieval performance is evaluated on public scene text datasets as well as three large datasets, namely IIIT scene text retrieval, Sports-10K and TV series-1M.

Abstract:

We present an approach for the text-to-image retrieval problem based on textual content present in images. Given the recent developments in understanding text in images, an appealing approach to address this problem is to localize and recognize the text, and then query the database, as in a text retrieval problem. We show that such an approach, despite being based on state-of-the-art methods, is insufficient, and propose a method, where we do not rely on an exact localization and recognition pipeline. We take a query-driven search approach, where we find approximate locations of characters in the text query, and then impose spatial constraints to generate a ranked list of images in the database. The retrieval performance is evaluated on public scene text datasets as well as three large datasets, namely IIIT scene text retrieval, Sports-10K and TV series-1M, we introduce.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Reading Text in the Wild with Convolutional Neural Networks

Max Jaderberg, +3 more

- 01 Jan 2016 -

International Journal of Computer Vision

TL;DR: An end-to-end system for text spotting—localising and recognising text in natural scene images—and text based image retrieval and a real-world application to allow thousands of hours of news footage to be instantly searchable via a text query is demonstrated.

...read moreread less

Journal ArticleDOI

Word Spotting and Recognition with Embedded Attributes

Jon Almazan, +3 more

- 17 Jul 2014 -

IEEE Transactions on Pattern Analysis an...

TL;DR: An approach in which both word images and text strings are embedded in a common vectorial subspace, allowing one to cast recognition and retrieval tasks as a nearest neighbor problem and is very fast to compute and, especially, to compare.

...read moreread less

Proceedings ArticleDOI

Scene Text Visual Question Answering

Ali Furkan Biten, +7 more

TL;DR: The ST-VQA dataset as discussed by the authors proposes a series of tasks of increasing difficulty for which reading the scene text in the context provided by the visual information is necessary to reason and generate an appropriate answer.

...read moreread less

Posted Content

Scene Text Visual Question Answering

Ali Furkan Biten, +7 more

- 31 May 2019 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: A new dataset, ST-VQA, is presented that aims to highlight the importance of exploiting high-level semantic information present in images as textual cues in the Visual Question Answering process and proposes a new evaluation metric for these tasks to account both for reasoning errors as well as shortcomings of the text recognition module.

...read moreread less

Proceedings ArticleDOI

Iterative Answer Prediction With Pointer-Augmented Multimodal Transformers for TextVQA

Ronghang Hu, +3 more

TL;DR: Li et al. as mentioned in this paper propose a multimodal transformer architecture accompanied by a rich representation for text in images, which naturally fuses different modalities homogeneously by embedding them into a common semantic space where self-attention is applied to model inter-and intra-modality context.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods

John Platt

Proceedings ArticleDOI

Detecting text in natural scenes with stroke width transform

Boris Epshtein, +2 more

TL;DR: A novel image operator is presented that seeks to find the value of stroke width for each image pixel, and its use on the task of text detection in natural images is demonstrated.

...read moreread less

Proceedings ArticleDOI

End-to-end scene text recognition

Kai Wang, +2 more

TL;DR: While scene text recognition has generally been treated with highly domain-specific methods, the results demonstrate the suitability of applying generic computer vision methods.

...read moreread less

Proceedings ArticleDOI

Real-time scene text localization and recognition

Lukas Neumann, +1 more

TL;DR: The proposed end-to-end real-time scene text localization and recognition method achieves state-of-the-art text localization results amongst published methods and it is the first one to report results for end- to-end text recognition.

...read moreread less

Journal ArticleDOI

Efficient Additive Kernels via Explicit Feature Maps

Andrea Vedaldi, +1 more

- 01 Mar 2012 -

IEEE Transactions on Pattern Analysis an...

TL;DR: This work introduces explicit feature maps for the additive class of kernels, such as the intersection, Hellinger's, and χ2 kernels, commonly used in computer vision, and enables their use in large scale problems.

...read moreread less

International Journal of Computer Vision

VQA: Visual Question Answering

Stanislaw Antol, +6 more

Image Retrieval Using Textual Cues

Citations

Reading Text in the Wild with Convolutional Neural Networks

Word Spotting and Recognition with Embedded Attributes

Scene Text Visual Question Answering

Scene Text Visual Question Answering

Iterative Answer Prediction With Pointer-Augmented Multimodal Transformers for TextVQA

References

Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods

Detecting text in natural scenes with stroke width transform

End-to-end scene text recognition

Real-time scene text localization and recognition

Efficient Additive Kernels via Explicit Feature Maps

Related Papers (5)

ICDAR 2013 Robust Reading Competition

ICDAR 2015 competition on Robust Reading

ImageNet: A large-scale hierarchical image database

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

VQA: Visual Question Answering