Image Retrieval Using Textual Cues

doi:10.1109/ICCV.2013.378

Open AccessProceedings ArticleDOI

Image Retrieval Using Textual Cues

Anand Mishra, +2 more

- pp 3040-3047

Chats0

TLDR

An approach for the text-to-image retrieval problem based on textual content present in images, where the retrieval performance is evaluated on public scene text datasets as well as three large datasets, namely IIIT scene text retrieval, Sports-10K and TV series-1M.

Abstract:

We present an approach for the text-to-image retrieval problem based on textual content present in images. Given the recent developments in understanding text in images, an appealing approach to address this problem is to localize and recognize the text, and then query the database, as in a text retrieval problem. We show that such an approach, despite being based on state-of-the-art methods, is insufficient, and propose a method, where we do not rely on an exact localization and recognition pipeline. We take a query-driven search approach, where we find approximate locations of characters in the text query, and then impose spatial constraints to generate a ranked list of images in the database. The retrieval performance is evaluated on public scene text datasets as well as three large datasets, namely IIIT scene text retrieval, Sports-10K and TV series-1M, we introduce.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Scene Text Analysis using Deep Belief Networks

Anupama Ray, +2 more

TL;DR: This paper is the first paper to report scene text recognition using deep belief networks and achieves improved recognition results on Chars74K English, Kannada and SVT-CHAR dataset in comparison to the state-of-art algorithms.

...read moreread less

Dissertation

Understanding Text in Scene Images

Mishra Anand

TL;DR: This thesis proposes a robust text segmentation (binarization) technique, and uses it to improve the recognition performance of scene text and presents an energy minimization framework that exploits both bottom-up and top-down cues for recognizing words extracted from street images.

...read moreread less

Book ChapterDOI

Multi-modal Correlated Centroid Space for Multi-lingual Cross-Modal Retrieval

Aditya Mogadala, +1 more

TL;DR: Experimental results show that C2SUR outperforms the existing state-of-the-art English cross-modal retrieval approaches and achieve similar results for other languages.

...read moreread less

Posted Content

RoadText-1K: Text Detection & Recognition Dataset for Driving Videos

Sangeeth Reddy, +5 more

- 19 May 2020 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: RoadText-1K as discussed by the authors is a dataset for text detection and recognition in driving videos, which contains 1000 video clips of driving without any bias towards text and with annotations for text bounding boxes and transcriptions in every frame.

...read moreread less

Journal ArticleDOI

Fusion of 3D GIS, Vision, Inertial and Magnetic Data for Improved Urban Pedestrian Navigation and Augmented Reality Applications

Nicolas Antigny, +2 more

- 01 Sep 2018 -

Annual of Navigation

TL;DR: A long pedestrian path in an urban environment with a sparsely known 3D model of urban furniture has permitted validation of the contribution of sensor fusion that improves the positioning accuracy and allows characterization of the 3D Geographical Information System content directly onsite using Augmented Reality.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Histograms of oriented gradients for human detection

Navneet Dalal, +1 more

TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.

...read moreread less

Proceedings ArticleDOI

Rapid object detection using a boosted cascade of simple features

Paul A. Viola, +1 more

TL;DR: A machine learning approach for visual object detection which is capable of processing images extremely rapidly and achieving high detection rates and the introduction of a new image representation called the "integral image" which allows the features used by the detector to be computed very quickly.

...read moreread less

Book

Introduction to Information Retrieval

Christopher D. Manning, +2 more

TL;DR: In this article, the authors present an up-to-date treatment of all aspects of the design and implementation of systems for gathering, indexing, and searching documents; methods for evaluating systems; and an introduction to the use of machine learning methods on text collections.

...read moreread less

Journal ArticleDOI

Object Detection with Discriminatively Trained Part-Based Models

Pedro F. Felzenszwalb, +3 more

- 01 Sep 2010 -

IEEE Transactions on Pattern Analysis an...

TL;DR: An object detection system based on mixtures of multiscale deformable part models that is able to represent highly variable object classes and achieves state-of-the-art results in the PASCAL object detection challenges is described.

...read moreread less

Proceedings ArticleDOI

Video Google: a text retrieval approach to object matching in videos

Sivic, +1 more

TL;DR: An approach to object and scene retrieval which searches for and localizes all the occurrences of a user outlined object in a video, represented by a set of viewpoint invariant region descriptors so that recognition can proceed successfully despite changes in viewpoint, illumination and partial occlusion.

...read moreread less

International Journal of Computer Vision

VQA: Visual Question Answering

Stanislaw Antol, +6 more

Image Retrieval Using Textual Cues

Citations

Scene Text Analysis using Deep Belief Networks

Understanding Text in Scene Images

Multi-modal Correlated Centroid Space for Multi-lingual Cross-Modal Retrieval

RoadText-1K: Text Detection & Recognition Dataset for Driving Videos

Fusion of 3D GIS, Vision, Inertial and Magnetic Data for Improved Urban Pedestrian Navigation and Augmented Reality Applications

References

Histograms of oriented gradients for human detection

Rapid object detection using a boosted cascade of simple features

Introduction to Information Retrieval

Object Detection with Discriminatively Trained Part-Based Models

Video Google: a text retrieval approach to object matching in videos

Related Papers (5)

ICDAR 2013 Robust Reading Competition

ICDAR 2015 competition on Robust Reading

ImageNet: A large-scale hierarchical image database

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

VQA: Visual Question Answering