scispace - formally typeset
Open AccessProceedings ArticleDOI

Image Retrieval Using Textual Cues

Reads0
Chats0
TLDR
An approach for the text-to-image retrieval problem based on textual content present in images, where the retrieval performance is evaluated on public scene text datasets as well as three large datasets, namely IIIT scene text retrieval, Sports-10K and TV series-1M.
Abstract
We present an approach for the text-to-image retrieval problem based on textual content present in images. Given the recent developments in understanding text in images, an appealing approach to address this problem is to localize and recognize the text, and then query the database, as in a text retrieval problem. We show that such an approach, despite being based on state-of-the-art methods, is insufficient, and propose a method, where we do not rely on an exact localization and recognition pipeline. We take a query-driven search approach, where we find approximate locations of characters in the text query, and then impose spatial constraints to generate a ranked list of images in the database. The retrieval performance is evaluated on public scene text datasets as well as three large datasets, namely IIIT scene text retrieval, Sports-10K and TV series-1M, we introduce.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Scene Text Analysis using Deep Belief Networks

TL;DR: This paper is the first paper to report scene text recognition using deep belief networks and achieves improved recognition results on Chars74K English, Kannada and SVT-CHAR dataset in comparison to the state-of-art algorithms.
Dissertation

Understanding Text in Scene Images

Mishra Anand
TL;DR: This thesis proposes a robust text segmentation (binarization) technique, and uses it to improve the recognition performance of scene text and presents an energy minimization framework that exploits both bottom-up and top-down cues for recognizing words extracted from street images.
Book ChapterDOI

Multi-modal Correlated Centroid Space for Multi-lingual Cross-Modal Retrieval

TL;DR: Experimental results show that C2SUR outperforms the existing state-of-the-art English cross-modal retrieval approaches and achieve similar results for other languages.
Posted Content

RoadText-1K: Text Detection & Recognition Dataset for Driving Videos

TL;DR: RoadText-1K as discussed by the authors is a dataset for text detection and recognition in driving videos, which contains 1000 video clips of driving without any bias towards text and with annotations for text bounding boxes and transcriptions in every frame.
Journal ArticleDOI

Fusion of 3D GIS, Vision, Inertial and Magnetic Data for Improved Urban Pedestrian Navigation and Augmented Reality Applications

TL;DR: A long pedestrian path in an urban environment with a sparsely known 3D model of urban furniture has permitted validation of the contribution of sensor fusion that improves the positioning accuracy and allows characterization of the 3D Geographical Information System content directly onsite using Augmented Reality.
References
More filters
Proceedings ArticleDOI

Histograms of oriented gradients for human detection

TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
Proceedings ArticleDOI

Rapid object detection using a boosted cascade of simple features

TL;DR: A machine learning approach for visual object detection which is capable of processing images extremely rapidly and achieving high detection rates and the introduction of a new image representation called the "integral image" which allows the features used by the detector to be computed very quickly.
Book

Introduction to Information Retrieval

TL;DR: In this article, the authors present an up-to-date treatment of all aspects of the design and implementation of systems for gathering, indexing, and searching documents; methods for evaluating systems; and an introduction to the use of machine learning methods on text collections.
Journal ArticleDOI

Object Detection with Discriminatively Trained Part-Based Models

TL;DR: An object detection system based on mixtures of multiscale deformable part models that is able to represent highly variable object classes and achieves state-of-the-art results in the PASCAL object detection challenges is described.
Proceedings ArticleDOI

Video Google: a text retrieval approach to object matching in videos

TL;DR: An approach to object and scene retrieval which searches for and localizes all the occurrences of a user outlined object in a video, represented by a set of viewpoint invariant region descriptors so that recognition can proceed successfully despite changes in viewpoint, illumination and partial occlusion.
Related Papers (5)