Image Retrieval Using Textual Cues
Anand Mishra,Karteek Alahari,C. V. Jawahar +2 more
- pp 3040-3047
Reads0
Chats0
TLDR
An approach for the text-to-image retrieval problem based on textual content present in images, where the retrieval performance is evaluated on public scene text datasets as well as three large datasets, namely IIIT scene text retrieval, Sports-10K and TV series-1M.Abstract:
We present an approach for the text-to-image retrieval problem based on textual content present in images. Given the recent developments in understanding text in images, an appealing approach to address this problem is to localize and recognize the text, and then query the database, as in a text retrieval problem. We show that such an approach, despite being based on state-of-the-art methods, is insufficient, and propose a method, where we do not rely on an exact localization and recognition pipeline. We take a query-driven search approach, where we find approximate locations of characters in the text query, and then impose spatial constraints to generate a ranked list of images in the database. The retrieval performance is evaluated on public scene text datasets as well as three large datasets, namely IIIT scene text retrieval, Sports-10K and TV series-1M, we introduce.read more
Citations
More filters
Proceedings ArticleDOI
Scene Text Analysis using Deep Belief Networks
TL;DR: This paper is the first paper to report scene text recognition using deep belief networks and achieves improved recognition results on Chars74K English, Kannada and SVT-CHAR dataset in comparison to the state-of-art algorithms.
Dissertation
Understanding Text in Scene Images
TL;DR: This thesis proposes a robust text segmentation (binarization) technique, and uses it to improve the recognition performance of scene text and presents an energy minimization framework that exploits both bottom-up and top-down cues for recognizing words extracted from street images.
Book ChapterDOI
Multi-modal Correlated Centroid Space for Multi-lingual Cross-Modal Retrieval
Aditya Mogadala,Achim Rettinger +1 more
TL;DR: Experimental results show that C2SUR outperforms the existing state-of-the-art English cross-modal retrieval approaches and achieve similar results for other languages.
Posted Content
RoadText-1K: Text Detection & Recognition Dataset for Driving Videos
TL;DR: RoadText-1K as discussed by the authors is a dataset for text detection and recognition in driving videos, which contains 1000 video clips of driving without any bias towards text and with annotations for text bounding boxes and transcriptions in every frame.
Journal ArticleDOI
Fusion of 3D GIS, Vision, Inertial and Magnetic Data for Improved Urban Pedestrian Navigation and Augmented Reality Applications
TL;DR: A long pedestrian path in an urban environment with a sparsely known 3D model of urban furniture has permitted validation of the contribution of sensor fusion that improves the positioning accuracy and allows characterization of the 3D Geographical Information System content directly onsite using Augmented Reality.
References
More filters
Proceedings ArticleDOI
Histograms of oriented gradients for human detection
Navneet Dalal,Bill Triggs +1 more
TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
Proceedings ArticleDOI
Rapid object detection using a boosted cascade of simple features
Paul A. Viola,Michael Jones +1 more
TL;DR: A machine learning approach for visual object detection which is capable of processing images extremely rapidly and achieving high detection rates and the introduction of a new image representation called the "integral image" which allows the features used by the detector to be computed very quickly.
Book
Introduction to Information Retrieval
TL;DR: In this article, the authors present an up-to-date treatment of all aspects of the design and implementation of systems for gathering, indexing, and searching documents; methods for evaluating systems; and an introduction to the use of machine learning methods on text collections.
Journal ArticleDOI
Object Detection with Discriminatively Trained Part-Based Models
TL;DR: An object detection system based on mixtures of multiscale deformable part models that is able to represent highly variable object classes and achieves state-of-the-art results in the PASCAL object detection challenges is described.
Proceedings ArticleDOI
Video Google: a text retrieval approach to object matching in videos
TL;DR: An approach to object and scene retrieval which searches for and localizes all the occurrences of a user outlined object in a video, represented by a set of viewpoint invariant region descriptors so that recognition can proceed successfully despite changes in viewpoint, illumination and partial occlusion.