scispace - formally typeset
Open AccessProceedings ArticleDOI

Image Retrieval Using Textual Cues

Reads0
Chats0
TLDR
An approach for the text-to-image retrieval problem based on textual content present in images, where the retrieval performance is evaluated on public scene text datasets as well as three large datasets, namely IIIT scene text retrieval, Sports-10K and TV series-1M.
Abstract
We present an approach for the text-to-image retrieval problem based on textual content present in images. Given the recent developments in understanding text in images, an appealing approach to address this problem is to localize and recognize the text, and then query the database, as in a text retrieval problem. We show that such an approach, despite being based on state-of-the-art methods, is insufficient, and propose a method, where we do not rely on an exact localization and recognition pipeline. We take a query-driven search approach, where we find approximate locations of characters in the text query, and then impose spatial constraints to generate a ranked list of images in the database. The retrieval performance is evaluated on public scene text datasets as well as three large datasets, namely IIIT scene text retrieval, Sports-10K and TV series-1M, we introduce.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Cross-Modality Retrieval by Joint Correlation Learning

TL;DR: This article proposes a cross-modal learning model with joint correlative calculation learning that achieves comparable performance to the state-of-the-art on three benchmarks, i.e., Flickr8k, Flickr30k, and MS-COCO.
Journal ArticleDOI

An Efficient Industrial System for Vehicle Tyre (Tire) Detection and Text Recognition Using Deep Learning

TL;DR: This paper presents first of its kind, a full scale industrial system which can read tyre codes when installed along driveways such as at gas stations or parking lots with vehicles driving under 10mph, and shows promise for the intended application.
Journal ArticleDOI

Real-time Lexicon-free Scene Text Retrieval

TL;DR: The proposed model uses a single shot CNN architecture that predicts bounding boxes and builds a compact representation of spotted words, which can be modeled as a nearest neighbor search of the textual representation of a query over the outputs of the CNN collected from the totality of an image database.
Posted Content

Robust Scene Text Recognition Using Sparse Coding based Features.

TL;DR: In this article, the HSC features are extracted by computing sparse codes with dictionaries that are learned from data using K-SVD, and aggregating per-pixel sparse codes to form local histograms, and the final recognition results are obtained by searching for the words which correspond to the maximum value of the objective function.
Journal ArticleDOI

A survey of methods, datasets and evaluation metrics for visual question answering

TL;DR: This paper has discussed some of the core concepts used in VQA systems and presented a comprehensive survey of efforts in the past to address this problem, and discussed some new datasets developed in 2019 and 2020.
References
More filters
Proceedings ArticleDOI

Histograms of oriented gradients for human detection

TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
Proceedings ArticleDOI

Rapid object detection using a boosted cascade of simple features

TL;DR: A machine learning approach for visual object detection which is capable of processing images extremely rapidly and achieving high detection rates and the introduction of a new image representation called the "integral image" which allows the features used by the detector to be computed very quickly.
Book

Introduction to Information Retrieval

TL;DR: In this article, the authors present an up-to-date treatment of all aspects of the design and implementation of systems for gathering, indexing, and searching documents; methods for evaluating systems; and an introduction to the use of machine learning methods on text collections.
Journal ArticleDOI

Object Detection with Discriminatively Trained Part-Based Models

TL;DR: An object detection system based on mixtures of multiscale deformable part models that is able to represent highly variable object classes and achieves state-of-the-art results in the PASCAL object detection challenges is described.
Proceedings ArticleDOI

Video Google: a text retrieval approach to object matching in videos

TL;DR: An approach to object and scene retrieval which searches for and localizes all the occurrences of a user outlined object in a video, represented by a set of viewpoint invariant region descriptors so that recognition can proceed successfully despite changes in viewpoint, illumination and partial occlusion.
Related Papers (5)