Open AccessProceedings Article
A Model for Learning the Semantics of Pictures
Victor Lavrenko,R. Manmatha,Jiwoon Jeon +2 more
- Vol. 16, pp 553-560
TLDR
An approach to learning the semantics of images which allows us to automatically annotate an image with keywords and to retrieve images based on text queries using a formalism that models the generation of annotated images.Abstract:
We propose an approach to learning the semantics of images which allows us to automatically annotate an image with keywords and to retrieve images based on text queries. We do this using a formalism that models the generation of annotated images. We assume that every image is divided into regions, each described by a continuous-valued feature vector. Given a training set of images with annotations, we compute a joint probabilistic model of image features and words which allow us to predict the probability of generating a word given the image regions. This may be used to automatically annotate and retrieve images given a word as a query. Experiments show that our model significantly outperforms the best of the previously reported results on the tasks of automatic image annotation and retrieval.read more
Citations
More filters
Journal ArticleDOI
Image retrieval: Ideas, influences, and trends of the new age
TL;DR: Almost 300 key theoretical and empirical contributions in the current decade related to image retrieval and automatic image annotation are surveyed, and the spawning of related subfields are discussed, to discuss the adaptation of existing image retrieval techniques to build systems that can be useful in the real world.
Posted Content
Microsoft COCO Captions: Data Collection and Evaluation Server
Xinlei Chen,Hao Fang,Tsung-Yi Lin,Ramakrishna Vedantam,Saurabh Gupta,Piotr Dollár,C. Lawrence Zitnick +6 more
TL;DR: The Microsoft COCO Caption dataset and evaluation server are described and several popular metrics, including BLEU, METEOR, ROUGE and CIDEr are used to score candidate captions.
Proceedings ArticleDOI
A new approach to cross-modal multimedia retrieval
Nikhil Rasiwasia,Jose Costa Pereira,Emanuele Coviello,Gabriel Doyle,Gert R. G. Lanckriet,Roger Levy,Nuno Vasconcelos +6 more
TL;DR: It is shown that accounting for cross-modal correlations and semantic abstraction both improve retrieval accuracy and are shown to outperform state-of-the-art image retrieval systems on a unimodal retrieval task.
Journal ArticleDOI
Framing image description as a ranking task: data, models and evaluation metrics
TL;DR: This paper proposed to frame sentence-based image annotation as the task of ranking a given pool of captions and showed that the importance of training on multiple captions per image, and of capturing syntactic (word order-based) and semantic features of these captions, is emphasized.
Journal ArticleDOI
Supervised Learning of Semantic Classes for Image Annotation and Retrieval
TL;DR: The supervised formulation is shown to achieve higher accuracy than various previously published methods at a fraction of their computational cost and to be fairly robust to parameter tuning.
References
More filters
Journal ArticleDOI
Normalized cuts and image segmentation
Jianbo Shi,Jitendra Malik +1 more
TL;DR: This work treats image segmentation as a graph partitioning problem and proposes a novel global criterion, the normalized cut, for segmenting the graph, which measures both the total dissimilarity between the different groups as well as the total similarity within the groups.
Proceedings ArticleDOI
Normalized cuts and image segmentation
Jianbo Shi,Jitendra Malik +1 more
TL;DR: This work treats image segmentation as a graph partitioning problem and proposes a novel global criterion, the normalized cut, for segmenting the graph, which measures both the total dissimilarity between the different groups as well as the total similarity within the groups.
Journal ArticleDOI
A language modeling approach to information retrieval
Jay Ponte,W. Bruce Croft +1 more
TL;DR: It will be shown that probabilistic methods can be used to predict topic changes in the context of the task of new event detection and provide further proof of concept for the use of language models for retrieval tasks.
Book ChapterDOI
Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary
TL;DR: This work shows how to cluster words that individually are difficult to predict into clusters that can be predicted well, and cannot predict the distinction between train and locomotive using the current set of features, but can predict the underlying concept.
Journal ArticleDOI
Matching words and pictures
TL;DR: A new approach for modeling multi-modal data sets, focusing on the specific case of segmented images with associated text, is presented, and a number of models for the joint distribution of image regions and words are developed, including several which explicitly learn the correspondence between regions and Words.