A Model for Learning the Semantics of Pictures

Open AccessProceedings Article

A Model for Learning the Semantics of Pictures

- Vol. 16, pp 553-560

TLDR

An approach to learning the semantics of images which allows us to automatically annotate an image with keywords and to retrieve images based on text queries using a formalism that models the generation of annotated images.

Abstract:

We propose an approach to learning the semantics of images which allows us to automatically annotate an image with keywords and to retrieve images based on text queries. We do this using a formalism that models the generation of annotated images. We assume that every image is divided into regions, each described by a continuous-valued feature vector. Given a training set of images with annotations, we compute a joint probabilistic model of image features and words which allow us to predict the probability of generating a word given the image regions. This may be used to automatically annotate and retrieve images given a word as a query. Experiments show that our model significantly outperforms the best of the previously reported results on the tasks of automatic image annotation and retrieval.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Image retrieval: Ideas, influences, and trends of the new age

Ritendra Datta, +3 more

- 08 May 2008 -

ACM Computing Surveys

TL;DR: Almost 300 key theoretical and empirical contributions in the current decade related to image retrieval and automatic image annotation are surveyed, and the spawning of related subfields are discussed, to discuss the adaptation of existing image retrieval techniques to build systems that can be useful in the real world.

...read moreread less

Posted Content

Microsoft COCO Captions: Data Collection and Evaluation Server

Xinlei Chen, +6 more

- 01 Apr 2015 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: The Microsoft COCO Caption dataset and evaluation server are described and several popular metrics, including BLEU, METEOR, ROUGE and CIDEr are used to score candidate captions.

...read moreread less

Proceedings ArticleDOI

A new approach to cross-modal multimedia retrieval

Nikhil Rasiwasia, +6 more

TL;DR: It is shown that accounting for cross-modal correlations and semantic abstraction both improve retrieval accuracy and are shown to outperform state-of-the-art image retrieval systems on a unimodal retrieval task.

...read moreread less

Journal ArticleDOI

Framing image description as a ranking task: data, models and evaluation metrics

Micah Hodosh, +2 more

- 01 May 2013 -

Journal of Artificial Intelligence Resea...

TL;DR: This paper proposed to frame sentence-based image annotation as the task of ranking a given pool of captions and showed that the importance of training on multiple captions per image, and of capturing syntactic (word order-based) and semantic features of these captions, is emphasized.

...read moreread less

Journal ArticleDOI

Supervised Learning of Semantic Classes for Image Annotation and Retrieval

Gustavo Carneiro, +3 more

- 01 Mar 2007 -

IEEE Transactions on Pattern Analysis an...

TL;DR: The supervised formulation is shown to achieve higher accuracy than various previously published methods at a fraction of their computational cost and to be fairly robust to parameter tuning.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Normalized cuts and image segmentation

Jianbo Shi, +1 more

- 01 Aug 2000 -

IEEE Transactions on Pattern Analysis an...

TL;DR: This work treats image segmentation as a graph partitioning problem and proposes a novel global criterion, the normalized cut, for segmenting the graph, which measures both the total dissimilarity between the different groups as well as the total similarity within the groups.

...read moreread less

Proceedings ArticleDOI

Normalized cuts and image segmentation

Jianbo Shi, +1 more

TL;DR: This work treats image segmentation as a graph partitioning problem and proposes a novel global criterion, the normalized cut, for segmenting the graph, which measures both the total dissimilarity between the different groups as well as the total similarity within the groups.

...read moreread less

Journal ArticleDOI

A language modeling approach to information retrieval

Jay Ponte, +1 more

TL;DR: It will be shown that probabilistic methods can be used to predict topic changes in the context of the task of new event detection and provide further proof of concept for the use of language models for retrieval tasks.

...read moreread less

Book ChapterDOI

Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary

Pinar Duygulu, +3 more

TL;DR: This work shows how to cluster words that individually are difficult to predict into clusters that can be predicted well, and cannot predict the distinction between train and locomotive using the current set of features, but can predict the underlying concept.

...read moreread less

Journal ArticleDOI

Matching words and pictures

Kobus Barnard, +5 more

- 01 Mar 2003 -

Journal of Machine Learning Research

TL;DR: A new approach for modeling multi-modal data sets, focusing on the specific case of segmented images with associated text, is presented, and a number of models for the joint distribution of image regions and words are developed, including several which explicitly learn the correspondence between regions and Words.

...read moreread less

Journal of Machine Learning Research

Supervised Learning of Semantic Classes for Image Annotation and Retrieval

Gustavo Carneiro, +3 more

- 01 Mar 2007 -

IEEE Transactions on Pattern Analysis an...

A Model for Learning the Semantics of Pictures

Citations

Image retrieval: Ideas, influences, and trends of the new age

Microsoft COCO Captions: Data Collection and Evaluation Server

A new approach to cross-modal multimedia retrieval

Framing image description as a ranking task: data, models and evaluation metrics

Supervised Learning of Semantic Classes for Image Annotation and Retrieval

References

Normalized cuts and image segmentation

Normalized cuts and image segmentation

A language modeling approach to information retrieval

Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary

Matching words and pictures

Related Papers (5)

Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary

Automatic image annotation and retrieval using cross-media relevance models

Modeling annotated data

Matching words and pictures

Supervised Learning of Semantic Classes for Image Annotation and Retrieval