Image annotation using metric learning in semantic neighbourhoods

doi:10.1007/978-3-642-33712-3_60

Open AccessBook ChapterDOI

Image annotation using metric learning in semantic neighbourhoods

- pp 836-849

TLDR

2PKNN, a two-step variant of the classical K-nearest neighbour algorithm, is proposed that performs comparable to the current state-of-the-art on three challenging image annotation datasets, and shows significant improvements after metric learning.

Abstract:

Automatic image annotation aims at predicting a set of textual labels for an image that describe its semantics. These are usually taken from an annotation vocabulary of few hundred labels. Because of the large vocabulary, there is a high variance in the number of images corresponding to different labels ("class-imbalance"). Additionally, due to the limitations of manual annotation, a significant number of available images are not annotated with all the relevant labels ("weak-labelling"). These two issues badly affect the performance of most of the existing image annotation models. In this work, we propose 2PKNN, a two-step variant of the classical K-nearest neighbour algorithm, that addresses these two issues in the image annotation task. The first step of 2PKNN uses "image-to-label" similarities, while the second step uses "image-to-image" similarities; thus combining the benefits of both. Since the performance of nearest-neighbour based methods greatly depends on how features are compared, we also propose a metric learning framework over 2PKNN that learns weights for multiple features as well as distances together. This is done in a large margin set-up by generalizing a well-known (single-label) classification metric learning algorithm for multi-label prediction. For scalability, we implement it by alternating between stochastic sub-gradient descent and projection steps. Extensive experiments demonstrate that, though conceptually simple, 2PKNN alone performs comparable to the current state-of-the-art on three challenging image annotation datasets, and shows significant improvements after metric learning.

Image annotation using metric learning in semantic neighbourhoods

Citations

Image automatic annotation via multi-view deep representation

Image distance metric learning based on neighborhood sets for automatic image annotation

Efficient multi-modal fusion on supergraph for scalable image annotation

Learning to Rank Image Tags With Limited Training Examples

Scene-based automatic image annotation

References

Distance Metric Learning for Large Margin Nearest Neighbor Classification

Distance Metric Learning for Large Margin Nearest Neighbor Classification

Labeling images with a computer game

Pegasos: primal estimated sub-gradient solver for SVM

Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary

Related Papers (5)

TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation

Multiple Bernoulli relevance models for image and video annotation

Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary

A Model for Learning the Semantics of Pictures

Supervised Learning of Semantic Classes for Image Annotation and Retrieval