Image annotation using metric learning in semantic neighbourhoods

doi:10.1007/978-3-642-33712-3_60

Open AccessBook ChapterDOI

Image annotation using metric learning in semantic neighbourhoods

- pp 836-849

TLDR

2PKNN, a two-step variant of the classical K-nearest neighbour algorithm, is proposed that performs comparable to the current state-of-the-art on three challenging image annotation datasets, and shows significant improvements after metric learning.

Abstract:

Automatic image annotation aims at predicting a set of textual labels for an image that describe its semantics. These are usually taken from an annotation vocabulary of few hundred labels. Because of the large vocabulary, there is a high variance in the number of images corresponding to different labels ("class-imbalance"). Additionally, due to the limitations of manual annotation, a significant number of available images are not annotated with all the relevant labels ("weak-labelling"). These two issues badly affect the performance of most of the existing image annotation models. In this work, we propose 2PKNN, a two-step variant of the classical K-nearest neighbour algorithm, that addresses these two issues in the image annotation task. The first step of 2PKNN uses "image-to-label" similarities, while the second step uses "image-to-image" similarities; thus combining the benefits of both. Since the performance of nearest-neighbour based methods greatly depends on how features are compared, we also propose a metric learning framework over 2PKNN that learns weights for multiple features as well as distances together. This is done in a large margin set-up by generalizing a well-known (single-label) classification metric learning algorithm for multi-label prediction. For scalability, we implement it by alternating between stochastic sub-gradient descent and projection steps. Extensive experiments demonstrate that, though conceptually simple, 2PKNN alone performs comparable to the current state-of-the-art on three challenging image annotation datasets, and shows significant improvements after metric learning.

Image annotation using metric learning in semantic neighbourhoods

Citations

A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics

NMF-KNN: Image Annotation Using Weighted Multi-view Non-negative Matrix Factorization

Automatic Image Annotation using Deep Learning Representations

Love Thy Neighbors: Image Annotation by Exploiting Image Metadata

A survey and analysis on automatic image annotation

References

A revisit of Generative Model for Automatic Image Annotation using Markov Random Fields

Linear distance metric learning for large-scale generic image recognition

Related Papers (5)

TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation

Multiple Bernoulli relevance models for image and video annotation

Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary

A Model for Learning the Semantics of Pictures

Supervised Learning of Semantic Classes for Image Annotation and Retrieval