scispace - formally typeset
Open AccessBook ChapterDOI

Image annotation using metric learning in semantic neighbourhoods

TLDR
2PKNN, a two-step variant of the classical K-nearest neighbour algorithm, is proposed that performs comparable to the current state-of-the-art on three challenging image annotation datasets, and shows significant improvements after metric learning.
Abstract
Automatic image annotation aims at predicting a set of textual labels for an image that describe its semantics. These are usually taken from an annotation vocabulary of few hundred labels. Because of the large vocabulary, there is a high variance in the number of images corresponding to different labels ("class-imbalance"). Additionally, due to the limitations of manual annotation, a significant number of available images are not annotated with all the relevant labels ("weak-labelling"). These two issues badly affect the performance of most of the existing image annotation models. In this work, we propose 2PKNN, a two-step variant of the classical K-nearest neighbour algorithm, that addresses these two issues in the image annotation task. The first step of 2PKNN uses "image-to-label" similarities, while the second step uses "image-to-image" similarities; thus combining the benefits of both. Since the performance of nearest-neighbour based methods greatly depends on how features are compared, we also propose a metric learning framework over 2PKNN that learns weights for multiple features as well as distances together. This is done in a large margin set-up by generalizing a well-known (single-label) classification metric learning algorithm for multi-label prediction. For scalability, we implement it by alternating between stochastic sub-gradient descent and projection steps. Extensive experiments demonstrate that, though conceptually simple, 2PKNN alone performs comparable to the current state-of-the-art on three challenging image annotation datasets, and shows significant improvements after metric learning.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics

TL;DR: This paper starts with canonical correlation analysis (CCA), a popular and successful approach for mapping visual and textual features to the same latent space, and incorporates a third view capturing high-level image semantics, represented either by a single category or multiple non-mutually-exclusive concepts.
Proceedings ArticleDOI

NMF-KNN: Image Annotation Using Weighted Multi-view Non-negative Matrix Factorization

TL;DR: The key idea is to learn query-specific generative model on the features of nearest-neighbors and tags using the proposed NMF-KNN approach which imposes consensus constraint on the coefficient matrices across different features to solve the problem of feature fusion.
Proceedings ArticleDOI

Automatic Image Annotation using Deep Learning Representations

TL;DR: It is demonstrated that word embedding vectors perform better than binary vectors as a representation of the tags associated with an image and the CCA model is compared to a simple CNN based linear regression model, which allows the CNN layers to be trained using back-propagation.
Proceedings ArticleDOI

Love Thy Neighbors: Image Annotation by Exploiting Image Metadata

TL;DR: In this paper, the authors use image metadata nonparametrategically to generate neighborhoods of related images using Jaccard similarities, then use a deep neural network to blend visual information from the image and its neighbors.
Journal ArticleDOI

A survey and analysis on automatic image annotation

TL;DR: A deep review of state-of-the-art AIA methods is presented by synthesizing 138 literatures published during the past two decades by dividing AIA Methods into five categories and comparing their performance on benchmark dataset and standard evaluation metrics.
References
More filters
Proceedings ArticleDOI

A revisit of Generative Model for Automatic Image Annotation using Markov Random Fields

TL;DR: A new approach based on multiple Markov random fields (MRF) for semantic context modeling and learning is presented and a new potential function for site modeling based on generative model is proposed and built to build local graphs for each annotation keyword.
Related Papers (5)