scispace - formally typeset
Open AccessBook ChapterDOI

Image annotation using metric learning in semantic neighbourhoods

TLDR
2PKNN, a two-step variant of the classical K-nearest neighbour algorithm, is proposed that performs comparable to the current state-of-the-art on three challenging image annotation datasets, and shows significant improvements after metric learning.
Abstract
Automatic image annotation aims at predicting a set of textual labels for an image that describe its semantics. These are usually taken from an annotation vocabulary of few hundred labels. Because of the large vocabulary, there is a high variance in the number of images corresponding to different labels ("class-imbalance"). Additionally, due to the limitations of manual annotation, a significant number of available images are not annotated with all the relevant labels ("weak-labelling"). These two issues badly affect the performance of most of the existing image annotation models. In this work, we propose 2PKNN, a two-step variant of the classical K-nearest neighbour algorithm, that addresses these two issues in the image annotation task. The first step of 2PKNN uses "image-to-label" similarities, while the second step uses "image-to-image" similarities; thus combining the benefits of both. Since the performance of nearest-neighbour based methods greatly depends on how features are compared, we also propose a metric learning framework over 2PKNN that learns weights for multiple features as well as distances together. This is done in a large margin set-up by generalizing a well-known (single-label) classification metric learning algorithm for multi-label prediction. For scalability, we implement it by alternating between stochastic sub-gradient descent and projection steps. Extensive experiments demonstrate that, though conceptually simple, 2PKNN alone performs comparable to the current state-of-the-art on three challenging image annotation datasets, and shows significant improvements after metric learning.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics

TL;DR: This paper starts with canonical correlation analysis (CCA), a popular and successful approach for mapping visual and textual features to the same latent space, and incorporates a third view capturing high-level image semantics, represented either by a single category or multiple non-mutually-exclusive concepts.
Proceedings ArticleDOI

NMF-KNN: Image Annotation Using Weighted Multi-view Non-negative Matrix Factorization

TL;DR: The key idea is to learn query-specific generative model on the features of nearest-neighbors and tags using the proposed NMF-KNN approach which imposes consensus constraint on the coefficient matrices across different features to solve the problem of feature fusion.
Proceedings ArticleDOI

Automatic Image Annotation using Deep Learning Representations

TL;DR: It is demonstrated that word embedding vectors perform better than binary vectors as a representation of the tags associated with an image and the CCA model is compared to a simple CNN based linear regression model, which allows the CNN layers to be trained using back-propagation.
Proceedings ArticleDOI

Love Thy Neighbors: Image Annotation by Exploiting Image Metadata

TL;DR: In this paper, the authors use image metadata nonparametrategically to generate neighborhoods of related images using Jaccard similarities, then use a deep neural network to blend visual information from the image and its neighbors.
Journal ArticleDOI

A survey and analysis on automatic image annotation

TL;DR: A deep review of state-of-the-art AIA methods is presented by synthesizing 138 literatures published during the past two decades by dividing AIA Methods into five categories and comparing their performance on benchmark dataset and standard evaluation metrics.
References
More filters
Proceedings ArticleDOI

SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition

TL;DR: This work considers visual category recognition in the framework of measuring similarities, or equivalently perceptual distances, to prototype examples of categories and proposes a hybrid of these two methods which deals naturally with the multiclass setting, has reasonable computational complexity both in training and at run time, and yields excellent results in practice.
Book ChapterDOI

Active Matching

TL;DR: This paper shows that the dramatically different approach of using priors dynamically to guide a feature by feature matching search can achieve global matching with much fewer image processing operations and lower overall computational cost.
Book

Computer Vision - Eccv 2002

TL;DR: A novel algorithm for recovering a smooth manifold of unknown dimension and topology from a set of points known to belong to it is presented and it can easily be applied when the ambient space is not Euclidean, which is important in many applications.
Proceedings ArticleDOI

Pegasos: Primal Estimated sub-GrAdient SOlver for SVM

TL;DR: A simple and effective iterative algorithm for solving the optimization problem cast by Support Vector Machines that alternates between stochastic gradient descent steps and projection steps that can seamlessly be adapted to employ non-linear kernels while working solely on the primal objective function.
Journal ArticleDOI

Supervised Learning of Semantic Classes for Image Annotation and Retrieval

TL;DR: The supervised formulation is shown to achieve higher accuracy than various previously published methods at a fraction of their computational cost and to be fairly robust to parameter tuning.
Related Papers (5)