scispace - formally typeset
Open AccessBook ChapterDOI

Image annotation using metric learning in semantic neighbourhoods

Reads0
Chats0
TLDR
2PKNN, a two-step variant of the classical K-nearest neighbour algorithm, is proposed that performs comparable to the current state-of-the-art on three challenging image annotation datasets, and shows significant improvements after metric learning.
Abstract
Automatic image annotation aims at predicting a set of textual labels for an image that describe its semantics. These are usually taken from an annotation vocabulary of few hundred labels. Because of the large vocabulary, there is a high variance in the number of images corresponding to different labels ("class-imbalance"). Additionally, due to the limitations of manual annotation, a significant number of available images are not annotated with all the relevant labels ("weak-labelling"). These two issues badly affect the performance of most of the existing image annotation models. In this work, we propose 2PKNN, a two-step variant of the classical K-nearest neighbour algorithm, that addresses these two issues in the image annotation task. The first step of 2PKNN uses "image-to-label" similarities, while the second step uses "image-to-image" similarities; thus combining the benefits of both. Since the performance of nearest-neighbour based methods greatly depends on how features are compared, we also propose a metric learning framework over 2PKNN that learns weights for multiple features as well as distances together. This is done in a large margin set-up by generalizing a well-known (single-label) classification metric learning algorithm for multi-label prediction. For scalability, we implement it by alternating between stochastic sub-gradient descent and projection steps. Extensive experiments demonstrate that, though conceptually simple, 2PKNN alone performs comparable to the current state-of-the-art on three challenging image annotation datasets, and shows significant improvements after metric learning.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics

TL;DR: This paper starts with canonical correlation analysis (CCA), a popular and successful approach for mapping visual and textual features to the same latent space, and incorporates a third view capturing high-level image semantics, represented either by a single category or multiple non-mutually-exclusive concepts.
Proceedings ArticleDOI

NMF-KNN: Image Annotation Using Weighted Multi-view Non-negative Matrix Factorization

TL;DR: The key idea is to learn query-specific generative model on the features of nearest-neighbors and tags using the proposed NMF-KNN approach which imposes consensus constraint on the coefficient matrices across different features to solve the problem of feature fusion.
Proceedings ArticleDOI

Automatic Image Annotation using Deep Learning Representations

TL;DR: It is demonstrated that word embedding vectors perform better than binary vectors as a representation of the tags associated with an image and the CCA model is compared to a simple CNN based linear regression model, which allows the CNN layers to be trained using back-propagation.
Proceedings ArticleDOI

Love Thy Neighbors: Image Annotation by Exploiting Image Metadata

TL;DR: In this paper, the authors use image metadata nonparametrategically to generate neighborhoods of related images using Jaccard similarities, then use a deep neural network to blend visual information from the image and its neighbors.
Journal ArticleDOI

A survey and analysis on automatic image annotation

TL;DR: A deep review of state-of-the-art AIA methods is presented by synthesizing 138 literatures published during the past two decades by dividing AIA Methods into five categories and comparing their performance on benchmark dataset and standard evaluation metrics.
References
More filters
Proceedings Article

Bayes Optimal Multilabel Classification via Probabilistic Classifier Chains

TL;DR: This paper formalize and analyze MLC within a probabilistic setting, and proposes a new method for MLC that generalizes and outperforms another approach, called classifier chains, that was recently introduced in the literature.
Proceedings Article

Large Margin Multi-Task Metric Learning

TL;DR: This paper proposes an alternative formulation for multi-task learning by extending the recently published large margin nearest neighbor (1mnn) algorithm to the MTL paradigm and shows that it consistently outperforms single-task kNN under several metrics and state-of-the-art MTL classifiers.
Proceedings ArticleDOI

Multi-label learning with incomplete class assignments

TL;DR: This work proposes a ranking based multi-label learning framework that explicitly addresses the challenge of learning from incompletely labeled data by exploiting the group lasso technique to combine the ranking errors.
Proceedings ArticleDOI

Automatic image annotation using group sparsity

TL;DR: A regularization based feature selection algorithm to leverage both the sparsity and clustering properties of features, and incorporate it into the image annotation task and a novel approach is also proposed to iteratively obtain similar and dissimilar pairs from both the keyword similarity and the relevance feedback.
Proceedings Article

Choosing linguistics over vision to describe images

TL;DR: This paper addresses the problem of automatically generating human-like descriptions for unseen images, given a collection of images and their corresponding human-generated descriptions, and presents a generic method which benefits from all three sources simultaneously, and is capable of constructing novel descriptions.
Related Papers (5)