scispace - formally typeset
Search or ask a question
Book ChapterDOI

Image annotation using metric learning in semantic neighbourhoods

TL;DR: 2PKNN, a two-step variant of the classical K-nearest neighbour algorithm, is proposed that performs comparable to the current state-of-the-art on three challenging image annotation datasets, and shows significant improvements after metric learning.
Abstract: Automatic image annotation aims at predicting a set of textual labels for an image that describe its semantics. These are usually taken from an annotation vocabulary of few hundred labels. Because of the large vocabulary, there is a high variance in the number of images corresponding to different labels ("class-imbalance"). Additionally, due to the limitations of manual annotation, a significant number of available images are not annotated with all the relevant labels ("weak-labelling"). These two issues badly affect the performance of most of the existing image annotation models. In this work, we propose 2PKNN, a two-step variant of the classical K-nearest neighbour algorithm, that addresses these two issues in the image annotation task. The first step of 2PKNN uses "image-to-label" similarities, while the second step uses "image-to-image" similarities; thus combining the benefits of both. Since the performance of nearest-neighbour based methods greatly depends on how features are compared, we also propose a metric learning framework over 2PKNN that learns weights for multiple features as well as distances together. This is done in a large margin set-up by generalizing a well-known (single-label) classification metric learning algorithm for multi-label prediction. For scalability, we implement it by alternating between stochastic sub-gradient descent and projection steps. Extensive experiments demonstrate that, though conceptually simple, 2PKNN alone performs comparable to the current state-of-the-art on three challenging image annotation datasets, and shows significant improvements after metric learning.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: Inspired by nearest neighbors, the semantic neighbor graph is introduced to generate preannotation, balancing unbalanced labels and the correlations between labels and images are modeled by the random dot product graph, to deeply mine semantics.
Abstract: With great developments of computing technologies and data mining methods, image annotation has attracted much attraction in smart agriculture. However, the semantic gap between labels and images poses great challenges on image annotation in agriculture, due to the label imbalance and difficulties in understanding obscure relationships of images and labels. In this paper, an image annotation method based on graph learning is proposed to accurately annotate images. Specifically, inspired by nearest neighbors, the semantic neighbor graph is introduced to generate preannotation, balancing unbalanced labels. Then, the correlations between labels and images are modeled by the random dot product graph, to deeply mine semantics. Finally, we perform experiments on two image sets. The experimental results show that our method is much better than the previous method, which verifies the effectiveness of our model and the proposed method.

3 citations

Journal ArticleDOI
TL;DR: The evaluation of the Mvg-NMF approach, a multi-view-group non-negative matrix factorization (NMF) method for an AIA system which considers both common and individual factors, showed that it is highly competitive with the recent state-of-the-art works.
Abstract: In automatic image annotation (AIA) different features describe images from different aspects or views. Part of information embedded in some views is common for all views, while other parts are individual and specific. In this paper, we present the Mvg-NMF approach, a multi-view-group non-negative matrix factorization (NMF) method for an AIA system which considers both common and individual factors. The NMF framework discovers a latent space by decomposing data into a set of non-negative basis vectors and coefficients. The views divided into homogeneous groups and latent spaces are extracted for each group. After mapping the test images into these spaces, a unified distance matrix is computed from the distance between images in all spaces. Then a search-based method is used to propagate tags from the nearest neighbors to test images. The evaluation on three datasets commonly used for image annotation showed that the Mvg-NMF is highly competitive with the recent state-of-the-art works.

3 citations


Cites methods from "Image annotation using metric learn..."

  • ...To compare the proposed method with the previous works, we utilized the same four evaluation metrics used in [6, 16, 36]....

    [...]

  • ...Verma and Jawaher [36] proposed 2PKNN, a two-step kNN, to address the class-imbalance problem and used a metric learning framework to assign weights to each dimension of each feature which has a great impact on the performance....

    [...]

  • ...0 238 – – – – 2PKNN + ML [36] 44 46 45 191 53 27 35....

    [...]

Dissertation
09 May 2018
TL;DR: Une methode semi-automatique pour construire une hierarchie semantique en modelisant de nombreuses relations semantiques entre les mots-cles d’annotation.
Abstract: Cette these traite le probleme d’extension d’annotation d’images. En effet, la croissance rapide des archives de contenus visuels disponibles a engendre un besoin en techniques d’indexation et de recherche d’information multimedia. L’annotation d’images permet l’indexation et la recherche dans des grandes collections d’images d’une facon facile et rapide. A partir de bases d’images partiellement annotees manuellement, nous souhaitons completer les annotations de ces bases, grâce a l’annotation automatique, pour pouvoir rendre plus efficaces les methodes de recherche et/ou classification d’images. Pour l’extension automatique d’annotation d’images, nous avons utilise les modeles graphiques probabilistes. Le modele propose est un melange de distributions multinomiales et de melanges de Gaussiennes ou nous avons combine des caracteristiques visuelles et textuelles. Pour reduire le cout de l’annotation manuelle et ameliorer la qualite de l’annotation obtenue, nous avons integre des retours utilisateur dans notre modele. Les retours utilisateur ont ete effectues en utilisant l’apprentissage dans l’apprentissage, l’apprentissage incremental et l’apprentissage actif. Pour combler le probleme du fosse semantique et enrichir l’annotation d’images, nous avons utilise une hierarchie semantique en modelisant de nombreuses relations semantiques entre les mots-cles d’annotation. Nous avons donc presente une methode semi-automatique pour construire une hierarchie semantique a partie d’un ensemble de mots-cles. Apres la construction de la hierarchie, nous l’avons integre dans notre modele d’annotation d’images. Le modele obtenu avec la hierarchie est un melange de distributions de Bernoulli et de melanges de Gaussiennes

2 citations

DOI
01 Jan 2019
TL;DR: This research presents a novel and scalable approach to image classification called “Smart Guess” that automates the very labor-intensive and therefore time-heavy and expensive and expensive process of manually cataloging images.
Abstract: DATA-DRIVEN APPROACH TO IMAGE CLASSIFICATION

2 citations


Cites background or methods from "Image annotation using metric learn..."

  • ...The standard evaluations [67, 58, 27, 41, 132] require five word annotations per image, hence we annotate a test image with the five words having the highest responses....

    [...]

  • ...And in most recent models [83] [41] [132], it was shown that simply using multiple global features can yield better performance over all the existing methods....

    [...]

  • ...Feature Corel-5K ESP Game IAPRTC-12 Method Visual text P R F N+ P R F N+ P R F N+ JEC [83] HC - 27 32 29 139 22 25 23 224 28 29 29 250 MBRM [27] HC - 24 25 25 122 18 19 19 209 24 23 24 223 TagProp(σML) [41] HC - 33 42 37 160 39 27 32 239 46 35 40 266 2PKNN [132] HC - 39 40 40 177 51 23 32 245 49 32 39 274 2PKNN+ML [132] HC - 44 46 45 191 53 27 36 252 54 37 44 278 KCCA-2PKNN [7] HC - 42 46 44 179 - - - - 59 30 40 259 SKL-CRM [90] HC - 39 46 42 184 41 26 32 248 47 32 38 274 JEC VGG-16 - 31 32 32 141 26 22 24 234 28 21 24 237 2PKNN VGG-16 - 33 30 32 160 40 23 29 250 38 23 29 261 TagProp (σ) VGG-16 30 35 32 149 31 28 30 246 38 30 34 260 Below are our models SVM-DMBRM HC - 36 48 41 197 55 25 34 259 56 29 38 283 SVM-DMBRM VGG-16 - 42 45 43 186 51 26 35 251 58 27 37 268 HHD VGG-16 - 31 49 38 194 35 36 34 257 32 44 36 280 CCA VGG-16 W2V 35 46 40 172 29 32 30 250 33 32 33 268 KCCA VGG-16 W2V 39 53 45 184 30 36 33 252 38 39 38 273 CCA-KNN VGG-16 BV 39 51 44 192 44 32 37 254 41 34 37 273 CCA-KNN VGG-16 W2V 42 52 46 201 46 36 41 260 45 38 41 278 CNN-R Caffe-Net W2V 32 41 37 166 45 29 35 248 49 31 38 272...

    [...]

  • ...These models may be generative [27, 67, 142], discriminative [14, 133, 41, 132] or nearest neighbor-based ones; among these, nearest neighbor based models are shown to be the most successful [83, 41, 132]....

    [...]

  • ...Multiple features with the right type of model are shown to improve the annotation performance significantly in the current state of the art system [132]....

    [...]

Book ChapterDOI
02 Sep 2015
TL;DR: A model for the annotation extension of images using a probabilistic graphical model based on a mixture of multinomial distributions and mixtures of Gaussians is proposed.
Abstract: With the fast development of digital cameras and social media image sharing, automatic image annotation has become a research area of great interest. It enables indexing, extracting and searching in large collections of images in an easier and faster way. In this paper, we propose a model for the annotation extension of images using a probabilistic graphical model. This model is based on a mixture of multinomial distributions and mixtures of Gaussians. The results of the proposed model are promising on three standard datasets: Corel-5k, ESP-Game and IAPRTC-12.

2 citations

References
More filters
Proceedings Article
05 Dec 2005
TL;DR: In this article, a Mahanalobis distance metric for k-NN classification is trained with the goal that the k-nearest neighbors always belong to the same class while examples from different classes are separated by a large margin.
Abstract: We show how to learn a Mahanalobis distance metric for k-nearest neighbor (kNN) classification by semidefinite programming. The metric is trained with the goal that the k-nearest neighbors always belong to the same class while examples from different classes are separated by a large margin. On seven data sets of varying size and difficulty, we find that metrics trained in this way lead to significant improvements in kNN classification—for example, achieving a test error rate of 1.3% on the MNIST handwritten digits. As in support vector machines (SVMs), the learning problem reduces to a convex optimization based on the hinge loss. Unlike learning in SVMs, however, our framework requires no modification or extension for problems in multiway (as opposed to binary) classification.

4,433 citations

Journal ArticleDOI
TL;DR: This paper shows how to learn a Mahalanobis distance metric for kNN classification from labeled examples in a globally integrated manner and finds that metrics trained in this way lead to significant improvements in kNN Classification.
Abstract: The accuracy of k-nearest neighbor (kNN) classification depends significantly on the metric used to compute distances between different examples. In this paper, we show how to learn a Mahalanobis distance metric for kNN classification from labeled examples. The Mahalanobis metric can equivalently be viewed as a global linear transformation of the input space that precedes kNN classification using Euclidean distances. In our approach, the metric is trained with the goal that the k-nearest neighbors always belong to the same class while examples from different classes are separated by a large margin. As in support vector machines (SVMs), the margin criterion leads to a convex optimization based on the hinge loss. Unlike learning in SVMs, however, our approach requires no modification or extension for problems in multiway (as opposed to binary) classification. In our framework, the Mahalanobis distance metric is obtained as the solution to a semidefinite program. On several data sets of varying size and difficulty, we find that metrics trained in this way lead to significant improvements in kNN classification. Sometimes these results can be further improved by clustering the training examples and learning an individual metric within each cluster. We show how to learn and combine these local metrics in a globally integrated manner.

4,157 citations


"Image annotation using metric learn..." refers background or methods in this paper

  • ...With this goal, we perform metric learning over 2PKNN by generalizing the LMNN [11] algorithm for multi-label prediction....

    [...]

  • ...In such a scenario, (i) since each base distance contributes differently, we can learn appropriate weights to combine them in the distance space [2, 3]; and (ii) since every feature (such as SIFT or colour histogram) itself is represented as a multidimensional vector, its individual elements can also be weighted in the feature space [11]....

    [...]

  • ...Our extension of LMNN conceptually differs from its previous extensions such as [21] in at least two significant ways: (i) we adapt LMNN in its choice of target/impostors to learn metrics for multi-label prediction problems, whereas [21] uses the same definition of target/impostors as in LMNN to address classification problem in multi-task setting, and (ii) in our formulation, the amount of push applied on an impostor varies depending on its conceptual similarity w.r.t. a given sample, which makes it suitable for multi-label prediction tasks....

    [...]

  • ...Our metric learning framework extends LMNN in two major ways: (i) LMNN is meant for single-label classification (or simply classification) problems, while we adapt it for images annotation which is a multi-label classification task; and (ii) LMNN learns a single Mahalanobis metric in the feature space, while we extend it to learn linear metrics for multi- Image Annotation Using Metric Learning in Semantic Neighbourhoods 3 ple features as well as distances together....

    [...]

  • ...For this purpose, we extend the classical LMNN [11] algorithm for multi-label prediction....

    [...]

Proceedings ArticleDOI
25 Apr 2004
TL;DR: A new interactive system: a game that is fun and can be used to create valuable output that addresses the image-labeling problem and encourages people to do the work by taking advantage of their desire to be entertained.
Abstract: We introduce a new interactive system: a game that is fun and can be used to create valuable output. When people play the game they help determine the contents of images by providing meaningful labels for them. If the game is played as much as popular online games, we estimate that most images on the Web can be labeled in a few months. Having proper labels associated with each image on the Web would allow for more accurate image search, improve the accessibility of sites (by providing descriptions of images to visually impaired individuals), and help users block inappropriate images. Our system makes a significant contribution because of its valuable output and because of the way it addresses the image-labeling problem. Rather than using computer vision techniques, which don't work well enough, we encourage people to do the work by taking advantage of their desire to be entertained.

2,365 citations


"Image annotation using metric learn..." refers background in this paper

  • ...ESP Game contains images annotated using an on-line game, where two (mutually unknown) players are randomly given an image for which they have to predict same keyword(s) to score points [22]....

    [...]

Journal ArticleDOI
TL;DR: A simple and effective stochastic sub-gradient descent algorithm for solving the optimization problem cast by Support Vector Machines, which is particularly well suited for large text classification problems, and demonstrates an order-of-magnitude speedup over previous SVM learning methods.
Abstract: We describe and analyze a simple and effective stochastic sub-gradient descent algorithm for solving the optimization problem cast by Support Vector Machines (SVM). We prove that the number of iterations required to obtain a solution of accuracy $${\epsilon}$$ is $${\tilde{O}(1 / \epsilon)}$$, where each iteration operates on a single training example. In contrast, previous analyses of stochastic gradient descent methods for SVMs require $${\Omega(1 / \epsilon^2)}$$ iterations. As in previously devised SVM solvers, the number of iterations also scales linearly with 1/λ, where λ is the regularization parameter of SVM. For a linear kernel, the total run-time of our method is $${\tilde{O}(d/(\lambda \epsilon))}$$, where d is a bound on the number of non-zero features in each example. Since the run-time does not depend directly on the size of the training set, the resulting algorithm is especially suited for learning from large datasets. Our approach also extends to non-linear kernels while working solely on the primal objective function, though in this case the runtime does depend linearly on the training set size. Our algorithm is particularly well suited for large text classification problems, where we demonstrate an order-of-magnitude speedup over previous SVM learning methods.

2,037 citations


"Image annotation using metric learn..." refers methods in this paper

  • ...To overcome this issue, we solve it by alternatively using stochastic sub-gradient descent and projection steps (similar to Pegasos [12])....

    [...]

  • ...To address this, we implement metric learning by alternating between stochastic sub-gradient descent and projection steps (similar to Pegasos [12])....

    [...]

Book ChapterDOI
28 May 2002
TL;DR: This work shows how to cluster words that individually are difficult to predict into clusters that can be predicted well, and cannot predict the distinction between train and locomotive using the current set of features, but can predict the underlying concept.
Abstract: We describe a model of object recognition as machine translation. In this model, recognition is a process of annotating image regions with words. Firstly, images are segmented into regions, which are classified into region types using a variety of features. A mapping between region types and keywords supplied with the images, is then learned, using a method based around EM. This process is analogous with learning a lexicon from an aligned bitext. For the implementation we describe, these words are nouns taken from a large vocabulary. On a large test set, the method can predict numerous words with high accuracy. Simple methods identify words that cannot be predicted well. We show how to cluster words that individually are difficult to predict into clusters that can be predicted well -- for example, we cannot predict the distinction between train and locomotive using the current set of features, but we can predict the underlying concept. The method is trained on a substantial collection of images. Extensive experimental results illustrate the strengths and weaknesses of the approach.

1,765 citations


"Image annotation using metric learn..." refers background in this paper

  • ...translation models [13, 14] and nearest-neighbour based relevance models [1, 8]....

    [...]

  • ...Corel 5K was first used in [14], and since then it has become a benchmark for comparing annotation performance....

    [...]