scispace - formally typeset
Search or ask a question
Book ChapterDOI

Image annotation using metric learning in semantic neighbourhoods

TL;DR: 2PKNN, a two-step variant of the classical K-nearest neighbour algorithm, is proposed that performs comparable to the current state-of-the-art on three challenging image annotation datasets, and shows significant improvements after metric learning.
Abstract: Automatic image annotation aims at predicting a set of textual labels for an image that describe its semantics. These are usually taken from an annotation vocabulary of few hundred labels. Because of the large vocabulary, there is a high variance in the number of images corresponding to different labels ("class-imbalance"). Additionally, due to the limitations of manual annotation, a significant number of available images are not annotated with all the relevant labels ("weak-labelling"). These two issues badly affect the performance of most of the existing image annotation models. In this work, we propose 2PKNN, a two-step variant of the classical K-nearest neighbour algorithm, that addresses these two issues in the image annotation task. The first step of 2PKNN uses "image-to-label" similarities, while the second step uses "image-to-image" similarities; thus combining the benefits of both. Since the performance of nearest-neighbour based methods greatly depends on how features are compared, we also propose a metric learning framework over 2PKNN that learns weights for multiple features as well as distances together. This is done in a large margin set-up by generalizing a well-known (single-label) classification metric learning algorithm for multi-label prediction. For scalability, we implement it by alternating between stochastic sub-gradient descent and projection steps. Extensive experiments demonstrate that, though conceptually simple, 2PKNN alone performs comparable to the current state-of-the-art on three challenging image annotation datasets, and shows significant improvements after metric learning.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: A novel multi- label dictionary learning approach, named multi-label dictionary learning (MLDL) with label consistency regularization and partial-identical label embedding MLDL, which conducts MLDL and partial -identicallabel embedding simultaneously.
Abstract: Image annotation has attracted a lot of research interest, and multi-label learning is an effective technique for image annotation. How to effectively exploit the underlying correlation among labels is a crucial task for multi-label learning. Most existing multi-label learning methods exploit the label correlation only in the output label space, leaving the connection between the label and the features of images untouched. Although, recently some methods attempt toward exploiting the label correlation in the input feature space by using the label information, they cannot effectively conduct the learning process in both the spaces simultaneously, and there still exists much room for improvement. In this paper, we propose a novel multi-label learning approach, named multi-label dictionary learning (MLDL) with label consistency regularization and partial-identical label embedding MLDL, which conducts MLDL and partial-identical label embedding simultaneously. In the input feature space, we incorporate the dictionary learning technique into multi-label learning and design the label consistency regularization term to learn the better representation of features. In the output label space, we design the partial-identical label embedding, in which the samples with exactly same label set can cluster together, and the samples with partial-identical label sets can collaboratively represent each other. Experimental results on the three widely used image datasets, including Corel 5K, IAPR TC12, and ESP Game, demonstrate the effectiveness of the proposed approach.

92 citations


Cites background from "Image annotation using metric learn..."

  • ...It can conduct multi-label dictionary learning in input feature space and partial-identical label embedding in output label space, simultaneously....

    [...]

  • ...(Corresponding author: Xiao-Yuan Jing.)...

    [...]

  • ...In addition, MLDL specially designs the label consistency regularization term for multi-label dictionary learning to enhance the discriminability of learned dictionary....

    [...]

Journal ArticleDOI
TL;DR: An overview of advances in the field of distance metric learning is offered, and well-tested features are utilized to assess the performance of selected methods following the experimental protocol of the state-of-the-art database labeled faces in the wild.
Abstract: In this paper, we first offer an overview of advances in the field of distance metric learning. Then, we empirically compare selected methods using a common experimental protocol. The number of distance metric learning algorithms proposed keeps growing due to their effectiveness and wide application. However, existing surveys are either outdated or they focus only on a few methods. As a result, there is an increasing need to summarize the obtained knowledge in a concise, yet informative manner. Moreover, existing surveys do not conduct comprehensive experimental comparisons. On the other hand, individual distance metric learning papers compare the performance of the proposed approach with only a few related methods and under different settings. This highlights the need for an experimental evaluation using a common and challenging protocol. To this end, we conduct face verification experiments, as this task poses significant challenges due to varying conditions during data acquisition. In addition, face verification is a natural application for distance metric learning because the encountered challenge is to define a distance function that: 1) accurately expresses the notion of similarity for verification; 2) is robust to noisy data; 3) generalizes well to unseen subjects; and 4) scales well with the dimensionality and number of training samples. In particular, we utilize well-tested features to assess the performance of selected methods following the experimental protocol of the state-of-the-art database labeled faces in the wild. A summary of the results is presented along with a discussion of the insights obtained and lessons learned by employing the corresponding algorithms.

76 citations


Cites background from "Image annotation using metric learn..."

  • ...In addition, we found many papers that take into consideration special forms of data, including time series [54], structured [31], [55]–[59], multilabel, multiview, and bags [60]–[65]....

    [...]

Journal ArticleDOI
TL;DR: A label propagation framework based on Kernel Canonical Correlation Analysis (KCCA) is proposed, which builds a latent semantic space where correlation of visual and textual features are well preserved into a semantic embedding.

76 citations

Journal ArticleDOI
TL;DR: This paper redefines multiple kernels using deep multi-layer networks as a multi-layered combination of nonlinear activation functions, each one involves a combination of several elementary or intermediate kernels, and results into a positive semi-definite deep kernel.
Abstract: Multiple kernel learning (MKL) is a widely used technique for kernel design. Its principle consists in learning, for a given support vector classifier, the most suitable convex (or sparse) linear combination of standard elementary kernels. However, these combinations are shallow and often powerless to capture the actual similarity between highly semantic data, especially for challenging classification tasks, such as image annotation. In this paper, we redefine multiple kernels using deep multi-layer networks. In this new contribution, a deep multiple kernel is recursively defined as a multi-layered combination of nonlinear activation functions, each one involves a combination of several elementary or intermediate kernels, and results into a positive semi-definite deep kernel. We propose four different frameworks in order to learn the weights of these networks: supervised, unsupervised, kernel-based semi-supervised, and Laplacian-based semi-supervised. When plugged into support vector machines, the resulting deep kernel networks show clear gain, compared with several shallow kernels for the task of image annotation. Extensive experiments and analysis on the challenging ImageCLEF photo annotation benchmark, the COREL5k database, and the Banana data set validate the effectiveness of the proposed method.

74 citations


Cites methods from "Image annotation using metric learn..."

  • ...…[18] extend CRM to sparse kernel learning using a multinomial function for countbased features; alternative techniques, model the probability of concepts using parametric approaches such as gaussian mixture models [19], [20], Latent Dirichlet Allocation [21] and translation-based models [22]....

    [...]

Proceedings ArticleDOI
01 Jan 2013
TL;DR: Performance comparison among different methods for kernelization using chi-squared kernel is shown.
Abstract: MBRM[1] 0.24/0.25/0.245/122 0.18/0.19/0.185/209 0.24/0.23/0.235/233 JEC[3] 0.27/0.32/0.293/139 0.22/0.25/0.234/224 0.28/0.29/0.285/250 TagProp-ML[2] 0.31/0.37/0.337/146 0.49/0.20/0.284/213 0.48/0.25/0.329/227 TagProp-s ML[2] 0.33/0.42/0.370/160 0.39/0.27/0.319/239 0.46/0.35/0.398/266 KSVM 0.29/0.43/0.346/174 0.30/0.28/0.290/256 0.43/0.27/0.332/266 KSVM-VT (Ours) 0.32/0.42/0.363/179 0.33/0.32/0.325/259 0.47/0.29/0.359/268 Table 1: Performance comparison among different methods. The prefix ‘K’ corresponds to kernelization using chi-squared kernel.

70 citations


Cites background or methods or result from "Image annotation using metric learn..."

  • ...We use three datasets popular in the image annotation task [5, 8, 11, 19]....

    [...]

  • ...Once we have learned all the classifiers, predicting labels for a new image becomes several times faster than the NN-based models [5, 8, 11, 19]....

    [...]

  • ...Hence this has emerged as an important research area during the last decade [2, 5, 6, 8, 11, 19, 23]....

    [...]

  • ...Among the image annotation models being proposed in the past, generative or nearestneighbour (NN)-based models [5, 8, 11, 19, 23] have particularly been shown to be successful for large vocabulary datasets such as Corel-5k [4], ESP Game [20] and IAPRTC-12 [7]....

    [...]

  • ...We use the same evaluation criteria as being used by previous methods [6, 8, 11, 19, 23]....

    [...]

References
More filters
Proceedings Article
05 Dec 2005
TL;DR: In this article, a Mahanalobis distance metric for k-NN classification is trained with the goal that the k-nearest neighbors always belong to the same class while examples from different classes are separated by a large margin.
Abstract: We show how to learn a Mahanalobis distance metric for k-nearest neighbor (kNN) classification by semidefinite programming. The metric is trained with the goal that the k-nearest neighbors always belong to the same class while examples from different classes are separated by a large margin. On seven data sets of varying size and difficulty, we find that metrics trained in this way lead to significant improvements in kNN classification—for example, achieving a test error rate of 1.3% on the MNIST handwritten digits. As in support vector machines (SVMs), the learning problem reduces to a convex optimization based on the hinge loss. Unlike learning in SVMs, however, our framework requires no modification or extension for problems in multiway (as opposed to binary) classification.

4,433 citations

Journal ArticleDOI
TL;DR: This paper shows how to learn a Mahalanobis distance metric for kNN classification from labeled examples in a globally integrated manner and finds that metrics trained in this way lead to significant improvements in kNN Classification.
Abstract: The accuracy of k-nearest neighbor (kNN) classification depends significantly on the metric used to compute distances between different examples. In this paper, we show how to learn a Mahalanobis distance metric for kNN classification from labeled examples. The Mahalanobis metric can equivalently be viewed as a global linear transformation of the input space that precedes kNN classification using Euclidean distances. In our approach, the metric is trained with the goal that the k-nearest neighbors always belong to the same class while examples from different classes are separated by a large margin. As in support vector machines (SVMs), the margin criterion leads to a convex optimization based on the hinge loss. Unlike learning in SVMs, however, our approach requires no modification or extension for problems in multiway (as opposed to binary) classification. In our framework, the Mahalanobis distance metric is obtained as the solution to a semidefinite program. On several data sets of varying size and difficulty, we find that metrics trained in this way lead to significant improvements in kNN classification. Sometimes these results can be further improved by clustering the training examples and learning an individual metric within each cluster. We show how to learn and combine these local metrics in a globally integrated manner.

4,157 citations


"Image annotation using metric learn..." refers background or methods in this paper

  • ...With this goal, we perform metric learning over 2PKNN by generalizing the LMNN [11] algorithm for multi-label prediction....

    [...]

  • ...In such a scenario, (i) since each base distance contributes differently, we can learn appropriate weights to combine them in the distance space [2, 3]; and (ii) since every feature (such as SIFT or colour histogram) itself is represented as a multidimensional vector, its individual elements can also be weighted in the feature space [11]....

    [...]

  • ...Our extension of LMNN conceptually differs from its previous extensions such as [21] in at least two significant ways: (i) we adapt LMNN in its choice of target/impostors to learn metrics for multi-label prediction problems, whereas [21] uses the same definition of target/impostors as in LMNN to address classification problem in multi-task setting, and (ii) in our formulation, the amount of push applied on an impostor varies depending on its conceptual similarity w.r.t. a given sample, which makes it suitable for multi-label prediction tasks....

    [...]

  • ...Our metric learning framework extends LMNN in two major ways: (i) LMNN is meant for single-label classification (or simply classification) problems, while we adapt it for images annotation which is a multi-label classification task; and (ii) LMNN learns a single Mahalanobis metric in the feature space, while we extend it to learn linear metrics for multi- Image Annotation Using Metric Learning in Semantic Neighbourhoods 3 ple features as well as distances together....

    [...]

  • ...For this purpose, we extend the classical LMNN [11] algorithm for multi-label prediction....

    [...]

Proceedings ArticleDOI
25 Apr 2004
TL;DR: A new interactive system: a game that is fun and can be used to create valuable output that addresses the image-labeling problem and encourages people to do the work by taking advantage of their desire to be entertained.
Abstract: We introduce a new interactive system: a game that is fun and can be used to create valuable output. When people play the game they help determine the contents of images by providing meaningful labels for them. If the game is played as much as popular online games, we estimate that most images on the Web can be labeled in a few months. Having proper labels associated with each image on the Web would allow for more accurate image search, improve the accessibility of sites (by providing descriptions of images to visually impaired individuals), and help users block inappropriate images. Our system makes a significant contribution because of its valuable output and because of the way it addresses the image-labeling problem. Rather than using computer vision techniques, which don't work well enough, we encourage people to do the work by taking advantage of their desire to be entertained.

2,365 citations


"Image annotation using metric learn..." refers background in this paper

  • ...ESP Game contains images annotated using an on-line game, where two (mutually unknown) players are randomly given an image for which they have to predict same keyword(s) to score points [22]....

    [...]

Journal ArticleDOI
TL;DR: A simple and effective stochastic sub-gradient descent algorithm for solving the optimization problem cast by Support Vector Machines, which is particularly well suited for large text classification problems, and demonstrates an order-of-magnitude speedup over previous SVM learning methods.
Abstract: We describe and analyze a simple and effective stochastic sub-gradient descent algorithm for solving the optimization problem cast by Support Vector Machines (SVM). We prove that the number of iterations required to obtain a solution of accuracy $${\epsilon}$$ is $${\tilde{O}(1 / \epsilon)}$$, where each iteration operates on a single training example. In contrast, previous analyses of stochastic gradient descent methods for SVMs require $${\Omega(1 / \epsilon^2)}$$ iterations. As in previously devised SVM solvers, the number of iterations also scales linearly with 1/λ, where λ is the regularization parameter of SVM. For a linear kernel, the total run-time of our method is $${\tilde{O}(d/(\lambda \epsilon))}$$, where d is a bound on the number of non-zero features in each example. Since the run-time does not depend directly on the size of the training set, the resulting algorithm is especially suited for learning from large datasets. Our approach also extends to non-linear kernels while working solely on the primal objective function, though in this case the runtime does depend linearly on the training set size. Our algorithm is particularly well suited for large text classification problems, where we demonstrate an order-of-magnitude speedup over previous SVM learning methods.

2,037 citations


"Image annotation using metric learn..." refers methods in this paper

  • ...To overcome this issue, we solve it by alternatively using stochastic sub-gradient descent and projection steps (similar to Pegasos [12])....

    [...]

  • ...To address this, we implement metric learning by alternating between stochastic sub-gradient descent and projection steps (similar to Pegasos [12])....

    [...]

Book ChapterDOI
28 May 2002
TL;DR: This work shows how to cluster words that individually are difficult to predict into clusters that can be predicted well, and cannot predict the distinction between train and locomotive using the current set of features, but can predict the underlying concept.
Abstract: We describe a model of object recognition as machine translation. In this model, recognition is a process of annotating image regions with words. Firstly, images are segmented into regions, which are classified into region types using a variety of features. A mapping between region types and keywords supplied with the images, is then learned, using a method based around EM. This process is analogous with learning a lexicon from an aligned bitext. For the implementation we describe, these words are nouns taken from a large vocabulary. On a large test set, the method can predict numerous words with high accuracy. Simple methods identify words that cannot be predicted well. We show how to cluster words that individually are difficult to predict into clusters that can be predicted well -- for example, we cannot predict the distinction between train and locomotive using the current set of features, but we can predict the underlying concept. The method is trained on a substantial collection of images. Extensive experimental results illustrate the strengths and weaknesses of the approach.

1,765 citations


"Image annotation using metric learn..." refers background in this paper

  • ...translation models [13, 14] and nearest-neighbour based relevance models [1, 8]....

    [...]

  • ...Corel 5K was first used in [14], and since then it has become a benchmark for comparing annotation performance....

    [...]