Image annotation using metric learning in semantic neighbourhoods
TL;DR: 2PKNN, a two-step variant of the classical K-nearest neighbour algorithm, is proposed that performs comparable to the current state-of-the-art on three challenging image annotation datasets, and shows significant improvements after metric learning.
Abstract: Automatic image annotation aims at predicting a set of textual labels for an image that describe its semantics. These are usually taken from an annotation vocabulary of few hundred labels. Because of the large vocabulary, there is a high variance in the number of images corresponding to different labels ("class-imbalance"). Additionally, due to the limitations of manual annotation, a significant number of available images are not annotated with all the relevant labels ("weak-labelling"). These two issues badly affect the performance of most of the existing image annotation models. In this work, we propose 2PKNN, a two-step variant of the classical K-nearest neighbour algorithm, that addresses these two issues in the image annotation task. The first step of 2PKNN uses "image-to-label" similarities, while the second step uses "image-to-image" similarities; thus combining the benefits of both. Since the performance of nearest-neighbour based methods greatly depends on how features are compared, we also propose a metric learning framework over 2PKNN that learns weights for multiple features as well as distances together. This is done in a large margin set-up by generalizing a well-known (single-label) classification metric learning algorithm for multi-label prediction. For scalability, we implement it by alternating between stochastic sub-gradient descent and projection steps. Extensive experiments demonstrate that, though conceptually simple, 2PKNN alone performs comparable to the current state-of-the-art on three challenging image annotation datasets, and shows significant improvements after metric learning.
...read more
Citations
545 citations
Cites background or methods from "Image annotation using metric learn..."
...Metric learning has been used for image classification and annotation (Guillaumin et al., 2009; Mensink et al., 2012; Verma and Jawahar, 2012)....
[...]
...…of publications have reported better results with simple data-driven schemes based on retrieving database images similar to a query and transferring the annotations from those images (Chua et al., 2009; Guillaumin et al., 2009; Makadia et al., 2008; Verma and Jawahar, 2012; Wang et al., 2008)....
[...]
...More recently, a number of publications have reported better results with simple data-driven schemes based on retrieving database images similar to a query and transferring the annotations from those images (Chua et al., 2009; Guillaumin et al., 2009; Makadia et al., 2008; Verma and Jawahar, 2012; Wang et al., 2008)....
[...]
...In fact, the standard datasets used for image annotation by Makadia et al. (2008); Guillaumin et al. (2009); Verma and Jawahar (2012) consist of 5K-20K images and have 260-290 tags each....
[...]
...One of the shortcomings of data-driven annotation approaches (Guillaumin et al., 2009; Makadia et al., 2008; Verma and Jawahar, 2012) as well as Wsabie is that they not account for co-occurrence and mutual exclusion constraints between different tags for the same image....
[...]
116 citations
Cites background or methods from "Image annotation using metric learn..."
...Another important difference between our method and existing methods [22, 15, 29] is the number of nearest-neighbors used to propagate the tags....
[...]
...Verma and Jawahar [29] presented two-pass kNN to find neighbors in semantic neighborhoods besides metric learning which learns weights for combining different features....
[...]
114 citations
Cites background or methods from "Image annotation using metric learn..."
...TagProp [30] is again based on nearest neighbor model but they achieved significant improvement by using 15 different local and global features along with metric learning....
[...]
...These models can be generative [5, 17, 32], discriminative [2, 29, 10, 30] or nearest neighbor based and among these, nearest neighbor based models are shown to be the most successful [18, 10, 30]....
[...]
...CRM [17] HC - 16 19 17 107 - - - - - - - SML [2] HC - 23 29 26 137 - - - - - - - MRFA [32] HC - 31 36 33 172 - - - - - - - GS [33] HC - 30 33 31 146 - - - - 32 29 30 252 JEC [18] HC - 27 32 29 139 22 25 23 224 28 29 29 250 CCD [22] HC - 36 41 38 159 36 24 29 232 44 29 35 251 KSVM-VT [29] HC - 32 42 36 179 33 32 33 259 47 29 36 268 MBRM [5] HC - 24 25 25 122 18 19 19 209 24 23 24 223 TagProp(σML) [10] HC - 33 42 37 160 39 27 32 239 46 35 40 266 2PKNN+ML [30] HC - 44 46 45 191 53 27 36 252 54 37 44 278 SVM-DMBRM [21] HC - 36 48 41 197 55 25 34 259 56 29 38 283 KCCA-2PKNN [1] HC - 42 46 44 179 - - - - 59 30 40 259 SKL-CRM [20] HC - 39 46 42 184 41 26 32 248 47 32 38 274...
[...]
...Multiple features with the right type of model are shown to improve the annotation performance significantly in the current state of the art system [30]....
[...]
...Firstly, we present the most widely reported type of evaluation where the recall and precision are computed per word and their average over all the words are reported [21, 1, 20, 30, 18, 5, 17]....
[...]
104 citations
70 citations
Cites background from "Image annotation using metric learn..."
...It can conduct multi-label dictionary learning in input feature space and partial-identical label embedding in output label space, simultaneously....
[...]
...(Corresponding author: Xiao-Yuan Jing.)...
[...]
...In addition, MLDL specially designs the label consistency regularization term for multi-label dictionary learning to enhance the discriminability of learned dictionary....
[...]
References
4,430 citations
3,736 citations
"Image annotation using metric learn..." refers background or methods in this paper
...With this goal, we perform metric learning over 2PKNN by generalizing the LMNN [11] algorithm for multi-label prediction....
[...]
...In such a scenario, (i) since each base distance contributes differently, we can learn appropriate weights to combine them in the distance space [2, 3]; and (ii) since every feature (such as SIFT or colour histogram) itself is represented as a multidimensional vector, its individual elements can also be weighted in the feature space [11]....
[...]
...Our extension of LMNN conceptually differs from its previous extensions such as [21] in at least two significant ways: (i) we adapt LMNN in its choice of target/impostors to learn metrics for multi-label prediction problems, whereas [21] uses the same definition of target/impostors as in LMNN to address classification problem in multi-task setting, and (ii) in our formulation, the amount of push applied on an impostor varies depending on its conceptual similarity w.r.t. a given sample, which makes it suitable for multi-label prediction tasks....
[...]
...Our metric learning framework extends LMNN in two major ways: (i) LMNN is meant for single-label classification (or simply classification) problems, while we adapt it for images annotation which is a multi-label classification task; and (ii) LMNN learns a single Mahalanobis metric in the feature space, while we extend it to learn linear metrics for multi- Image Annotation Using Metric Learning in Semantic Neighbourhoods 3 ple features as well as distances together....
[...]
...For this purpose, we extend the classical LMNN [11] algorithm for multi-label prediction....
[...]
2,274 citations
"Image annotation using metric learn..." refers background in this paper
...ESP Game contains images annotated using an on-line game, where two (mutually unknown) players are randomly given an image for which they have to predict same keyword(s) to score points [22]....
[...]
1,891 citations
"Image annotation using metric learn..." refers methods in this paper
...To overcome this issue, we solve it by alternatively using stochastic sub-gradient descent and projection steps (similar to Pegasos [12])....
[...]
...To address this, we implement metric learning by alternating between stochastic sub-gradient descent and projection steps (similar to Pegasos [12])....
[...]
1,721 citations
"Image annotation using metric learn..." refers background in this paper
...translation models [13, 14] and nearest-neighbour based relevance models [1, 8]....
[...]
...Corel 5K was first used in [14], and since then it has become a benchmark for comparing annotation performance....
[...]