scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

A Robust Distance with Correlated Metric Learning for Multi-Instance Multi-Label Data

TL;DR: This paper addresses the multiple instance learning problem using a novel Bag-to-Class distance measure, parameterizes the proposed distance measure using class-specific distance metrics, and proposes a novel metric learning framework that explicitly captures inter-class correlations within the learned metrics.
Abstract: In multi-instance data, every object is a bag that contains multiple elements or instances. Each bag may be assigned to one or more classes, such that it has at least one instance corresponding to every assigned class. However, since the annotations are at bag-level, there is no direct association between the instances within a bag and the assigned class labels, hence making the problem significantly challenging. While existing methods have mostly focused on Bag-to-Bag or Class-to-Bag distances, in this paper, we address the multiple instance learning problem using a novel Bag-to-Class distance measure. This is based on two observations: (a) existence of outliers is natural in multi-instance data, and (b) there may exist multiple instances within a bag that belong to a particular class. In order to address these, in the proposed distance measure (a) we employ L1-distance that brings robustness against outliers, and (b) rather than considering only the most similar instance-pair during distance computation as done by existing methods, we consider a subset of instances within a bag while determining its relevance to a given class. We parameterize the proposed distance measure using class-specific distance metrics, and propose a novel metric learning framework that explicitly captures inter-class correlations within the learned metrics. Experiments on two popular datasets demonstrate the effectiveness of the proposed distance measure and metric learning.
Citations
More filters
Journal ArticleDOI
TL;DR: A novel deep metric learning method to tackle the multi-label image classification problem by attempting to explore a latent space, where images and labels are embedded via two unique deep neural networks, respectively, to capture the relationships between image features and labels.
Abstract: In this paper, we present a novel deep metric learning method to tackle the multi-label image classification problem. In order to better learn the correlations among images features, as well as labels, we attempt to explore a latent space, where images and labels are embedded via two unique deep neural networks, respectively. To capture the relationships between image features and labels, we aim to learn a two-way deep distance metric over the embedding space from two different views, i.e., the distance between one image and its labels is not only smaller than those distances between the image and its labels’ nearest neighbors but also smaller than the distances between the labels and other images corresponding to the labels’ nearest neighbors. Moreover, a reconstruction module for recovering correct labels is incorporated into the whole framework as a regularization term, such that the label embedding space is more representative. Our model can be trained in an end-to-end manner. Experimental results on publicly available image data sets corroborate the efficacy of our method compared with the state of the arts.

24 citations


Cites background from "A Robust Distance with Correlated M..."

  • ...In [37], a novel metric learning framework was presented to integrate class-specific distance metrics and explicitly take into account inter-class correlations for multi-label prediction....

    [...]

Journal ArticleDOI
TL;DR: In this article, a compositional distance metric learning approach for multi-label classification is proposed by modeling structural interactions between instance space and label space, which adopts the representation of a weighted sum of rank-1 PSD matrices based on component bases.
Abstract: Multi-label classification aims to assign a set of proper labels for each instance, where distance metric learning can help improve the generalization ability of instance-based multi-label classification models. Existing multi-label metric learning techniques work by utilizing pairwise constraints to enforce that examples with similar label assignments should have close distance in the embedded feature space. In this paper, a novel distance metric learning approach for multi-label classification is proposed by modeling structural interactions between instance space and label space. On one hand, compositional distance metric is employed which adopts the representation of a weighted sum of rank-1 PSD matrices based on component bases. On the other hand, compositional weights are optimized by exploiting triplet similarity constraints derived from both instance and label spaces. Due to the compositional nature of employed distance metric, the resulting problem admits quadratic programming formulation with linear optimization complexity w.r.t. the number of training examples. We also derive the generalization bound for the proposed approach based on algorithmic robustness analysis of the compositional metric. Extensive experiments on sixteen benchmark data sets clearly validate the usefulness of compositional metric in yielding effective distance metric for multi-label classification.

15 citations

Posted Content
TL;DR: Zhang et al. as mentioned in this paper proposed a two-way deep metric learning method for multi-label image classification, where images and labels are embedded via two unique deep neural networks, respectively, and a reconstruction module is incorporated into the whole framework as a regularization term.
Abstract: In this paper, we present a novel deep metric learning method to tackle the multi-label image classification problem. In order to better learn the correlations among images features, as well as labels, we attempt to explore a latent space, where images and labels are embedded via two unique deep neural networks, respectively. To capture the relationships between image features and labels, we aim to learn a \emph{two-way} deep distance metric over the embedding space from two different views, i.e., the distance between one image and its labels is not only smaller than those distances between the image and its labels' nearest neighbors, but also smaller than the distances between the labels and other images corresponding to the labels' nearest neighbors. Moreover, a reconstruction module for recovering correct labels is incorporated into the whole framework as a regularization term, such that the label embedding space is more representative. Our model can be trained in an end-to-end manner. Experimental results on publicly available image datasets corroborate the efficacy of our method compared with the state-of-the-arts.

7 citations

Proceedings ArticleDOI
01 Jan 2017
TL;DR: This paper proposes a dictionary learning based strategy to MIL which first identifies class-specific discriminative codewords, and then projects the bag-level instances into a probabilistic embedding space with respect to the selected codeword.
Abstract: In this paper we deal with the problem of action recognition from unconstrained videos under the notion of multiple instance learning (MIL). The traditional MIL paradigm considers the data items as bags of instances with the constraint that the positive bags contain some class-specific instances whereas the negative bags consist of instances only from negative classes. A classifier is then further constructed using the bag level annotations and a distance metric between the bags. However, such an approach is not robust to outliers and is time consuming for a moderately large dataset. In contrast, we propose a dictionary learning based strategy to MIL which first identifies class-specific discriminative codewords, and then projects the bag-level instances into a probabilistic embedding space with respect to the selected codewords. This essentially generates a fixedlength vector representation of the bags which is specifically dominated by the properties of the class-specific instances. We introduce a novel exhaustive search strategy using a support vector machine classifier in order to highlight the class-specific codewords. The standard multiclass classification pipeline is followed henceforth in the new embedded feature space for the sake of action recognition. We validate the proposed framework on the challenging KTH and Weizmann datasets, and the results obtained are promising and comparable to representative techniques from the literature.

5 citations


Cites background from "A Robust Distance with Correlated M..."

  • ...In order to properly classify the bags, several distance measures including bag to bag, class to bag or bag to class (Verma and Jawahar, 2016) are introduced....

    [...]

References
More filters
Proceedings Article
05 Dec 2005
TL;DR: In this article, a Mahanalobis distance metric for k-NN classification is trained with the goal that the k-nearest neighbors always belong to the same class while examples from different classes are separated by a large margin.
Abstract: We show how to learn a Mahanalobis distance metric for k-nearest neighbor (kNN) classification by semidefinite programming. The metric is trained with the goal that the k-nearest neighbors always belong to the same class while examples from different classes are separated by a large margin. On seven data sets of varying size and difficulty, we find that metrics trained in this way lead to significant improvements in kNN classification—for example, achieving a test error rate of 1.3% on the MNIST handwritten digits. As in support vector machines (SVMs), the learning problem reduces to a convex optimization based on the hinge loss. Unlike learning in SVMs, however, our framework requires no modification or extension for problems in multiway (as opposed to binary) classification.

4,433 citations

Journal ArticleDOI
TL;DR: This paper shows how to learn a Mahalanobis distance metric for kNN classification from labeled examples in a globally integrated manner and finds that metrics trained in this way lead to significant improvements in kNN Classification.
Abstract: The accuracy of k-nearest neighbor (kNN) classification depends significantly on the metric used to compute distances between different examples. In this paper, we show how to learn a Mahalanobis distance metric for kNN classification from labeled examples. The Mahalanobis metric can equivalently be viewed as a global linear transformation of the input space that precedes kNN classification using Euclidean distances. In our approach, the metric is trained with the goal that the k-nearest neighbors always belong to the same class while examples from different classes are separated by a large margin. As in support vector machines (SVMs), the margin criterion leads to a convex optimization based on the hinge loss. Unlike learning in SVMs, however, our approach requires no modification or extension for problems in multiway (as opposed to binary) classification. In our framework, the Mahalanobis distance metric is obtained as the solution to a semidefinite program. On several data sets of varying size and difficulty, we find that metrics trained in this way lead to significant improvements in kNN classification. Sometimes these results can be further improved by clustering the training examples and learning an individual metric within each cluster. We show how to learn and combine these local metrics in a globally integrated manner.

4,157 citations


"A Robust Distance with Correlated M..." refers background or methods in this paper

  • ...We optimize OP1 in the primal form itself using a batch gradient-descent and projection method similar to [18]....

    [...]

  • ..., L}, we follow the approach of metric learning using pair-wise comparisons [18, 4, 5, 13]....

    [...]

  • ...While metric learning for single-instance data (single-label [5, 4, 18] or multi-label [7, 13]) is a well-studied topic, there have been few attempts that perform metric learning for multiinstance data....

    [...]

Journal ArticleDOI
TL;DR: Three kinds of algorithms that learn axis-parallel rectangles to solve the multiple instance problem are described and compared, giving 89% correct predictions on a musk odor prediction task.

2,767 citations


"A Robust Distance with Correlated M..." refers background in this paper

  • ...Whereas, in case of multi-label data where each bag may be labeled with one or more classes, there may be an overlap among different super-bags; i.e., (i) ∑|D| j=1 nj ≤ ∑L l=1ml, and (ii) |Ug ∩ Uh| ≥ 0, ∀ g 6= h. Based on this, now we present the RB2C-distance for MIL....

    [...]

  • ...And third, by the definition of MIL, an object bag is assigned to a class if at least one of its instances belongs to that class....

    [...]

  • ...This demonstrates the effectiveness of this classical MIL method, and also reflects the need of revisiting such methods for developing better methods....

    [...]

  • ...Under the MIL setting, for a given bag Xi, if ∃j ∈ {1, . . . , ni} such that the instance xij belongs to the lth class (1 ≤ l ≤ L), then the whole bag Xi belongs to the lth class and yi(l) = 1; otherwise yi(l) = 0....

    [...]

  • ...Multiple Instance Learning (MIL) [2] is a machine learning paradigm that has lately achieved significant attention [17, 8, 10, 11, 1, 19, 14, 16, 15]....

    [...]

Book ChapterDOI
28 May 2002
TL;DR: This work shows how to cluster words that individually are difficult to predict into clusters that can be predicted well, and cannot predict the distinction between train and locomotive using the current set of features, but can predict the underlying concept.
Abstract: We describe a model of object recognition as machine translation. In this model, recognition is a process of annotating image regions with words. Firstly, images are segmented into regions, which are classified into region types using a variety of features. A mapping between region types and keywords supplied with the images, is then learned, using a method based around EM. This process is analogous with learning a lexicon from an aligned bitext. For the implementation we describe, these words are nouns taken from a large vocabulary. On a large test set, the method can predict numerous words with high accuracy. Simple methods identify words that cannot be predicted well. We show how to cluster words that individually are difficult to predict into clusters that can be predicted well -- for example, we cannot predict the distinction between train and locomotive using the current set of features, but we can predict the underlying concept. The method is trained on a substantial collection of images. Extensive experimental results illustrate the strengths and weaknesses of the approach.

1,765 citations


"A Robust Distance with Correlated M..." refers methods in this paper

  • ...We use two popular multi-instance multi-label datasets Corel-5K [3] and IAPR TC-12 [6]....

    [...]

  • ...Here, µ is set to be the (rounded) average number of classes per bag in training data (µ = 2 for Corel-5K dataset, and µ = 3 for IAPR TC-12 dataset)....

    [...]

  • ...Corel-5K dataset consists of 4500 training images, 500 testing images, and a vocabulary of 260 classes....

    [...]

  • ...To validate our approach, we extensively experiment on two popular multi-label datasets: Corel-5K [3] and IAPR TC-12 [6]....

    [...]

  • ...For RB2C and its variants, we keep K1 = 5 and K2 = 8 for Corel-5K dataset, and K1 = 10 and K2 = 4 for IAPR TC-12 dataset....

    [...]

Proceedings ArticleDOI
21 Apr 2008
TL;DR: This paper analyzes a representative snapshot of Flickr and presents and evaluates tag recommendation strategies to support the user in the photo annotation task by recommending a set of tags that can be added to the photo.
Abstract: Online photo services such as Flickr and Zooomr allow users to share their photos with family, friends, and the online community at large. An important facet of these services is that users manually annotate their photos using so called tags, which describe the contents of the photo or provide additional contextual and semantical information. In this paper we investigate how we can assist users in the tagging phase. The contribution of our research is twofold. We analyse a representative snapshot of Flickr and present the results by means of a tag characterisation focussing on how users tags photos and what information is contained in the tagging. Based on this analysis, we present and evaluate tag recommendation strategies to support the user in the photo annotation task by recommending a set of tags that can be added to the photo. The results of the empirical evaluation show that we can effectively recommend relevant tags for a variety of photos with different levels of exhaustiveness of original tagging.

1,048 citations


"A Robust Distance with Correlated M..." refers background in this paper

  • ...We define the correlation between k and l class as [12]:...

    [...]