scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Smart Mining for Deep Metric Learning

01 Oct 2017-pp 2840-2848
TL;DR: This paper proposes a novel deep metric learning method that combines the triplet model and the global structure of the embedding space and relies on a smart mining procedure that produces effective training samples for a low computational cost.
Abstract: To solve deep metric learning problems and producing feature embeddings, current methodologies will commonly use a triplet model to minimise the relative distance between samples from the same class and maximise the relative distance between samples from different classes. Though successful, the training convergence of this triplet model can be compromised by the fact that the vast majority of the training samples will produce gradients with magnitudes that are close to zero. This issue has motivated the development of methods that explore the global structure of the embedding and other methods that explore hard negative/positive mining. The effectiveness of such mining methods is often associated with intractable computational requirements. In this paper, we propose a novel deep metric learning method that combines the triplet model and the global structure of the embedding space. We rely on a smart mining procedure that produces effective training samples for a low computational cost. In addition, we propose an adaptive controller that automatically adjusts the smart mining hyper-parameters and speeds up the convergence of the training process. We show empirically that our proposed method allows for fast and more accurate training of triplet ConvNets than other competing mining methods. Additionally, we show that our method achieves new state-of-the-art embedding results for CUB-200-2011 and Cars196 datasets.
Citations
More filters
Posted Content
TL;DR: This work introduces Bootstrap Your Own Latent (BYOL), a new approach to self-supervised image representation learning that performs on par or better than the current state of the art on both transfer and semi- supervised benchmarks.
Abstract: We introduce Bootstrap Your Own Latent (BYOL), a new approach to self-supervised image representation learning. BYOL relies on two neural networks, referred to as online and target networks, that interact and learn from each other. From an augmented view of an image, we train the online network to predict the target network representation of the same image under a different augmented view. At the same time, we update the target network with a slow-moving average of the online network. While state-of-the art methods rely on negative pairs, BYOL achieves a new state of the art without them. BYOL reaches $74.3\%$ top-1 classification accuracy on ImageNet using a linear evaluation with a ResNet-50 architecture and $79.6\%$ with a larger ResNet. We show that BYOL performs on par or better than the current state of the art on both transfer and semi-supervised benchmarks. Our implementation and pretrained models are given on GitHub.

2,942 citations


Cites methods from "Smart Mining for Deep Metric Learni..."

  • ...These methods need careful treatment of negative pairs [13] by either relying on large batch sizes [8, 12], memory banks [9] or customized mining strategies [14, 15] to retrieve the negative pairs....

    [...]

Proceedings ArticleDOI
15 Jun 2019
TL;DR: In this paper, the authors proposed a meta-transfer learning approach to adapt a base-learner to a new task for which only a few labeled samples are available, which learns scaling and shifting functions of DNN weights for each task.
Abstract: Meta-learning has been proposed as a framework to address the challenging few-shot learning setting. The key idea is to leverage a large number of similar few-shot tasks in order to learn how to adapt a base-learner to a new task for which only a few labeled samples are available. As deep neural networks (DNNs) tend to overfit using a few samples only, meta-learning typically uses shallow neural networks (SNNs), thus limiting its effectiveness. In this paper we propose a novel few-shot learning method called meta-transfer learning (MTL) which learns to adapt a deep NN for few shot learning tasks. Specifically, "meta" refers to training multiple tasks, and "transfer" is achieved by learning scaling and shifting functions of DNN weights for each task. In addition, we introduce the hard task (HT) meta-batch scheme as an effective learning curriculum for MTL. We conduct experiments using (5-class, 1-shot) and (5-class, 5-shot) recognition tasks on two challenging few-shot learning benchmarks: miniImageNet and Fewshot-CIFAR100. Extensive comparisons to related works validate that our meta-transfer learning approach trained with the proposed HT meta-batch scheme achieves top performance. An ablation study also shows that both components contribute to fast convergence and high accuracy.

708 citations

Proceedings ArticleDOI
15 Jun 2019
TL;DR: In this article, a general pair weighting (GPW) framework has been proposed, which casts the sampling problem of deep metric learning into a unified view through gradient analysis, providing a powerful tool for understanding recent pair-based loss functions.
Abstract: A family of loss functions built on pair-based computation have been proposed in the literature which provide a myriad of solutions for deep metric learning. In this pa-per, we provide a general weighting framework for under-standing recent pair-based loss functions. Our contributions are three-fold: (1) we establish a General Pair Weighting (GPW) framework, which casts the sampling problem of deep metric learning into a unified view of pair weighting through gradient analysis, providing a powerful tool for understanding recent pair-based loss functions; (2) we show that with GPW, various existing pair-based methods can be compared and discussed comprehensively, with clear differences and key limitations identified; (3) we propose a new loss called multi-similarity loss (MS loss) under the GPW,which is implemented in two iterative steps (i.e., mining and weighting). This allows it to fully consider three similarities for pair weighting, providing a more principled approach for collecting and weighting informative pairs. Finally, the proposed MS loss obtains new state-of-the-art performance on four image retrieval benchmarks, where it outperforms the most recent approaches, such as ABE[14] and HTL[4], by a large margin, e.g.,60.6%→65.7%on CUB200,and 80.9%→88.0%on In-Shop Clothes Retrieval datasetat Recall@1.

549 citations

Journal ArticleDOI
21 Aug 2019-Symmetry
TL;DR: This article is considered to be important, as it is the first comprehensive study in which sampling strategy, appropriate distance metric, and the structure of the network are systematically analyzed and evaluated as a whole and supported by comparing the quantitative results of the methods.
Abstract: Metric learning aims to measure the similarity among samples while using an optimal distance metric for learning tasks. Metric learning methods, which generally use a linear projection, are limited in solving real-world problems demonstrating non-linear characteristics. Kernel approaches are utilized in metric learning to address this problem. In recent years, deep metric learning, which provides a better solution for nonlinear data through activation functions, has attracted researchers’ attention in many different areas. This article aims to reveal the importance of deep metric learning and the problems dealt with in this field in the light of recent studies. As far as the research conducted in this field are concerned, most existing studies that are inspired by Siamese and Triplet networks are commonly used to correlate among samples while using shared weights in deep metric learning. The success of these networks is based on their capacity to understand the similarity relationship among samples. Moreover, sampling strategy, appropriate distance metric, and the structure of the network are the challenging factors for researchers to improve the performance of the network model. This article is considered to be important, as it is the first comprehensive study in which these factors are systematically analyzed and evaluated as a whole and supported by comparing the quantitative results of the methods.

350 citations

Proceedings ArticleDOI
15 Jun 2019
TL;DR: A novel instance based softmax embedding method, which directly optimizes the `real' instance features on top of the softmax function, which achieves significantly faster learning speed and higher accuracy than all existing methods.
Abstract: This paper studies the unsupervised embedding learning problem, which requires an effective similarity measurement between samples in low-dimensional embedding space. Motivated by the positive concentrated and negative separated properties observed from category-wise supervised learning, we propose to utilize the instance-wise supervision to approximate these properties, which aims at learning data augmentation invariant and instance spread-out features. To achieve this goal, we propose a novel instance based softmax embedding method, which directly optimizes the `real' instance features on top of the softmax function. It achieves significantly faster learning speed and higher accuracy than all existing methods. The proposed method performs well for both seen and unseen testing categories with cosine similarity. It also achieves competitive performance even without pre-trained network over samples from fine-grained categories.

341 citations


Cites background or methods from "Smart Mining for Deep Metric Learni..."

  • ...Following existing works on supervised deep embedding learning [13, 32], the retrieval performance and clustering quality of the testing set are evaluated....

    [...]

  • ...Most of them are designed on top of pairwise [12, 30] or triplet relationships [13, 29]....

    [...]

  • ...In particular, several sampling strategies are widely investigated to improve the performance, such as hard mining [16], semihard mining [35], smart mining [13] and so on....

    [...]

References
More filters
Proceedings ArticleDOI
07 Jun 2015
TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
Abstract: We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. By a carefully crafted design, we increased the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC14 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.

40,257 citations

Proceedings Article
21 Jun 2010
TL;DR: Restricted Boltzmann machines were developed using binary stochastic hidden units that learn features that are better for object recognition on the NORB dataset and face verification on the Labeled Faces in the Wild dataset.
Abstract: Restricted Boltzmann machines were developed using binary stochastic hidden units. These can be generalized by replacing each binary unit by an infinite number of copies that all have the same weights but have progressively more negative biases. The learning and inference rules for these "Stepped Sigmoid Units" are unchanged. They can be approximated efficiently by noisy, rectified linear units. Compared with binary units, these units learn features that are better for object recognition on the NORB dataset and face verification on the Labeled Faces in the Wild dataset. Unlike binary units, rectified linear units preserve information about relative intensities as information travels through multiple layers of feature detectors.

14,799 citations


"Smart Mining for Deep Metric Learni..." refers background in this paper

  • ...Also in (1), note that fl = [fl,1, ..., fl,nl ] represents an array of nl pre-activation functions....

    [...]

Proceedings ArticleDOI
07 Jun 2015
TL;DR: A system that directly learns a mapping from face images to a compact Euclidean space where distances directly correspond to a measure offace similarity, and achieves state-of-the-art face recognition performance using only 128-bytes perface.
Abstract: Despite significant recent advances in the field of face recognition [10, 14, 15, 17], implementing face verification and recognition efficiently at scale presents serious challenges to current approaches. In this paper we present a system, called FaceNet, that directly learns a mapping from face images to a compact Euclidean space where distances directly correspond to a measure of face similarity. Once this space has been produced, tasks such as face recognition, verification and clustering can be easily implemented using standard techniques with FaceNet embeddings as feature vectors.

8,289 citations


"Smart Mining for Deep Metric Learni..." refers background or methods in this paper

  • ...2, semi-hard mining has proved an effective method for training triplet networks [14] with the primary aim of finding sets of triplets that will continue to progress the training of the network....

    [...]

  • ...(5) In order to avoid the costly argmax over the entire training set, semi-hard mining is instead commonly performed over the stochastic subset of samples used in each minibatch [16, 14]....

    [...]

  • ...The development of deep metric learning models for the estimation of effective feature embedding [2, 4, 9, 11, 15, 16, 17, 13, 22, 25, 27, 26] is at the core of many recently proposed computer vision methods [3, 14, 19, 24, 28]....

    [...]

  • ...This issue has lead to the implementation of importance sampling techniques [14, 16, 24] that stochastically under-samples the set of triplets....

    [...]

  • ...Similar to [14, 9], we set the margin for the triplet and global loss to 0....

    [...]

01 Jul 2011
TL;DR: CUB-200-2011 as mentioned in this paper is an extended version of CUB200, which roughly doubles the number of images per category and adds new part localization annotations, annotated with bounding boxes, part locations, and at-ribute labels.
Abstract: CUB-200-2011 is an extended version of CUB-200 [7], a challenging dataset of 200 bird species. The extended version roughly doubles the number of images per category and adds new part localization annotations. All images are annotated with bounding boxes, part locations, and at- tribute labels. Images and annotations were filtered by mul- tiple users of Mechanical Turk. We introduce benchmarks and baseline experiments for multi-class categorization and part localization.

3,769 citations


"Smart Mining for Deep Metric Learni..." refers background or methods in this paper

  • ...ll@K metric [15]. Tables1and2show the NMI and k nearest neighbour performance with the Recall@K metric results defined above comparing our method to the state of the art for the datasets CUB- 200-2011 [25] and Cars196 [9]. From these tables, we can first see that Triplet + FANNG significantly improves upon the Semi-hard [16] results with respect to all measures, and showing that the smart mining process ...

    [...]

  • ...final model Triplet + FANNG + Global + Adaptive shows competitive results with respect to all measures as well as a much faster convergence rate (see Fig.4). For instance, for the CUB-200-2011 dataset [25], Triplet + FANNG + Global + Adaptive converges in just Table 1. Clustering and recall performance on CUB-200- 2011 [25]. Our proposals are highlighted. Method NMI R@1 R@2 R@4 R@8 Semi-hard [16] 55.38...

    [...]

  • ...dings in far Figure 3. A comparison of training performance using hand tuned and adaptive selection of . Training and validation error is shown for the first 20 epochs while training on CUB- 200-2011 [25]. fewer epochs. To maintain a high training error, it is best to use batches that are 50% to 100% mined triplets. A comparison of hand tuned and adaptive parameter selection can be seen in figure3. Tra...

    [...]

  • ...r epochs will greatly reduce the overall training time. 4. Experiments For the experiments, we follow the protocol used in previous papers [20,21,15], which uses unseen classes from the CUB- 200-2011 [25] and Cars196 [9] datasets in order to assess the clustering quality and k nearest neighbour retrieval [8]. Our proposed method combining triplet and global losses using FANNG [5] with and without auto...

    [...]

  • ...’s paper [15]. For the remaining approaches (i.e. our proposed method, and (5) ), we use the same training and test set split described in [21] across all datasets. Specifically, the means CUB200-2011 [25] has 11;788 images of 200 bird species, from which we take the first 100 species for training and use the remaining 100 species for testing. Cars196 [9] has 16;185 images from 196 car models, from whic...

    [...]

Proceedings Article
Jane Bromley1, Isabelle Guyon1, Yann LeCun1, E. Sackinger1, Roopak Shah1 
29 Nov 1993
TL;DR: An algorithm for verification of signatures written on a pen-input tablet based on a novel, artificial neural network called a "Siamese" neural network, which consists of two identical sub-networks joined at their outputs.
Abstract: This paper describes an algorithm for verification of signatures written on a pen-input tablet. The algorithm is based on a novel, artificial neural network, called a "Siamese" neural network. This network consists of two identical sub-networks joined at their outputs. During training the two sub-networks extract features from two signatures, while the joining neuron measures the distance between the two feature vectors. Verification consists of comparing an extracted feature vector with a stored feature vector for the signer. Signatures closer to this stored representation than a chosen threshold are accepted, all other signatures are rejected as forgeries.

2,980 citations


"Smart Mining for Deep Metric Learni..." refers background in this paper

  • ...Arguably, the most explored deep learning model that can estimate feature embedding is based on triplet networks [6, 24], which are an extension of the siamese network [1]....

    [...]