Showing papers on "Metric (mathematics) published in 2019"

PDF

Open Access

Proceedings Article•DOI•

Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression

[...]

Hamid Rezatofighi¹, Nathan Tsoi², JunYoung Gwak², Amir Sadeghian², Ian Reid¹, Silvio Savarese² - Show less +2 more•Institutions (2)

University of Adelaide¹, Stanford University²

01 Jun 2019

TL;DR: In this paper, a generalized IoU (GIoU) metric is proposed for non-overlapping bounding boxes, which can be directly used as a regression loss.

...read moreread less

Abstract: Intersection over Union (IoU) is the most popular evaluation metric used in the object detection benchmarks. However, there is a gap between optimizing the commonly used distance losses for regressing the parameters of a bounding box and maximizing this metric value. The optimal objective for a metric is the metric itself. In the case of axis-aligned 2D bounding boxes, it can be shown that IoU can be directly used as a regression loss. However, IoU has a plateau making it infeasible to optimize in the case of non-overlapping bounding boxes. In this paper, we address the this weakness by introducing a generalized version of IoU as both a new loss and a new metric. By incorporating this generalized IoU ( GIoU) as a loss into the state-of-the art object detection frameworks, we show a consistent improvement on their performance using both the standard, IoU based, and new, GIoU based, performance measures on popular object detection benchmarks such as PASCAL VOC and MS COCO.

...read moreread less

1,527 citations

Posted Content•

Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression

[...]

Hamid Rezatofighi¹, Nathan Tsoi¹, JunYoung Gwak¹, Amir Sadeghian¹, Ian Reid², Silvio Savarese¹ - Show less +2 more•Institutions (2)

Stanford University¹, University of Adelaide²

25 Feb 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper introduces a generalized version of IoU ( GIoU) as a loss into the state-of-the art object detection frameworks, and shows a consistent improvement on their performance using both the standard, IoU based, and new, GIo U based, performance measures on popular object detection benchmarks.

...read moreread less

Abstract: Intersection over Union (IoU) is the most popular evaluation metric used in the object detection benchmarks. However, there is a gap between optimizing the commonly used distance losses for regressing the parameters of a bounding box and maximizing this metric value. The optimal objective for a metric is the metric itself. In the case of axis-aligned 2D bounding boxes, it can be shown that $IoU$ can be directly used as a regression loss. However, $IoU$ has a plateau making it infeasible to optimize in the case of non-overlapping bounding boxes. In this paper, we address the weaknesses of $IoU$ by introducing a generalized version as both a new loss and a new metric. By incorporating this generalized $IoU$ ($GIoU$) as a loss into the state-of-the art object detection frameworks, we show a consistent improvement on their performance using both the standard, $IoU$ based, and new, $GIoU$ based, performance measures on popular object detection benchmarks such as PASCAL VOC and MS COCO.

...read moreread less

1,251 citations

Proceedings Article•DOI•

[...]

Xun Wang, Xintong Han, Weilin Huang, Dong Dengke, Matthew R. Scott - Show less +1 more

15 Jun 2019

TL;DR: In this article, a general pair weighting (GPW) framework has been proposed, which casts the sampling problem of deep metric learning into a unified view through gradient analysis, providing a powerful tool for understanding recent pair-based loss functions.

...read moreread less

Abstract: A family of loss functions built on pair-based computation have been proposed in the literature which provide a myriad of solutions for deep metric learning. In this pa-per, we provide a general weighting framework for under-standing recent pair-based loss functions. Our contributions are three-fold: (1) we establish a General Pair Weighting (GPW) framework, which casts the sampling problem of deep metric learning into a unified view of pair weighting through gradient analysis, providing a powerful tool for understanding recent pair-based loss functions; (2) we show that with GPW, various existing pair-based methods can be compared and discussed comprehensively, with clear differences and key limitations identified; (3) we propose a new loss called multi-similarity loss (MS loss) under the GPW,which is implemented in two iterative steps (i.e., mining and weighting). This allows it to fully consider three similarities for pair weighting, providing a more principled approach for collecting and weighting informative pairs. Finally, the proposed MS loss obtains new state-of-the-art performance on four image retrieval benchmarks, where it outperforms the most recent approaches, such as ABE[14] and HTL[4], by a large margin, e.g.,60.6%→65.7%on CUB200,and 80.9%→88.0%on In-Shop Clothes Retrieval datasetat Recall@1.

...read moreread less

549 citations

Journal Article•DOI•

What Do Different Evaluation Metrics Tell Us About Saliency Models

[...]

Zoya Bylinskii¹, Tilke Judd², Aude Oliva¹, Antonio Torralba¹, Frédo Durand¹ - Show less +1 more•Institutions (2)

Massachusetts Institute of Technology¹, Google²

01 Mar 2019-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper provides an analysis of 8 different evaluation metrics and their properties, and makes recommendations for metric selections under specific assumptions and for specific applications.

...read moreread less

Abstract: How best to evaluate a saliency model's ability to predict where humans look in images is an open research question. The choice of evaluation metric depends on how saliency is defined and how the ground truth is represented. Metrics differ in how they rank saliency models, and this results from how false positives and false negatives are treated, whether viewing biases are accounted for, whether spatial deviations are factored in, and how the saliency maps are pre-processed. In this paper, we provide an analysis of 8 different evaluation metrics and their properties. With the help of systematic experiments and visualizations of metric computations, we add interpretability to saliency scores and more transparency to the evaluation of saliency models. Building off the differences in metric properties and behaviors, we make recommendations for metric selections under specific assumptions and for specific applications.

...read moreread less

526 citations

Proceedings Article•DOI•

MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance

[...]

Wei Zhao¹, Maxime Peyrard², Fei Liu³, Yang Gao⁴, Christian M. Meyer¹, Steffen Eger¹ - Show less +2 more•Institutions (4)

Technische Universität Darmstadt¹, École Polytechnique Fédérale de Lausanne², University of Central Florida³, Stanford University⁴

14 Aug 2019

TL;DR: This paper investigates strategies to encode system and reference texts to devise a metric that shows a high correlation with human judgment of text quality and validate the new metric, namely MoverScore, on a number of text generation tasks.

...read moreread less

Abstract: A robust evaluation metric has a profound impact on the development of text generation systems. A desirable metric compares system output against references based on their semantics rather than surface forms. In this paper we investigate strategies to encode system and reference texts to devise a metric that shows a high correlation with human judgment of text quality. We validate our new metric, namely MoverScore, on a number of text generation tasks including summarization, machine translation, image captioning, and data-to-text generation, where the outputs are produced by a variety of neural and non-neural systems. Our findings suggest that metrics combining contextualized representations with a distance measure perform the best. Such metrics also demonstrate strong generalization capability across tasks. For ease-of-use we make our metrics available as web service.

...read moreread less

387 citations

Posted Content•

DRIT++: Diverse Image-to-Image Translation via Disentangled Representations

[...]

Hsin-Ying Lee, Hung-Yu Tseng, Qi Mao, Jia-Bin Huang, Yu-Ding Lu, Maneesh Singh, Ming-Hsuan Yang - Show less +3 more

02 May 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, a disentangled representation is proposed for image-to-image translation without paired training images, which takes the encoded content features extracted from a given input and the attribute vectors sampled from the attribute space to produce diverse outputs at test time.

...read moreread less

Abstract: Image-to-image translation aims to learn the mapping between two visual domains. There are two main challenges for many applications: 1) the lack of aligned training pairs and 2) multiple possible outputs from a single input image. In this work, we present an approach based on disentangled representation for producing diverse outputs without paired training images. To achieve diversity, we propose to embed images onto two spaces: a domain-invariant content space capturing shared information across domains and a domain-specific attribute space. Our model takes the encoded content features extracted from a given input and the attribute vectors sampled from the attribute space to produce diverse outputs at test time. To handle unpaired training data, we introduce a novel cross-cycle consistency loss based on disentangled representations. Qualitative results show that our model can generate diverse and realistic images on a wide range of tasks without paired training data. For quantitative comparisons, we measure realism with user study and diversity with a perceptual distance metric. We apply the proposed model to domain adaptation and show competitive performance when compared to the state-of-the-art on the MNIST-M and the LineMod datasets.

...read moreread less

383 citations

Journal Article•DOI•

Deep Metric Learning: A Survey

[...]

Mahmut Kaya, Hasan Sakir Bilge

21 Aug 2019-Symmetry

TL;DR: This article is considered to be important, as it is the first comprehensive study in which sampling strategy, appropriate distance metric, and the structure of the network are systematically analyzed and evaluated as a whole and supported by comparing the quantitative results of the methods.

...read moreread less

Abstract: Metric learning aims to measure the similarity among samples while using an optimal distance metric for learning tasks. Metric learning methods, which generally use a linear projection, are limited in solving real-world problems demonstrating non-linear characteristics. Kernel approaches are utilized in metric learning to address this problem. In recent years, deep metric learning, which provides a better solution for nonlinear data through activation functions, has attracted researchers’ attention in many different areas. This article aims to reveal the importance of deep metric learning and the problems dealt with in this field in the light of recent studies. As far as the research conducted in this field are concerned, most existing studies that are inspired by Siamese and Triplet networks are commonly used to correlate among samples while using shared weights in deep metric learning. The success of these networks is based on their capacity to understand the similarity relationship among samples. Moreover, sampling strategy, appropriate distance metric, and the structure of the network are the challenging factors for researchers to improve the performance of the network model. This article is considered to be important, as it is the first comprehensive study in which these factors are systematically analyzed and evaluated as a whole and supported by comparing the quantitative results of the methods.

...read moreread less

350 citations

Proceedings Article•DOI•

Disentangling Monocular 3D Object Detection

[...]

Andrea Simonelli, Samuel Rota Bulò, Lorenzo Porzi, Manuel Lopez-Antequera, Peter Kontschieder - Show less +1 more

01 Oct 2019

TL;DR: In this paper, a disentangling transformation for 2D and 3D detection losses and a self-supervised confidence score for 3D bounding boxes is proposed for monocular 3D object detection.

...read moreread less

Abstract: In this paper we propose an approach for monocular 3D object detection from a single RGB image, which leverages a novel disentangling transformation for 2D and 3D detection losses and a novel, self-supervised confidence score for 3D bounding boxes. Our proposed loss disentanglement has the twofold advantage of simplifying the training dynamics in the presence of losses with complex interactions of parameters, and sidestepping the issue of balancing independent regression terms. Our solution overcomes these issues by isolating the contribution made by groups of parameters to a given loss, without changing its nature. We further apply loss disentanglement to another novel, signed Intersection-over-Union criterion-driven loss for improving 2D detection results. Besides our methodological innovations, we critically review the AP metric used in KITTI3D, which emerged as the most important dataset for comparing 3D detection results. We identify and resolve a flaw in the 11-point interpolated AP metric, affecting all previously published detection results and particularly biases the results of monocular 3D detection. We provide extensive experimental evaluations and ablation studies and set a new state-of-the-art on the KITTI3D Car class.

...read moreread less

329 citations

Proceedings Article•DOI•

Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation

[...]

He Wang¹, Srinath Sridhar¹, Jingwei Huang¹, Julien Valentin², Shuran Song³, Leonidas J. Guibas¹ - Show less +2 more•Institutions (3)

Stanford University¹, Google², Princeton University³

15 Jun 2019

TL;DR: The proposed method is able to robustly estimate the pose and size of unseen object instances in real environments while also achieving state-of-the-art performance on standard 6D pose estimation benchmarks.

...read moreread less

Abstract: The goal of this paper is to estimate the 6D pose and dimensions of unseen object instances in an RGB-D image. Contrary to ``instance-level'' 6D pose estimation tasks, our problem assumes that no exact object CAD models are available during either training or testing time. To handle different and unseen object instances in a given category, we introduce a \textbf{Normalized Object Coordinate Space (NOCS)}---a shared canonical representation for all possible object instances within a category. Our region-based neural network is then trained to directly infer the correspondence from observed pixels to this shared object representation (NOCS) along with other object information such as class label and instance mask. These predictions can be combined with the depth map to jointly estimate the metric 6D pose and dimensions of multiple objects in a cluttered scene. To train our network, we present a new context-aware technique to generate large amounts of fully annotated mixed reality data. To further improve our model and evaluate its performance on real data, we also provide a fully annotated real-world dataset with large environment and instance variation. Extensive experiments demonstrate that the proposed method is able to robustly estimate the pose and size of unseen object instances in real environments while also achieving state-of-the-art performance on standard 6D pose estimation benchmarks.

...read moreread less

320 citations

Proceedings Article•DOI•

RepMet: Representative-Based Metric Learning for Classification and Few-Shot Object Detection

[...]

Leonid Karlinsky¹, Joseph Shtok¹, Sivan Harary¹, Eli Schwartz², Amit Aides¹, Rogerio Feris¹, Raja Giryes², Alexander M. Bronstein³ - Show less +4 more•Institutions (3)

IBM¹, Tel Aviv University², Technion – Israel Institute of Technology³

07 Jan 2019

TL;DR: In this article, distance metric learning (DML) is applied to object classification, both in the standard regime of rich training data and in the few-shot scenario, where each category is represented by only a few examples.

...read moreread less

Abstract: Distance metric learning (DML) has been successfully applied to object classification, both in the standard regime of rich training data and in the few-shot scenario, where each category is represented by only a few examples. In this work, we propose a new method for DML that simultaneously learns the backbone network parameters, the embedding space, and the multi-modal distribution of each of the training categories in that space, in a single end-to-end training process. Our approach outperforms state-of-the-art methods for DML-based object classification on a variety of standard fine-grained datasets. Furthermore, we demonstrate the effectiveness of our approach on the problem of few-shot object detection, by incorporating the proposed DML architecture as a classification head into a standard object detection model. We achieve the best results on the ImageNet-LOC dataset compared to strong baselines, when only a few training examples are available. We also offer the community a new episodic benchmark based on the ImageNet dataset for the few-shot object detection task.

...read moreread less

311 citations

Proceedings Article•

Improved Precision and Recall Metric for Assessing Generative Models

[...]

Tuomas Kynkäänniemi, Tero Karras¹, Samuli Laine¹, Jaakko Lehtinen², Timo Aila¹ - Show less +1 more•Institutions (2)

Nvidia¹, Helsinki Institute for Information Technology²

15 Apr 2019

TL;DR: This work presents an evaluation metric that can separately and reliably measure both the quality and coverage of the samples produced by a generative model and the perceptual quality of individual samples, and extends it to study latent space interpolations.

...read moreread less

Abstract: The ability to automatically estimate the quality and coverage of the samples produced by a generative model is a vital requirement for driving algorithm research. We present an evaluation metric that can separately and reliably measure both of these aspects in image generation tasks by forming explicit, non-parametric representations of the manifolds of real and generated data. We demonstrate the effectiveness of our metric in StyleGAN and BigGAN by providing several illustrative examples where existing metrics yield uninformative or contradictory results. Furthermore, we analyze multiple design variants of StyleGAN to better understand the relationships between the model architecture, training methods, and the properties of the resulting sample distribution. In the process, we identify new variants that improve the state-of-the-art. We also perform the first principled analysis of truncation methods and identify an improved method. Finally, we extend our metric to estimate the perceptual quality of individual samples, and use this to study latent space interpolations.

...read moreread less

Journal Article•DOI•

Ordinal Regression Models in Psychology: A Tutorial

[...]

Paul-Christian Bürkner¹, Matti Vuorre²•Institutions (2)

University of Münster¹, Columbia University²

25 Feb 2019

TL;DR: In psychology, ordinal variables, although extremely common in psychology, are almost exclusively analyzed with statistical models that falsely assume them to be metric as discussed by the authors, which can lead to distorted effect.

...read moreread less

Abstract: Ordinal variables, although extremely common in psychology, are almost exclusively analyzed with statistical models that falsely assume them to be metric. This practice can lead to distorted effect...

...read moreread less

Proceedings Article•DOI•

ROI-10D: Monocular Lifting of 2D Detection to 6D Pose and Metric Shape

[...]

Fabian Manhardt¹, Wadim Kehl², Adrien Gaidon²•Institutions (2)

Technische Universität München¹, Toyota²

15 Jun 2019

TL;DR: This work proposes a novel loss formulation by lifting 2D detection, orientation, and scale estimation into 3D space and demonstrates that this approach doubles the AP on the 3D pose metrics on the official test set, defining the new state of the art.

...read moreread less

Abstract: We present a deep learning method for end-to-end monocular 3D object detection and metric shape retrieval. We propose a novel loss formulation by lifting 2D detection, orientation, and scale estimation into 3D space. Instead of optimizing these quantities separately, the 3D instantiation allows to properly measure the metric misalignment of boxes. We experimentally show that our 10D lifting of sparse 2D Regions of Interests (RoIs) achieves great results both for 6D pose and recovery of the textured metric geometry of instances. This further enables 3D synthetic data augmentation via inpainting recovered meshes directly onto the 2D scenes. We evaluate on KITTI3D against other strong monocular methods and demonstrate that our approach doubles the AP on the 3D pose metrics on the official test set, defining the new state of the art.

...read moreread less

Proceedings Article•DOI•

Generative Dual Adversarial Network for Generalized Zero-Shot Learning

[...]

He Huang¹, Changhu Wang, Philip S. Yu¹, Chang-Dong Wang²•Institutions (2)

University of Illinois at Chicago¹, Sun Yat-sen University²

15 Jun 2019

TL;DR: A novel model is proposed that provides a unified framework for three different approaches: visual->semantic mapping, semantic->visual mapping, and metric learning that preserves higher accuracy and performs better than existing state-of-the-art models in in classifying images from unseen classes.

...read moreread less

Abstract: This paper studies the problem of generalized zero-shot learning which requires the model to train on image-label pairs from some seen classes and test on the task of classifying new images from both seen and unseen classes. In this paper, we propose a novel model that provides a unified framework for three different approaches: visual->semantic mapping, semantic->visual mapping, and metric learning. Specifically, our proposed model consists of a feature generator that can generate various visual features given class embeddings as input, a regressor that maps each visual feature back to its corresponding class embedding, and a discriminator that learns to evaluate the closeness of an image feature and a class embedding. All three components are trained under the combination of cyclic consistency loss and dual adversarial loss. Experimental results show that our model not only preserves higher accuracy in classifying images from seen classes, but also performs better than existing state-of-the-art models in in classifying images from unseen classes.

...read moreread less

Posted Content•

Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation

[...]

He Wang¹, Srinath Sridhar¹, Jingwei Huang¹, Julien Valentin², Shuran Song³, Leonidas J. Guibas¹ - Show less +2 more•Institutions (3)

Stanford University¹, Google², Princeton University³

09 Jan 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, a region-based neural network is trained to directly infer the correspondence from observed pixels to the shared object representation (NOCS) along with other object information such as class label and instance mask.

...read moreread less

Abstract: The goal of this paper is to estimate the 6D pose and dimensions of unseen object instances in an RGB-D image. Contrary to "instance-level" 6D pose estimation tasks, our problem assumes that no exact object CAD models are available during either training or testing time. To handle different and unseen object instances in a given category, we introduce a Normalized Object Coordinate Space (NOCS)---a shared canonical representation for all possible object instances within a category. Our region-based neural network is then trained to directly infer the correspondence from observed pixels to this shared object representation (NOCS) along with other object information such as class label and instance mask. These predictions can be combined with the depth map to jointly estimate the metric 6D pose and dimensions of multiple objects in a cluttered scene. To train our network, we present a new context-aware technique to generate large amounts of fully annotated mixed reality data. To further improve our model and evaluate its performance on real data, we also provide a fully annotated real-world dataset with large environment and instance variation. Extensive experiments demonstrate that the proposed method is able to robustly estimate the pose and size of unseen object instances in real environments while also achieving state-of-the-art performance on standard 6D pose estimation benchmarks.

...read moreread less

Proceedings Article•DOI•

Ranked List Loss for Deep Metric Learning

[...]

Xinshao Wang¹, Yang Hua¹, Elyor Kodirov, Guosheng Hu, Romain Garnier, Neil Robertson¹ - Show less +2 more•Institutions (1)

Queen's University Belfast¹

25 Feb 2019

TL;DR: This work presents two limitations of existing ranking-motivated structured losses and proposes a novel ranked list loss to solve both of them and proposes to learn a hypersphere for each class in order to preserve the similarity structure inside it.

...read moreread less

Abstract: The objective of deep metric learning (DML) is to learn embeddings that can capture semantic similarity information among data points. Existing pairwise or tripletwise loss functions used in DML are known to suffer from slow convergence due to a large proportion of trivial pairs or triplets as the model improves. To improve this, rankingmotivated structured losses are proposed recently to incorporate multiple examples and exploit the structured information among them. They converge faster and achieve state-of-the-art performance. In this work, we present two limitations of existing ranking-motivated structured losses and propose a novel ranked list loss to solve both of them. First, given a query, only a fraction of data points is incorporated to build the similarity structure. Consequently, some useful examples are ignored and the structure is less informative. To address this, we propose to build a setbased similarity structure by exploiting all instances in the gallery. The samples are split into a positive set and a negative set. Our objective is to make the query closer to the positive set than to the negative set by a margin. Second, previous methods aim to pull positive pairs as close as possible in the embedding space. As a result, the intraclass data distribution might be dropped. In contrast, we propose to learn a hypersphere for each class in order to preserve the similarity structure inside it. Our extensive experiments show that the proposed method achieves state-of-the-art performance on three widely used benchmarks.

...read moreread less

Proceedings Article•DOI•

SoftTriple Loss: Deep Metric Learning Without Triplet Sampling

[...]

Qi Qian¹, Lei Shang¹, Baigui Sun¹, Juhua Hu², Tacoma Tacoma, Hao Li¹, Rong Jin¹ - Show less +3 more•Institutions (2)

Alibaba Group¹, University of Washington²

11 Sep 2019

TL;DR: The SoftTriple loss is proposed to extend the SoftMax loss with multiple centers for each class, equivalent to a smoothed triplet loss where each class has a single center.

...read moreread less

Abstract: Distance metric learning (DML) is to learn the embeddings where examples from the same class are closer than examples from different classes. It can be cast as an optimization problem with triplet constraints. Due to the vast number of triplet constraints, a sampling strategy is essential for DML. With the tremendous success of deep learning in classifications, it has been applied for DML. When learning embeddings with deep neural networks (DNNs), only a mini-batch of data is available at each iteration. The set of triplet constraints has to be sampled within the mini-batch. Since a mini-batch cannot capture the neighbors in the original set well, it makes the learned embeddings sub-optimal. On the contrary, optimizing SoftMax loss, which is a classification loss, with DNN shows a superior performance in certain DML tasks. It inspires us to investigate the formulation of SoftMax. Our analysis shows that SoftMax loss is equivalent to a smoothed triplet loss where each class has a single center. In real-world data, one class can contain several local clusters rather than a single one, e.g., birds of different poses. Therefore, we propose the SoftTriple loss to extend the SoftMax loss with multiple centers for each class. Compared with conventional deep metric learning algorithms, optimizing SoftTriple loss can learn the embeddings without the sampling phase by mildly increasing the size of the last fully connected layer. Experiments on the benchmark fine-grained data sets demonstrate the effectiveness of the proposed loss function.

...read moreread less

Journal Article•DOI•

HSME: Hypersphere Manifold Embedding for Visible Thermal Person Re-Identification

[...]

Yi Hao¹, Nannan Wang¹, Jie Li¹, Xinbo Gao¹•Institutions (1)

Xidian University¹

17 Jul 2019

TL;DR: This paper proposes an end-to-end dualstream hypersphere manifold embedding network (HSMEnet) with both classification and identification constraint and designs a two-stage training scheme to acquire decorrelated features.

...read moreread less

Abstract: Person Re-identification(re-ID) has great potential to contribute to video surveillance that automatically searches and identifies people across different cameras. Heterogeneous person re-identification between thermal(infrared) and visible images is essentially a cross-modality problem and important for night-time surveillance application. Current methods usually train a model by combining classification and metric learning algorithms to obtain discriminative and robust feature representations. However, the combined loss function ignored the correlation between classification subspace and feature embedding subspace. In this paper, we use Sphere Softmax to learn a hypersphere manifold embedding and constrain the intra-modality variations and cross-modality variations on this hypersphere. We propose an end-to-end dualstream hypersphere manifold embedding network(HSMEnet) with both classification and identification constraint. Meanwhile, we design a two-stage training scheme to acquire decorrelated features, we refer the HSME with decorrelation as D-HSME. We conduct experiments on two crossmodality person re-identification datasets. Experimental results demonstrate that our method outperforms the state-of-the-art methods on two datasets. On RegDB dataset, rank-1 accuracy is improved from 33.47% to 50.85%, and mAP is improved from 31.83% to 47.00%.

...read moreread less

Proceedings Article•

MetricGAN: Generative Adversarial Networks based Black-box Metric Scores Optimization for Speech Enhancement.

[...]

Szu-Wei Fu¹, Chien-Feng Liao¹, Yu Tsao², Shou-De Lin¹•Institutions (2)

National Taiwan University¹, Academia Sinica²

13 May 2019

TL;DR: In this article, the authors proposed a novel metricGAN approach with an aim to optimize the generator with respect to one or multiple evaluation metrics, based on which the generated data can also be arbitrarily specified by users.

...read moreread less

Abstract: Adversarial loss in a conditional generative adversarial network (GAN) is not designed to directly optimize evaluation metrics of a target task, and thus, may not always guide the generator in a GAN to generate data with improved metric scores. To overcome this issue, we propose a novel MetricGAN approach with an aim to optimize the generator with respect to one or multiple evaluation metrics. Moreover, based on MetricGAN, the metric scores of the generated data can also be arbitrarily specified by users. We tested the proposed MetricGAN on a speech enhancement task, which is particularly suitable to verify the proposed approach because there are multiple metrics measuring different aspects of speech signals. Moreover, these metrics are generally complex and could not be fully optimized by Lp or conventional adversarial losses.

...read moreread less

Proceedings Article•DOI•

Skip-GANomaly: Skip Connected and Adversarially Trained Encoder-Decoder Anomaly Detection

[...]

Samet Akcay¹, Amir Atapour-Abarghouei¹, Toby P. Breckon¹•Institutions (1)

Durham University¹

14 Jul 2019

TL;DR: In this article, an unsupervised anomaly detection model is proposed, trained only on the normal (non-anomalous, plentiful) samples in order to learn the normality distribution of the domain, and hence detect abnormality based on deviation from this model.

...read moreread less

Abstract: Despite inherent ill-definition, anomaly detection is a research endeavour of great interest within machine learning and visual scene understanding alike. Most commonly, anomaly detection is considered as the detection of outliers within a given data distribution based on some measure of normality. The most significant challenge in real-world anomaly detection problems is that available data is highly imbalanced towards normality (i.e. non-anomalous) and contains at most a sub-set of all possible anomalous samples - hence limiting the use of well-established supervised learning methods. By contrast, we introduce an unsupervised anomaly detection model, trained only on the normal (non-anomalous, plentiful) samples in order to learn the normality distribution of the domain, and hence detect abnormality based on deviation from this model. Our proposed approach employs an encoder-decoder convolutional neural network with skip connections to thoroughly capture the multi-scale distribution of the normal data distribution in image space. Furthermore, utilizing an adversarial training scheme for this chosen architecture provides superior reconstruction both within image space and a lower-dimensional embedding vector space encoding. Minimizing the reconstruction error metric within both the image and hidden vector spaces during training aids the model to learn the distribution of normality as required. Higher reconstruction metrics during subsequent test and deployment are thus indicative of a deviation from this normal distribution, hence indicative of an anomaly. Experimentation over established anomaly detection benchmarks and challenging real-world datasets, within the context of X-ray security screening, shows the unique promise of such a proposed approach.

...read moreread less

Posted Content•

SoftTriple Loss: Deep Metric Learning Without Triplet Sampling

[...]

Qi Qian, Lei Shang, Baigui Sun, Juhua Hu, Hao Li, Rong Jin - Show less +2 more

11 Sep 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: Zhang et al. as mentioned in this paper proposed the SoftTriple loss to extend the SoftMax loss with multiple centers for each class. And they showed that the softmax loss is equivalent to a smoothed triplet loss where each class has a single center.

...read moreread less

Proceedings Article•DOI•

Learnable Triangulation of Human Pose

[...]

Karim Iskakov¹, Egor Burkov¹, Victor Lempitsky¹, Yury Malkov¹•Institutions (1)

Samsung¹

01 Oct 2019

TL;DR: In this paper, two learnable triangulation methods that combine 3D information from multiple 2D views are proposed for multi-view 3D human pose estimation, and both of them are end-to-end differentiable.

...read moreread less

Abstract: We present two novel solutions for multi-view 3D human pose estimation based on new learnable triangulation methods that combine 3D information from multiple 2D views. The first (baseline) solution is a basic differentiable algebraic triangulation with an addition of confidence weights estimated from the input images. The second, more complex, solution is based on volumetric aggregation of 2D feature maps from the 2D backbone followed by refinement via 3D convolutions that produce final 3D joint heatmaps. Crucially, both of the approaches are end-to-end differentiable, which allows us to directly optimize the target metric. We demonstrate transferability of the solutions across datasets and considerably improve the multi-view state of the art on the Human3.6M dataset.

...read moreread less

Journal Article•DOI•

A CNN With Multiscale Convolution and Diversified Metric for Hyperspectral Image Classification

[...]

Zhiqiang Gong¹, Ping Zhong¹, Yang Yu¹, Weidong Hu¹, Shutao Li² - Show less +1 more•Institutions (2)

National University of Defense Technology¹, Hunan University²

01 Jan 2019-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: Experimental results show that the proposed novel convolutional neural networks with multiscale convolution and diversified metric is better than original deep models and can produce comparable or even better classification performance in different hyperspectral image data sets with respect to spectral and spectral–spatial features.

...read moreread less

Abstract: Recently, researchers have shown the powerful ability of deep methods with multilayers to extract high-level features and to obtain better performance for hyperspectral image classification. However, a common problem of traditional deep models is that the learned deep models might be suboptimal because of the limited number of training samples, especially for the image with large intraclass variance and low interclass variance. In this paper, novel convolutional neural networks (CNNs) with multiscale convolution (MS-CNNs) are proposed to address this problem by extracting deep multiscale features from the hyperspectral image. Moreover, deep metrics usually accompany with MS-CNNs to improve the representational ability for the hyperspectral image. However, the usual metric learning would make the metric parameters in the learned model tend to behave similarly. This similarity leads to obvious model’s redundancy and, thus, shows negative effects on the description ability of the deep metrics. Traditionally, determinantal point process (DPP) priors, which encourage the learned factors to repulse from one another, can be imposed over these factors to diversify them. Taking advantage of both the MS-CNNs and DPP-based diversity-promoting deep metrics, this paper develops a CNN with multiscale convolution and diversified metric to obtain discriminative features for hyperspectral image classification. Experiments are conducted over four real-world hyperspectral image data sets to show the effectiveness and applicability of the proposed method. Experimental results show that our method is better than original deep models and can produce comparable or even better classification performance in different hyperspectral image data sets with respect to spectral and spectral–spatial features.

...read moreread less

Journal Article•DOI•

Divergence measure of Pythagorean fuzzy sets and its application in medical diagnosis

[...]

Fuyuan Xiao¹, Weiping Ding²•Institutions (2)

Southwest University¹, Nantong University²

01 Jun 2019-Applied Soft Computing

TL;DR: This is the first work to consider the divergence of PFSs for measuring the discrepancy of data from the perspective of the relative entropy, and a novel divergence measure is proposed by taking advantage of the Jensen–Shannon divergence, called as PFSJS distance.

...read moreread less

Posted Content•

Learnable Triangulation of Human Pose

[...]

Karim Iskakov¹, Egor Burkov¹, Victor Lempitsky¹, Yury Malkov¹•Institutions (1)

Samsung¹

14 May 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: Two novel solutions for multi-view 3D human pose estimation based on new learnable triangulation methods that combine 3D information from multiple 2D views are presented and end-to-end differentiable, which allows us to directly optimize the target metric.

...read moreread less

Abstract: We present two novel solutions for multi-view 3D human pose estimation based on new learnable triangulation methods that combine 3D information from multiple 2D views. The first (baseline) solution is a basic differentiable algebraic triangulation with an addition of confidence weights estimated from the input images. The second solution is based on a novel method of volumetric aggregation from intermediate 2D backbone feature maps. The aggregated volume is then refined via 3D convolutions that produce final 3D joint heatmaps and allow modelling a human pose prior. Crucially, both approaches are end-to-end differentiable, which allows us to directly optimize the target metric. We demonstrate transferability of the solutions across datasets and considerably improve the multi-view state of the art on the Human3.6M dataset. Video demonstration, annotations and additional materials will be posted on our project page (this https URL).

...read moreread less

Journal Article•DOI•

Distribution Consistency Based Covariance Metric Networks for Few-Shot Learning

[...]

Wenbin Li¹, Jinglin Xu², Jing Huo¹, Lei Wang³, Yang Gao¹, Jiebo Luo⁴ - Show less +2 more•Institutions (4)

Nanjing University¹, Northwestern Polytechnical University², University of Wollongong³, University of Rochester⁴

17 Jul 2019

TL;DR: The CovaMNet is designed to exploit both the covariance representation and covariance metric based on the distribution consistency for the few-shot classification tasks and employs the episodic training mechanism to train the entire network in an end-to-end manner from scratch.

...read moreread less

Abstract: Few-shot learning aims to recognize new concepts from very few examples. However, most of the existing few-shot learning methods mainly concentrate on the first-order statistic of concept representation or a fixed metric on the relation between a sample and a concept. In this work, we propose a novel end-to-end deep architecture, named Covariance Metric Networks (CovaMNet). The CovaMNet is designed to exploit both the covariance representation and covariance metric based on the distribution consistency for the few-shot classification tasks. Specifically, we construct an embedded local covariance representation to extract the second-order statistic information of each concept and describe the underlying distribution of this concept. Upon the covariance representation, we further define a new deep covariance metric to measure the consistency of distributions between query samples and new concepts. Furthermore, we employ the episodic training mechanism to train the entire network in an end-to-end manner from scratch. Extensive experiments in two tasks, generic few-shot image classification and fine-grained fewshot image classification, demonstrate the superiority of the proposed CovaMNet. The source code can be available from https://github.com/WenbinLee/CovaMNet.git.

...read moreread less

Journal Article•DOI•

A generative modeling approach for benchmarking and training shallow quantum circuits

[...]

Marcello Benedetti¹, Delfina Garcia-Pintos, Oscar Perdomo², Vicente Leyton-Ortega, Yunseong Nam, Alejandro Perdomo-Ortiz - Show less +2 more•Institutions (2)

University College London¹, Central Connecticut State University²

27 May 2019-npj Quantum Information

TL;DR: A quantum circuit learning algorithm that can be used to assist the characterization of quantum devices and to train shallow circuits for generative tasks is proposed and it is demonstrated that this approach can learn an optimal preparation of the Greenberger-Horne-Zeilinger states.

...read moreread less

Abstract: Hybrid quantum-classical algorithms provide ways to use noisy intermediate-scale quantum computers for practical applications. Expanding the portfolio of such techniques, we propose a quantum circuit learning algorithm that can be used to assist the characterization of quantum devices and to train shallow circuits for generative tasks. The procedure leverages quantum hardware capabilities to its fullest extent by using native gates and their qubit connectivity. We demonstrate that our approach can learn an optimal preparation of the Greenberger-Horne-Zeilinger states, also known as “cat states”. We further demonstrate that our approach can efficiently prepare approximate representations of coherent thermal states, wave functions that encode Boltzmann probabilities in their amplitudes. Finally, complementing proposals to characterize the power or usefulness of near-term quantum devices, such as IBM’s quantum volume, we provide a new hardware-independent metric called the qBAS score. It is based on the performance yield in a specific sampling task on one of the canonical machine learning data sets known as Bars and Stripes. We show how entanglement is a key ingredient in encoding the patterns of this data set; an ideal benchmark for testing hardware starting at four qubits and up. We provide experimental results and evaluation of this metric to probe the trade off between several architectural circuit designs and circuit depths on an ion-trap quantum computer.

...read moreread less

Proceedings Article•DOI•

MAC: Mining Activity Concepts for Language-Based Temporal Localization

[...]

Runzhou Ge, Jiyang Gao¹, Kan Chen¹, Ram Nevatia¹•Institutions (1)

University of Southern California¹

01 Jan 2019

TL;DR: The novel ACL encodes the semantic concepts from verb-obj pairs in language queries and leverages activity classifiers' prediction scores to encode visual concepts, and shows that ACL significantly outperforms state-of-the-arts under the widely used metric.

...read moreread less

Abstract: We address the problem of language-based temporal localization in untrimmed videos. Compared to temporal localization with fixed categories, this problem is more challenging as the language-based queries not only have no pre-defined activity list but also may contain complex descriptions. Previous methods address the problem by considering features from video sliding windows and language queries and learning a subspace to encode their correlation, which ignore rich semantic cues about activities in videos and queries. We propose to mine activity concepts from both video and language modalities by applying the actionness score enhanced Activity Concepts based Localizer (ACL). Specifically, the novel ACL encodes the semantic concepts from verb-obj pairs in language queries and leverages activity classifiers' prediction scores to encode visual concepts. Besides, ACL also has the capability to regress sliding windows as localization results. Experiments show that ACL significantly outperforms state-of-the-arts under the widely used metric, with more than 5% increase on both Charades-STA and TACoS datasets.

...read moreread less

Proceedings Article•DOI•

Few-Shot Learning With Embedded Class Models and Shot-Free Meta Training

[...]

Avinash Ravichandran¹, Rahul Bhotika¹, Stefano Soatto²•Institutions (2)

Amazon.com¹, University of California, Los Angeles²

10 May 2019

TL;DR: This work proposes a method for learning embeddings for few-shot learning that is suitable for use with any number of shots (shot-free), that encompasses metric learning, that facilitates adding new classes without crowding the class representation space.

...read moreread less

Abstract: We propose a method for learning embeddings for few-shot learning that is suitable for use with any number of shots (shot-free). Rather than fixing the class prototypes to be the Euclidean average of sample embeddings, we allow them to live in a higher-dimensional space (embedded class models) and learn the prototypes along with the model parameters. The class representation function is defined implicitly, which allows us to deal with a variable number of shots per class with a simple constant-size architecture. The class embedding encompasses metric learning, that facilitates adding new classes without crowding the class representation space. Despite being general and not tuned to the benchmark, our approach achieves state-of-the-art performance on the standard few-shot benchmark datasets.

...read moreread less

Proceedings Article•DOI•

Hardness-Aware Deep Metric Learning

[...]

Wenzhao Zheng¹, Zhaodong Chen¹, Jiwen Lu¹, Jie Zhou¹•Institutions (1)

Tsinghua University¹

15 Jun 2019

TL;DR: This paper performs linear interpolation on embeddings to adaptively manipulate their hard levels and generate corresponding label-preserving synthetics for recycled training, so that information buried in all samples can be fully exploited and the metric is always challenged with proper difficulty.

...read moreread less

Abstract: This paper presents a hardness-aware deep metric learning (HDML) framework. Most previous deep metric learning methods employ the hard negative mining strategy to alleviate the lack of informative samples for training. However, this mining strategy only utilizes a subset of training data, which may not be enough to characterize the global geometry of the embedding space comprehensively. To address this problem, we perform linear interpolation on embeddings to adaptively manipulate their hard levels and generate corresponding label-preserving synthetics for recycled training, so that information buried in all samples can be fully exploited and the metric is always challenged with proper difficulty. Our method achieves very competitive performance on the widely used CUB-200-2011, Cars196, and Stanford Online Products datasets.

...read moreread less

Collapse