Showing papers on "Metric (mathematics) published in 2020"

PDF

Open Access

Proceedings Article•DOI•

A Survey on Performance Metrics for Object-Detection Algorithms

[...]

Rafael Padilla¹, Sergio L. Netto¹, Eduardo A. B. da Silva¹•Institutions (1)

01 Jul 2020

TL;DR: This work explores and compares the plethora of metrics for the performance evaluation of object-detection algorithms and proposes a standard implementation that can be used as a benchmark among different datasets with minimum adaptation on the annotation files.

...read moreread less

Abstract: This work explores and compares the plethora of metrics for the performance evaluation of object-detection algorithms. Average precision (AP),for instance, is a popular metric for evaluating the accuracy of object detectors by estimating the area under the curve (AUC) of the precision × recall relationship. Depending on the point interpolation used in the plot, two different AP variants can be defined and, therefore, different results are generated. AP has six additional variants increasing the possibilities of benchmarking. The lack of consensus in different works and AP implementations is a problem faced by the academic and scientific communities. Metric implementations written in different computational languages and platforms are usually distributed with corresponding datasets sharing a given bounding-box description. Such projects indeed help the community with evaluation tools, but demand extra work to be adapted for other datasets and bounding-box formats. This work reviews the most used metrics for object detection detaching their differences, applications, and main concepts. It also proposes a standard implementation that can be used as a benchmark among different datasets with minimum adaptation on the annotation files.

...read moreread less

451 citations

Proceedings Article•

A Baseline for Few-Shot Image Classification

[...]

Guneet S. Dhillon¹, Pratik Chaudhari², Avinash Ravichandran¹, Stefano Soatto³•Institutions (3)

Amazon.com¹, University of Pennsylvania², University of California, Los Angeles³

30 Apr 2020

TL;DR: This work performs extensive studies on benchmark datasets to propose a metric that quantifies the "hardness" of a few-shot episode and finds that using a large number of meta-training classes results in high few- shot accuracies even for a largeNumber of few-shots classes.

...read moreread less

Abstract: Fine-tuning a deep network trained with the standard cross-entropy loss is a strong baseline for few-shot learning. When fine-tuned transductively, this outperforms the current state-of-the-art on standard datasets such as Mini-Imagenet, Tiered-Imagenet, CIFAR-FS and FC-100 with the same hyper-parameters. The simplicity of this approach enables us to demonstrate the first few-shot learning results on the Imagenet-21k dataset. We find that using a large number of meta-training classes results in high few-shot accuracies even for a large number of few-shot classes. We do not advocate our approach as the solution for few-shot learning, but simply use the results to highlight limitations of current benchmarks and few-shot protocols. We perform extensive studies on benchmark datasets to propose a metric that quantifies the "hardness" of a few-shot episode. This metric can be used to report the performance of few-shot algorithms in a more systematic way.

...read moreread less

355 citations

Proceedings Article•DOI•

DeepEMD: Few-Shot Image Classification With Differentiable Earth Mover’s Distance and Structured Classifiers

[...]

Chi Zhang¹, Yujun Cai¹, Guosheng Lin¹, Chunhua Shen²•Institutions (2)

Nanyang Technological University¹, University of Adelaide²

14 Jun 2020

TL;DR: Zhang et al. as discussed by the authors adopt the Earth Mover's distance (EMD) as a metric to compute a structural distance between dense image representations to determine image relevance, which is used to represent the image distance for classification.

...read moreread less

Abstract: In this paper, we address the few-shot classification task from a new perspective of optimal matching between image regions. We adopt the Earth Mover's Distance (EMD) as a metric to compute a structural distance between dense image representations to determine image relevance. The EMD generates the optimal matching flows between structural elements that have the minimum matching cost, which is used to represent the image distance for classification. To generate the important weights of elements in the EMD formulation, we design a cross-reference mechanism, which can effectively minimize the impact caused by the cluttered background and large intra-class appearance variations. To handle k-shot classification, we propose to learn a structured fully connected layer that can directly classify dense image representations with the EMD. Based on the implicit function theorem, the EMD can be inserted as a layer into the network for end-to-end training. We conduct comprehensive experiments to validate our algorithm and we set new state-of-the-art performance on four popular few-shot classification benchmarks, namely miniImageNet, tieredImageNet, Fewshot-CIFAR100 (FC100) and Caltech-UCSD Birds-200-2011 (CUB).

...read moreread less

354 citations

Posted Content•

DeepEMD: Few-Shot Image Classification with Differentiable Earth Mover's Distance and Structured Classifiers

[...]

Chi Zhang¹, Yujun Cai¹, Guosheng Lin¹, Chunhua Shen²•Institutions (2)

Nanyang Technological University¹, University of Adelaide²

15 Mar 2020

TL;DR: This paper adopts the Earth Mover's Distance (EMD) as a metric to compute a structural distance between dense image representations to determine image relevance and designs a cross-reference mechanism that can effectively minimize the impact caused by the cluttered background and large intra-class appearance variations.

...read moreread less

271 citations

Proceedings Article•DOI•

On Sampled Metrics for Item Recommendation

[...]

Walid Krichene¹, Steffen Rendle¹•Institutions (1)

Google¹

23 Aug 2020

TL;DR: It is shown that sampled metrics are inconsistent with their exact version, in the sense that they do not persist relative statements, and it is suggested that sampling should be avoided for metric calculation, however if an experimental study needs to sample, the proposed corrections can improve the quality of the estimate.

...read moreread less

Abstract: The task of item recommendation requires ranking a large catalogue of items given a context. Item recommendation algorithms are evaluated using ranking metrics that depend on the positions of relevant items. To speed up the computation of metrics, recent work often uses sampled metrics where only a smaller set of random items and the relevant items are ranked. This paper investigates sampled metrics in more detail and shows that they are inconsistent with their exact version, in the sense that they do not persist relative statements, e.g., recommender A is better than B, not even in expectation. Moreover, the smaller the sampling size, the less difference there is between metrics, and for very small sampling size, all metrics collapse to the AUC metric. We show that it is possible to improve the quality of the sampled metrics by applying a correction, obtained by minimizing different criteria such as bias or mean squared error. We conclude with an empirical evaluation of the naive sampled metrics and their corrected variants. To summarize, our work suggests that sampling should be avoided for metric calculation, however if an experimental study needs to sample, the proposed corrections can improve the quality of the estimate.

...read moreread less

264 citations

Posted Content•

A Metric Learning Reality Check

[...]

Kevin Musgrave¹, Serge Belongie¹, Ser-Nam Lim²•Institutions (2)

Cornell University¹, Facebook²

18 Mar 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: Flaws in the experimental methodology of numerous metric learning papers are found, and it is shown that the actual improvements over time have been marginal at best.

...read moreread less

Abstract: Deep metric learning papers from the past four years have consistently claimed great advances in accuracy, often more than doubling the performance of decade-old methods. In this paper, we take a closer look at the field to see if this is actually true. We find flaws in the experimental methodology of numerous metric learning papers, and show that the actual improvements over time have been marginal at best.

...read moreread less

244 citations

Journal Article•DOI•

HOTA: A Higher Order Metric for Evaluating Multi-Object Tracking

[...]

Jonathon Luiten¹, Aljosa Osep¹, Patrick Dendorfer², Philip H. S. Torr², Andreas Geiger², Laura Leal-Taixé³, Bastian Leibe⁴ - Show less +3 more•Institutions (4)

RWTH Aachen University¹, Technische Universität München², University of Oxford³, Max Planck Society⁴

16 Sep 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work presents a novel MOT evaluation metric, higher order tracking accuracy (HOTA), which explicitly balances the effect of performing accurate detection, association and localization into a single unified metric for comparing trackers.

...read moreread less

Abstract: Multi-Object Tracking (MOT) has been notoriously difficult to evaluate. Previous metrics overemphasize the importance of either detection or association. To address this, we present a novel MOT evaluation metric, HOTA (Higher Order Tracking Accuracy), which explicitly balances the effect of performing accurate detection, association and localization into a single unified metric for comparing trackers. HOTA decomposes into a family of sub-metrics which are able to evaluate each of five basic error types separately, which enables clear analysis of tracking performance. We evaluate the effectiveness of HOTA on the MOTChallenge benchmark, and show that it is able to capture important aspects of MOT performance not previously taken into account by established metrics. Furthermore, we show HOTA scores better align with human visual evaluation of tracking performance.

...read moreread less

216 citations

Book Chapter•DOI•

A Metric Learning Reality Check

[...]

Kevin Musgrave¹, Serge Belongie¹, Ser-Nam Lim²•Institutions (2)

Cornell University¹, Facebook²

18 Mar 2020

TL;DR: In this article, the authors take a closer look at the field to see if this is actually true, and find flaws in the experimental methodology of numerous metric learning papers, and show that the actual improvements over time have been marginal at best.

...read moreread less

200 citations

Proceedings Article•DOI•

In defence of metric learning for speaker recognition

[...]

Joon Son Chung¹, Jaesung Huh¹, Seongkyu Mun¹, Minjae Lee¹, Hee-Soo Heo¹, Soyeon Choe¹, Chiheon Ham, Sunghwan Jung, Bong-Jin Lee¹, Ick-sang Han¹ - Show less +6 more•Institutions (1)

Naver Corporation¹

26 Mar 2020

TL;DR: This paper proposed a metric learning objective for open-set speaker recognition, where ideal embeddings should be able to condense information into a compact utterance-level representation that has small intra-speaker and large inter-Speaker distance.

...read moreread less

Abstract: The objective of this paper is 'open-set' speaker recognition of unseen speakers, where ideal embeddings should be able to condense information into a compact utterance-level representation that has small intra-speaker and large inter-speaker distance. A popular belief in speaker recognition is that networks trained with classification objectives outperform metric learning methods. In this paper, we present an extensive evaluation of most popular loss functions for speaker recognition on the VoxCeleb dataset. We demonstrate that the vanilla triplet loss shows competitive performance compared to classification-based losses, and those trained with our proposed metric learning objective outperform state-of-the-art methods.

...read moreread less

199 citations

Posted Content•

Quantum embeddings for machine learning

[...]

Seth Lloyd, Maria Schuld, Aroosa Ijaz, Josh Izaac, Nathan Killoran - Show less +1 more

10 Jan 2020-arXiv: Quantum Physics

TL;DR: This work proposes to train the first part of the circuit with the objective of maximally separating data classes in Hilbert space, a strategy it calls quantum metric learning, which provides a powerful analytic framework for quantum machine learning.

...read moreread less

Abstract: Quantum classifiers are trainable quantum circuits used as machine learning models. The first part of the circuit implements a quantum feature map that encodes classical inputs into quantum states, embedding the data in a high-dimensional Hilbert space; the second part of the circuit executes a quantum measurement interpreted as the output of the model. Usually, the measurement is trained to distinguish quantum-embedded data. We propose to instead train the first part of the circuit---the embedding---with the objective of maximally separating data classes in Hilbert space, a strategy we call quantum metric learning. As a result, the measurement minimizing a linear classification loss is already known and depends on the metric used: for embeddings separating data using the l1 or trace distance, this is the Helstrom measurement, while for the l2 or Hilbert-Schmidt distance, it is a simple overlap measurement. This approach provides a powerful analytic framework for quantum machine learning and eliminates a major component in current models, freeing up more precious resources to best leverage the capabilities of near-term quantum information processors.

...read moreread less

189 citations

Journal Article•DOI•

Gravitational Test beyond the First Post-Newtonian Order with the Shadow of the M87 Black Hole

[...]

Dimitrios Psaltis¹, Lia Medeiros², Pierre Christian¹, Feryal Özel¹ +212 more•Institutions (53)

01 Oct 2020-Physical Review Letters

TL;DR: It is shown analytically that spacetimes that deviate from the Kerr metric but satisfy weak-field tests can lead to large deviations in the predicted black-hole shadows that are inconsistent with even the current EHT measurements.

...read moreread less

Abstract: The 2017 Event Horizon Telescope (EHT) observations of the central source in M87 have led to the first measurement of the size of a black-hole shadow. This observation offers a new and clean gravitational test of the black-hole metric in the strong-field regime. We show analytically that spacetimes that deviate from the Kerr metric but satisfy weak-field tests can lead to large deviations in the predicted black-hole shadows that are inconsistent with even the current EHT measurements. We use numerical calculations of regular, parametric, non-Kerr metrics to identify the common characteristic among these different parametrizations that control the predicted shadow size. We show that the shadow-size measurements place significant constraints on deviation parameters that control the second post-Newtonian and higher orders of each metric and are, therefore, inaccessible to weak-field tests. The new constraints are complementary to those imposed by observations of gravitational waves from stellar-mass sources.

...read moreread less

Journal Article•DOI•

Is there a novel Einstein–Gauss–Bonnet theory in four dimensions?

[...]

Metin Gürses¹, Tahsin Çağrı Şişman, Bayram Tekin²•Institutions (2)

Bilkent University¹, Middle East Technical University²

01 Jul 2020-European Physical Journal C

TL;DR: In this article, it was shown that the field equations of the EG theory do not admit an intrinsically four-dimensional definition, in terms of metric only, as such it does not exist in four dimensions.

...read moreread less

Abstract: No! We show that the field equations of Einstein–Gauss–Bonnet theory defined in generic $$D>4$$ dimensions split into two parts one of which always remains higher dimensional, and hence the theory does not have a non-trivial limit to $$D=4$$. Therefore, the recently introduced four-dimensional, novel, Einstein–Gauss–Bonnet theory does not admit an intrinsically four-dimensional definition, in terms of metric only, as such it does not exist in four dimensions. The solutions (the spacetime, the metric) always remain $$D>4$$ dimensional. As there is no canonical choice of 4 spacetime dimensions out of D dimensions for generic metrics, the theory is not well defined in four dimensions.

...read moreread less

Posted Content•

Cross-Domain Few-Shot Classification via Learned Feature-Wise Transformation

[...]

Hung-Yu Tseng¹, Hsin-Ying Lee¹, Jia-Bin Huang², Ming-Hsuan Yang¹•Institutions (2)

University of California, Merced¹, Virginia Tech²

23 Jan 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: The core idea is to use feature-wise transformation layers for augmenting the image features using affine transforms to simulate various feature distributions under different domains in the training stage, and applies a learning-to-learn approach to search for the hyper-parameters of the feature- wise transformation layers.

...read moreread less

Abstract: Few-shot classification aims to recognize novel categories with only few labeled images in each class. Existing metric-based few-shot classification algorithms predict categories by comparing the feature embeddings of query images with those from a few labeled images (support examples) using a learned metric function. While promising performance has been demonstrated, these methods often fail to generalize to unseen domains due to large discrepancy of the feature distribution across domains. In this work, we address the problem of few-shot classification under domain shifts for metric-based methods. Our core idea is to use feature-wise transformation layers for augmenting the image features using affine transforms to simulate various feature distributions under different domains in the training stage. To capture variations of the feature distributions under different domains, we further apply a learning-to-learn approach to search for the hyper-parameters of the feature-wise transformation layers. We conduct extensive experiments and ablation studies under the domain generalization setting using five few-shot classification datasets: mini-ImageNet, CUB, Cars, Places, and Plantae. Experimental results demonstrate that the proposed feature-wise transformation layer is applicable to various metric-based models, and provides consistent improvements on the few-shot classification performance under domain shift.

...read moreread less

Journal Article•DOI•

Unsupervised Person Re-Identification by Deep Asymmetric Metric Embedding

[...]

Hong-Xing Yu¹, Ancong Wu¹, Wei-Shi Zheng¹•Institutions (1)

Sun Yat-sen University¹

01 Apr 2020-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A novel unsupervised deep framework named the DEep Clustering-based Asymmetric MEtric Learning (DECAMEL) is developed, which learns a compact cross-view cluster structure of Re-ID data to help alleviate the view-specific bias and facilitate mining the potential cross-View discriminative information for unsuper supervised Re- ID.

...read moreread less

Abstract: Person re-identification (Re-ID) aims to match identities across non-overlapping camera views. Researchers have proposed many supervised Re-ID models which require quantities of cross-view pairwise labelled data. This limits their scalabilities to many applications where a large amount of data from multiple disjoint camera views is available but unlabelled. Although some unsupervised Re-ID models have been proposed to address the scalability problem, they often suffer from the view-specific bias problem which is caused by dramatic variances across different camera views, e.g., different illumination, viewpoints and occlusion. The dramatic variances induce specific feature distortions in different camera views, which can be very disturbing in finding cross-view discriminative information for Re-ID in the unsupervised scenarios, since no label information is available to help alleviate the bias. We propose to explicitly address this problem by learning an unsupervised asymmetric distance metric based on cross-view clustering. The asymmetric distance metric allows specific feature transformations for each camera view to tackle the specific feature distortions. We then design a novel unsupervised loss function to embed the asymmetric metric into a deep neural network, and therefore develop a novel unsupervised deep framework named the DE ep C lustering-based A symmetric ME tric L earning ( DECAMEL ). In such a way, DECAMEL jointly learns the feature representation and the unsupervised asymmetric metric. DECAMEL learns a compact cross-view cluster structure of Re-ID data, and thus help alleviate the view-specific bias and facilitate mining the potential cross-view discriminative information for unsupervised Re-ID. Extensive experiments on seven benchmark datasets whose sizes span several orders show the effectiveness of our framework.

...read moreread less

Proceedings Article•DOI•

Fast Symmetric Diffeomorphic Image Registration with Convolutional Neural Networks

[...]

Tony C. W. Mok¹, Albert C. S. Chung¹•Institutions (1)

Hong Kong University of Science and Technology¹

14 Jun 2020

TL;DR: A novel, efficient unsupervised symmetric image registration method which maximizes the similarity between images within the space of diffeomorphic maps and estimates both forward and inverse transformations simultaneously.

...read moreread less

Abstract: Diffeomorphic deformable image registration is crucial in many medical image studies, as it offers unique, special features including topology preservation and invertibility of the transformation. Recent deep learning-based deformable image registration methods achieve fast image registration by leveraging a convolutional neural network (CNN) to learn the spatial transformation from the synthetic ground truth or the similarity metric. However, these approaches often ignore the topology preservation of the transformation and the smoothness of the transformation which is enforced by a global smoothing energy function alone. Moreover, deep learning-based approaches often estimate the displacement field directly, which cannot guarantee the existence of the inverse transformation. In this paper, we present a novel, efficient unsupervised symmetric image registration method which maximizes the similarity between images within the space of diffeomorphic maps and estimates both forward and inverse transformations simultaneously. We evaluate our method on 3D image registration with a large scale brain image dataset. Our method achieves state-of-the-art registration accuracy and running time while maintaining desirable diffeomorphic properties.

...read moreread less

Journal Article•DOI•

A Polynomial Kernel Induced Distance Metric to Improve Deep Transfer Learning for Fault Diagnosis of Machines

[...]

Bin Yang¹, Yaguo Lei¹, Feng Jia¹, Naipeng Li¹, Du Zhaojun¹ - Show less +1 more•Institutions (1)

Xi'an Jiaotong University¹

01 Nov 2020-IEEE Transactions on Industrial Electronics

TL;DR: A distance metric named polynomial kernel induced MMD (PK-MMD) is proposed and combined with a diagnosis model is constructed to reuse diagnosis knowledge from one machine to the other, and the PK- MMD-based diagnosis model presents better transfer results than other methods.

...read moreread less

Abstract: Deep transfer-learning-based diagnosis models are promising to apply diagnosis knowledge across related machines, but from which the collected data follow different distribution. To reduce the distribution discrepancy, Gaussian kernel induced maximum mean discrepancy (GK-MMD) is a widely used distance metric to impose constraints on the training of diagnosis models. However, the models using GK-MMD have three weaknesses: 1) GK-MMD may not accurately estimate distribution discrepancy because it ignores the high-order moment distances of data; 2) the time complexity of GK-MMD is high to require much computation cost; 3) the transfer performance of GK-MMD-based diagnosis models is sensitive to the selected kernel parameters. In order to overcome the weaknesses, a distance metric named polynomial kernel induced MMD (PK-MMD) is proposed in this article. Combined with PK-MMD, a diagnosis model is constructed to reuse diagnosis knowledge from one machine to the other. The proposed methods are verified by two transfer learning cases, in which the health states of locomotive bearings are identified with the help of data respectively from motor bearings and gearbox bearings in laboratories. The results show that PK-MMD enables to improve the inefficient computation of GK-MMD, and the PK-MMD-based diagnosis model presents better transfer results than other methods.

...read moreread less

Proceedings Article•

Space-Time Correspondence as a Contrastive Random Walk

[...]

Allan Jabri¹, Andrew Owens², Alexei A. Efros¹•Institutions (2)

University of California, Berkeley¹, University of Michigan²

25 Jun 2020

TL;DR: Despite its simplicity, the method outperforms the self-supervised state-of-the-art on a variety of label propagation tasks involving objects, semantic parts, and pose.

...read moreread less

Abstract: This paper proposes a simple self-supervised approach for learning a representation for visual correspondence from raw video. We cast correspondence as prediction of links in a space-time graph constructed from video. In this graph, the nodes are patches sampled from each frame, and nodes adjacent in time can share a directed edge. We learn a representation in which pairwise similarity defines transition probability of a random walk, so that long-range correspondence is computed as a walk along the graph. We optimize the representation to place high probability along paths of similarity. Targets for learning are formed without supervision, by cycle-consistency: the objective is to maximize the likelihood of returning to the initial node when walking along a graph constructed from a palindrome of frames. Thus, a single path-level constraint implicitly supervises chains of intermediate comparisons. When used as a similarity metric without adaptation, the learned representation outperforms the self-supervised state-of-the-art on label propagation tasks involving objects, semantic parts, and pose. Moreover, we demonstrate that a technique we call edge dropout, as well as self-supervised adaptation at test-time, further improve transfer for object-centric correspondence.

...read moreread less

Journal Article•DOI•

A Solution for Large-Scale Multi-Object Tracking

[...]

Michael Beard¹, Ba-Tuong Vo¹, Ba-Ngu Vo¹•Institutions (1)

Curtin University¹

13 Apr 2020-IEEE Transactions on Signal Processing

TL;DR: A large-scale multi-object tracker based on the generalised labeled multi-Bernoulli (GLMB) filter is proposed and a new method of applying the optimal sub-pattern assignment (OSPA) metric to determine a meaningful distance between two sets of tracks is introduced.

...read moreread less

Abstract: A large-scale multi-object tracker based on the generalised labeled multi-Bernoulli (GLMB) filter is proposed. The algorithm is capable of tracking a very large, unknown and time-varying number of objects simultaneously, in the presence of a high number of false alarms, as well as missed detections and measurement origin uncertainty due to closely spaced objects. The algorithm is demonstrated on a simulated tracking scenario, where the peak number objects appearing simultaneously exceeds one million. Additionally, we introduce a new method of applying the optimal sub-pattern assignment (OSPA) metric to determine a meaningful distance between two sets of tracks. We also develop an efficient strategy for its exact computation in large-scale scenarios to evaluate the performance of the proposed tracker.

...read moreread less

Book Chapter•DOI•

Learning from Extrinsic and Intrinsic Supervisions for Domain Generalization

[...]

Shujun Wang¹, Lequan Yu², Caizi Li³, Chi-Wing Fu¹, Pheng-Ann Heng¹ - Show less +1 more•Institutions (3)

The Chinese University of Hong Kong¹, Stanford University², Chinese Academy of Sciences³

23 Aug 2020

TL;DR: A new domain generalization framework that learns how to generalize across domains simultaneously from extrinsic relationship supervision and intrinsic self-supervision for images from multi-source domains is presented.

...read moreread less

Abstract: The generalization capability of neural networks across domains is crucial for real-world applications. We argue that a generalized object recognition system should well understand the relationships among different images and also the images themselves at the same time. To this end, we present a new domain generalization framework (called EISNet) that learns how to generalize across domains simultaneously from extrinsic relationship supervision and intrinsic self-supervision for images from multi-source domains. To be specific, we formulate our framework with feature embedding using a multi-task learning paradigm. Besides conducting the common supervised recognition task, we seamlessly integrate a momentum metric learning task and a self-supervised auxiliary task to collectively integrate the extrinsic and intrinsic supervisions. Also, we develop an effective momentum metric learning scheme with the K-hard negative mining to boost the network generalization ability. We demonstrate the effectiveness of our approach on two standard object recognition benchmarks VLCS and PACS, and show that our EISNet achieves state-of-the-art performance.

...read moreread less

Journal Article•DOI•

Domain Adaptation With Neural Embedding Matching

[...]

Zengmao Wang¹, Bo Du¹, Yuhong Guo²•Institutions (2)

Wuhan University¹, Carleton University²

01 Jul 2020-IEEE Transactions on Neural Networks

TL;DR: A novel representation learning-based domain adaptation method to transfer information from the source domain to the target domain where labeled data is scarce and it outperforms several state-of-the-art domain adaptation methods and the progressive learning strategy is promising.

...read moreread less

Abstract: Domain adaptation aims to exploit the supervision knowledge in a source domain for learning prediction models in a target domain. In this article, we propose a novel representation learning-based domain adaptation method, i.e., neural embedding matching (NEM) method, to transfer information from the source domain to the target domain where labeled data is scarce. The proposed approach induces an intermediate common representation space for both domains with a neural network model while matching the embedding of data from the two domains in this common representation space. The embedding matching is based on the fundamental assumptions that a cross-domain pair of instances will be close to each other in the embedding space if they belong to the same class category, and the local geometry property of the data can be maintained in the embedding space. The assumptions are encoded via objectives of metric learning and graph embedding techniques to regularize and learn the semisupervised neural embedding model. We also provide a generalization bound analysis for the proposed domain adaptation method. Meanwhile, a progressive learning strategy is proposed and it improves the generalization ability of the neural network gradually. Experiments are conducted on a number of benchmark data sets and the results demonstrate that the proposed method outperforms several state-of-the-art domain adaptation methods and the progressive learning strategy is promising.

...read moreread less

Proceedings Article•DOI•

USR: An Unsupervised and Reference Free Evaluation Metric for Dialog Generation

[...]

Shikib Mehri¹, Maxine Eskenazi¹•Institutions (1)

Carnegie Mellon University¹

01 May 2020

TL;DR: This article proposed an unsupervised and reference-free evaluation metric for dialog, called USR, which is a reference free metric that trains models to measure several desirable qualities of dialog.

...read moreread less

Abstract: The lack of meaningful automatic evaluation metrics for dialog has impeded open-domain dialog research. Standard language generation metrics have been shown to be ineffective for evaluating dialog models. To this end, this paper presents USR, an UnSupervised and Reference-free evaluation metric for dialog. USR is a reference-free metric that trains unsupervised models to measure several desirable qualities of dialog. USR is shown to strongly correlate with human judgment on both Topical-Chat (turn-level: 0.42, system-level: 1.0) and PersonaChat (turn-level: 0.48 and system-level: 1.0). USR additionally produces interpretable measures for several desirable properties of dialog.

...read moreread less

Proceedings Article•

Revisiting Training Strategies and Generalization Performance in Deep Metric Learning

[...]

Karsten Roth¹, Timo Milbich¹, Samarth Sinha², Prateek Gupta³, Björn Ommer¹, Joseph Paul Cohen⁴ - Show less +2 more•Institutions (4)

Heidelberg University¹, University of Toronto², University of Oxford³, Université de Montréal⁴

12 Jul 2020

TL;DR: A simple, yet effective, training regularization is proposed to reliably boost the performance of ranking-based DML models on various standard benchmark datasets.

...read moreread less

Abstract: Deep Metric Learning (DML) is arguably one of the most influential lines of research for learning visual similarities with many proposed approaches every year. Although the field benefits from the rapid progress, the divergence in training protocols, architectures, and parameter choices make an unbiased comparison difficult. To provide a consistent reference point, we revisit the most widely used DML objective functions and conduct a study of the crucial parameter choices as well as the commonly neglected mini-batch sampling process. Under consistent comparison, DML objectives show much higher saturation than indicated by literature. Further based on our analysis, we uncover a correlation between the embedding space density and compression to the generalization performance of DML models. Exploiting these insights, we propose a simple, yet effective, training regularization to reliably boost the performance of ranking-based DML models on various standard benchmark datasets. Code and a publicly accessible WandB-repo are available at this https URL.

...read moreread less

Journal Article•DOI•

Heterogeneous Domain Adaptation: An Unsupervised Approach

[...]

Feng Liu¹, Guangquan Zhang¹, Jie Lu¹•Institutions (1)

University of Technology, Sydney¹

03 Mar 2020-IEEE Transactions on Neural Networks

TL;DR: An unsupervised knowledge transfer theorem that guarantees the correctness of transferring knowledge and a principal angle-based metric to measure the distance between two pairs of domains are presented.

...read moreread less

Abstract: Domain adaptation leverages the knowledge in one domain—the source domain—to improve learning efficiency in another domain—the target domain. Existing heterogeneous domain adaptation research is relatively well-progressed but only in situations where the target domain contains at least a few labeled instances. In contrast, heterogeneous domain adaptation with an unlabeled target domain has not been well-studied. To contribute to the research in this emerging field, this article presents: 1) an unsupervised knowledge transfer theorem that guarantees the correctness of transferring knowledge and 2) a principal angle-based metric to measure the distance between two pairs of domains: one pair comprises the original source and target domains and the other pair comprises two homogeneous representations of two domains. The theorem and the metric have been implemented in an innovative transfer model, called a Grassmann–linear monotonic maps–geodesic flow kernel (GLG), which is specifically designed for heterogeneous unsupervised domain adaptation (HeUDA). The linear monotonic maps (LMMs) meet the conditions of the theorem and are used to construct homogeneous representations of the heterogeneous domains. The metric shows the extent to which the homogeneous representations have preserved the information in the original source and target domains. By minimizing the proposed metric, the GLG model learns the homogeneous representations of heterogeneous domains and transfers knowledge through these learned representations via a geodesic flow kernel (GFK). To evaluate the model, five public data sets were reorganized into ten HeUDA tasks across three applications: cancer detection, the credit assessment, and text classification. The experiments demonstrate that the proposed model delivers superior performance over the existing baselines.

...read moreread less

Journal Article•DOI•

Semantic Object Accuracy for Generative Text-to-Image Synthesis.

[...]

Tobias Hinz¹, Stefan Heinrich¹, Stefan Wermter¹•Institutions (1)

University of Hamburg¹

02 Sep 2020-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A new model that explicitly models individual objects within an image and a new evaluation metric called Semantic Object Accuracy (SOA) that specifically evaluates images given an image caption are introduced that outperform models which only model global image characteristics.

...read moreread less

Abstract: Generative adversarial networks conditioned on textual image descriptions are capable of generating realistic-looking images. However, current methods still struggle to generate images based on complex image captions from a heterogeneous domain. Furthermore, quantitatively evaluating these text-to-image models is challenging, as most evaluation metrics only judge image quality but not the conformity between the image and its caption. To address these challenges we introduce a new model that explicitly models individual objects within an image and a new evaluation metric called Semantic Object Accuracy (SOA) that specifically evaluates images given an image caption. The SOA uses a pre-trained object detector to evaluate if a generated image contains objects that are mentioned in the image caption, e.g. whether an image generated from "a car driving down the street" contains a car. We perform a user study comparing several text-to-image models and show that our SOA metric ranks the models the same way as humans, whereas other metrics such as the Inception Score do not. Our evaluation also shows that models which explicitly model objects outperform models which only model global image characteristics.

...read moreread less

Posted Content•

Reliable Fidelity and Diversity Metrics for Generative Models

[...]

Muhammad Ferjad Naeem, Seong Joon Oh, Youngjung Uh, Yunjey Choi, Jaejun Yoo - Show less +1 more

23 Feb 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: It is shown that even the latest version of the precision and recall metrics are not reliable yet, and density and coverage metrics provide more interpretable and reliable signals for practitioners than the existing metrics.

...read moreread less

Abstract: Devising indicative evaluation metrics for the image generation task remains an open problem. The most widely used metric for measuring the similarity between real and generated images has been the Frechet Inception Distance (FID) score. Because it does not differentiate the fidelity and diversity aspects of the generated images, recent papers have introduced variants of precision and recall metrics to diagnose those properties separately. In this paper, we show that even the latest version of the precision and recall metrics are not reliable yet. For example, they fail to detect the match between two identical distributions, they are not robust against outliers, and the evaluation hyperparameters are selected arbitrarily. We propose density and coverage metrics that solve the above issues. We analytically and experimentally show that density and coverage provide more interpretable and reliable signals for practitioners than the existing metrics. Code: this https URL.

...read moreread less

Book Chapter•DOI•

Smooth-AP: Smoothing the Path Towards Large-Scale Image Retrieval.

[...]

Andrew Brown¹, Weidi Xie¹, Vicky Kalogeiton¹, Andrew Zisserman¹•Institutions (1)

University of Oxford¹

23 Aug 2020

TL;DR: Smooth-AP is a plug-and-play objective function that allows for end-to-end training of deep networks with a simple and elegant implementation and improves the performance over the state-of-the-art, especially for larger-scale datasets, thus demonstrating the effectiveness and scalability of Smooth-AP to real-world scenarios.

...read moreread less

Abstract: Optimising a ranking-based metric, such as Average Precision (AP), is notoriously challenging due to the fact that it is non-differentiable, and hence cannot be optimised directly using gradient-descent methods. To this end, we introduce an objective that optimises instead a smoothed approximation of AP, coined Smooth-AP. Smooth-AP is a plug-and-play objective function that allows for end-to-end training of deep networks with a simple and elegant implementation. We also present an analysis for why directly optimising the ranking based metric of AP offers benefits over other deep metric learning losses.

...read moreread less

Proceedings Article•DOI•

Spatial Pyramid Based Graph Reasoning for Semantic Segmentation

[...]

Xia Li¹, Yibo Yang¹, Qijie Zhao¹, Tiancheng Shen¹, Zhouchen Lin¹, Hong Liu¹ - Show less +2 more•Institutions (1)

Peking University¹

14 Jun 2020

TL;DR: Wang et al. as mentioned in this paper applied graph convolution into the semantic segmentation task and proposed an improved Laplacian, which is data-dependent and introduces an attention diagonal matrix to learn a better distance metric.

...read moreread less

Abstract: The convolution operation suffers from a limited receptive filed, while global modeling is fundamental to dense prediction tasks, such as semantic segmentation. In this paper, we apply graph convolution into the semantic segmentation task and propose an improved Laplacian. The graph reasoning is directly performed in the original feature space organized as a spatial pyramid. Different from existing methods, our Laplacian is data-dependent and we introduce an attention diagonal matrix to learn a better distance metric. It gets rid of projecting and re-projecting processes, which makes our proposed method a light-weight module that can be easily plugged into current computer vision architectures. More importantly, performing graph reasoning directly in the feature space retains spatial relationships and makes spatial pyramid possible to explore multiple long-range contextual patterns from different scales. Experiments on Cityscapes, COCO Stuff, PASCAL Context and PASCAL VOC demonstrate the effectiveness of our proposed methods on semantic segmentation. We achieve comparable performance with advantages in computational and memory overhead.

...read moreread less

Posted Content•

Probabilistic 3D Multi-Object Tracking for Autonomous Driving.

[...]

Hsu-Kuang Chiu, Antonio Prioletti, Jie Li, Jeannette Bohg

16 Jan 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper presents the on-line tracking method, which made the first place in the NuScenes Tracking Challenge, and outperforms the AB3DMOT baseline method by a large margin in the Average Multi-Object Tracking Accuracy (AMOTA) metric.

...read moreread less

Abstract: 3D multi-object tracking is a key module in autonomous driving applications that provides a reliable dynamic representation of the world to the planning module. In this paper, we present our on-line tracking method, which made the first place in the NuScenes Tracking Challenge, held at the AI Driving Olympics Workshop at NeurIPS 2019. Our method estimates the object states by adopting a Kalman Filter. We initialize the state covariance as well as the process and observation noise covariance with statistics from the training set. We also use the stochastic information from the Kalman Filter in the data association step by measuring the Mahalanobis distance between the predicted object states and current object detections. Our experimental results on the NuScenes validation and test set show that our method outperforms the AB3DMOT baseline method by a large margin in the Average Multi-Object Tracking Accuracy (AMOTA) metric.

...read moreread less

Proceedings Article•DOI•

Towards a Point Cloud Structural Similarity Metric

[...]

Evangelos Alexiou¹, Touradj Ebrahimi¹•Institutions (1)

École Polytechnique Fédérale de Lausanne¹

06 Jul 2020

TL;DR: A family of statistical dispersion measurements for the prediction of perceptual degradations is proposed and assessed, and best-performing attributes and features are revealed, under different neighborhood sizes.

...read moreread less

Abstract: Point cloud is a 3D image representation that has recently emerged as a viable approach for advanced content modality in modern communication systems. In view of its wide adoption, quality evaluation metrics are essential. In this paper, we propose and assess a family of statistical dispersion measurements for the prediction of perceptual degradations. The employed features characterize local distributions of point cloud attributes reflecting topology and color. After associating local regions between a reference and a distorted model, the corresponding feature values are compared. The visual quality of a distorted model is then predicted by error pooling across individual quality scores obtained per region. The extracted features aim at capturing local changes, similarly to the well- known Structural Similarity Index. Benchmarking results using available datasets reveal best-performing attributes and features, under different neighborhood sizes. Finally, point cloud voxelization is examined as part of the process, improving the prediction accuracy under certain conditions.

...read moreread less

Proceedings Article•DOI•

PCQM: A Full-Reference Quality Metric for Colored 3D Point Clouds

[...]

Gabriel Meynet¹, Yana Nehme¹, Julie Digne¹, Guillaume Lavoué¹•Institutions (1)

Centre national de la recherche scientifique¹

26 May 2020

TL;DR: This paper introduces PCQM, a full-reference objective metric for visual quality assessment of 3D point clouds, an optimally-weighted linear combination of geometry-based and color-based features that outperforms all previous metrics in terms of correlation with mean opinion scores.

...read moreread less

Abstract: 3D point clouds constitute an emerging multimedia content, now used in a wide range of applications. The main drawback of this representation is the size of the data since typical point clouds may contain millions of points, usually associated with both geometry and color information. Consequently, a significant amount of work has been devoted to the efficient compression of this representation. Lossy compression leads to a degradation of the data and thus impacts the visual quality of the displayed content. In that context, predicting perceived visual quality computationally is essential for the optimization and evaluation of compression algorithms. In this paper, we introduce PCQM, a full-reference objective metric for visual quality assessment of 3D point clouds. The metric is an optimally-weighted linear combination of geometry-based and color-based features. We evaluate its performance on an open subjective dataset of colored point clouds compressed by several algorithms; the proposed quality assessment approach outperforms all previous metrics in terms of correlation with mean opinion scores.

...read moreread less

Collapse