Showing papers on "Metric (mathematics) published in 2017"

PDF

Open Access

Posted Content•

Learning to Compare: Relation Network for Few-Shot Learning

[...]

Flood Sung¹, Yongxin Yang, Li Zhang², Tao Xiang¹, Philip H. S. Torr², Timothy M. Hospedales³ - Show less +2 more•Institutions (3)

Queen Mary University of London¹, University of Oxford², University of Edinburgh³

16 Nov 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: Relation Network (RN) as mentioned in this paper learns to learn a deep distance metric to compare a small number of images within episodes, each of which is designed to simulate the few-shot setting.

...read moreread less

Abstract: We present a conceptually simple, flexible, and general framework for few-shot learning, where a classifier must learn to recognise new classes given only few examples from each. Our method, called the Relation Network (RN), is trained end-to-end from scratch. During meta-learning, it learns to learn a deep distance metric to compare a small number of images within episodes, each of which is designed to simulate the few-shot setting. Once trained, a RN is able to classify images of new classes by computing relation scores between query images and the few examples of each new class without further updating the network. Besides providing improved performance on few-shot learning, our framework is easily extended to zero-shot learning. Extensive experiments on five benchmarks demonstrate that our simple approach provides a unified and effective approach for both of these two tasks.

...read moreread less

2,077 citations

Proceedings Article•DOI•

Simple online and realtime tracking with a deep association metric

[...]

Nicolai Wojke¹, Alex Bewley², Dietrich Paulus¹•Institutions (2)

University of Koblenz and Landau¹, Queensland University of Technology²

21 Mar 2017

TL;DR: This paper integrates appearance information to improve the performance of SORT and reduces the number of identity switches, achieving overall competitive performance at high frame rates.

...read moreread less

Abstract: Simple Online and Realtime Tracking (SORT) is a pragmatic approach to multiple object tracking with a focus on simple, effective algorithms. In this paper, we integrate appearance information to improve the performance of SORT. Due to this extension we are able to track objects through longer periods of occlusions, effectively reducing the number of identity switches. In spirit of the original framework we place much of the computational complexity into an offline pre-training stage where we learn a deep association metric on a largescale person re-identification dataset. During online application, we establish measurement-to-track associations using nearest neighbor queries in visual appearance space. Experimental evaluation shows that our extensions reduce the number of identity switches by 45%, achieving overall competitive performance at high frame rates.

...read moreread less

1,808 citations

Posted Content•

Simple Online and Realtime Tracking with a Deep Association Metric

[...]

Nicolai Wojke¹, Alex Bewley², Dietrich Paulus¹•Institutions (2)

University of Koblenz and Landau¹, Queensland University of Technology²

21 Mar 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, the authors integrate appearance information to improve the performance of Simple Online and Real-time Tracking (SORT) by tracking objects through longer periods of occlusions, effectively reducing the number of identity switches.

...read moreread less

Abstract: Simple Online and Realtime Tracking (SORT) is a pragmatic approach to multiple object tracking with a focus on simple, effective algorithms. In this paper, we integrate appearance information to improve the performance of SORT. Due to this extension we are able to track objects through longer periods of occlusions, effectively reducing the number of identity switches. In spirit of the original framework we place much of the computational complexity into an offline pre-training stage where we learn a deep association metric on a large-scale person re-identification dataset. During online application, we establish measurement-to-track associations using nearest neighbor queries in visual appearance space. Experimental evaluation shows that our extensions reduce the number of identity switches by 45%, achieving overall competitive performance at high frame rates.

...read moreread less

987 citations

Proceedings Article•DOI•

End-to-End Neural Ad-hoc Ranking with Kernel Pooling

[...]

Chenyan Xiong¹, Zhuyun Dai¹, Jamie Callan¹, Zhiyuan Liu², Russell Power³ - Show less +1 more•Institutions (3)

Carnegie Mellon University¹, Tsinghua University², Allen Institute for Artificial Intelligence³

07 Aug 2017

TL;DR: K-NRM uses a translation matrix that models word-level similarities via word embeddings, a new kernel-pooling technique that uses kernels to extract multi-level soft match features, and a learning-to-rank layer that combines those features into the final ranking score.

...read moreread less

Abstract: This paper proposes K-NRM, a kernel based neural model for document ranking. Given a query and a set of documents, K-NRM uses a translation matrix that models word-level similarities via word embeddings, a new kernel-pooling technique that uses kernels to extract multi-level soft match features, and a learning-to-rank layer that combines those features into the final ranking score. The whole model is trained end-to-end. The ranking layer learns desired feature patterns from the pairwise ranking loss. The kernels transfer the feature patterns into soft-match targets at each similarity level and enforce them on the translation matrix. The word embeddings are tuned accordingly so that they can produce the desired soft matches. Experiments on a commercial search engine's query log demonstrate the improvements of K-NRM over prior feature-based and neural-based states-of-the-art, and explain the source of K-NRM's advantage: Its kernel-guided embedding encodes a similarity metric tailored for matching query words to document words, and provides effective multi-level soft matches.

...read moreread less

572 citations

Book Chapter•DOI•

Generalised Dice overlap as a deep learning loss function for highly unbalanced segmentations

[...]

Carole H. Sudre¹, Carole H. Sudre², Wenqi Li², Tom Vercauteren², Sebastien Ourselin², Sebastien Ourselin¹, M. Jorge Cardoso¹, M. Jorge Cardoso² - Show less +4 more•Institutions (2)

UCL Institute of Neurology¹, University College London²

11 Jul 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work investigates the behavior of these loss functions and their sensitivity to learning rate tuning in the presence of different rates of label imbalance across 2D and 3D segmentation tasks and proposes to use the class re-balancing properties of the Generalized Dice overlap as a robust and accurate deep-learning loss function for unbalanced tasks.

...read moreread less

Abstract: Deep-learning has proved in recent years to be a powerful tool for image analysis and is now widely used to segment both 2D and 3D medical images. Deep-learning segmentation frameworks rely not only on the choice of network architecture but also on the choice of loss function. When the segmentation process targets rare observations, a severe class imbalance is likely to occur between candidate labels, thus resulting in sub-optimal performance. In order to mitigate this issue, strategies such as the weighted cross-entropy function, the sensitivity function or the Dice loss function, have been proposed. In this work, we investigate the behavior of these loss functions and their sensitivity to learning rate tuning in the presence of different rates of label imbalance across 2D and 3D segmentation tasks. We also propose to use the class re-balancing properties of the Generalized Dice overlap, a known metric for segmentation assessment, as a robust and accurate deep-learning loss function for unbalanced tasks.

...read moreread less

568 citations

Proceedings Article•DOI•

Collaborative Metric Learning

[...]

Cheng-Kang Hsieh¹, Longqi Yang², Yin Cui², Tsung-Yi Lin², Serge Belongie², Deborah Estrin² - Show less +2 more•Institutions (2)

University of California, Los Angeles¹, Cornell University²

03 Apr 2017

TL;DR: The proposed algorithm outperforms state-of-the-art collaborative filtering algorithms on a wide range of recommendation tasks and uncovers the underlying spectrum of users' fine-grained preferences.

...read moreread less

Abstract: Metric learning algorithms produce distance metrics that capture the important relationships among data. In this work, we study the connection between metric learning and collaborative filtering. We propose Collaborative Metric Learning (CML) which learns a joint metric space to encode not only users' preferences but also the user-user and item-item similarity. The proposed algorithm outperforms state-of-the-art collaborative filtering algorithms on a wide range of recommendation tasks and uncovers the underlying spectrum of users' fine-grained preferences. CML also achieves significant speedup for Top-K recommendation tasks using off-the-shelf, approximate nearest-neighbor search, with negligible accuracy reduction.

...read moreread less

465 citations

Posted Content•

Semantic Instance Segmentation with a Discriminative Loss Function

[...]

Bert De Brabandere, Davy Neven, Luc Van Gool

08 Aug 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work proposes an approach of combining an off-the-shelf network with a principled loss function inspired by a metric learning objective that encourages a convolutional network to produce a representation of the image that can easily be clustered into instances with a simple post-processing step.

...read moreread less

Abstract: Semantic instance segmentation remains a challenging task. In this work we propose to tackle the problem with a discriminative loss function, operating at the pixel level, that encourages a convolutional network to produce a representation of the image that can easily be clustered into instances with a simple post-processing step. The loss function encourages the network to map each pixel to a point in feature space so that pixels belonging to the same instance lie close together while different instances are separated by a wide margin. Our approach of combining an off-the-shelf network with a principled loss function inspired by a metric learning objective is conceptually simple and distinct from recent efforts in instance segmentation. In contrast to previous works, our method does not rely on object proposals or recurrent mechanisms. A key contribution of our work is to demonstrate that such a simple setup without bells and whistles is effective and can perform on par with more complex methods. Moreover, we show that it does not suffer from some of the limitations of the popular detect-and-segment approaches. We achieve competitive performance on the Cityscapes and CVPPP leaf segmentation benchmarks.

...read moreread less

449 citations

Proceedings Article•DOI•

Probabilistic data association for semantic SLAM

[...]

Sean L. Bowman¹, Nikolay Atanasov¹, Kostas Daniilidis¹, George J. Pappas¹•Institutions (1)

University of Pennsylvania¹

01 May 2017

TL;DR: This paper forms an optimization problem over sensor states and semantic landmark positions that integrates metric information, semantic information, and data associations, and decomposes it into two interconnected problems: an estimation of discrete data association and landmark class probabilities, and a continuous optimization over the metric states.

...read moreread less

Abstract: Traditional approaches to simultaneous localization and mapping (SLAM) rely on low-level geometric features such as points, lines, and planes. They are unable to assign semantic labels to landmarks observed in the environment. Furthermore, loop closure recognition based on low-level features is often viewpoint-dependent and subject to failure in ambiguous or repetitive environments. On the other hand, object recognition methods can infer landmark classes and scales, resulting in a small set of easily recognizable landmarks, ideal for view-independent unambiguous loop closure. In a map with several objects of the same class, however, a crucial data association problem exists. While data association and recognition are discrete problems usually solved using discrete inference, classical SLAM is a continuous optimization over metric information. In this paper, we formulate an optimization problem over sensor states and semantic landmark positions that integrates metric information, semantic information, and data associations, and decompose it into two interconnected problems: an estimation of discrete data association and landmark class probabilities, and a continuous optimization over the metric states. The estimated landmark and robot poses affect the association and class distributions, which in turn affect the robot-landmark pose optimization. The performance of our algorithm is demonstrated on indoor and outdoor datasets.

...read moreread less

442 citations

Proceedings Article•DOI•

DeepFuse: A Deep Unsupervised Approach for Exposure Fusion with Extreme Exposure Image Pairs

[...]

K. Ram Prabhakar¹, V. Sai Srikar, R. Venkatesh Babu¹•Institutions (1)

Indian Institute of Science¹

01 Oct 2017

TL;DR: In this article, an unsupervised deep learning framework for multi-exposure fusion is proposed, which uses a novel CNN architecture trained to learn the fusion operation without reference ground truth image.

...read moreread less

Abstract: We present a novel deep learning architecture for fusing static multi-exposure images. Current multi-exposure fusion (MEF) approaches use hand-crafted features to fuse input sequence. However, the weak hand-crafted representations are not robust to varying input conditions. Moreover, they perform poorly for extreme exposure image pairs. Thus, it is highly desirable to have a method that is robust to varying input conditions and capable of handling extreme exposure without artifacts. Deep representations have known to be robust to input conditions and have shown phenomenal performance in a supervised setting. However, the stumbling block in using deep learning for MEF was the lack of sufficient training data and an oracle to provide the ground-truth for supervision. To address the above issues, we have gathered a large dataset of multi-exposure image stacks for training and to circumvent the need for ground truth images, we propose an unsupervised deep learning framework for MEF utilizing a no-reference quality metric as loss function. The proposed approach uses a novel CNN architecture trained to learn the fusion operation without reference ground truth image. The model fuses a set of common low level features extracted from each image to generate artifact-free perceptually pleasing results. We perform extensive quantitative and qualitative evaluation and show that the proposed technique outperforms existing state-of-the-art approaches for a variety of natural images.

...read moreread less

433 citations

Proceedings Article•

Generalization and Equilibrium in Generative Adversarial Nets (GANs)

[...]

Sanjeev Arora, Rong Ge¹, Yingyu Liang², Tengyu Ma², Yi Zhang² - Show less +1 more•Institutions (2)

Duke University¹, Princeton University²

02 Mar 2017

TL;DR: The authors showed that GANs may not have good generalization properties; e.g., training may appear successful but the trained distribution may be far from target distribution in standard metrics.

...read moreread less

Abstract: We show that training of generative adversarial network (GAN) may not have good generalization properties; e.g., training may appear successful but the trained distribution may be far from target distribution in standard metrics. However, generalization does occur for a weaker metric called neural net distance. It is also shown that an approximate pure equilibrium exists in the discriminator/generator game for a special class of generators with natural training objectives when generator capacity and training set sizes are moderate. This existence of equilibrium inspires MIX+GAN protocol, which can be combined with any existing GAN training, and empirically shown to improve some of them.

...read moreread less

425 citations

Proceedings Article•DOI•

Why We Need New Evaluation Metrics for NLG

[...]

Jekaterina Novikova, Ondřej Dušek, Amanda Cercas Curry, Verena Rieser

10 Sep 2017

TL;DR: A wide range of metrics are investigated, including state-of-the-art word-based and novel grammar-based ones, and it is demonstrated that they only weakly reflect human judgements of system outputs as generated by data-driven, end-to-end NLG.

...read moreread less

Abstract: The majority of NLG evaluation relies on automatic metrics, such as BLEU . In this paper, we motivate the need for novel, system- and data-independent automatic evaluation methods: We investigate a wide range of metrics, including state-of-the-art word-based and novel grammar-based ones, and demonstrate that they only weakly reflect human judgements of system outputs as generated by data-driven, end-to-end NLG. We also show that metric performance is data- and system-specific. Nevertheless, our results also suggest that automatic metrics perform reliably at system-level and can support system development by finding cases where a system performs poorly.

...read moreread less

Proceedings Article•DOI•

Mind the Class Weight Bias: Weighted Maximum Mean Discrepancy for Unsupervised Domain Adaptation

[...]

Hongliang Yan¹, Yukang Ding¹, Peihua Li², Qilong Wang², Yong Xu¹, Wangmeng Zuo¹ - Show less +2 more•Institutions (2)

Harbin Institute of Technology¹, Dalian University of Technology²

21 Jul 2017

TL;DR: In this article, a weighted maximum mean discrepancy (MMD) model is proposed to exploit the class prior probability on source and target domains, whose challenge lies in the fact that the class label in target domain is unavailable.

...read moreread less

Abstract: In domain adaptation, maximum mean discrepancy (MMD) has been widely adopted as a discrepancy metric between the distributions of source and target domains. However, existing MMD-based domain adaptation methods generally ignore the changes of class prior distributions, i.e., class weight bias across domains. This remains an open problem but ubiquitous for domain adaptation, which can be caused by changes in sample selection criteria and application scenarios. We show that MMD cannot account for class weight bias and results in degraded domain adaptation performance. To address this issue, a weighted MMD model is proposed in this paper. Specifically, we introduce class-specific auxiliary weights into the original MMD for exploiting the class prior probability on source and target domains, whose challenge lies in the fact that the class label in target domain is unavailable. To account for it, our proposed weighted MMD model is defined by introducing an auxiliary weight for each class in the source domain, and a classification EM algorithm is suggested by alternating between assigning the pseudo-labels, estimating auxiliary weights and updating model parameters. Extensive experiments demonstrate the superiority of our weighted MMD over conventional MMD for domain adaptation.

...read moreread less

Posted Content•

No Fuss Distance Metric Learning using Proxies

[...]

Yair Movshovitz-Attias¹, Alexander Toshev¹, Thomas Leung¹, Sergey Ioffe¹, Saurabh Singh¹ - Show less +1 more•Institutions (1)

Google¹

21 Mar 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, the authors proposed to optimize the triplet loss on a different space of triplets, consisting of an anchor data point and similar and dissimilar proxy points which are learned as well.

...read moreread less

Abstract: We address the problem of distance metric learning (DML), defined as learning a distance consistent with a notion of semantic similarity. Traditionally, for this problem supervision is expressed in the form of sets of points that follow an ordinal relationship -- an anchor point $x$ is similar to a set of positive points $Y$, and dissimilar to a set of negative points $Z$, and a loss defined over these distances is minimized. While the specifics of the optimization differ, in this work we collectively call this type of supervision Triplets and all methods that follow this pattern Triplet-Based methods. These methods are challenging to optimize. A main issue is the need for finding informative triplets, which is usually achieved by a variety of tricks such as increasing the batch size, hard or semi-hard triplet mining, etc. Even with these tricks, the convergence rate of such methods is slow. In this paper we propose to optimize the triplet loss on a different space of triplets, consisting of an anchor data point and similar and dissimilar proxy points which are learned as well. These proxies approximate the original data points, so that a triplet loss over the proxies is a tight upper bound of the original loss. This proxy-based loss is empirically better behaved. As a result, the proxy-loss improves on state-of-art results for three standard zero-shot learning datasets, by up to 15% points, while converging three times as fast as other triplet-based losses.

...read moreread less

Journal Article•DOI•

Deep Multimodal Distance Metric Learning Using Click Constraints for Image Ranking

[...]

Jun Yu¹, Xiaokang Yang², Fei Gao¹, Dacheng Tao³•Institutions (3)

Hangzhou Dianzi University¹, Shanghai Jiao Tong University², University of Technology, Sydney³

01 Dec 2017-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: This paper develops a novel deep multimodal distance metric learning (Deep-MDML) method, which adopts a new ranking model to use multi-modal features, including click features and visual features in DML.

...read moreread less

Abstract: How do we retrieve images accurately? Also, how do we rank a group of images precisely and efficiently for specific queries? These problems are critical for researchers and engineers to generate a novel image searching engine. First, it is important to obtain an appropriate description that effectively represent the images. In this paper, multimodal features are considered for describing images. The images unique properties are reflected by visual features, which are correlated to each other. However, semantic gaps always exist between images visual features and semantics. Therefore, we utilize click feature to reduce the semantic gap. The second key issue is learning an appropriate distance metric to combine these multimodal features. This paper develops a novel deep multimodal distance metric learning (Deep-MDML) method. A structured ranking model is adopted to utilize both visual and click features in distance metric learning (DML). Specifically, images and their related ranking results are first collected to form the training set. Multimodal features, including click and visual features, are collected with these images. Next, a group of autoencoders is applied to obtain initially a distance metric in different visual spaces, and an MDML method is used to assign optimal weights for different modalities. Next, we conduct alternating optimization to train the ranking model, which is used for the ranking of new queries with click features. Compared with existing image ranking methods, the proposed method adopts a new ranking model to use multimodal features, including click features and visual features in DML. We operated experiments to analyze the proposed Deep-MDML in two benchmark data sets, and the results validate the effects of the method.

...read moreread less

Proceedings Article•DOI•

See the Forest for the Trees: Joint Spatial and Temporal Recurrent Neural Networks for Video-Based Person Re-identification

[...]

Zhen Zhou¹, Yan Huang¹, Wei Wang¹, Liang Wang¹, Tieniu Tan¹ - Show less +1 more•Institutions (1)

Chinese Academy of Sciences¹

01 Jul 2017

TL;DR: This paper focuses on video-based person re-identification and builds an end-to-end deep neural network architecture to jointly learn features and metrics and integrates the surrounding information at each location by a spatial recurrent model when measuring the similarity with another pedestrian video.

...read moreread less

Abstract: Surveillance cameras have been widely used in different scenes. Accordingly, a demanding need is to recognize a person under different cameras, which is called person re-identification. This topic has gained increasing interests in computer vision recently. However, less attention has been paid to video-based approaches, compared with image-based ones. Two steps are usually involved in previous approaches, namely feature learning and metric learning. But most of the existing approaches only focus on either feature learning or metric learning. Meanwhile, many of them do not take full use of the temporal and spatial information. In this paper, we concentrate on video-based person re-identification and build an end-to-end deep neural network architecture to jointly learn features and metrics. The proposed method can automatically pick out the most discriminative frames in a given video by a temporal attention model. Moreover, it integrates the surrounding information at each location by a spatial recurrent model when measuring the similarity with another pedestrian video. That is, our method handles spatial and temporal information simultaneously in a unified manner. The carefully designed experiments on three public datasets show the effectiveness of each component of the proposed deep network, performing better in comparison with the state-of-the-art methods.

...read moreread less

Journal Article•DOI•

Learning a no-reference quality metric for single-image super-resolution

[...]

Chao Ma¹, Chao Ma², Chih-Yuan Yang³, Xiaokang Yang², Ming-Hsuan Yang³ - Show less +1 more•Institutions (3)

University of Adelaide¹, Shanghai Jiao Tong University², University of California, Merced³

01 May 2017-Computer Vision and Image Understanding

TL;DR: Zhang et al. as discussed by the authors designed three types of low-level statistical features in both spatial and frequency domains to quantify super-resolved artifacts, and learned a two-stage regression model to predict the quality scores of super-resolution images without referring to ground-truth images.

...read moreread less

Proceedings Article•DOI•

Improved Image Captioning via Policy Gradient optimization of SPIDEr

[...]

Siqi Liu¹, Zhenhai Zhu¹, Ning Ye, Sergio Guadarrama², Kevin Murphy³ - Show less +1 more•Institutions (3)

Google¹, University of California, Berkeley², Washington State University³

01 Oct 2017

TL;DR: This paper shows how to use a policy gradient (PG) method to directly optimize a linear combination of SPICE and CIDEr (a combination the authors call SPIDEr), which results in image captions that are strongly preferred by human raters compared to captions generated by the same model but trained to optimize MLE or the COCO metrics.

...read moreread less

Abstract: Current image captioning methods are usually trained via maximum likelihood estimation. However, the log-likelihood score of a caption does not correlate well with human assessments of quality. Standard syntactic evaluation metrics, such as BLEU, METEOR and ROUGE, are also not well correlated. The newer SPICE and CIDEr metrics are better correlated, but have traditionally been hard to optimize for. In this paper, we show how to use a policy gradient (PG) method to directly optimize a linear combination of SPICE and CIDEr (a combination we call SPIDEr): the SPICE score ensures our captions are semantically faithful to the image, while CIDEr score ensures our captions are syntactically fluent. The PG method we propose improves on the prior MIXER approach, by using Monte Carlo rollouts instead of mixing MLE training with PG. We show empirically that our algorithm leads to easier optimization and improved results compared to MIXER. Finally, we show that using our PG method we can optimize any of the metrics, including the proposed SPIDEr metric which results in image captions that are strongly preferred by human raters compared to captions generated by the same model but trained to optimize MLE or the COCO metrics.

...read moreread less

Proceedings Article•DOI•

Scalable Person Re-identification on Supervised Smoothed Manifold

[...]

Song Bai¹, Xiang Bai¹, Qi Tian²•Institutions (2)

Huazhong University of Science and Technology¹, University of Texas at San Antonio²

24 Mar 2017

TL;DR: An unconventional manifold-preserving algorithm is proposed, which can make best use of supervision from training data, whose label information is given as pairwise constraints, scale up to large repositories with low on-line time complexity, and be plunged into most existing algorithms, serving as a generic postprocessing procedure to further boost the identification accuracies.

...read moreread less

Abstract: Most existing person re-identification algorithms either extract robust visual features or learn discriminative metrics for person images. However, the underlying manifold which those images reside on is rarely investigated. That arises a problem that the learned metric is not smooth with respect to the local geometry structure of the data manifold. In this paper, we study person re-identification with manifold-based affinity learning, which did not receive enough attention from this area. An unconventional manifold-preserving algorithm is proposed, which can 1) make best use of supervision from training data, whose label information is given as pairwise constraints, 2) scale up to large repositories with low on-line time complexity, and 3) be plunged into most existing algorithms, serving as a generic postprocessing procedure to further boost the identification accuracies. Extensive experimental results on five popular person re-identification benchmarks consistently demonstrate the effectiveness of our method. Especially, on the largest CUHK03 and Market-1501, our method outperforms the state-of-the-art alternatives by a large margin with high efficiency, which is more appropriate for practical applications.

...read moreread less

Book Chapter•DOI•

Nonrigid image registration using multi-scale 3D convolutional neural networks

[...]

Hessam Sokooti¹, Bob D. de Vos², Floris F. Berendsen¹, Boudewijn P. F. Lelieveldt³, Boudewijn P. F. Lelieveldt¹, Ivana Išgum², Marius Staring¹, Marius Staring³ - Show less +4 more•Institutions (3)

Leiden University Medical Center¹, Utrecht University², Delft University of Technology³

10 Sep 2017

TL;DR: The proposed RegNet is trained using a large set of artificially generated DVFs, does not explicitly define a dissimilarity metric, and integrates image content at multiple scales to equip the network with contextual information, thereby greatly simplifying the training problem.

...read moreread less

Abstract: In this paper we propose a method to solve nonrigid image registration through a learning approach, instead of via iterative optimization of a predefined dissimilarity metric. We design a Convolutional Neural Network (CNN) architecture that, in contrast to all other work, directly estimates the displacement vector field (DVF) from a pair of input images. The proposed RegNet is trained using a large set of artificially generated DVFs, does not explicitly define a dissimilarity metric, and integrates image content at multiple scales to equip the network with contextual information. At testing time nonrigid registration is performed in a single shot, in contrast to current iterative methods. We tested RegNet on 3D chest CT follow-up data. The results show that the accuracy of RegNet is on par with a conventional B-spline registration, for anatomy within the capture range. Training RegNet with artificially generated DVFs is therefore a promising approach for obtaining good results on real clinical data, thereby greatly simplifying the training problem. Deformable image registration can therefore be successfully casted as a learning problem.

...read moreread less

Journal Article•DOI•

OptiClust, an Improved Method for Assigning Amplicon-Based Sequence Data to Operational Taxonomic Units.

[...]

Sarah L. Westcott¹, Patrick D. Schloss¹•Institutions (1)

University of Michigan¹

26 Apr 2017

TL;DR: A new OTU assignment algorithm that iteratively reassigns sequences to new OTUs to optimize the Matthews correlation coefficient (MCC), a measure of the quality of OTU assignments, is developed.

...read moreread less

Abstract: Assignment of 16S rRNA gene sequences to operational taxonomic units (OTUs) is a computational bottleneck in the process of analyzing microbial communities. Although this has been an active area of research, it has been difficult to overcome the time and memory demands while improving the quality of the OTU assignments. Here, we developed a new OTU assignment algorithm that iteratively reassigns sequences to new OTUs to optimize the Matthews correlation coefficient (MCC), a measure of the quality of OTU assignments. To assess the new algorithm, OptiClust, we compared it to 10 other algorithms using 16S rRNA gene sequences from two simulated and four natural communities. Using the OptiClust algorithm, the MCC values averaged 15.2 and 16.5% higher than the OTUs generated when we used the average neighbor and distance-based greedy clustering with VSEARCH, respectively. Furthermore, on average, OptiClust was 94.6 times faster than the average neighbor algorithm and just as fast as distance-based greedy clustering with VSEARCH. An empirical analysis of the efficiency of the algorithms showed that the time and memory required to perform the algorithm scaled quadratically with the number of unique sequences in the data set. The significant improvement in the quality of the OTU assignments over previously existing methods will significantly enhance downstream analysis by limiting the splitting of similar sequences into separate OTUs and merging of dissimilar sequences into the same OTU. The development of the OptiClust algorithm represents a significant advance that is likely to have numerous other applications. IMPORTANCE The analysis of microbial communities from diverse environments using 16S rRNA gene sequencing has expanded our knowledge of the biogeography of microorganisms. An important step in this analysis is the assignment of sequences into taxonomic groups based on their similarity to sequences in a database or based on their similarity to each other, irrespective of a database. In this study, we present a new algorithm for the latter approach. The algorithm, OptiClust, seeks to optimize a metric of assignment quality by shuffling sequences between taxonomic groups. We found that OptiClust produces more robust assignments and does so in a rapid and memory-efficient manner. This advance will allow for a more robust analysis of microbial communities and the factors that shape them. Podcast : A podcast concerning this article is available.

...read moreread less

Journal Article•DOI•

Zipf’s Law in Passwords

[...]

Ding Wang¹, Haibo Cheng¹, Ping Wang¹, Xinyi Huang², Gaopeng Jian¹ - Show less +1 more•Institutions (2)

Peking University¹, Fujian Normal University²

28 Jun 2017-IEEE Transactions on Information Forensics and Security

TL;DR: Li et al. as discussed by the authors proposed two Zipf-like models (i.e., PDF-Zipf and CDF-ZipF) to characterize the distribution of passwords and proposed a new metric for measuring the strength of password data sets.

...read moreread less

Abstract: Despite three decades of intensive research efforts, it remains an open question as to what is the underlying distribution of user-generated passwords. In this paper, we make a substantial step forward toward understanding this foundational question. By introducing a number of computational statistical techniques and based on 14 large-scale data sets, which consist of 113.3 million real-world passwords, we, for the first time, propose two Zipf-like models (i.e., PDF-Zipf and CDF-Zipf) to characterize the distribution of passwords. More specifically, our PDF-Zipf model can well fit the popular passwords and obtain a coefficient of determination larger than 0.97; our CDF-Zipf model can well fit the entire password data set, with the maximum cumulative distribution function (CDF) deviation between the empirical distribution and the fitted theoretical model being 0.49%~4.59% (on an average 1.85%). With the concrete knowledge of password distributions, we suggest a new metric for measuring the strength of password data sets. Extensive experimental results show the effectiveness and general applicability of the proposed Zipf-like models and security metric.

...read moreread less

Posted Content•

The Cramer Distance as a Solution to Biased Wasserstein Gradients

[...]

Marc G. Bellemare, Ivo Danihelka, Will Dabney, Shakir Mohamed, Balaji Lakshminarayanan, Stephan Hoyer, Rémi Munos - Show less +3 more

30 May 2017-arXiv: Learning

TL;DR: This paper describes three natural properties of probability divergences that it believes reflect requirements from machine learning: sum invariance, scale sensitivity, and unbiased sample gradients and proposes an alternative to the Wasserstein metric, the Cramer distance, which possesses all three desired properties.

...read moreread less

Abstract: The Wasserstein probability metric has received much attention from the machine learning community Unlike the Kullback-Leibler divergence, which strictly measures change in probability, the Wasserstein metric reflects the underlying geometry between outcomes The value of being sensitive to this geometry has been demonstrated, among others, in ordinal regression and generative modelling In this paper we describe three natural properties of probability divergences that reflect requirements from machine learning: sum invariance, scale sensitivity, and unbiased sample gradients The Wasserstein metric possesses the first two properties but, unlike the Kullback-Leibler divergence, does not possess the third We provide empirical evidence suggesting that this is a serious issue in practice Leveraging insights from probabilistic forecasting we propose an alternative to the Wasserstein metric, the Cramer distance We show that the Cramer distance possesses all three desired properties, combining the best of the Wasserstein and Kullback-Leibler divergences To illustrate the relevance of the Cramer distance in practice we design a new algorithm, the Cramer Generative Adversarial Network (GAN), and show that it performs significantly better than the related Wasserstein GAN

...read moreread less

Proceedings Article•DOI•

Deep Metric Learning via Facility Location

[...]

Hyun Oh Song¹, Stefanie Jegelka², Vivek Rathod¹, Kevin Murphy¹•Institutions (2)

Google¹, Massachusetts Institute of Technology²

01 Jul 2017

TL;DR: This paper proposes a new metric learning scheme, based on structured prediction, that is aware of the global structure of the embedding space, and which is designed to optimize a clustering quality metric (NMI).

...read moreread less

Abstract: Learning image similarity metrics in an end-to-end fashion with deep networks has demonstrated excellent results on tasks such as clustering and retrieval. However, current methods, all focus on a very local view of the data. In this paper, we propose a new metric learning scheme, based on structured prediction, that is aware of the global structure of the embedding space, and which is designed to optimize a clustering quality metric (NMI). We show state of the art performance on standard datasets, such as CUB200-2011 [37], Cars196 [18], and Stanford online products [30] on NMI and R@K evaluation metrics.

...read moreread less

Journal Article•DOI•

A Metric for Quantifying Product‐Level Circularity

[...]

Marcus Linder¹, Steven Sarasini¹, Patricia van Loon¹•Institutions (1)

Viktoria Institute¹

01 Jun 2017-Journal of Industrial Ecology

TL;DR: This work argues that the economic value of product parts may constitute a useful basis for such aggregation, and describes a set of principles for using economic value as a basis for measuring product circularity, and outlines a metric that utilizes this approach.

...read moreread less

Abstract: Summary Circularity metrics are useful for empirically assessing the effects of a circular economy in terms of profitability, job creation, and environmental impacts. At present, however, there is no standardized method for measuring the circularity of products. We start by reviewing existing product-level metrics in terms of validity and reliability, taking note of theoretically justified principles for aggregating different types of material flows and cycles into a single value. We then argue that the economic value of product parts may constitute a useful basis for such aggregation; describe a set of principles for using economic value as a basis for measuring product circularity; and outline a metric that utilizes this approach. Our recommendation is to use the ratio of recirculated economic value to total product value as a circularity metric, using value chain costs as an estimator. In order to protect value chain actors’ sensitive financial data and facilitate neutrality regarding outsourcing or insourcing, we suggest a means to calculate product-level circularity based on sequential approximations of adding one product part and activity at a time. We conclude by suggesting potential avenues for further research, including ways in which the proposed metric can be used in wider assessments of the circular economy, and ways in which it may be further refined.

...read moreread less

Journal Article•DOI•

Discriminative Deep Metric Learning for Face and Kinship Verification

[...]

Jiwen Lu¹, Junlin Hu², Yap-Peng Tan²•Institutions (2)

Tsinghua University¹, Nanyang Technological University²

20 Jun 2017-IEEE Transactions on Image Processing

TL;DR: A discriminative deep multi-metric learning method to jointly learn multiple neural networks, under which the correlation of different features of each sample is maximized, and the distance of each positive pair is reduced and that of each negative pair is enlarged.

...read moreread less

Abstract: This paper presents a new discriminative deep metric learning (DDML) method for face and kinship verification in wild conditions. While metric learning has achieved reasonably good performance in face and kinship verification, most existing metric learning methods aim to learn a single Mahalanobis distance metric to maximize the inter-class variations and minimize the intra-class variations, which cannot capture the nonlinear manifold where face images usually lie on. To address this, we propose a DDML method to train a deep neural network to learn a set of hierarchical nonlinear transformations to project face pairs into the same latent feature space, under which the distance of each positive pair is reduced and that of each negative pair is enlarged. To better use the commonality of multiple feature descriptors to make all the features more robust for face and kinship verification, we develop a discriminative deep multi-metric learning method to jointly learn multiple neural networks, under which the correlation of different features of each sample is maximized, and the distance of each positive pair is reduced and that of each negative pair is enlarged. Extensive experimental results show that our proposed methods achieve the acceptable results in both face and kinship verification.

...read moreread less

Proceedings Article•DOI•

Is Second-Order Information Helpful for Large-Scale Visual Recognition?

[...]

Peihua Li¹, Jiangtao Xie¹, Qilong Wang¹, Wangmeng Zuo²•Institutions (2)

Dalian University of Technology¹, Harbin Institute of Technology²

01 Oct 2017

TL;DR: A Matrix Power Normalized Covariance (MPNCOV) method that develops forward and backward propagation formulas regarding the nonlinear matrix functions such that MPN-COV can be trained end-to-end and analyzes both qualitatively and quantitatively its advantage over the well-known Log-Euclidean metric.

...read moreread less

Abstract: By stacking layers of convolution and nonlinearity, convolutional networks (ConvNets) effectively learn from lowlevel to high-level features and discriminative representations. Since the end goal of large-scale recognition is to delineate complex boundaries of thousands of classes, adequate exploration of feature distributions is important for realizing full potentials of ConvNets. However, state-of-theart works concentrate only on deeper or wider architecture design, while rarely exploring feature statistics higher than first-order. We take a step towards addressing this problem. Our method consists in covariance pooling, instead of the most commonly used first-order pooling, of highlevel convolutional features. The main challenges involved are robust covariance estimation given a small sample of large-dimensional features and usage of the manifold structure of covariance matrices. To address these challenges, we present a Matrix Power Normalized Covariance (MPNCOV) method. We develop forward and backward propagation formulas regarding the nonlinear matrix functions such that MPN-COV can be trained end-to-end. In addition, we analyze both qualitatively and quantitatively its advantage over the well-known Log-Euclidean metric. On the ImageNet 2012 validation set, by combining MPN-COV we achieve over 4%, 3% and 2.5% gains for AlexNet, VGG-M and VGG-16, respectively; integration of MPN-COV into 50-layer ResNet outperforms ResNet-101 and is comparable to ResNet-152. The source code will be available on the project page: http://www.peihuali.org/MPN-COV.

...read moreread less

Posted Content•

Quality Aware Network for Set to Set Recognition

[...]

Yu Liu¹, Junjie Yan¹, Wanli Ouyang²•Institutions (2)

SenseTime¹, University of Sydney²

11 Apr 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: Analysis on gradient spread of this mechanism indicates that the quality learned by the network is beneficial to set-to-set recognition and simplifies the distribution that the network needs to fit.

...read moreread less

Abstract: This paper targets on the problem of set to set recognition, which learns the metric between two image sets. Images in each set belong to the same identity. Since images in a set can be complementary, they hopefully lead to higher accuracy in practical applications. However, the quality of each sample cannot be guaranteed, and samples with poor quality will hurt the metric. In this paper, the quality aware network (QAN) is proposed to confront this problem, where the quality of each sample can be automatically learned although such information is not explicitly provided in the training stage. The network has two branches, where the first branch extracts appearance feature embedding for each sample and the other branch predicts quality score for each sample. Features and quality scores of all samples in a set are then aggregated to generate the final feature embedding. We show that the two branches can be trained in an end-to-end manner given only the set-level identity annotation. Analysis on gradient spread of this mechanism indicates that the quality learned by the network is beneficial to set-to-set recognition and simplifies the distribution that the network needs to fit. Experiments on both face verification and person re-identification show advantages of the proposed QAN. The source code and network structure can be downloaded at this https URL.

...read moreread less

Posted Content•

Cross-view Asymmetric Metric Learning for Unsupervised Person Re-identification

[...]

Hong-Xing Yu¹, Ancong Wu¹, Wei-Shi Zheng¹•Institutions (1)

Sun Yat-sen University¹

27 Aug 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: The model aims to learn an asymmetric metric, i.e., specific projection for each view, based on asymmetric clustering on cross-view person images, and finds a shared space where view-specific bias is alleviated and thus better matching performance can be achieved.

...read moreread less

Abstract: While metric learning is important for Person re-identification (RE-ID), a significant problem in visual surveillance for cross-view pedestrian matching, existing metric models for RE-ID are mostly based on supervised learning that requires quantities of labeled samples in all pairs of camera views for training. However, this limits their scalabilities to realistic applications, in which a large amount of data over multiple disjoint camera views is available but not labelled. To overcome the problem, we propose unsupervised asymmetric metric learning for unsupervised RE-ID. Our model aims to learn an asymmetric metric, i.e., specific projection for each view, based on asymmetric clustering on cross-view person images. Our model finds a shared space where view-specific bias is alleviated and thus better matching performance can be achieved. Extensive experiments have been conducted on a baseline and five large-scale RE-ID datasets to demonstrate the effectiveness of the proposed model. Through the comparison, we show that our model works much more suitable for unsupervised RE-ID compared to classical unsupervised metric learning models. We also compare with existing unsupervised RE-ID methods, and our model outperforms them with notable margins. Specifically, we report the results on large-scale unlabelled RE-ID dataset, which is important but unfortunately less concerned in literatures.

...read moreread less

Proceedings Article•DOI•

Quality Aware Network for Set to Set Recognition

[...]

Yu Liu¹, Junjie Yan¹, Wanli Ouyang²•Institutions (2)

SenseTime¹, University of Sydney²

01 Jul 2017

TL;DR: In this article, the quality of each sample can be automatically learned in the training stage, although such information is not explicitly provided during the training process, and the network has two branches, where the first branch extracts appearance feature embedding and the other branch predicts quality score for each sample.

...read moreread less

Proceedings Article•DOI•

Cross-View Asymmetric Metric Learning for Unsupervised Person Re-Identification

[...]

Hong-Xing Yu¹, Ancong Wu¹, Wei-Shi Zheng¹•Institutions (1)

Sun Yat-sen University¹

01 Oct 2017

TL;DR: Zhang et al. as discussed by the authors proposed an unsupervised asymmetric metric learning model for cross-view pedestrian matching, which aims to learn a specific projection for each view, based on asymmetric clustering on person images, and find a shared space where view-specific bias is alleviated.

...read moreread less

Abstract: While metric learning is important for Person reidentification (RE-ID), a significant problem in visual surveillance for cross-view pedestrian matching, existing metric models for RE-ID are mostly based on supervised learning that requires quantities of labeled samples in all pairs of camera views for training. However, this limits their scalabilities to realistic applications, in which a large amount of data over multiple disjoint camera views is available but not labelled. To overcome the problem, we propose unsupervised asymmetric metric learning for unsupervised RE-ID. Our model aims to learn an asymmetric metric, i.e., specific projection for each view, based on asymmetric clustering on cross-view person images. Our model finds a shared space where view-specific bias is alleviated and thus better matching performance can be achieved. Extensive experiments have been conducted on a baseline and five large-scale RE-ID datasets to demonstrate the effectiveness of the proposed model. Through the comparison, we show that our model works much more suitable for unsupervised RE-ID compared to classical unsupervised metric learning models. We also compare with existing unsupervised REID methods, and our model outperforms them with notable margins. Specifically, we report the results on large-scale unlabelled RE-ID dataset, which is important but unfortunately less concerned in literatures.

...read moreread less

Collapse