scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Ground Metric Learning on Graphs

TL;DR: This paper considers the GML problem when the learned metric is constrained to be a geodesic distance on a graph that supports the measures of interest, and seeks a graph ground metric such that the OT interpolation between the starting and ending densities that result from that ground metric agrees with the observed evolution.
Abstract: Optimal transport (OT) distances between probability distributions are parameterized by the ground metric they use between observations. Their relevance for real-life applications strongly hinges on whether that ground metric parameter is suitably chosen. Selecting it adaptively and algorithmically from prior knowledge, the so-called ground metric learning GML) problem, has therefore appeared in various settings. We consider it in this paper when the learned metric is constrained to be a geodesic distance on a graph that supports the measures of interest. This imposes a rich structure for candidate metrics, but also enables far more efficient learning procedures when compared to a direct optimization over the space of all metric matrices. We use this setting to tackle an inverse problem stemming from the observation of a density evolving with time: we seek a graph ground metric such that the OT interpolation between the starting and ending densities that result from that ground metric agrees with the observed evolution. This OT dynamic framework is relevant to model natural phenomena exhibiting displacements of mass, such as for instance the evolution of the color palette induced by the modification of lighting and materials.
Citations
More filters
Journal ArticleDOI
TL;DR: In this article , the authors focus on the k-top recommendation problem, where a solution is encoded as a matrix whose rows correspond to customers and columns to items, and the value of accuracy, novelty, and coverage for each candidate list, is evaluated as a sample and can be represented as a 3-d histogram which encodes the knowledge obtained from function evaluations.
Abstract: Abstract Metrics such as diversity and novelty have become important, beside accuracy, in the design of Recommender Systems (RSs), in response the increasing users' heterogeneity. Therefore, the design of RSs is now increasingly modelled as a multi-objective optimization problem (MOP) for whose solution Multi-objective evolutionary algorithms (MOEAs) have been increasingly considered. In this paper we focus on the k-top recommendation problem in which a solution is encoded as a matrix whose rows correspond to customers and column to items. The value of accuracy, novelty, and coverage for each candidate list, is evaluated as a sample and can be represented as a 3-d histogram which encodes the knowledge obtained from function evaluations. This enables to map the solution space into a space, whose elements are histograms, structured by the Wasserstein (WST) distance between histograms. The similarity between 2 users in this probabilistic space is given by the Wasserstein distance between their histograms. This enables the construction of the WST graph whose nodes are the users and the weights of the edges are the WST distance between users. The clustering of users takes then place in the WST-graph. In the optimization phase the difference between two top-k lists can be encoded as the WST distance between their 3-dimensional histograms. This enables to derive new selection operators which provide a better diversification (exploration). The new algorithm Multi-objective evolutionary optimization/Wasserstein (MOEA/WST), compared with the benchmark NSGA-II, yields better hypervolume and coverage, in particular at low generation counts.

2 citations

Journal ArticleDOI
TL;DR: In this article , a deep convolutional neural network is trained on Wasserstein barycenters of pairs of measures, which generalizes well to the problem of finding more than two measures.
Abstract: Optimal transport is a notoriously difficult problem to solve numerically, with current approaches often remaining intractable for very large-scale applications such as those encountered in machine learning. Wasserstein barycenters—the problem of finding measures in-between given input measures in the optimal transport sense—are even more computationally demanding as it requires to solve an optimization problem involving optimal transport distances. By training a deep convolutional neural network, we improve by a factor of 80 the computational speed of Wasserstein barycenters over the fastest state-of-the-art approach on the GPU, resulting in milliseconds computational times on $$512\times 512$$ regular grids. We show that our network, trained on Wasserstein barycenters of pairs of measures, generalizes well to the problem of finding Wasserstein barycenters of more than two measures. We demonstrate the efficiency of our approach for computing barycenters of sketches and transferring colors between multiple images.

2 citations

Journal ArticleDOI
TL;DR: In this article , the authors devise and study two column generation strategies: a natural one based on a simplified computation of reduced costs, and one through a Dantzig-Wolfe decomposition.

1 citations

Proceedings Article
01 Jul 2022
TL;DR: In this paper , the ground cost is computed as a positive eigenvector of the function mapping a cost to the pairwise Wasserstein distances between the inputs, and a scalable computational method using entropic regularization is introduced.
Abstract: Optimal Transport (OT) defines geometrically meaningful Wasserstein distances, used in machine learning applications to compare probability distributions. However, a key bottleneck is the design of a cost which should be adapted to the task under study. In most cases, supervised metric learning is not accessible, and one usually resorts to some ad-hoc approach. Unsupervised metric learning is thus a fundamental problem to enable data-driven applications of Optimal Transport. In this paper, we propose for the first time a canonical answer by computing the ground cost as a positive eigenvector of the function mapping a cost to the pairwise OT distances between the inputs. This map is homogeneous and monotone, thus framing unsupervised metric learning as a non-linear Perron-Frobenius problem. We provide criteria to ensure the existence and uniqueness of this eigenvector. In addition, we introduce a scalable computational method using entropic regularization, which - in the large regularization limit - operates a principal component analysis dimensionality reduction. We showcase this method on synthetic examples and datasets. Finally, we apply it in the context of biology to the analysis of a high-throughput single-cell RNA sequencing (scRNAseq) dataset, to improve cell clustering and infer the relationships between genes in an unsupervised way.
Book ChapterDOI
28 Jun 2023
References
More filters
28 Oct 2017
TL;DR: An automatic differentiation module of PyTorch is described — a library designed to enable rapid research on machine learning models that focuses on differentiation of purely imperative programs, with a focus on extensibility and low overhead.
Abstract: In this article, we describe an automatic differentiation module of PyTorch — a library designed to enable rapid research on machine learning models. It builds upon a few projects, most notably Lua Torch, Chainer, and HIPS Autograd [4], and provides a high performance environment with easy access to automatic differentiation of models executed on different devices (CPU and GPU). To make prototyping easier, PyTorch does not follow the symbolic approach used in many other deep learning frameworks, but focuses on differentiation of purely imperative programs, with a focus on extensibility and low overhead. Note that this preprint is a draft of certain sections from an upcoming paper covering all PyTorch features.

13,268 citations

Journal ArticleDOI
TL;DR: This paper investigates the properties of a metric between two distributions, the Earth Mover's Distance (EMD), for content-based image retrieval, and compares the retrieval performance of the EMD with that of other distances.
Abstract: We investigate the properties of a metric between two distributions, the Earth Mover's Distance (EMD), for content-based image retrieval. The EMD is based on the minimal cost that must be paid to transform one distribution into the other, in a precise sense, and was first proposed for certain vision problems by Peleg, Werman, and Rom. For image retrieval, we combine this idea with a representation scheme for distributions that is based on vector quantization. This combination leads to an image comparison framework that often accounts for perceptual similarity better than other previously proposed methods. The EMD is based on a solution to the transportation problem from linear optimization, for which efficient algorithms are available, and also allows naturally for partial matching. It is more robust than histogram matching techniques, in that it can operate on variable-length representations of the distributions that avoid quantization and other binning problems typical of histograms. When used to compare distributions with the same overall mass, the EMD is a true metric. In this paper we focus on applications to color and texture, and we compare the retrieval performance of the EMD with that of other distances.

4,593 citations

Proceedings Article
05 Dec 2005
TL;DR: In this article, a Mahanalobis distance metric for k-NN classification is trained with the goal that the k-nearest neighbors always belong to the same class while examples from different classes are separated by a large margin.
Abstract: We show how to learn a Mahanalobis distance metric for k-nearest neighbor (kNN) classification by semidefinite programming. The metric is trained with the goal that the k-nearest neighbors always belong to the same class while examples from different classes are separated by a large margin. On seven data sets of varying size and difficulty, we find that metrics trained in this way lead to significant improvements in kNN classification—for example, achieving a test error rate of 1.3% on the MNIST handwritten digits. As in support vector machines (SVMs), the learning problem reduces to a convex optimization based on the hinge loss. Unlike learning in SVMs, however, our framework requires no modification or extension for problems in multiway (as opposed to binary) classification.

4,433 citations

Proceedings ArticleDOI
20 Jun 2005
TL;DR: The idea is to learn a function that maps input patterns into a target space such that the L/sub 1/ norm in the target space approximates the "semantic" distance in the input space.
Abstract: We present a method for training a similarity metric from data. The method can be used for recognition or verification applications where the number of categories is very large and not known during training, and where the number of training samples for a single category is very small. The idea is to learn a function that maps input patterns into a target space such that the L/sub 1/ norm in the target space approximates the "semantic" distance in the input space. The method is applied to a face verification task. The learning process minimizes a discriminative loss function that drives the similarity metric to be small for pairs of faces from the same person, and large for pairs from different persons. The mapping from raw to the target space is a convolutional network whose architecture is designed for robustness to geometric distortions. The system is tested on the Purdue/AR face database which has a very high degree of variability in the pose, lighting, expression, position, and artificial occlusions such as dark glasses and obscuring scarves.

3,870 citations


"Ground Metric Learning on Graphs" refers methods in this paper

  • ...Non-linear methods include the prior embedding of the data (kernel trick) before performing a linear method [47, 50], or other non-linear metric functions [13, 28]....

    [...]

Proceedings Article
01 Jan 2002
TL;DR: This paper presents an algorithm that, given examples of similar (and, if desired, dissimilar) pairs of points in �”n, learns a distance metric over ℝn that respects these relationships.
Abstract: Many algorithms rely critically on being given a good metric over their inputs. For instance, data can often be clustered in many "plausible" ways, and if a clustering algorithm such as K-means initially fails to find one that is meaningful to a user, the only recourse may be for the user to manually tweak the metric until sufficiently good clusters are found. For these and other applications requiring good metrics, it is desirable that we provide a more systematic way for users to indicate what they consider "similar." For instance, we may ask them to provide examples. In this paper, we present an algorithm that, given examples of similar (and, if desired, dissimilar) pairs of points in ℝn, learns a distance metric over ℝn that respects these relationships. Our method is based on posing metric learning as a convex optimization problem, which allows us to give efficient, local-optima-free algorithms. We also demonstrate empirically that the learned metrics can be used to significantly improve clustering performance.

3,176 citations


"Ground Metric Learning on Graphs" refers background in this paper

  • ...For instance, for classification purposes, the learned metric brings closer samples of the same class and drives away samples of different classes [53]....

    [...]