Top 9 papers published by Marco Cuturi from Google in 2021

Posted Content•

JKOnet: Proximal Optimal Transport Modeling of Population Dynamics.

[...]

Charlotte Bunne¹, Laetitia Meng-Papaxanthos, Andreas Krause, Marco Cuturi•Institutions (1)

11 Jun 2021-arXiv: Learning

TL;DR: JKONET, a neural architecture that combines an energy model on measures, with (small) optimal displacements solved with input convex neural networks (ICNN), is proposed and demonstrated the applicability of the model to explain and predict population dynamics.

...read moreread less

Abstract: Consider a heterogeneous population of points evolving with time. While the population evolves, both in size and nature, we can observe it periodically, through snapshots taken at different timestamps. Each of these snapshots is formed by sampling points from the population at that time, and then creating features to recover point clouds. While these snapshots describe the population's evolution on aggregate, they do not provide directly insights on individual trajectories. This scenario is encountered in several applications, notably single-cell genomics experiments, tracking of particles, or when studying crowd motion. In this paper, we propose to model that dynamic as resulting from the celebrated Jordan-Kinderlehrer-Otto (JKO) proximal scheme. The JKO scheme posits that the configuration taken by a population at time $t$ is one that trades off a decrease w.r.t. an energy (the model we seek to learn) penalized by an optimal transport distance w.r.t. the previous configuration. To that end, we propose JKOnet, a neural architecture that combines an energy model on measures, with (small) optimal displacements solved with input convex neural networks (ICNN). We demonstrate the applicability of our model to explain and predict population dynamics.

...read moreread less

14 citations

Journal Article•DOI•

Ground Metric Learning on Graphs

[...]

Matthieu Heitz¹, Nicolas Bonneel¹, David Coeurjolly¹, Marco Cuturi², Gabriel Peyré³ - Show less +1 more•Institutions (3)

University of Lyon¹, Google², Centre national de la recherche scientifique³

01 Jan 2021-Journal of Mathematical Imaging and Vision

TL;DR: In this article, the authors consider the GML problem when the learned metric is constrained to be a geodesic distance on a graph that supports the measures of interest, and they use this setting to tackle an inverse problem stemming from the observation of a density evolving with time; they seek a graph ground metric such that the OT interpolation between the starting and ending densities that result from that ground metric agrees with the observed evolution.

...read moreread less

Abstract: Optimal transport (OT) distances between probability distributions are parameterized by the ground metric they use between observations. Their relevance for real-life applications strongly hinges on whether that ground metric parameter is suitably chosen. The challenge of selecting it adaptively and algorithmically from prior knowledge, the so-called ground metric learning (GML) problem, has therefore appeared in various settings. In this paper, we consider the GML problem when the learned metric is constrained to be a geodesic distance on a graph that supports the measures of interest. This imposes a rich structure for candidate metrics, but also enables far more efficient learning procedures when compared to a direct optimization over the space of all metric matrices. We use this setting to tackle an inverse problem stemming from the observation of a density evolving with time; we seek a graph ground metric such that the OT interpolation between the starting and ending densities that result from that ground metric agrees with the observed evolution. This OT dynamic framework is relevant to model natural phenomena exhibiting displacements of mass, such as the evolution of the color palette induced by the modification of lighting and materials.

...read moreread less

13 citations

Proceedings Article•

On Projection Robust Optimal Transport: Sample Complexity and Model Misspecification

[...]

Tianyi Lin¹, Zeyu Zheng¹, Elynn Y. Chen¹, Marco Cuturi², Michael I. Jordan¹ - Show less +1 more•Institutions (2)

University of California, Berkeley¹, Google²

18 Mar 2021

TL;DR: In this paper, the authors adopt the viewpoint of projection robust (PR) OT, which seeks to maximize the OT cost between two measures by choosing a $k$-dimensional subspace onto which they can be projected.

...read moreread less

Abstract: Optimal transport (OT) distances are increasingly used as loss functions for statistical inference, notably in the learning of generative models or supervised learning. Yet, the behavior of minimum Wasserstein estimators is poorly understood, notably in high-dimensional regimes or under model misspecification. In this work we adopt the viewpoint of projection robust (PR) OT, which seeks to maximize the OT cost between two measures by choosing a $k$-dimensional subspace onto which they can be projected. Our first contribution is to establish several fundamental statistical properties of PR Wasserstein distances, complementing and improving previous literature that has been restricted to one-dimensional and well-specified cases. Next, we propose the integral PR Wasserstein (IPRW) distance as an alternative to the PRW distance, by averaging rather than optimizing on subspaces. Our complexity bounds can help explain why both PRW and IPRW distances outperform Wasserstein distances empirically in high-dimensional inference tasks. Finally, we consider parametric inference using the PRW distance. We provide an asymptotic guarantee of two types of minimum PRW estimators and formulate a central limit theorem for max-sliced Wasserstein estimator under model misspecification. To enable our analysis on PRW with projection dimension larger than one, we devise a novel combination of variational analysis and statistical theory.

...read moreread less

12 citations

Posted Content•

Efficient and Modular Implicit Differentiation.

[...]

Mathieu Blondel, Quentin Berthet, Marco Cuturi, Roy Frostig, Stephan Hoyer, Felipe Llinares-López, Fabian Pedregosa, Jean-Philippe Vert - Show less +4 more

31 May 2021-arXiv: Learning

TL;DR: In this paper, the authors propose a unified, efficient and modular approach for implicit differentiation of optimization problems, where the user defines (in Python in the case of our implementation) a function $F$ capturing the optimality conditions of the problem to be differentiated.

...read moreread less

Abstract: Automatic differentiation (autodiff) has revolutionized machine learning. It allows expressing complex computations by composing elementary ones in creative ways and removes the burden of computing their derivatives by hand. More recently, differentiation of optimization problem solutions has attracted widespread attention with applications such as optimization as a layer, and in bi-level problems such as hyper-parameter optimization and meta-learning. However, the formulas for these derivatives often involve case-by-case tedious mathematical derivations. In this paper, we propose a unified, efficient and modular approach for implicit differentiation of optimization problems. In our approach, the user defines (in Python in the case of our implementation) a function $F$ capturing the optimality conditions of the problem to be differentiated. Once this is done, we leverage autodiff of $F$ and implicit differentiation to automatically differentiate the optimization problem. Our approach thus combines the benefits of implicit differentiation and autodiff. It is efficient as it can be added on top of any state-of-the-art solver and modular as the optimality condition specification is decoupled from the implicit differentiation mechanism. We show that seemingly simple principles allow to recover many recently proposed implicit differentiation methods and create new ones easily. We demonstrate the ease of formulating and solving bi-level optimization problems using our framework. We also showcase an application to the sensitivity analysis of molecular dynamics.

...read moreread less

8 citations

Proceedings Article•

Equitable and Optimal Transport with Multiple Agents

[...]

Meyer Scetbon, Laurent Meunier¹, Jamal Atif, Marco Cuturi•Institutions (1)

Facebook¹

18 Mar 2021

TL;DR: In this article, an extension of the optimal transport problem with multiple costs is introduced, where the goal is to maximize the utility of the least advantaged agent by minimizing the transportation cost of the agent who works the most.

...read moreread less

Abstract: We introduce an extension of the Optimal Transport problem when multiple costs are involved. Considering each cost as an agent, we aim to share equally between agents the work of transporting one distribution to another. To do so, we minimize the transportation cost of the agent who works the most. Another point of view is when the goal is to partition equitably goods between agents according to their heterogeneous preferences. Here we aim to maximize the utility of the least advantaged agent. This is a fair division problem. Like Optimal Transport, the problem can be cast as a linear optimization problem. When there is only one agent, we recover the Optimal Transport problem. When two agents are considered, we are able to recover Integral Probability Metrics defined by $\alpha$-H\"older functions, which include the widely-known Dudley metric. To the best of our knowledge, this is the first time a link is given between the Dudley metric and Optimal Transport. We provide an entropic regularization of that problem which leads to an alternative algorithm faster than the standard linear program.

...read moreread less

5 citations

Proceedings Article•

Low-Rank Sinkhorn Factorization

[...]

Meyer Scetbon¹, Marco Cuturi², Gabriel Peyré³•Institutions (3)

ENSAE ParisTech¹, Google², Centre national de la recherche scientifique³

18 Jul 2021

TL;DR: In this article, a generic approach is proposed to solve the optimal transport problem under low-rank constraints with arbitrary costs, which can be solved through the machinery of regularized 2-Wasserstein barycenters.

...read moreread less

Abstract: Several recent applications of optimal transport (OT) theory to machine learning have relied on regularization, notably entropy and the Sinkhorn algorithm. Because matrix-vector products are pervasive in the Sinkhorn algorithm, several works have proposed to \textit{approximate} kernel matrices appearing in its iterations using low-rank factors. Another route lies instead in imposing low-rank constraints on the feasible set of couplings considered in OT problems, with no approximations on cost nor kernel matrices. This route was first explored by Forrow et al., 2018, who proposed an algorithm tailored for the squared Euclidean ground cost, using a proxy objective that can be solved through the machinery of regularized 2-Wasserstein barycenters. Building on this, we introduce in this work a generic approach that aims at solving, in full generality, the OT problem under low-rank constraints with arbitrary costs. Our algorithm relies on an explicit factorization of low rank couplings as a product of \textit{sub-coupling} factors linked by a common marginal; similar to an NMF approach, we alternatively updates these factors. We prove the non-asymptotic stationary convergence of this algorithm and illustrate its efficiency on benchmark experiments.

...read moreread less

3 citations

Posted Content•

Linear-Time Gromov Wasserstein Distances using Low Rank Couplings and Costs.

[...]

Meyer Scetbon¹, Gabriel Peyré², Marco Cuturi³•Institutions (3)

ENSAE ParisTech¹, École Normale Supérieure², Google³

02 Jun 2021-arXiv: Learning

TL;DR: In this paper, a variant of the sinkhorn algorithm is used to speed up the resolution of the Gromov-Wasserstein (GW) problem, which restricts the set of admissible couplings to those admitting a low rank factorization as the product of two subcouplings.

...read moreread less

Abstract: The ability to compare and align related datasets living in heterogeneous spaces plays an increasingly important role in machine learning. The Gromov-Wasserstein (GW) formalism can help tackle this problem. Its main goal is to seek an assignment (more generally a coupling matrix) that can register points across otherwise incomparable datasets. As a non-convex and quadratic generalization of optimal transport (OT), GW is NP-hard. Yet, heuristics are known to work reasonably well in practice, the state of the art approach being to solve a sequence of nested regularized OT problems. While popular, that heuristic remains too costly to scale, with cubic complexity in the number of samples $n$. We show in this paper how a recent variant of the Sinkhorn algorithm can substantially speed up the resolution of GW. That variant restricts the set of admissible couplings to those admitting a low rank factorization as the product of two sub-couplings. By updating alternatively each sub-coupling, our algorithm computes a stationary point of the problem in quadratic time with respect to the number of samples. When cost matrices have themselves low rank, our algorithm has time complexity $\mathcal{O}(n)$. We demonstrate the efficiency of our method on simulated and real data.

...read moreread less

2 citations

Posted Content•

Low-Rank Sinkhorn Factorization

[...]

Meyer Scetbon¹, Marco Cuturi², Gabriel Peyré³•Institutions (3)

ENSAE ParisTech¹, Google², École Normale Supérieure³

08 Mar 2021-arXiv: Machine Learning

TL;DR: In this article, a generic approach is proposed to solve the optimal transport problem under low-rank constraints with arbitrary costs, which can be solved through the machinery of regularized 2-Wasserstein barycenters.

...read moreread less

Abstract: Several recent applications of optimal transport (OT) theory to machine learning have relied on regularization, notably entropy and the Sinkhorn algorithm. Because matrix-vector products are pervasive in the Sinkhorn algorithm, several works have proposed to \textit{approximate} kernel matrices appearing in its iterations using low-rank factors. Another route lies instead in imposing low-rank constraints on the feasible set of couplings considered in OT problems, with no approximations on cost nor kernel matrices. This route was first explored by Forrow et al., 2018, who proposed an algorithm tailored for the squared Euclidean ground cost, using a proxy objective that can be solved through the machinery of regularized 2-Wasserstein barycenters. Building on this, we introduce in this work a generic approach that aims at solving, in full generality, the OT problem under low-rank constraints with arbitrary costs. Our algorithm relies on an explicit factorization of low rank couplings as a product of \textit{sub-coupling} factors linked by a common marginal; similar to an NMF approach, we alternatively updates these factors. We prove the non-asymptotic stationary convergence of this algorithm and illustrate its efficiency on benchmark experiments.

...read moreread less

Posted Content•

Randomized Stochastic Gradient Descent Ascent.

[...]

Othmane Sebbouh, Marco Cuturi, Gabriel Peyré

25 Nov 2021-arXiv: Learning

TL;DR: Randomized SGDA (RSGDA) as discussed by the authors is a variant of ESGDA with stochastic loop size with a simpler theoretical analysis, which has almost sure convergence rates when used on nonconvex min/strongly-concave max settings.

...read moreread less

Abstract: An increasing number of machine learning problems, such as robust or adversarial variants of existing algorithms, require minimizing a loss function that is itself defined as a maximum. Carrying a loop of stochastic gradient ascent (SGA) steps on the (inner) maximization problem, followed by an SGD step on the (outer) minimization, is known as Epoch Stochastic Gradient \textit{Descent Ascent} (ESGDA). While successful in practice, the theoretical analysis of ESGDA remains challenging, with no clear guidance on choices for the inner loop size nor on the interplay between inner/outer step sizes. We propose RSGDA (Randomized SGDA), a variant of ESGDA with stochastic loop size with a simpler theoretical analysis. RSGDA comes with the first (among SGDA algorithms) almost sure convergence rates when used on nonconvex min/strongly-concave max settings. RSGDA can be parameterized using optimal loop sizes that guarantee the best convergence rates known to hold for SGDA. We test RSGDA on toy and larger scale problems, using distributionally robust optimization and single-cell data matching using optimal transport as a testbed.

...read moreread less

Showing papers by "Marco Cuturi published in 2021"