scispace - formally typeset
Open AccessPosted Content

Stochastic Optimization for Large-scale Optimal Transport

Reads0
Chats0
TLDR
A new class of stochastic optimization algorithms to cope with large-scale problems routinely encountered in machine learning applications, based on entropic regularization of the primal OT problem, which results in a smooth dual optimization optimization which can be addressed with algorithms that have a provably faster convergence.
Abstract
Optimal transport (OT) defines a powerful framework to compare probability distributions in a geometrically faithful way. However, the practical impact of OT is still limited because of its computational burden. We propose a new class of stochastic optimization algorithms to cope with large-scale problems routinely encountered in machine learning applications. These methods are able to manipulate arbitrary distributions (either discrete or continuous) by simply requiring to be able to draw samples from them, which is the typical setup in high-dimensional learning problems. This alleviates the need to discretize these densities, while giving access to provably convergent methods that output the correct distance without discretization error. These algorithms rely on two main ideas: (a) the dual OT problem can be re-cast as the maximization of an expectation ; (b) entropic regularization of the primal OT problem results in a smooth dual optimization optimization which can be addressed with algorithms that have a provably faster convergence. We instantiate these ideas in three different setups: (i) when comparing a discrete distribution to another, we show that incremental stochastic optimization schemes can beat Sinkhorn's algorithm, the current state-of-the-art finite dimensional OT solver; (ii) when comparing a discrete distribution to a continuous density, a semi-discrete reformulation of the dual program is amenable to averaged stochastic gradient descent, leading to better performance than approximately solving the problem by discretization ; (iii) when dealing with two continuous densities, we propose a stochastic gradient descent over a reproducing kernel Hilbert space (RKHS). This is currently the only known method to solve this problem, apart from computing OT on finite samples. We backup these claims on a set of discrete, semi-discrete and continuous benchmark problems.

read more

Citations
More filters
Posted Content

Estimating individual treatment effect: generalization bounds and algorithms

TL;DR: A novel, simple and intuitive generalization-error bound is given showing that the expected ITE estimation error of a representation is bounded by a sum of the standard generalized-error of that representation and the distance between the treated and control distributions induced by the representation.
Journal ArticleDOI

A geometric view of optimal transportation and generative model

TL;DR: This work shows the intrinsic relations between optimal transportation and convex geometry, especially the variational approach to solve Alexandrov problem: constructing a convex polytope with prescribed face normals and volumes, and leads to a geometric interpretation to generative models, and to a novel framework forGenerative models.
Posted Content

Large-Scale Optimal Transport and Mapping Estimation

TL;DR: This paper proposes a stochastic dual approach of regularized OT, and shows empirically that it scales better than a recent related approach when the amount of samples is very large, and estimates a Monge map as a deep neural network learned by approximating the barycentric projection of the previously-obtained OT plan.
Posted Content

Towards Optimal Running Times for Optimal Transport

TL;DR: This work provides faster algorithms for approximating the optimal transport distance between two discrete probability distributions, e.g. earth mover's distance, and provides reductions from optimal transport to canonical optimization problems for which recent algorithmic efforts have provided nearly-linear time algorithms.
Posted Content

GAN and VAE from an Optimal Transport Point of View

TL;DR: This short article revisits some of the ideas introduced in arXIV:1701.07875 and arXiv:1705.07642 in a simple setup and sheds some lights on the connexions between Variational Autoencoders, Generative Adversarial Networks and Minimum Kantorovitch Estimators.
References
More filters
Proceedings Article

Sinkhorn Distances: Lightspeed Computation of Optimal Transport

TL;DR: This work smooths the classic optimal transport problem with an entropic regularization term, and shows that the resulting optimum is also a distance which can be computed through Sinkhorn's matrix scaling algorithm at a speed that is several orders of magnitude faster than that of transport solvers.
Journal ArticleDOI

A Multiscale Approach to Optimal Transport

TL;DR: An improvement of an algorithm of Aurenhammer, Hoffmann and Aronov to find a least square matching between a probability density and finite set of sites with mass constraints, in the Euclidean plane is proposed.