Sliced Wasserstein Distance for Learning Gaussian Mixture Models

doi:10.1109/CVPR.2018.00361

Open AccessProceedings ArticleDOI

Sliced Wasserstein Distance for Learning Gaussian Mixture Models

Soheil Kolouri, +2 more

- pp 3427-3436

Chats0

TLDR

This work proposes an alternative formulation for estimating the GMM parameters using the sliced Wasserstein distance, which gives rise to a new algorithm that can estimate high-dimensional data distributions more faithfully than the EM algorithm.

Abstract:

Gaussian mixture models (GMM) are powerful parametric tools with many applications in machine learning and computer vision. Expectation maximization (EM) is the most popular algorithm for estimating the GMM parameters. However, EM guarantees only convergence to a stationary point of the log-likelihood function, which could be arbitrarily worse than the optimal solution. Inspired by the relationship between the negative log-likelihood function and the Kullback-Leibler (KL) divergence, we propose an alternative formulation for estimating the GMM parameters using the sliced Wasserstein distance, which gives rise to a new algorithm. Specifically, we propose minimizing the sliced-Wasserstein distance between the mixture model and the data distribution with respect to the GMM parameters. In contrast to the KL-divergence, the energy landscape for the sliced-Wasserstein distance is more well-behaved and therefore more suitable for a stochastic gradient descent scheme to obtain the optimal GMM parameters. We show that our formulation results in parameter estimates that are more robust to random initializations and demonstrate that it can estimate high-dimensional data distributions more faithfully than the EM algorithm.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Sliced Wasserstein Discrepancy for Unsupervised Domain Adaptation

Chen-Yu Lee, +3 more

TL;DR: The proposed sliced Wasserstein discrepancy (SWD) is designed to capture the natural notion of dissimilarity between the outputs of task-specific classifiers and enables efficient distribution alignment in an end-to-end trainable fashion.

...read moreread less

Posted Content

Wasserstein Distributionally Robust Optimization: Theory and Applications in Machine Learning

Daniel Kuhn, +3 more

- 23 Aug 2019 -

arXiv: Machine Learning

TL;DR: This tutorial argues that Wasserstein distributionally robust optimization has interesting ramifications for statistical learning and motivates new approaches for fundamental learning tasks such as classification, regression, maximum likelihood estimation or minimum mean square error estimation, among others.

...read moreread less

Posted Content

Max-Sliced Wasserstein Distance and its use for GANs

Ishan Deshpande, +8 more

- 11 Apr 2019 -

arXiv: Learning

TL;DR: This work demonstrates that the recently proposed sliced Wasserstein distance trains GANs on high-dimensional images up to a resolution of 256x256 easily and develops the max-sliced Wasserenstein distance, which enjoys compelling sample complexity while reducing projection complexity, albeit necessitating a max estimation.

...read moreread less

Proceedings ArticleDOI

Generative Multiplane Images: Making a 2D GAN 3D-Aware

Xiaoming Zhao, +5 more

TL;DR: This work modifies a classical GAN, i.e .

...read moreread less

Posted Content

Sliced Wasserstein Discrepancy for Unsupervised Domain Adaptation

Chen-Yu Lee, +3 more

- 10 Mar 2019 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: In this article, a sliced Wasserstein discrepancy (SWD) is proposed to capture the natural notion of dissimilarity between the outputs of task-specific classifiers, which provides a geometrically meaningful guidance to detect target samples that are far from the support of the source.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

Journal ArticleDOI

Maximum likelihood from incomplete data via the EM algorithm

Arthur P. Dempster, +2 more

- 01 Sep 1977 -

Journal of the royal statistical society...

Journal ArticleDOI

Generative Adversarial Nets

Ian Goodfellow, +7 more

TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.

...read moreread less

Book

Machine Learning : A Probabilistic Perspective

Kevin P. Murphy

TL;DR: This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach, and is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.

...read moreread less

Proceedings ArticleDOI

Deep Learning Face Attributes in the Wild

Ziwei Liu, +3 more

TL;DR: A novel deep learning framework for attribute prediction in the wild that cascades two CNNs, LNet and ANet, which are fine-tuned jointly with attribute tags, but pre-trained differently.

...read moreread less

Collapse

Journal of Mathematical Imaging and Visi...

GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium

Martin Heusel, +4 more

Sliced Wasserstein Distance for Learning Gaussian Mixture Models

Citations

Sliced Wasserstein Discrepancy for Unsupervised Domain Adaptation

Wasserstein Distributionally Robust Optimization: Theory and Applications in Machine Learning

Max-Sliced Wasserstein Distance and its use for GANs

Generative Multiplane Images: Making a 2D GAN 3D-Aware

Sliced Wasserstein Discrepancy for Unsupervised Domain Adaptation

References

Adam: A Method for Stochastic Optimization

Maximum likelihood from incomplete data via the EM algorithm

Generative Adversarial Nets

Machine Learning : A Probabilistic Perspective

Deep Learning Face Attributes in the Wild

Related Papers (5)

Generative Adversarial Nets

Wasserstein Generative Adversarial Networks

Sinkhorn Distances: Lightspeed Computation of Optimal Transport

Sliced and Radon Wasserstein Barycenters of Measures

GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium