Convergence Rates for Gaussian Mixtures of Experts

Open AccessPosted Content

Convergence Rates for Gaussian Mixtures of Experts

- 09 Jul 2019 -

TLDR

Drawing on optimal transport theory, a connection is established between the algebraic independence of the expert functions and a certain class of partial differential equations (PDEs) to derive convergence rates and minimax lower bounds for parameter estimation.

Abstract:

We provide a theoretical treatment of over-specified Gaussian mixtures of experts with covariate-free gating networks We establish the convergence rates of the maximum likelihood estimation (MLE) for these models Our proof technique is based on a novel notion of \emph{algebraic independence} of the expert functions Drawing on optimal transport theory, we establish a connection between the algebraic independence and a certain class of partial differential equations (PDEs) Exploiting this connection allows us to derive convergence rates and minimax lower bounds for parameter estimation

Citations

PDF

Open Access

More filters

Posted Content

Minimax Confidence Intervals for the Sliced Wasserstein Distance

Tudor Manole, +2 more

- 17 Sep 2019 -

arXiv: Statistics Theory

TL;DR: Confidence intervals for the Sliced Wasserstein distance are constructed which have finite-sample validity under no assumptions or under mild moment assumptions and are adaptive in length to the regularity of the underlying distributions.

...read moreread less

Posted Content

On the Minimax Optimality of the EM Algorithm for Learning Two-Component Mixed Linear Regression.

Jeongyeol Kwon, +2 more

- 04 Jun 2020 -

arXiv: Machine Learning

TL;DR: By providing tight convergence guarantees of the EM algorithm in middle-to-low SNR regimes, this work fills the remaining gap in the literature, and significantly reveals that in low SNR, EM changes rate, matching the $n^{-1/4}$ rate of the MLE, a behavior that previous work had been unable to show.

...read moreread less

Posted Content

Estimating the Number of Components in Finite Mixture Models via the Group-Sort-Fuse Procedure

Tudor Manole, +1 more

- 24 May 2020 -

arXiv: Methodology

TL;DR: This work proposes the Group-Sort-Fuse procedure---a new penalized likelihood approach for simultaneous estimation of the order and mixing measure in multidimensional finite mixture models and shows that the GSF is consistent in estimating the true mixture order and achieves the convergence rate for parameter estimation up to polylogarithmic factors.

...read moreread less

Posted Content

Learning in Gated Neural Networks

Ashok Vardhan Makkuva, +3 more

- 06 Jun 2019 -

arXiv: Learning

TL;DR: A careful analysis of the optimization landscape is performed and it is shown that with appropriately designed loss functions, gradient descent can indeed learn the parameters accurately and demonstrate significant performance gains over standard loss functions in numerical experiments.

...read moreread less

Posted Content

A non-asymptotic model selection in block-diagonal mixture of polynomial experts models

TrungTin Nguyen, +3 more

- 02 May 2021 -

arXiv: Statistics Theory

TL;DR: A block-diagonal localized mixture of polynomial experts (BLoMPE) regression model is investigated, which is constructed upon an inverse regression and blockdiagonal structures of the Gaussian expert covariance matrices, and a penalized maximum likelihood selection criterion is introduced to estimate the unknown conditional density of the regression model.

...read moreread less

References

PDF

Open Access

More filters

Book

Topics in Optimal Transportation

Cédric Villani

TL;DR: In this paper, the metric side of optimal transportation is considered from a differential point of view on optimal transportation, and the Kantorovich duality of the optimal transportation problem is investigated.

...read moreread less

Journal ArticleDOI

Adaptive mixtures of local experts

Robert A. Jacobs, +3 more

- 01 Mar 1991 -

Neural Computation

TL;DR: A new supervised learning procedure for systems composed of many separate networks, each of which learns to handle a subset of the complete set of training cases, which is demonstrated to be able to be solved by a very simple expert network.

...read moreread less

Journal ArticleDOI

Hierarchical mixtures of experts and the EM algorithm

Michael I. Jordan, +1 more

- 01 Mar 1994 -

Neural Computation

TL;DR: An Expectation-Maximization (EM) algorithm for adjusting the parameters of the tree-structured architecture for supervised learning and an on-line learning algorithm in which the parameters are updated incrementally.

...read moreread less

Book ChapterDOI

Hierarchical mixtures of experts and the EM algorithm

Michael I. Jordan, +1 more

TL;DR: An expectation-maximization (EM) algorithm for adjusting the parameters of the tree-structured architecture for supervised learning is presented and an online learning algorithm in which the parameters are updated incrementally is developed.

...read moreread less