scispace - formally typeset
Open AccessPosted Content

Convergence Rates for Gaussian Mixtures of Experts

TLDR
Drawing on optimal transport theory, a connection is established between the algebraic independence of the expert functions and a certain class of partial differential equations (PDEs) to derive convergence rates and minimax lower bounds for parameter estimation.
Abstract
We provide a theoretical treatment of over-specified Gaussian mixtures of experts with covariate-free gating networks We establish the convergence rates of the maximum likelihood estimation (MLE) for these models Our proof technique is based on a novel notion of \emph{algebraic independence} of the expert functions Drawing on optimal transport theory, we establish a connection between the algebraic independence and a certain class of partial differential equations (PDEs) Exploiting this connection allows us to derive convergence rates and minimax lower bounds for parameter estimation

read more

Citations
More filters
Posted Content

Minimax Confidence Intervals for the Sliced Wasserstein Distance

TL;DR: Confidence intervals for the Sliced Wasserstein distance are constructed which have finite-sample validity under no assumptions or under mild moment assumptions and are adaptive in length to the regularity of the underlying distributions.
Posted Content

On the Minimax Optimality of the EM Algorithm for Learning Two-Component Mixed Linear Regression.

TL;DR: By providing tight convergence guarantees of the EM algorithm in middle-to-low SNR regimes, this work fills the remaining gap in the literature, and significantly reveals that in low SNR, EM changes rate, matching the $n^{-1/4}$ rate of the MLE, a behavior that previous work had been unable to show.
Posted Content

Estimating the Number of Components in Finite Mixture Models via the Group-Sort-Fuse Procedure

TL;DR: This work proposes the Group-Sort-Fuse procedure---a new penalized likelihood approach for simultaneous estimation of the order and mixing measure in multidimensional finite mixture models and shows that the GSF is consistent in estimating the true mixture order and achieves the convergence rate for parameter estimation up to polylogarithmic factors.
Posted Content

Learning in Gated Neural Networks

TL;DR: A careful analysis of the optimization landscape is performed and it is shown that with appropriately designed loss functions, gradient descent can indeed learn the parameters accurately and demonstrate significant performance gains over standard loss functions in numerical experiments.
Posted Content

A non-asymptotic model selection in block-diagonal mixture of polynomial experts models

TL;DR: A block-diagonal localized mixture of polynomial experts (BLoMPE) regression model is investigated, which is constructed upon an inverse regression and blockdiagonal structures of the Gaussian expert covariance matrices, and a penalized maximum likelihood selection criterion is introduced to estimate the unknown conditional density of the regression model.
References
More filters
Book

Topics in Optimal Transportation

TL;DR: In this paper, the metric side of optimal transportation is considered from a differential point of view on optimal transportation, and the Kantorovich duality of the optimal transportation problem is investigated.
Journal ArticleDOI

Adaptive mixtures of local experts

TL;DR: A new supervised learning procedure for systems composed of many separate networks, each of which learns to handle a subset of the complete set of training cases, which is demonstrated to be able to be solved by a very simple expert network.
Journal ArticleDOI

Hierarchical mixtures of experts and the EM algorithm

TL;DR: An Expectation-Maximization (EM) algorithm for adjusting the parameters of the tree-structured architecture for supervised learning and an on-line learning algorithm in which the parameters are updated incrementally.
Book ChapterDOI

Hierarchical mixtures of experts and the EM algorithm

TL;DR: An expectation-maximization (EM) algorithm for adjusting the parameters of the tree-structured architecture for supervised learning is presented and an online learning algorithm in which the parameters are updated incrementally is developed.
Related Papers (5)