Open AccessPosted Content
Convergence Rates for Gaussian Mixtures of Experts
TLDR
Drawing on optimal transport theory, a connection is established between the algebraic independence of the expert functions and a certain class of partial differential equations (PDEs) to derive convergence rates and minimax lower bounds for parameter estimation.Abstract:
We provide a theoretical treatment of over-specified Gaussian mixtures of experts with covariate-free gating networks We establish the convergence rates of the maximum likelihood estimation (MLE) for these models Our proof technique is based on a novel notion of \emph{algebraic independence} of the expert functions Drawing on optimal transport theory, we establish a connection between the algebraic independence and a certain class of partial differential equations (PDEs) Exploiting this connection allows us to derive convergence rates and minimax lower bounds for parameter estimationread more
Citations
More filters
Posted Content
Minimax Confidence Intervals for the Sliced Wasserstein Distance
TL;DR: Confidence intervals for the Sliced Wasserstein distance are constructed which have finite-sample validity under no assumptions or under mild moment assumptions and are adaptive in length to the regularity of the underlying distributions.
Posted Content
On the Minimax Optimality of the EM Algorithm for Learning Two-Component Mixed Linear Regression.
TL;DR: By providing tight convergence guarantees of the EM algorithm in middle-to-low SNR regimes, this work fills the remaining gap in the literature, and significantly reveals that in low SNR, EM changes rate, matching the $n^{-1/4}$ rate of the MLE, a behavior that previous work had been unable to show.
Posted Content
Estimating the Number of Components in Finite Mixture Models via the Group-Sort-Fuse Procedure
Tudor Manole,Abbas Khalili +1 more
TL;DR: This work proposes the Group-Sort-Fuse procedure---a new penalized likelihood approach for simultaneous estimation of the order and mixing measure in multidimensional finite mixture models and shows that the GSF is consistent in estimating the true mixture order and achieves the convergence rate for parameter estimation up to polylogarithmic factors.
Posted Content
Learning in Gated Neural Networks
TL;DR: A careful analysis of the optimization landscape is performed and it is shown that with appropriately designed loss functions, gradient descent can indeed learn the parameters accurately and demonstrate significant performance gains over standard loss functions in numerical experiments.
Posted Content
A non-asymptotic model selection in block-diagonal mixture of polynomial experts models
TL;DR: A block-diagonal localized mixture of polynomial experts (BLoMPE) regression model is investigated, which is constructed upon an inverse regression and blockdiagonal structures of the Gaussian expert covariance matrices, and a penalized maximum likelihood selection criterion is introduced to estimate the unknown conditional density of the regression model.
References
More filters
Book
Topics in Optimal Transportation
TL;DR: In this paper, the metric side of optimal transportation is considered from a differential point of view on optimal transportation, and the Kantorovich duality of the optimal transportation problem is investigated.
Journal ArticleDOI
Adaptive mixtures of local experts
TL;DR: A new supervised learning procedure for systems composed of many separate networks, each of which learns to handle a subset of the complete set of training cases, which is demonstrated to be able to be solved by a very simple expert network.
Journal ArticleDOI
Hierarchical mixtures of experts and the EM algorithm
TL;DR: An Expectation-Maximization (EM) algorithm for adjusting the parameters of the tree-structured architecture for supervised learning and an on-line learning algorithm in which the parameters are updated incrementally.
Book ChapterDOI
Hierarchical mixtures of experts and the EM algorithm
TL;DR: An expectation-maximization (EM) algorithm for adjusting the parameters of the tree-structured architecture for supervised learning is presented and an online learning algorithm in which the parameters are updated incrementally is developed.