Information geometry of the EM and em algorithms for neural networks

doi:10.1016/0893-6080(95)00003-8

Journal ArticleDOI

Information geometry of the EM and em algorithms for neural networks

Shun-ichi Amari

- 16 Dec 1995 -

Neural Networks

- Vol. 8, Iss: 9, pp 1379-1408

Chats0

TLDR

A unified information geometrical framework for studying stochastic models of neural networks, by focusing on the EM and em algorithms, and proves a condition that guarantees their equivalence.

About:

This article is published in Neural Networks.The article was published on 1995-12-16. It has received 339 citations till now. The article focuses on the topics: Stochastic neural network & Mixture model.

Citations

PDF

Open Access

More filters

Book

Methods of information geometry

Shun-ichi Amari, +1 more

Journal ArticleDOI

Hierarchical mixtures of experts and the EM algorithm

Michael I. Jordan, +1 more

- 01 Mar 1994 -

Neural Computation

TL;DR: An Expectation-Maximization (EM) algorithm for adjusting the parameters of the tree-structured architecture for supervised learning and an on-line learning algorithm in which the parameters are updated incrementally.

...read moreread less

Proceedings ArticleDOI

Clustering with Bregman Divergences

Arindam Banerjee, +3 more

TL;DR: This paper proposes and analyzes parametric hard and soft clustering algorithms based on a large class of distortion functions known as Bregman divergences, and shows that there is a bijection between regular exponential families and a largeclass of BRegman diverGences, that is called regular Breg man divergence.

...read moreread less

Journal ArticleDOI

Exponentiated gradient versus gradient descent for linear predictors

Jyrki Kivinen, +1 more

- 10 Jan 1997 -

Information & Computation

TL;DR: The bounds suggest that the losses of the algorithms are in general incomparable, but EG(+/-) has a much smaller loss if only a few components of the input are relevant for the predictions, which is quite tight already on simple artificial data.

...read moreread less

Journal ArticleDOI

Clustering on the Unit Hypersphere using von Mises-Fisher Distributions

Arindam Banerjee, +3 more

- 01 Dec 2005 -

Journal of Machine Learning Research

TL;DR: A generative mixture-model approach to clustering directional data based on the von Mises-Fisher distribution, which arises naturally for data distributed on the unit hypersphere, and derives and analyzes two variants of the Expectation Maximization framework for estimating the mean and concentration parameters of this mixture.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Maximum likelihood from incomplete data via the EM algorithm

Arthur P. Dempster, +2 more

- 01 Sep 1977 -

Journal of the royal statistical society...

Journal ArticleDOI

Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images

Stuart Geman, +1 more

- 01 Nov 1984 -

IEEE Transactions on Pattern Analysis an...

TL;DR: The analogy between images and statistical mechanics systems is made and the analogous operation under the posterior distribution yields the maximum a posteriori (MAP) estimate of the image given the degraded observations, creating a highly parallel ``relaxation'' algorithm for MAP estimation.

...read moreread less

Journal ArticleDOI

Linear Statistical Inference and its Applications

P. G. Moore, +1 more

- 01 Mar 1967 -

The Statistician

TL;DR: The theory of least squares and analysis of variance has been studied in the literature for a long time, see as mentioned in this paper for a review of some of the most relevant works. But the main focus of this paper is on the analysis of variance.

...read moreread less

Journal ArticleDOI

A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains

Leonard E. Baum, +3 more

- 01 Feb 1970 -

Annals of Mathematical Statistics

Journal ArticleDOI

Adaptive mixtures of local experts

Robert A. Jacobs, +3 more

- 01 Mar 1991 -

Neural Computation

TL;DR: A new supervised learning procedure for systems composed of many separate networks, each of which learns to handle a subset of the complete set of training cases, which is demonstrated to be able to be solved by a very simple expert network.

...read moreread less

Collapse

Information geometry of the EM and em algorithms for neural networks

Citations

Methods of information geometry

Hierarchical mixtures of experts and the EM algorithm

Clustering with Bregman Divergences

Exponentiated gradient versus gradient descent for linear predictors

Clustering on the Unit Hypersphere using von Mises-Fisher Distributions

References

Maximum likelihood from incomplete data via the EM algorithm

Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images

Linear Statistical Inference and its Applications

A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains

Adaptive mixtures of local experts

Related Papers (5)

Maximum likelihood from incomplete data via the EM algorithm

Methods of information geometry

Natural gradient works efficiently in learning

Hierarchical mixtures of experts and the EM algorithm

Elements of information theory