Information geometry of neural network—an overview

doi:10.1007/978-1-4615-6099-9_2

Home
/
Papers
/
Information geometry of neural network—an overview

Book Chapter•DOI•

Information geometry of neural network—an overview

Shun-ichi Amari¹•Institutions (1)

University of Tokyo¹

01 Oct 1997-pp 15-23

TL;DR: Information geometry gives an answer, giving the Riemannian metric and a dual pair of affine connections in the manifold of neural networks.

read less

Abstract: The set of all the neural networks of a fixed architecture forms a geometrical manifold where the modifable connection weights play the role of coordinates. It is important to study all such networks as a whole rather than the behavior of each network in order to understand the capability of information processing of neural networks. What is the natural geometry to be introduced in the manifold of neural networks? Information geometry gives an answer, giving the Riemannian metric and a dual pair of affine connections. An overview is given to information geometry of neural networks.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

A unified algorithm for principal and minor components extraction

[...]

Tianping Chen¹, Shun-ichi Amari², Qin Lin¹•Institutions (2)

Fudan University¹, RIKEN Brain Science Institute²

01 Apr 1998-Neural Networks

TL;DR: A unified algorithm which can be used to extract both principal and minor component eigenvectors is proposed and if altered simply by the sign, it can also serve as a true minor components extractor.

...read moreread less

91 citations

Cites background or methods from "Information geometry of neural netw..."

...Its relation to information geometry (Amari, 1985, 1997a) is also suggested by Nakamura, 1993; Fujiwara and Amari, 1995....
[...]
...We show that our algorithms are the natural or Riemannian gradient flow (Amari, 1997b) on a Stiefel manifold....
[...]

Journal Article•DOI•

Using the EM algorithm to train neural networks: misconceptions and a new algorithm for multiclass classification

[...]

Shu-Kay Ng¹, Geoffrey J. McLachlan¹•Institutions (1)

University of Queensland¹

01 May 2004-IEEE Transactions on Neural Networks

TL;DR: This paper considers how the EM algorithm can be adopted to train multilayer perceptron (MLP) and mixture of experts (ME) networks in applications to multiclass classification and proposes the use of an expectation-conditional maximization (ECM) algorithm to train ME networks.

...read moreread less

Abstract: The expectation-maximization (EM) algorithm has been of considerable interest in recent years as the basis for various algorithms in application areas of neural networks such as pattern recognition. However, there exists some misconceptions concerning its application to neural networks. In this paper, we clarify these misconceptions and consider how the EM algorithm can be adopted to train multilayer perceptron (MLP) and mixture of experts (ME) networks in applications to multiclass classification. We identify some situations where the application of the EM algorithm to train MLP networks may be of limited value and discuss some ways of handling the difficulties. For ME networks, it is reported in the literature that networks trained by the EM algorithm using iteratively reweighted least squares (IRLS) algorithm in the inner loop of the M-step, often performed poorly in multiclass classification. However, we found that the convergence of the IRLS algorithm is stable and that the log likelihood is monotonic increasing when a learning rate smaller than one is adopted. Also, we propose the use of an expectation-conditional maximization (ECM) algorithm to train ME networks. Its performance is demonstrated to be superior to the IRLS algorithm on some simulated and real data sets.

...read moreread less

61 citations

Cites background from "Information geometry of neural netw..."

...With the case of , (6) reduces to the logistic transformation....
[...]

Journal Article•DOI•

An information-geometric approach to a theory of pragmatic structuring

[...]

Nihat Ay

01 Jan 2002-Annals of Probability

TL;DR: T theoretical results about the low complexity of optimal solutions for the optimization problem of frequently used measures like the mutual information in an unconstrained and more theoretical setting are established.

...read moreread less

Abstract: 1.1. The motivation of the approach. In the field of neural networks, so-called infomax principles like the principle of “maximum information preservation” by Linsker [20] are formulated to derive learning rules that improve the information processing properties of neural systems (see [12]). These principles, which are based on information-theoretic measures, are intended to describe the mechanism of learning in the brain. There, the starting point is a low-dimensional and biophysiologically motivated parametrization of the neural system, which need not necessarily be compatible with the given optimization principle. In contrast to this, we establish theoretical results about the low complexity of optimal solutions for the optimization problem of frequently used measures like the mutual information in an unconstrained and more theoretical setting. In the present paper, we do not comment on applications to modeling neural networks. This is intended to be done in a further step, where the results can be used for the characterization of “good” parameter sets that, on the one hand, are compatible with the underlying optimization and, on the other hand, are biologically motivated.

...read moreread less

49 citations

Journal Article•DOI•

Information geometry in quantum field theory: lessons from simple examples

[...]

Johanna Erdmenger¹, Kevin T. Grosvenor¹, Ro Jefferson²•Institutions (2)

University of Würzburg¹, Max Planck Society²

08 Jan 2020

TL;DR: In this article, the authors explore the information geometry associated to a variety of simple systems and derive some general lessons that may have important implications for the application of information geometry in holography.

...read moreread less

Abstract: Motivated by the increasing connections between information theory and high-energy physics, particularly in the context of the AdS/CFT correspondence, we explore the information geometry associated to a variety of simple systems. By studying their Fisher metrics, we derive some general lessons that may have important implications for the application of information geometry in holography. We begin by demonstrating that the symmetries of the physical theory under study play a strong role in the resulting geometry, and that the appearance of an AdS metric is a relatively general feature. We then investigate what information the Fisher metric retains about the physics of the underlying theory by studying the geometry for both the classical 2d Ising model and the corresponding 1d free fermion theory, and find that the curvature diverges precisely at the phase transition on both sides. We discuss the differences that result from placing a metric on the space of theories vs. states, using the example of coherent free fermion states. We compare the latter to the metric on the space of coherent free boson states and show that in both cases the metric is determined by the symmetries of the corresponding density matrix. We also clarify some misconceptions in the literature pertaining to different notions of flatness associated to metric and non-metric connections, with implications for how one interprets the curvature of the geometry. Our results indicate that in general, caution is needed when connecting the AdS geometry arising from certain models with the AdS/CFT correspondence, and seek to provide a useful collection of guidelines for future progress in this exciting area.

...read moreread less

34 citations

Proceedings Article•DOI•

Direction of arrival estimation based on information geometry

[...]

Mario Coutino¹, Radmila Pribic, Geert Leus¹•Institutions (1)

Delft University of Technology¹

20 Mar 2016

TL;DR: The proposed method uses geodesic distances in the statistical manifold of probability distributions parametrized by their covariance matrix to estimate the direction of arrival of several sources using concepts from information geometry.

...read moreread less

Abstract: In this paper, a new direction of arrival (DOA) estimation approach is devised using concepts from information geometry (IG). The proposed method uses geodesic distances in the statistical manifold of probability distributions parametrized by their covariance matrix to estimate the direction of arrival of several sources. In order to obtain a practical method, the DOA estimation is treated as a single-variable optimization problem, for which the DOA solutions are found by means of a line search. The relation between the proposed method and MVDR beamformer is elucidated. An evaluation of its performance is carried out by means of Monte Carlo simulations and it is shown that the proposed method provides improved resolution capabilities at low SNR with respect to MUSIC and MVDR.

...read moreread less

17 citations

Cites background from "Information geometry of neural netw..."

...Recent work [5] has raised attention towards the usage of information geometry to describe the manifold in which probability distributions live and links with several fields have been established (e.g., neural networks [6], [7], optimization [8], [9])....
[...]

1
2
3
4
…
5

References

PDF

Open Access

More filters

Book Chapter•DOI•

Learning internal representations by error propagation

[...]

David E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams

01 Jan 1988

TL;DR: This chapter contains sections titled: The Problem, The Generalized Delta Rule, Simulation Results, Some Further Generalizations, Conclusion.

...read moreread less

Abstract: This chapter contains sections titled: The Problem, The Generalized Delta Rule, Simulation Results, Some Further Generalizations, Conclusion

...read moreread less

17,604 citations

Book•

Learning internal representations by error propagation

[...]

David E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams

03 Jan 1986

TL;DR: In this paper, the problem of the generalized delta rule is discussed and the Generalized Delta Rule is applied to the simulation results of simulation results in terms of the generalized delta rule.

...read moreread less

Abstract: This chapter contains sections titled: The Problem, The Generalized Delta Rule, Simulation Results, Some Further Generalizations, Conclusion

...read moreread less

13,579 citations

Journal Article•DOI•

A learning algorithm for boltzmann machines

[...]

David H. Ackley¹, Geoffrey E. Hinton¹, Terrence J. Sejnowski²•Institutions (2)

Carnegie Mellon University¹, Johns Hopkins University²

01 Jan 1985-Cognitive Science

TL;DR: A general parallel search method is described, based on statistical mechanics, and it is shown how it leads to a general learning rule for modifying the connection strengths so as to incorporate knowledge about a task domain in an efficient way.

...read moreread less

3,727 citations

Journal Article•DOI•

Hierarchical mixtures of experts and the EM algorithm

[...]

Michael I. Jordan¹, Robert A. Jacobs²•Institutions (2)

Massachusetts Institute of Technology¹, University of Rochester²

01 Mar 1994-Neural Computation

TL;DR: An Expectation-Maximization (EM) algorithm for adjusting the parameters of the tree-structured architecture for supervised learning and an on-line learning algorithm in which the parameters are updated incrementally.

...read moreread less

Abstract: We present a tree-structured architecture for supervised learning. The statistical model underlying the architecture is a hierarchical mixture model in which both the mixture coefficients and the mixture components are generalized linear models (GLIM's). Learning is treated as a maximum likelihood problem; in particular, we present an Expectation-Maximization (EM) algorithm for adjusting the parameters of the architecture. We also develop an on-line learning algorithm in which the parameters are updated incrementally. Comparative simulation results are presented in the robot dynamics domain.

...read moreread less

2,418 citations

Journal Article•DOI•

$I$-Divergence Geometry of Probability Distributions and Minimization Problems

[...]

Imre Csiszár

01 Feb 1975-Annals of Probability

TL;DR: In this article, the minimum discrimination information problem is viewed as projecting a PD onto a convex set of PD's and useful existence theorems for and characterizations of the minimizing PD are arrived at.

...read moreread less

Abstract: Some geometric properties of PD's are established, Kullback's $I$-divergence playing the role of squared Euclidean distance. The minimum discrimination information problem is viewed as that of projecting a PD onto a convex set of PD's and useful existence theorems for and characterizations of the minimizing PD are arrived at. A natural generalization of known iterative algorithms converging to the minimizing PD in special situations is given; even for those special cases, our convergence proof is more generally valid than those previously published. As corollaries of independent interest, generalizations of known results on the existence of PD's or nonnegative matrices of a certain form are obtained. The Lagrange multiplier technique is not used.

...read moreread less

1,604 citations