scispace - formally typeset
Search or ask a question
Book ChapterDOI

Information geometry of neural network—an overview

01 Oct 1997-pp 15-23
TL;DR: Information geometry gives an answer, giving the Riemannian metric and a dual pair of affine connections in the manifold of neural networks.
Abstract: The set of all the neural networks of a fixed architecture forms a geometrical manifold where the modifable connection weights play the role of coordinates. It is important to study all such networks as a whole rather than the behavior of each network in order to understand the capability of information processing of neural networks. What is the natural geometry to be introduced in the manifold of neural networks? Information geometry gives an answer, giving the Riemannian metric and a dual pair of affine connections. An overview is given to information geometry of neural networks.
Citations
More filters
Journal ArticleDOI
TL;DR: A unified algorithm which can be used to extract both principal and minor component eigenvectors is proposed and if altered simply by the sign, it can also serve as a true minor components extractor.

91 citations


Cites background or methods from "Information geometry of neural netw..."

  • ...Its relation to information geometry (Amari, 1985, 1997a) is also suggested by Nakamura, 1993; Fujiwara and Amari, 1995....

    [...]

  • ...We show that our algorithms are the natural or Riemannian gradient flow (Amari, 1997b) on a Stiefel manifold....

    [...]

Journal ArticleDOI
TL;DR: This paper considers how the EM algorithm can be adopted to train multilayer perceptron (MLP) and mixture of experts (ME) networks in applications to multiclass classification and proposes the use of an expectation-conditional maximization (ECM) algorithm to train ME networks.
Abstract: The expectation-maximization (EM) algorithm has been of considerable interest in recent years as the basis for various algorithms in application areas of neural networks such as pattern recognition. However, there exists some misconceptions concerning its application to neural networks. In this paper, we clarify these misconceptions and consider how the EM algorithm can be adopted to train multilayer perceptron (MLP) and mixture of experts (ME) networks in applications to multiclass classification. We identify some situations where the application of the EM algorithm to train MLP networks may be of limited value and discuss some ways of handling the difficulties. For ME networks, it is reported in the literature that networks trained by the EM algorithm using iteratively reweighted least squares (IRLS) algorithm in the inner loop of the M-step, often performed poorly in multiclass classification. However, we found that the convergence of the IRLS algorithm is stable and that the log likelihood is monotonic increasing when a learning rate smaller than one is adopted. Also, we propose the use of an expectation-conditional maximization (ECM) algorithm to train ME networks. Its performance is demonstrated to be superior to the IRLS algorithm on some simulated and real data sets.

61 citations


Cites background from "Information geometry of neural netw..."

  • ...With the case of , (6) reduces to the logistic transformation....

    [...]

Journal ArticleDOI
TL;DR: T theoretical results about the low complexity of optimal solutions for the optimization problem of frequently used measures like the mutual information in an unconstrained and more theoretical setting are established.
Abstract: 1.1. The motivation of the approach. In the field of neural networks, so-called infomax principles like the principle of “maximum information preservation” by Linsker [20] are formulated to derive learning rules that improve the information processing properties of neural systems (see [12]). These principles, which are based on information-theoretic measures, are intended to describe the mechanism of learning in the brain. There, the starting point is a low-dimensional and biophysiologically motivated parametrization of the neural system, which need not necessarily be compatible with the given optimization principle. In contrast to this, we establish theoretical results about the low complexity of optimal solutions for the optimization problem of frequently used measures like the mutual information in an unconstrained and more theoretical setting. In the present paper, we do not comment on applications to modeling neural networks. This is intended to be done in a further step, where the results can be used for the characterization of “good” parameter sets that, on the one hand, are compatible with the underlying optimization and, on the other hand, are biologically motivated.

49 citations

Journal ArticleDOI
08 Jan 2020
TL;DR: In this article, the authors explore the information geometry associated to a variety of simple systems and derive some general lessons that may have important implications for the application of information geometry in holography.
Abstract: Motivated by the increasing connections between information theory and high-energy physics, particularly in the context of the AdS/CFT correspondence, we explore the information geometry associated to a variety of simple systems. By studying their Fisher metrics, we derive some general lessons that may have important implications for the application of information geometry in holography. We begin by demonstrating that the symmetries of the physical theory under study play a strong role in the resulting geometry, and that the appearance of an AdS metric is a relatively general feature. We then investigate what information the Fisher metric retains about the physics of the underlying theory by studying the geometry for both the classical 2d Ising model and the corresponding 1d free fermion theory, and find that the curvature diverges precisely at the phase transition on both sides. We discuss the differences that result from placing a metric on the space of theories vs. states, using the example of coherent free fermion states. We compare the latter to the metric on the space of coherent free boson states and show that in both cases the metric is determined by the symmetries of the corresponding density matrix. We also clarify some misconceptions in the literature pertaining to different notions of flatness associated to metric and non-metric connections, with implications for how one interprets the curvature of the geometry. Our results indicate that in general, caution is needed when connecting the AdS geometry arising from certain models with the AdS/CFT correspondence, and seek to provide a useful collection of guidelines for future progress in this exciting area.

34 citations

Proceedings ArticleDOI
20 Mar 2016
TL;DR: The proposed method uses geodesic distances in the statistical manifold of probability distributions parametrized by their covariance matrix to estimate the direction of arrival of several sources using concepts from information geometry.
Abstract: In this paper, a new direction of arrival (DOA) estimation approach is devised using concepts from information geometry (IG). The proposed method uses geodesic distances in the statistical manifold of probability distributions parametrized by their covariance matrix to estimate the direction of arrival of several sources. In order to obtain a practical method, the DOA estimation is treated as a single-variable optimization problem, for which the DOA solutions are found by means of a line search. The relation between the proposed method and MVDR beamformer is elucidated. An evaluation of its performance is carried out by means of Monte Carlo simulations and it is shown that the proposed method provides improved resolution capabilities at low SNR with respect to MUSIC and MVDR.

17 citations


Cites background from "Information geometry of neural netw..."

  • ...Recent work [5] has raised attention towards the usage of information geometry to describe the manifold in which probability distributions live and links with several fields have been established (e.g., neural networks [6], [7], optimization [8], [9])....

    [...]

References
More filters
Book ChapterDOI
01 Jan 1988
TL;DR: This chapter contains sections titled: The Problem, The Generalized Delta Rule, Simulation Results, Some Further Generalizations, Conclusion.
Abstract: This chapter contains sections titled: The Problem, The Generalized Delta Rule, Simulation Results, Some Further Generalizations, Conclusion

17,604 citations

Book
03 Jan 1986
TL;DR: In this paper, the problem of the generalized delta rule is discussed and the Generalized Delta Rule is applied to the simulation results of simulation results in terms of the generalized delta rule.
Abstract: This chapter contains sections titled: The Problem, The Generalized Delta Rule, Simulation Results, Some Further Generalizations, Conclusion

13,579 citations

Journal ArticleDOI
TL;DR: A general parallel search method is described, based on statistical mechanics, and it is shown how it leads to a general learning rule for modifying the connection strengths so as to incorporate knowledge about a task domain in an efficient way.

3,727 citations

Journal ArticleDOI
TL;DR: An Expectation-Maximization (EM) algorithm for adjusting the parameters of the tree-structured architecture for supervised learning and an on-line learning algorithm in which the parameters are updated incrementally.
Abstract: We present a tree-structured architecture for supervised learning. The statistical model underlying the architecture is a hierarchical mixture model in which both the mixture coefficients and the mixture components are generalized linear models (GLIM's). Learning is treated as a maximum likelihood problem; in particular, we present an Expectation-Maximization (EM) algorithm for adjusting the parameters of the architecture. We also develop an on-line learning algorithm in which the parameters are updated incrementally. Comparative simulation results are presented in the robot dynamics domain.

2,418 citations

Journal ArticleDOI
TL;DR: In this article, the minimum discrimination information problem is viewed as projecting a PD onto a convex set of PD's and useful existence theorems for and characterizations of the minimizing PD are arrived at.
Abstract: Some geometric properties of PD's are established, Kullback's $I$-divergence playing the role of squared Euclidean distance. The minimum discrimination information problem is viewed as that of projecting a PD onto a convex set of PD's and useful existence theorems for and characterizations of the minimizing PD are arrived at. A natural generalization of known iterative algorithms converging to the minimizing PD in special situations is given; even for those special cases, our convergence proof is more generally valid than those previously published. As corollaries of independent interest, generalizations of known results on the existence of PD's or nonnegative matrices of a certain form are obtained. The Lagrange multiplier technique is not used.

1,604 citations