scispace - formally typeset
Search or ask a question

Showing papers on "Information geometry published in 2008"


Journal ArticleDOI
TL;DR: A novel approach for classifying points lying on a connected Riemannian manifold using the geometry of the space of d-dimensional nonsingular covariance matrices as object descriptors.
Abstract: We present a new algorithm to detect pedestrian in still images utilizing covariance matrices as object descriptors. Since the descriptors do not form a vector space, well known machine learning techniques are not well suited to learn the classifiers. The space of d-dimensional nonsingular covariance matrices can be represented as a connected Riemannian manifold. The main contribution of the paper is a novel approach for classifying points lying on a connected Riemannian manifold using the geometry of the space. The algorithm is tested on INRIA and DaimlerChrysler pedestrian datasets where superior detection rates are observed over the previous approaches.

1,044 citations


Journal ArticleDOI
Tong Lin1, Hongbin Zha1
TL;DR: A novel framework based on the assumption that the input high-dimensional data lie on an intrinsically low-dimensional Riemannian manifold, which can learn intrinsic geometric structures of the data, preserve radial geodesic distances, and yield regular embeddings.
Abstract: Recently, manifold learning has been widely exploited in pattern recognition, data analysis, and machine learning. This paper presents a novel framework, called Riemannian manifold learning (RML), based on the assumption that the input high-dimensional data lie on an intrinsically low-dimensional Riemannian manifold. The main idea is to formulate the dimensionality reduction problem as a classical problem in Riemannian geometry, that is, how to construct coordinate charts for a given Riemannian manifold? We implement the Riemannian normal coordinate chart, which has been the most widely used in Riemannian geometry, for a set of unorganized data points. First, two input parameters (the neighborhood size k and the intrinsic dimension d) are estimated based on an efficient simplicial reconstruction of the underlying manifold. Then, the normal coordinates are computed to map the input high-dimensional data into a low- dimensional space. Experiments on synthetic data, as well as real-world images, demonstrate that our algorithm can learn intrinsic geometric structures of the data, preserve radial geodesic distances, and yield regular embeddings.

418 citations


Proceedings ArticleDOI
23 Jun 2008
TL;DR: A novel algorithm for clustering data sampled from multiple submanifolds of a Riemannian manifold is proposed and it is shown that the null space of a matrix built from the local representation gives the segmentation of the data.
Abstract: We propose a novel algorithm for clustering data sampled from multiple submanifolds of a Riemannian manifold. First, we learn a representation of the data using generalizations of local nonlinear dimensionality reduction algorithms from Euclidean to Riemannian spaces. Such generalizations exploit geometric properties of the Riemannian space, particularly its Riemannian metric. Then, assuming that the data points from different groups are separated, we show that the null space of a matrix built from the local representation gives the segmentation of the data. Our method is computationally simple and performs automatic segmentation without requiring user initialization. We present results on 2-D motion segmentation and diffusion tensor imaging segmentation.

156 citations


Proceedings ArticleDOI
23 Jun 2008
TL;DR: An effective online Log-Euclidean Riemannian subspace learning algorithm which models the appearance changes of an object by incrementally learning a low-order Log- eigenspace representation through adaptively updating the sample mean and eigenbasis is presented.
Abstract: Recently, a novel Log-Euclidean Riemannian metric is proposed for statistics on symmetric positive definite (SPD) matrices. Under this metric, distances and Riemannian means take a much simpler form than the widely used affine-invariant Riemannian metric. Based on the Log-Euclidean Riemannian metric, we develop a tracking framework in this paper. In the framework, the covariance matrices of image features in the five modes are used to represent object appearance. Since a nonsingular covariance matrix is a SPD matrix lying on a connected Riemannian manifold, the Log-Euclidean Riemannian metric is used for statistics on the covariance matrices of image features. Further, we present an effective online Log-Euclidean Riemannian subspace learning algorithm which models the appearance changes of an object by incrementally learning a low-order Log-Euclidean eigenspace representation through adaptively updating the sample mean and eigenbasis. Tracking is then led by the Bayesian state inference framework in which a particle filter is used for propagating sample distributions over the time. Theoretic analysis and experimental evaluations demonstrate the promise and effectiveness of the proposed framework.

147 citations


Proceedings ArticleDOI
26 May 2008
TL;DR: This innovative approach avoids classical drawbacks of Doppler processing by filter banks or FFT in case of bursts with very few pulses, using that radar data covariance matrices include all information of the sensor signal.
Abstract: New operational requirements for stealth targets detection in dense & inhomogeneous clutter are emerging (littoral warfare, low altitude asymmetric threats, battlefield in urban area...). Classical radar approaches for Doppler & array signal processing have reached their limits. We propose new improvements based on advanced mathematical studies on geometry of SPD matrix (symmetric positive definite matrix) and information geometry, using that radar data covariance matrices include all information of the sensor signal. First, information geometry allows to take into account statistics of radar covariance matrix (by mean of Fisher information matrix used in Cramer-Rao bound) to built a robust distance, called Jensen, Siegel or Bruhat-Tits metric. Geometry on ldquosymmetric conesrdquo, developed in frameworks of Lie group and Jordan algebra, provides new algorithms to compute matrix geometric means that could be used for ldquomatrix CFARrdquo. This innovative approach avoids classical drawbacks of Doppler processing by filter banks or FFT in case of bursts with very few pulses.

128 citations


Proceedings ArticleDOI
23 Jun 2008
TL;DR: A new texture descriptor is proposed which intrinsically defines the geometry of textural regions using the shape operator borrowed from differential geometry to define an active contour model which distinguishes the background and textural objects of interest represented by the probability density functions of the proposed texture descriptor.
Abstract: We present an approach for unsupervised segmentation of natural and textural images based on active contour, differential geometry and information theoretical concept. More precisely, we propose a new texture descriptor which intrinsically defines the geometry of textural regions using the shape operator borrowed from differential geometry. Then, we use the popular Kullback-Leibler distance to define an active contour model which distinguishes the background and textural objects of interest represented by the probability density functions of our new texture descriptor. We prove the existence of a solution to the proposed segmentation model. Finally, a fast and easy to implement texture segmentation algorithm is introduced to extract meaningful objects. We present promising synthetic and real-world results and compare our algorithm to other state-of-the-art techniques.

77 citations


Book
01 Jan 2008
TL;DR: The main motivation for this book lies in the breadth of applications in which a statistical model is used to represent small departures from, for example, a Poisson process.
Abstract: The main motivation for this book lies in the breadth of applications in which a statistical model is used to represent small departures from, for example, a Poisson process. Our approach uses information geometry to provide a common context but we need only rather elementary material from differential geometry, information theory and mathematical statistics. Introductory sections serve together to help those interested from the applications side in making use of our methods and results. Reported in this monograph is a body of results, and computer-algebraic methods that seem to have quite general applicability to statistical models admitting representation through parametric families of probability density functions. Some illustrations are given from a variety of contexts for geometric characterization of statistical states near to the three important standard basic reference states: (Poisson) randomness, uniformity, independence. The individual applications are somewhat heuristic models from various fields and we incline more to terminology and notation from the applications rather than from formal statistics. However, a common thread is a geometrical representation for statistical perturbations of the basic standard states, and hence results gain qualitative stability. Moreover, the geometry is controlled by a metric structure that owes its heritage through maximum likelihood to information theory so the quantitative features---lengths of curves, geodesics, scalar curvatures etc.---have some respectable authority. We see in the applications simple models for galactic void distributions and galaxy clustering, amino acid clustering along protein chains, cryptographic protection, stochastic fibre networks, coupled geometric features in hydrology and quantum chaotic behaviour.

70 citations


Journal ArticleDOI
TL;DR: In this paper, an elementary introduction to information geometry is presented, followed by a precise geometric characterisation of the family of Gaussian density functions, and the properties of vapour liquid phase transitions are elucidated in geometrical terms.
Abstract: Using the square-root map p-->\sqrt{p} a probability density function p can be represented as a point of the unit sphere S in the Hilbert space of square-integrable functions. If the density function depends smoothly on a set of parameters, the image of the map forms a Riemannian submanifold M in S. The metric on M induced by the ambient spherical geometry of S is the Fisher information matrix. Statistical properties of the system modelled by a parametric density function p can then be expressed in terms of information geometry. An elementary introduction to information geometry is presented, followed by a precise geometric characterisation of the family of Gaussian density functions. When the parametric density function describes the equilibrium state of a physical system, certain physical characteristics can be identified with geometric features of the associated information manifold M. Applying this idea, the properties of vapour-liquid phase transitions are elucidated in geometrical terms. For an ideal gas, phase transitions are absent and the geometry of M is flat. In this case, the solutions to the geodesic equations yield the adiabatic equations of state. For a van der Waals gas, the associated geometry of M is highly nontrivial. The scalar curvature of M diverges along the spinodal boundary which envelopes the unphysical region in the phase diagram. The curvature is thus closely related to the stability of the system.

61 citations


Journal ArticleDOI
TL;DR: This work presents the Balanced Exploration and Exploitation Model Search (BEEM) algorithm that works very well especially for these difficult scenes and achieves significant speedups compared to the state of the art algorithms.
Abstract: The estimation of the epipolar geometry is especially difficult when the putative correspondences include a low percentage of inlier correspondences and/or a large subset of the inliers is consistent with a degenerate configuration of the epipolar geometry that is totally incorrect. This work presents the balanced exploration and exploitation model (BEEM) search algorithm, which works very well especially for these difficult scenes. The algorithm handles these two problems in a unified manner. It includes the following main features: 1) balanced use of three search techniques: global random exploration, local exploration near the current best solution, and local exploitation to improve the quality of the model, 2) exploitation of available prior information to accelerate the search process, 3) use of the best found model to guide the search process, escape from degenerate models, and define an efficient stopping criterion, 4) presentation of a simple and efficient method to estimate the epipolar geometry from two scale-invariant feature transform (SIFT) correspondences, and 5) use of the locality-sensitive hashing (LSH) approximate nearest neighbor algorithm for fast putative correspondence generation. The resulting algorithm when tested on real images with or without degenerate configurations gives quality estimations and achieves significant speedups compared to the state-of-the-art algorithms.

60 citations


Book ChapterDOI
01 Jan 2008
TL;DR: This work proposes using the geometry of the variational approximating distribution instead to speed up a conjugate gradient method for variational learning and inference, and shows significant speedups over alternative learning algorithms.
Abstract: Variational methods for approximate inference in machine learning often adapt a parametric probability distribution to optimize a given objective function. This view is especially useful when applying variational Bayes (VB) to models outside the conjugate-exponential family. For them, variational Bayesian expectation maximization (VB EM) algorithms are not easily available, and gradient-based methods are often used as alternatives. Traditional natural gradient methods use the Riemannian structure (or geometry) of the predictive distribution to speed up maximum likelihood estimation. We propose using the geometry of the variational approximating distribution instead to speed up a conjugate gradient method for variational learning and inference. The computational overhead is small due to the simplicity of the approximating distribution. Experiments with real-world speech data show significant speedups over alternative learning algorithms.

54 citations


Journal ArticleDOI
TL;DR: The dynamics of learning in a neighborhood of the singular regions when the true teacher machine lies at the singularity is analyzed, both for the standard gradient (SGD) and natural gradient (NGD) methods.
Abstract: The dynamical behavior of learning is known to be very slow for the multilayer perceptron, being often trapped in the "plateau." It has been recently understood that this is due to the singularity in the parameter space of perceptrons, in which trajectories of learning are drawn. The space is Riemannian from the point of view of information geometry and contains singular regions where the Riemannian metric or the Fisher information matrix degenerates. This paper analyzes the dynamics of learning in a neighborhood of the singular regions when the true teacher machine lies at the singularity. We give explicit asymptotic analytical solutions (trajectories) both for the standard gradient (SGD) and natural gradient (NGD) methods. It is clearly shown, in the case of the SGD method, that the plateau phenomenon appears in a neighborhood of the critical regions, where the dynamical behavior is extremely slow. The analysis of the NGD method is much more difficult, because the inverse of the Fisher information matrix diverges. We conquer the difficulty by introducing the "blow-down" technique used in algebraic geometry. The NGD method works efficiently, and the state converges directly to the true parameters very quickly while it staggers in the case of the SGD method. The analytical results are compared with computer simulations, showing good agreement. The effects of singularities on learning are thus qualitatively clarified for both standard and NGD methods.

Proceedings ArticleDOI
23 Jun 2008
TL;DR: An online, recursive filtering technique to model linear dynamical systems that operate on the state space of symmetric positive definite matrices (tensors) that lie on a Riemannian manifold is presented.
Abstract: We present an online, recursive filtering technique to model linear dynamical systems that operate on the state space of symmetric positive definite matrices (tensors) that lie on a Riemannian manifold. The proposed approach describes a predict-and-update computational paradigm, similar to a vector Kalman filter, to estimate the optimal tensor state. We adapt the original Kalman filtering algorithm to appropriately propagate the state over time and assimilate observations, while conforming to the geometry of the manifold. We validate our algorithm with synthetic data experiments and demonstrate its application to visual object tracking using covariance features.

Posted Content
TL;DR: In this article, the authors derived physically and geometrically interesting properties of the solutions of the PME and its associated equation, such as moment-conserving projection of a solution, evolutional velocities of second moments and the convergence rate to the manifold in terms of the geodesic curves, divergence and so on.
Abstract: This paper presents new geometric aspects of the behaviors of solutions to the porous medium equation (PME) and its associated equation. First we discuss the Legendre structure with information geometry on the manifold of generalized exponential densities. Next by considering such a structure in particular on the q-Gaussian densities, we derive several physically and geometrically interesting properties of the solutions. They include, for example, characterization of the moment-conserving projection of a solution, evaluation of evolutional velocities of the second moments and the convergence rate to the manifold in terms of the geodesic curves, divergence and so on.

Journal ArticleDOI
TL;DR: In this article, a curvature condition, introduced by Ma, Trudinger and Wang in relation with the regularity of optimal transport, is shown to be stable under Gromov-Hausdorff limits, even though the condition implicitly involves fourth order derivatives of the Riemannian metric.

BookDOI
TL;DR: In this paper, the authors give an overview of variational geodesics in semi-Riemannian manifolds, from refinements of classical results to updated variational settings, and focus on two fundamental problems of this approach, which regards geodesic connectedness.
Abstract: Geodesics become an essential element of the geometry of a semi-Riemannian manifold. In fact, their differences and similarities with the (positive definite) Riemannian case, constitute the first step to understand semi-Riemannian Geometry. The progress in the last two decades has become impressive, being especially relevant the systematic introduction of (infinite-dimensional) variational methods. Our purpose is to give an overview, from refinements of classical results to updated variational settings. First, several properties (and especially completeness) of geodesics in some ambient spaces are studied. This includes heuristic constructions of compact incomplete examples, geodesics in warped, GRW or stationary spacetimes, properties in surfaces and spaceforms, or problems on stability of completeness. Then, we study the variational framework, and focus on two fundamental problems of this approach, which regards geodesic connectedness. The first one deals with a varia- tional principle for stationary manifolds, and its recent implementation inside Causality Theory. The second one concerns orthogonal splitting manifolds, and a reasonably self- contained development is provided, collecting some steps spread in the literature.

Book ChapterDOI
TL;DR: In this article, the complex-valued Ray-Singer torsion, the Milnor-Turaev torsions, and the dynamical Torsion are discussed.
Abstract: Riemannian Geometry, Topology and Dynamics permit to introduce partially defined holomorphic functions on the variety of representations of the fundamental group of a manifold. The functions we consider are the complex-valued Ray-Singer torsion, the Milnor-Turaev torsion, and the dynamical torsion. They are associated essentially to a closed smooth manifold equipped with a (co)Euler structure and a Riemannian metric in the first case, a smooth triangulation in the second case, and a smooth flow of type described in Section 2 in the third case. In this paper we define these functions, describe some of their properties and calculate them in some case. We conjecture that they are essentially equal and have analytic continuation to rational functions on the variety of representations. We discuss what we know to be true. As particular cases of our torsion, we recognize familiar rational functions in topology such as the Lefschetz zeta function of a diffeomorphism, the dynamical zeta function of closed trajectories, and the Alexander polynomial of a knot. A numerical invariant derived from Ray-Singer torsion and associated to two homotopic acyclic representations is discussed in the last section.

Journal ArticleDOI
TL;DR: It is shown that the hyperbolicity of a non-maximally symmetric 6N-dimensional statistical manifold ℳs underlying an ED Gaussian model describing an arbitrary system of 3N degrees of freedom leads to linear information-geometric entropy growth and to exponential divergence of the Jacobi vector field intensity, quantum and classical features of chaos respectively.
Abstract: A new information-geometric approach to chaotic dynamics on curved statistical manifolds based on Entropic Dynamics (ED) is proposed. It is shown that the hyperbolicity of a non-maximally symmetric 6N-dimensional statistical manifold ℳs underlying an ED Gaussian model describing an arbitrary system of 3N degrees of freedom leads to linear information-geometric entropy growth and to exponential divergence of the Jacobi vector field intensity, quantum and classical features of chaos respectively.

Journal ArticleDOI
TL;DR: In this article, the authors studied the geometry and regularity of Lorentzian manifolds under natural curvature and volume bounds, and established several injectivity radius estimates at a point or on the past null cone of a point.
Abstract: Motivated by the application to general relativity we study the geometry and regularity of Lorentzian manifolds under natural curvature and volume bounds, and we establish several injectivity radius estimates at a point or on the past null cone of a point. Our estimates are entirely local and geometric, and are formulated via a reference Riemannian metric that we canonically associate with a given observer (p, T) –where p is a point of the manifold and T is a future-oriented time-like unit vector prescribed at p only. The proofs are based on a generalization of arguments from Riemannian geometry. We first establish estimates on the reference Riemannian metric, and then express them in terms of the Lorentzian metric. In the context of general relativity, our estimate on the injectivity radius of an observer should be useful to investigate the regularity of spacetimes satisfying Einstein field equations.

Journal ArticleDOI
TL;DR: In this paper, the authors show how information geometry, the natural geometry of discrete probability distributions, can be used to derive the quantum formalism based on three elementary features of quantum phenomena, namely complementarity, measurement simulability, and global gauge invariance.
Abstract: In this paper, we show how information geometry, the natural geometry of discrete probability distributions, can be used to derive the quantum formalism. The derivation rests upon three elementary features of quantum phenomena, namely complementarity, measurement simulability, and global gauge invariance. When these features are appropriately formalized within an information geometric framework, and combined with a novel information-theoretic principle, the central features of the finite-dimensional quantum formalism can be reconstructed.

Journal ArticleDOI
Alain Connes1
TL;DR: In this paper, the relative position of two von Neumann algebras in Hilbert space is measured and combined with the spectrum of the Dirac operator, giving a complete invariant of Riemannian geometry.
Abstract: We introduce an invariant of Riemannian geometry which measures the relative position of two von Neumann algebras in Hilbert space, and which, when combined with the spectrum of the Dirac operator, gives a complete invariant of Riemannian geometry. We show that the new invariant plays the same role with respect to the spectral invariant as the Cabibbo--Kobayashi--Maskawa mixing matrix in the Standard Model plays with respect to the list of masses of the quarks.

Journal ArticleDOI
TL;DR: A kernel rule for classification on the manifold based on n independent copies of (X,Y) is introduced and it is shown that this kernel rule is consistent in the sense that its probability of error converges to the Bayes risk with probability one.
Abstract: Let X be a random variable taking values in a compact Riemannian manifold without boundary, and let Y be a discrete random variable valued in {0; 1} which represents a classification label. We introduce a kernel rule for classification on the manifold based on n independent copies of (X, Y ). Under mild assumptions on the bandwidth sequence, it is shown that this kernel rule is consistent in the sense that its prob- ability of error converges to the Bayes risk with probability one.

Book ChapterDOI
Chengxi Ye1, Jia Liu1, Chun Chen1, Mingli Song1, Jiajun Bu1 
09 Dec 2008
TL;DR: A novel algorithm for speech emotion classification is presented that considers the relations between simple features by incorporating covariance matrices as the new feature descriptors and is able to train one simple model to accurately differentiate the emotions from both genders.
Abstract: We present a novel algorithm for speech emotion classification. In contrast to previous methods, we additionally consider the relations between simple features by incorporating covariance matrices as the new feature descriptors. Since non-singular covariance matrices do not lie on a linear space, we endow the space with an affine invariance metric and render it into a Riemannian manifold. After that we use the tangent space to approximate the manifold. Classification is performed in the tangent space and a generalized principal component analysis is presented. We test the algorithm on speech emotion classification and the experiment results show an improvement at around 13%(+3% with PCA) in recognition accuracy. Based on that we are able to train one simple model to accurately differentiate the emotions from both genders.

Proceedings ArticleDOI
12 May 2008
TL;DR: This paper proposes calculating a low-dimensional, information based embedding of documents into Euclidean space and calculates the Fisher metric over a lower dimensional statistical manifold estimated in a nonparametric fashion from the data.
Abstract: The problem of document classification considers categorizing or grouping of various document types. Each document can be represented as a bag of words, which has no straightforward Euclidean representation. Relative word counts form the basis for similarity metrics among documents. Endowing the vector of term frequencies with a Euclidean metric has no obvious straightforward justification. A more appropriate assumption commonly used is that the data lies on a statistical manifold, or a manifold of probabilistic generative models. In this paper, we propose calculating a low-dimensional, information based embedding of documents into Euclidean space. One component of our approach motivated by information geometry is the Fisher information distance to define similarities between documents. The other component is the calculation of the Fisher metric over a lower dimensional statistical manifold estimated in a nonparametric fashion from the data. We demonstrate that in the classification task, this information driven embedding outperforms both a standard PCA embedding and other Euclidean embeddings of the term frequency vector.

Posted Content
TL;DR: In this article, the basic class of almost complex manifolds with Norden metric is considered and the curvature properties of the investigated manifolds are studied, and the isotropic Kaehler type of investigated manifold is introduced and characterized geometrically.
Abstract: The basic class of the non-integrable almost complex manifolds with Norden metric is considered. Its curvature properties are studied. The isotropic Kaehler type of investigated manifolds is introduced and characterized geometrically.

Book ChapterDOI
TL;DR: In this article, the spectrum of the differential form Laplacian on a Riemannian foliated manifold is analyzed when the metric on the ambient manifold is blown up in directions normal to the leaves (in the adiabatic limit).
Abstract: We present some recent results on the behavior of the spectrum of the differential form Laplacian on a Riemannian foliated manifold when the metric on the ambient manifold is blown up in directions normal to the leaves (in the adiabatic limit).

Journal ArticleDOI
TL;DR: In this paper, the authors describe potential theory on Riemannian manifolds, concentrating on Liouville-type theorems and their relationships with the parabolicity and stochastic completeness of the underlying manifold.
Abstract: We describe some aspects of potential theory on Riemannian manifolds, concentrating on Liouville-type theorems and their relationships with the parabolicity and stochastic completeness of the underlying manifold. Some generalizations of these concepts to the case of non-linear operators are also discussed.

Book ChapterDOI
15 Sep 2008
TL;DR: A Riemannian metric for the joint distribution of the state-action, which is directly linked with the average reward, is proposed and derived, and a new NPG named "Natural State-action Gradient"(NSG) is derived).
Abstract: The parameter space of a statistical learning machine has a Riemannian metric structure in terms of its objective function. [1] Amari proposed the concept of "natural gradient" that takes the Riemannian metric of the parameter space into account. Kakade [2] applied it to policy gradient reinforcement learning, called a natural policy gradient (NPG). Although NPGs evidently depend on the underlying Riemannian metrics, careful attention was not paid to the alternative choice of the metric in previous studies. In this paper, we propose a Riemannian metric for the joint distribution of the state-action, which is directly linked with the average reward, and derive a new NPG named "Natural State-action Gradient"(NSG). Then, we prove that NSG can be computed by fitting a certain linear model into the immediate reward function. In numerical experiments, we verify that the NSG learning can handle MDPs with a large number of states, for which the performances of the existing (N)PG methods degrade.

Journal ArticleDOI
TL;DR: In this article, a complete Riemannian manifold X with negative curvature satisfying − b 2 ⩽ K X⩽ − a 2 0 for some constants a, b, is naturally mapped in the space of probability measures on the ideal boundary ∂X by assigning the Poisson kernels.
Abstract: A complete Riemannian manifold X with negative curvature satisfying − b 2 ⩽ K X ⩽ − a 2 0 for some constants a , b , is naturally mapped in the space of probability measures on the ideal boundary ∂X by assigning the Poisson kernels. We show that this map is embedding and the pull-back metric of the Fisher information metric by this embedding coincides with the original metric of X up to constant provided X is a rank one symmetric space of non-compact type. Furthermore, we give a geometric meaning of the embedding.

Proceedings Article
01 Jan 2008
TL;DR: In this paper, a local Euclidean embedding is identified by whitening the tangent space, which leads to an additive parameter update sequence that approximates the geodesic flow to the optimal density model.
Abstract: We propose two strategies to improve the optimization in information geometry. First, a local Euclidean embedding is identified by whitening the tangent space, which leads to an additive parameter update sequence that approximates the geodesic flow to the optimal density model. Second, removal of the minor components of gradients enhances the estimation of the Fisher information matrix and reduces the computational cost. We also prove that dimensionality reduction is necessary for learning multidimensional linear transformations. The optimization based on the principal whitened gradients demonstrates faster and more robust convergence in simulations on unsupervised learning with synthetic data and on discriminant analysis of breast cancer data.

Journal ArticleDOI
TL;DR: The central results consist of justifying the use of relative entropy as the uniquely natural criterion to select a preferred approximation from within a family of trial parameterized distributions, and to obtain the optimal approximation by marginalizing over parameters using the method of maximum entropy and information geometry.
Abstract: We develop a maximum relative entropy formalism to generate optimal approximations to probability distributions. The central results consist of (a) justifying the use of relative entropy as the uniquely natural criterion to select a preferred approximation from within a family of trial parameterized distributions, and (b) to obtain the optimal approximation by marginalizing over parameters using the method of maximum entropy and information geometry. As an illustration we apply our method to simple fluids. The “exact” canonical distribution is approximated by that of a fluid of hard spheres. The proposed method first determines the preferred value of the hard-sphere diameter, and then obtains an optimal hard-sphere approximation by a suitably weighed average over different hard-sphere diameters. This leads to a considerable improvement in accounting for the soft-core nature of the interatomic potential. As a numerical demonstration, the radial distribution function and the equation of state for a Lennard-Jones fluid (argon) are compared with results from molecular dynamics simulations.