A Riemannian Framework for Tensor Computing
Summary (9 min read)
1 Introduction
- Positive definite symmetric matrices (so-called tensors in this article) are often encountered in image processing, for instance as covariance matrices for characterizing statistics on deformations, or as an encoding of the principal diffusion directions in Diffusion Tensor Imaging (DTI).
- Even when a SVD is performed to smooth independently the rotation (eigenvectors basis trihedron) and eigenvalues, there is a continuity problem around equal eigenvalues.
- This statistical framework was then reorganized and extended in [Pennec, 1999, Pennec, 2004] for general Riemannian manifolds, invariance properties leading in some case to a natural choice for the metric.
- The authors show how this theory can be applied to tensors, leading to a new intrinsic computing framework for these geometric features with many important theoretical properties as well as practical computing properties.
- The authors show that this problem can be tackled efficiently through a weighted mean optimization.
2 Statistics on geometric features
- The authors summarize in this Section the theory of statistics on Riemannian manifolds developed in [Pennec, 1999, Pennec, 2004].
- The aim is to exemplify the fact that choosing a Riemannian metric “automatically” determines a powerful framework to work on the manifold through the use of a few tools from differential geometry.
- To compute the length of the curve, the authors can proceed as usual by integrating this value along the curve.
- The distance between two points of a connected Riemannian manifold is the minimum length among the curves joining these points.
- This means that the manifold has no boundary nor any singular point that the authors can reach in a finite time.
2.1 Exponential chart
- From the theory of second order differential equations, the authors know that there exists one and only one geodesic starting from that point with this tangent vector.
- The function that maps to each vector −→xy ∈.
- TxM the point y of the manifold that is reached after a unit time by the geodesic starting at x with this tangent vector is called the exponential map.
- On the sphere S2(1) for instance, the cut locus of a point x is its antipodal point and the tangential cut locus is the circle of radius π.
- Geodesics starting from x are straight lines, and the distance from the reference point are conserved.
2.2 Practical implementation
- In fact, most of the usual operations using additions and subtractions may be reinterpreted in a Riemannian framework using the notion of bipoint, an antecedent of vector introduced during the 19th Century.
- Indeed, one defines vectors as equivalent classes of bipoints (oriented couples of points) in a Euclidean space.
- This is possible because the authors have a canonical way (the translation) to compare what happens at two different points.
- In a Riemannian manifold, the authors can still compare things locally (by parallel transportation), but not any more globally.
- Conversely, the logarithmic map may be used to map almost any bipoint (x, y) into a vector −→xy = logx(y) of TxM.
2.3 Basic statistical tools
- The Riemannian metric induces an infinitesimal volume element on each tangent space, and thus a measure dM on the manifold that can be used to measure random events on the manifold and to define the probability density function (if it exists) of these random elements.
- With the probability measure of a random element, the authors can integrate functions from the manifold to any vector space, thus defining the expected value of this function.
- As for the mean, the authors chose in [Pennec, 1996, Pennec, 1999, Pennec, 2004] a variational approach to generalize the Normal Law: they define it as the distribution that minimizes the information knowing the mean and the covariance.
- The relation between the concentration matrix (the “metric” used in the exponential of the probability density function) and the covariance matrix is slightly more complex than the simple inversion of the vectorial case, as it has to be corrected for the curvature of the manifold.
- This opens the way to the generalization of many other statistical tests, as the authors may expect similarly simple approximations for sufficiently centered distributions.
3 Working on the Tensor space
- Let us now focus on the space Sym+n of positive definite symmetric matrices .
- The goal is to find a Riemannian metric with interesting enough properties.
- This leads to a very regular manifold structure where tensors with null and infinite eigenvalues are both at an infinite distance of any positive definite symmetric matrix: the cone of positive definite symmetric matrices is replaced by a space which has an infinite development in each of its n(n + 1)/2 directions.
- Moreover, there is one and only one geodesic joining any two tensors, and the authors can even define globally consistent orthonormal coordinate systems of tangent spaces.
- Thus, the structure the authors obtain is very close to a vector space, except that the space is curved.
3.1 Exponential, logarithm and square root of tensors
- In the case of symmetric matrices, the authors have some important simplifications.
- The series defining the exponential function converges for any matrix argument, but this is generally not the case for the series defining its inverse function: the logarithm.
- Thus, the matrix exponential realizes a one-to-one mapping between the space of symmetric matrices to the space of tensors.
- The square root is always defined and moreover unique: let Σ = U D2UT be a diagonalization (with positives values for the di’s).
3.3 An invariant Riemannian metric
- Another way to determine the invariant distance is through the Riemannian metric.
- Let us take the most simple scalar product on the tangent space at the identity matrix: if W1 and W2 are tangent vectors (i.e. symmetric matrices, not necessarily definite nor positive), the authors define the scalar product to be the standard matrix scalar product 〈W1 |W2 〉 = Tr(W T1 W2).
- To find the geodesic without going though the computation of Christoffel symbols, the authors may rely on a result from differential geometry [Gamkrelidze, 1991, Helgason, 1978, Kobayashi and Nomizu, 1969] which says that the geodesics for the invariant metrics on affine symmetric spaces are generated by the action of the one-parameter subgroups of the acting Lie group1.
- Introducing this solution into the equation of geodesics (Eq. 3) does not lead to a very tractable expression.
- Thanks to the invariance by the linear group, this property holds for the distance to any (positive definite) tensor of the manifold.
3.4 Exponential and logarithm maps
- This mapping is called the exponential map, because it corresponds to the usual exponential in some matrix groups.
- As explained in Section 2.1, these charts can be viewed as the development of the manifold onto the tangent space along the geodesics.
- Moreover, as the manifold has a non-positive curvature [Skovgaard, 1984], there is no cut-locus and the statistical properties detailed in [Pennec, 2004] hold in their most general form.
- The authors have the existence and uniqueness of the mean of any distribution with a compact support [Kendall, 1990].
3.5 Induced and orthonormal coordinate systems
- One has to be careful because the coordinate system of all these charts is not orthonormal.
- Even if this basis is orthonormal at some points of the manifold (such as at the identity for their tensors), it has to be corrected for the Riemannian metric at other places due to the manifold curvature.
- In their case, the transformation Σ1/2 ∈ GLn is moreover uniquely defined (as a positive square root) and is a smooth function of Σ over the complete tensor manifold.
- This group action approach was chosen in earlier works [Pennec, 1996, Pennec and Thirion, 1997, Pennec and Ayache, 1998] with what the authors called the placement function.
- On the sphere, there is a singularity at the antipodal point of the chosen origin for any otherwise smooth placement function.
3.6 Gradient descent and PDEs: an intrinsic geodesic marching
- The principle of a first order gradient descent is to go toward the steepest descent, in the direction opposite to the gradient for a short time-step ε, and iterate the process.
- The standard operator Σt+1 = Σt−εWt is only valid for very short time-steps in the flat Euclidean matrix space, and the authors could easily go out of the cone of positive definite tensors.
- A much more interesting numerical operator is given by following the geodesic backward starting at Σ with tangent vector.
- This intrinsic gradient descent ensures that the authors cannot leave the manifold.
- This intrinsic scheme is trivially generalized to partial differential evolution equations (PDEs) on tensor fields such as ∂tΣ(x, t) = −W (x, t): the authors obtain Σ(x, t+dt) = expΣ(x,t)(−dtW (x, t)).
3.7 Example with the mean value
- The Karcher or Fréchet mean is the set of tensors minimizing the sum of squared distances: C(Σ) = ∑N i=1 dist 2(Σ, Σi).
- In the case of tensors, the manifold has a non-positive curvature[Skovgaard, 1984], so that there is one and only one mean value Σ̄ [Kendall, 1990].
- Thus, the intrinsic Newton gradient descent algorithm gives the following mean value at estimation step t +.
- This gradient descent algorithm usually converges very fast (about 10 iterations, see Fig. 2).
3.8 Simple statistical operations on tensors
- As described in [Pennec, 2004], the authors may generalize most of the usual statistical methods by using the exponential chart at the mean point.
- One can generate a random Gaussian tensor using the following procedure: the authors sample n(n+1)/2 independent and normalized real Gaussian samples, multiply the corresponding vector by the square root of the desired covariance matrix (expressed in their Vec coordinate system), and come back to the tensor manifold using the inverse Vec mapping.
- This theorem states that the empirical mean of N independently and identically distributed (IID) random variables with a variance γ2 asymptotically follows a Gaussian law of variance γ2/N , centered at the exact mean value.
- Moreover, a Kolmogorov-Smirnov test confirms that the distance between the empirical and theoretical cumulative pdf is not significant (p-value of 0.19).
4 Tensor Interpolation
- One of the important operations in geometric data processing is to interpolate values between known measurements.
- In 3D image processing, (tri-) linear interpolation is often used thanks to its very low computational load and comparatively much better results than nearest neighbor interpolation.
- A typical example where the convolution kernel has an infinite support is the sinus cardinal interpolation.
- To ensure that this is an interpolating function, one has to require that wi(xj) = δij (where δij is the Kronecker symbol).
- Typical examples in triangulations or tetrahedrizations are barycentric and natural neighbor coordinates [Sibson, 1981].
4.1 Interpolation through weighted mean
- To generalize interpolation methods defined using weighted means to their tensor manifold, let us assume that the sample weights wk(x) are defined as above in Rd.
- Thanks to their normalization, the value f(x) interpolated from vectors fk verifies ∑N i=1 wi(x)(fi−f(x)) =.
- Of course, the authors loose in general the existence and uniqueness properties.
- For positive weights, the existence and uniqueness theorems for the Karcher mean can be adapted.
4.2 Example of the linear interpolation
- The linear interpolation is simple as this is a walk along the geodesic joining the two tensors.
- The authors displayed in Fig. 3 the flat and the Riemannian interpolations between 2D tensors of eigenvalues (5,1) horizontally and (1,50) at 45 degrees, along with the evolution of the eigenvalues, their mean (i.e. trace of the matrix) and product (i.e. determinant of the matrix or volume of the ellipsoid).
- With the standard matrix coefficient interpolation, the evolution of the trace is perfectly linear (which was expected since this is a linear function of the coefficients), and the principal eigenvalue regularly grows almost linearly, while the smallest eigenvalue slightly grows toward a local maxima before lowering.
- What is much more annoying is that the determinant (i.e. the volume) does not grow regularly in between the two tensors, but goes through a maximum.
- On the contrary, one can clearly see a regular evolution of the eigenvalues and of their product with the interpolation in their Riemannian space.
4.3 Tri-linear interpolation
- The bi- and tri-linear interpolation of tensors on a regular grid in 2D or 3D are almost as simple, except that the authors do not have any longer an explicit solution using geodesics since there are more than two reference points.
- After computing the (bi-) tri-linear weights with respect to the neighboring sites of the point the authors want to evaluate, they now have to go through the iterative optimization of the weighted mean (Eq. 5) to compute the interpolated tensor.
- One can see that the volume of the tensors is much more important with the classical than with the Riemannian interpolation.
- The authors also get a much smoother interpolation of the principal directions with their method.
4.4 Interpolation of non regular measurements
- When tensors are not measured on a regular grid but “randomly” localized in space, defining neighbors becomes an issue.
- One solution, proposed by [Sibson, 1981] and later used for surfaces by [Cazals and Boissonnat, 2001], is the natural neighbor interpolation.
- For any point x, its natural neighbors are the points of {xi} whose Voronoi cells are chopped off upon insertion of x into the Voronoi diagram.
- Moreover, the authors have little control on the quality of this approximation.
5 Filtering tensor fields
- Let us now consider that the authors have a tensor field, for instance like in Diffusion Tensor Imaging (DTI) [Le Bihan et al., 2001], where the tensor is a first order approximation of the anisotropic diffusion of the water molecules at each point of the imaged tissues.
- In the brain, the diffusion is much favored in the direction of oriented structures (fibers of axons).
- The tensor field obtained from the images is noisy and needs to be regularized before being further analyzed.
- The generalization to tensor fields is quite straightforward using once again weighted means (Section 5.1 below).
5.1 Gaussian Filtering
- Like previously, this weighted mean can be solved on their manifold using their intrinsic gradient descent scheme.
- One can see a more important blurring of the corpus callosum fiber tracts using the flat metric.
- The integration of this filtering scheme into a complete fiber tracking system would be necessary to fully evaluate the pros and cons of each metric.
5.2 Spatial gradient of Tensor fields
- The linearity of the derivatives implies that the authors could use directional derivatives in more than the d orthogonal directions.
- This is especially well adapted to stabilize the discrete computations: the finite difference estimation of the directional derivative is ∂uF (x) = F (x+ u)−F (x).
- The authors experimentally found in other applications (e.g. to compute the Jacobian of a deformation field in non-rigid registration [Rey et al., 2002, p. 169]) that this gradient approximation scheme was more stable and much faster than computing all derivatives using convolutions, for instance by the derivative of the Gaussian.
- To quantify the local amount of variability independently of the space direction, one usually takes the norm of the gradient: ‖∇F (x)‖2 = ∑d i=1 ‖∂iF (x)‖ (7) Notice that this approximation is consistent with the previous one only if the directions u are normalized to unity.
- For a manifold valued field Σ(x) define on Rd, the authors can proceed similarly, except that the directional derivatives ∂iΣ(x) are now tangent vectors of TΣ(x)M.
5.3 Filtering using PDEs
- Regularizing a scalar, vector or tensor field F aims at reducing the amount of its spatial variations.
- The vector case Let us decompose their vector field F (x) into its n scalar components fi(x).
- Thus, choosing an orthonormal coordinate system on the space Rn, their regularization criterion is decoupled into n independent scalar regularization problems: Reg(F )(x) = n∑ i=1 ∫ Ω ‖∇fi(x)‖2 dx = n∑ i=1 Reg(fi).
- The authors may avoid the introduction of additional complex mathematical tools by coming back to the basic definitions.
- For the numerical computation of the Laplacian, the authors may approximate the first and second order tensor derivative by their Euclidean derivatives.
5.4 Anisotropic filtering
- In practice, the authors would like to filter within the homogeneous regions, but not across their boundaries.
- (14) Figures 7 and 8 present example results of this very simple anisotropic filtering scheme on synthetic and real DTI images.
- The anisotropic smoothing perfectly preserves the discontinuity while completely smoothing each region.
- Thus, similarly to the Euclidean mean of identically and independently distributed measurements, the authors expect the standard deviation of the regularized tensors to be roughly 7 ' √ 48 times smaller than the one of the noisy input tensors.
- One can see that the tensors are regularized in “homogeneous” regions (ventricles, temporal areas), while the main tracts are left unchanged.
6 Regularization and restoration of tensor fields
- The pure diffusion is efficient to reduce the noise in the data, but it also reduces the amount of information.
- At an infinite diffusion time, the tensor field will be completely homogeneous (or homogeneous by part for some anisotropic diffusion schemes), with a value corresponding to the mean of the measurements over the region (with Neumann boundary conditions).
- Thus, the absolute minimum of their regularization criterion alone is of little interest.
- To keep close to the measured tensor field Σ0(x) while still regularizing, a more theoretically grounded approach is to consider an optimization problem with a competition between a data attachment term and a possibly non-linear anisotropic regularization term: C(Σ) = Sim(Σ, Σ0) + λ Reg(Σ).
6.1 The regularization term
- To preserve the discontinuities, the gradient of this criterion (the Laplacian) may be tailored to prevent the smoothing across them, as the authors have done in Section 5.4.
- There is no more convergence guarantee, since this anisotropic regularization“force”may not derive from a well-posed criterion .
- This scheme was used in [Chefd’hotel et al., 2004] with the flat Euclidean metric on tensors, in conjunction with a geometric numerical integration scheme that preserves the rank of the matrix.
- This produces the eigenvalue swelling effect they observed.
- The authors may compute the directional derivatives using finite differences in the flat matrix space and use the intrinsic evolution scheme, but they believe that there are more efficient ways to do it using the exponential map.
6.2 A least-squares attachment term
- Usually, one considers that the data (e.g. a scalar image or a displacement vector field F0(x)) are corrupted by a uniform Gaussian noise independent at each spatial position.
- On the tensor manifold, assuming a uniform Gaussian noise independent at each position also leads to a least-squares criterion through a maximum likelihood approach.
6.3 A least-squares attachment term for sparsely distributed ten-
- Now, let us consider the case where the authors do not have a dense measure of their tensor field, but only N measures Σi at irregularly distributed sample points xi.
- The tensor field Σ(x) is related to the data only at the measurement points xi through the Dirac distributions δ(x−xi).
- If the introduction of distributions may be dealt with for the theoretical differentiation of the criterion with respect to the continuous tensor field Σ, it is a real problem for the numerical implementation.
6.4 Interpolation through diffusion
- With the sparse data attachment term (16) and the isotropic first order regularization term (10), the authors are looking for a tensor field that minimizes its spatial variations while interpolating (or more precisely approximating at the desired precision) the measurement values: C(Σ) = N∑ i=1 ∫ Ω Gσ(x− xi) dist2 (Σ(xi) , Σi) + λ ∫ Ω ‖∇Σ(x)‖2Σ(x) dx.
- Last but not least, the authors need an initialization of the tensor field Σ0(x) to obtain a fully operational algorithm.
- Thus, the authors have a potentially destructive competition between the two terms of the criterion in very localized area.
- In that case, the Laplacian regularization will spread from these boundaries with no constraints until it reaches the counterbalancing forces of the data attachment term in the immediate vicinity of the sparse measurements.
7 Conclusion
- The authors propose in this paper an affine invariant metric that endows the space of positive define symmetric matrices with a very regular manifold structure.
- The authors exemplify some the the good metric properties for some simple statistical operations.
- A second contribution of the paper is the application of this framework to important geometric data processing problem such as interpolation, filtering, diffusion and restoration of tensor fields.
- This framework is based on the choice of a Riemannian metric on one side, which leads to powerful differential geometry tools such as the the exponential maps and geodesic marching techniques, and on the transformation of linear combinations or integrals into minimization problems on the other side.
- The authors are currently investigating another very interesting application field in collaboration with P. Thompson and A. Toga at UCLA: the modeling and analysis of the variability of the brain anatomy.
Did you find this useful? Give us your feedback
Citations
6,999 citations
Cites background or methods from "A Riemannian Framework for Tensor C..."
...In the maximum entropy characterisation of Pennec et al. (2006), the matrix H is known as a concentration matrix and is analogous to the inverse of a covariance matrix....
[...]
...In particular, it is not really clear how to optimally and efficiently resample (interpolate) the tensor field A(x) such that the positive definite (and other) properties are best retained (Pennec et al., 2006; Arsigny et al., 2006c)....
[...]
1,137 citations
1,117 citations
Cites background from "A Riemannian Framework for Tensor C..."
...The same metric was proposed in Pennec et al. (2005) from a different viewpoint....
[...]
...In order to segment images with and without texture, a feature vector including the square root of the structure tensor and the intensity was defined in Rousson et al. (2003): f (x) = ( I, I 2x1 |∇ I | , I 2x2 |∇ I | , 2Ix1 Ix2 |∇ I | )T ....
[...]
1,044 citations
Cites background from "A Riemannian Framework for Tensor C..."
...…Symþd can be formulated as a connected Riemannian manifold, and an invariant Riemannian metric on the tangent space of Symþd is given by [35] < y; z >X¼ trace X 1 2yX 1zX 1 2 : ð13Þ The exponential map associated to the Riemannian metric expXðyÞ ¼ X 1 2 exp X 1 2yX 1 2 X 1 2 ð14Þ is a global…...
[...]
...…section, the distance between two points on Symþd is measured by substituting (15) into (13) d2ðX;YÞ ¼ < logXðYÞ; logXðYÞ >X ¼ trace log2 X 12YX 12 : ð18Þ We note that an equivalent form of the affine-invariant distance metric was first given in [15] in terms of joint eigenvalues of X and Y....
[...]
804 citations
References
12,560 citations
"A Riemannian Framework for Tensor C..." refers background in this paper
...The basic idea is to penalize the smoothing in the directions where the derivative is important (Perona and Malik, 1990; Gerig et al., 1992)....
[...]
6,321 citations
"A Riemannian Framework for Tensor C..." refers background or methods in this paper
...To find the geodesic without going though the computation of Christoffel symbols, we may rely on a result from differential geometry (Gamkrelidze, 1991; Helgason, 1978; Kobayashi and Nomizu, 1969) which says that the geodesics for the invariant metrics on affine symmetric spaces are generated by the action of the one-parameter subgroups of the acting Lie group....
[...]
...It is implicitly hidden under very general theorems on symmetric spaces in many differential geometry textbooks (Kobayashi and Nomizu 1969; Helgason 1978; Gamkrelidze, 1991) and sometimes considered as a well known result as in Bhatia (2003)....
[...]
5,641 citations
4,249 citations
Related Papers (5)
Frequently Asked Questions (16)
Q2. What are the common types of symmetric matrices?
Positive definite symmetric matrices (so-called tensors in this article) are often encountered in image processing, for instance as covariance matrices for characterizing statistics on deformations, or as an encoding of the principal diffusion directions in Diffusion Tensor Imaging (DTI).
Q3. What is the main problem with symmetric positive definite matrices?
As symmetric positive definite matrices constitute a convex half-cone in the vector space of matrices, many usual operations (like the mean) are stable in this space.
Q4. How can the authors generate a random Gaussian tensor?
For instance, the authors can generate a random (generalized) Gaussian tensor using the following procedure: the authors sample n(n+1)/2 independent and normalized real Gaussian samples, multiply the corresponding vector by the square root of the desired covariance matrix (expressed in their Vec coordinate system), and come back to the tensor manifold using the inverse Vec mapping.
Q5. What is the way to compute the directional derivatives?
The authors may compute the directional derivatives using finite differences in the flat matrix space and use the intrinsic evolution scheme, but the authors believe that there are more efficient ways to do it using the exponential map.
Q6. What is the Riemannian interpolation of the eigenvalues?
With the standard matrix coefficient interpolation, the evolution of the trace is perfectly linear (which was expected since this is a linear function of the coefficients), and the principal eigenvalue regularly grows almost linearly, while the smallest eigenvalue slightly grows toward a local maxima before lowering.
Q7. How do the authors get the geodesic starting from any other point of the manifold?
Using the invariance of their metric, the authors easily obtain the geodesic starting from any other point of the manifold using their group action:
Q8. What are the main problems that arise when estimating tensors from data?
problems arise when estimating tensors from data (in standard DTI, the estimated symmetric matrix could have negative eigenvalues), or when smoothing fields of tensors: the numerical schemes used to solve the Partial Differential Equation (PDE) may sometimes lead to negative eigenvalues if the time step is not small enough.
Q9. What is the intrinsic integration scheme for PDEs?
An intrinsic integration scheme for PDEs that uses the exponential map has been added in [Chefd’hotel et al., 2002], and allows to perform PDEs evolution on the considered manifold without re-projections.
Q10. What is the way to keep close to the measured tensor field?
To keep close to the measured tensor field Σ0(x) while still regularizing, a more theoretically grounded approach is to consider an optimization problem with a competition between a data attachment term and a possibly non-linear anisotropic regularization term:
Q11. What is the tangential cut locus of a point x?
On the sphere S2(1) for instance, the cut locus of a point x is its antipodal point and the tangential cut locus is the circle of radius π.
Q12. What is the weighting function of the Laplace Beltrami operator?
If c(.) is a weighting function decreasing from c(0) = 1 to c(+∞) = 0, this can be realized directly in the discrete implementation of the Laplacian (Eq. 13): the contribution ∆uΣ of the spatial direction u to the Laplace-Beltrami operator is weighted by their decreasing function according to the norm ‖∂uΣ‖Σ of the gradient in that direction.
Q13. What is the procedure for a tensor field (x) ?
For a tensor field Σ(x) ∈ Sym+n over Rd, the procedure is more complex as the authors should use the covariant derivative (the connection) to differentiate vectors fields on their manifold.
Q14. What is the problem around equal eigenvalues?
Even when a SVD is performed to smooth independently the rotation (eigenvectors basis trihedron) and eigenvalues, there is a continuity problem around equal eigenvalues.
Q15. What is the directional derivative of HSim(F)?
This time, the directional derivative ∂HSim(F ) is directly expressed using a scalar product with H in the proper functional space, so that the steepest ascent direction is ∇Sim(F ) = 2 (F (x)− F0(x)).
Q16. What is the reinterpretation of addition and subtraction using logarithmic and?
This reinterpretation of addition and subtraction using logarithmic and exponential maps is very powerful to generalize algorithms working on vector spaces to algorithms on Riemannian manifolds, as illustrated by Table 1.