What are the contributions mentioned in the paper "Manifold optimization over the set of doubly stochastic matrices: a second-order geometry" ?

This paper is interested in solving a subset of convex optimization using a different approach. The paper introduces three manifolds to solve convex programs under particular box constraints.

What is the explanation for the performance of the symmetric multinomial?

one can note that the symmetric multinomial performs better than the positive one which can be explained by the fact that the optimal solution has vanishing eigenvalues which make the retraction on the cone of positive matrices non-efficient.

How many optimization problems are set up using regularizers?

For each of the manifolds, an optimization problem is set up using regularizers is order to reach the optimal solution to optimization problem (70) with M = SP+n .

What is the retraction of the set of doubly stochastic matrices?

Although the projection on the set of doubly stochastic matrices is difficult [33], this paper proposes a highly efficient retraction that take advantage of the structure of both the manifold and its tangent space.

What is the reason why the objective function is simpler than the manifold?

This can be explained by the fact that not only the objective function (72) is simpler than (73) but also by the fact that the manifold contains less degrees of freedom which makes the projections more efficient.

What is the retraction of the definite symmetric multinomial manifold?

Unlike the previous retractions that rely on the Euclidean structure of the embedding space, this retraction is obtained by direct computation of the properties of the retraction given in Section II.

What is the plot of the proposed framework?

The plot reveals that the proposed framework is highly efficient in high dimension with significant gain over the specialized algorithm.

Why is the symmetric manifold more efficient than the unconstrained one?

This is mainly due to the fact that the Riemannian optimization approach convert a constrained optimization into an unconstrained one over a constrained set.

What is the gain in performance of the proposed algorithm?

The gain in performance can be explained by the fact that the proposed method uses the geometry of the problem efficiently unlike generic solvers which convert the problem in a standard form and solve it using standard methods.

(Open Access) Manifold Optimization Over the Set of Doubly Stochastic Matrices: A Second-Order Geometry (2019) | Ahmed Douik

Q: What is the general idea behind unconstrained Euclidean numerical optimization methods?

The general idea behind unconstrained Euclidean numerical optimization methods is to start with an initial point X0 and to iteratively update it according to certain predefined rules in order to obtain a sequence {Xt} which converges to a local minimizes of the objective function.

Q: What is the role of the tangent space in the optimization algorithms?

The tangent space plays a primordial role in the optimization algorithms over manifold in the same way as the derivative of a function plays an important role in Euclidean optimization.

arXiv:1802.02628v1 [math.OC] 7 Feb 2018

Manifold Optimization Over the Set of Doubly

Stochastic Matrices: A Second-Order Geometry

Ahmed Douik, Student Member, IEEE and Babak Hassibi, Member, IEEE

Abstract—Convex optimization is a well-established research

area with applications in almost all ﬁelds. Over the decades, mul-

tiple approaches have been proposed to solve convex programs.

The development of interior-point methods allowed solving a

more general set of convex programs known as semi-deﬁnite

programs and second-order cone programs. However, it has been

established that these methods are excessively slow for high

dimensions, i.e., they suffer from the curse of dimension ality.

On the other hand, optimization algorithms on manifold have

shown great ability in ﬁnding solutions to nonconvex problems

in reasonable time. This paper is interested in solving a subset

of convex optimization using a di f ferent approach. The main

idea behind Riemannian optimization is to view the constrained

optimization problem as an unconstrained one over a restricted

search space. The paper introduces three manifolds to solve

convex programs under particu lar box constraint s. The mani-

folds, called the doubly stochastic, symmetric and the deﬁnit e

multinomial manifolds, generalize the simplex also known as the

multinomial manifold. The proposed manifolds and algorithms

are well-adapted to solving convex programs in which the

variable of interest is a multidimensional probability distribution

function. Theoretical analysis and simulation results testify the

efﬁciency of the proposed method over state of t he art methods. In

particular, they reveal that t he proposed framework outperforms

conventional generic and specialized solvers, especially in high

dimensions.

Index Terms—Riemannian manifolds, symmetric doubly

stochastic matrices, positive matrices, convex optimization.

I. INTRODUCTION

Numerical optimization is the foundation of various en gi-

neering and compu tational sciences. Consider a mapping f

from a subset D of R

to R. The goal of the optimization

algorithm s is to ﬁn d an extreme point x

∗

∈ D such that

f(x

∗

) ≤ f (y) for all point y ∈ N

∗

in the neighborho od of

∗

. Unconstrained Euclidean

optimization refers to the setup

in which the domain of the objective function is the whole

space, i.e., D = R

. On th e other hand, co nstrained Euclidean

optimization denotes optimization problem in which the search

set is constrained, i.e., D ( R

Convex optimization is a special case o f constrained opti-

mization problem s in which both the objective function and

the search set are convex. Historically initiated with the study

of least-squares and linear programming problems, convex

optimization plays a crucial role in optimization algorith m

thanks to the desirable convergence property it exhibits. The

development of interior-point methods allowed solving a more

general set o f convex programs kn own as semi-deﬁnite pro-

grams and second-order cone programs. A summary of convex

optimization methods and performanc e a nalysis can be found

in the seminal bo ok [1].

Ahmed Douik and Babak Hassibi are with the Department of Electrical

Engineering, California Institute of Technology, Pasadena, CA 91125 USA

(e-mail: {ahmed.douik,hassibi}@caltech.edu).

The traditional optimization schemes are identiﬁed with the word Eu-

clidean in contrast with the Riemannian algorithm in the rest of the paper.

Another impo rtant property of convex optimization is that

the inter ior of the search space can be identiﬁed with a

manifold that is embedded in a higher-dimensional Euclidean

space. Using advanced tools to solve the constrained opti-

mization, e.g., [2], requires solving on the high dimension

space which can be excessively slow. Riemannian optimization

takes advantage of the fact that the manifold is of lower

dimension a nd exploits its und erlying geometric structure.

The optimization problem is reformulated from a constraine d

Euclidean optimization into an unconstrained optimization

over a restricted search space, a.k.a., a Riemannian manifold.

Thanks to the aforementioned low-dimension feature, opti-

mization over Riemannian m anifolds is expected to perform

more efﬁciently [3]. Therefore, a large body of literature ded-

icated to adapting traditional Euclidean o ptimization methods

and their convergence properties to Riemannian manifolds.

This paper introduces a framework for solving optimization

problems in whic h the optimization variable is a do ubly

stochastic matrix. Such framework is particularly interesting

for clustering applications. I n such pr oblems ,e.g., [4]–[7], one

wishes to recover the structure of a graph given a similarity

matrix. The recovery is performed by minimizing a p rede-

ﬁned cost function under the constrain t that the optimization

variable is a doubly stochastic matrix. This work provides a

uniﬁed framework to carry such optimization.

A. State of the Art

Optimization algorithms on Riemannian manifolds appeared

in the o ptimization literature as early as the 1970’s with the

work of Lue nberger [8] wherein the standard Newton’s opti-

mization method has been adap te d to problems on manifolds.

A decade later, Gabay [9] introduces the steepest de scent and

the quasi-Newton algorithm on embedde d submanifolds of

. The work investigates th e globa l and local convergence

properties of both the steepest descent and the Newton’s

method. The analysis of the steepest descen t and the Newton

algorithm is extended in [10], [11] to Riemannian manifolds.

By using exact line sear ch, the auth ors con cluded the conver-

gence of their pro posed algorithms. The assumption is relaxed

in [12] wherein the author provides convergence rate and

guaran tees for the steepest descent and Newton’s m ethod for

Armijo step-size control.

The above-mentioned works substitute the concept of the

line search in Euclidian algorithms by searchin g along a

geodesic which g e neralizes the idea of a straight line. While

the m e thod is natur al and intuitive, it might not b e practical.

Indeed , ﬁnding the expression of the geodesic requires com-

puting the exponential map which may be as complicated as

solving the original optimization problem [ 13]. To overcome

the p roblem, the authors in [14] suggest ap proximating the

exponential map up to a given order, called a retraction,

and show quadratic convergence for Newton’s method under

such setu p. The work initiated more sophisticated op timization

algorithm such as the trust region methods [3], [15]–[18].

Analysis of the convergence of ﬁrst and second order methods

on Riemannian manifolds, e.g., gradient and conjugate gradi-

ent d e scent, Newton’s method, and trust region method s, using

general retractions are summarized in [13].

Thanks to the theoretical convergence guarantees me ntioned

above, the optimiz a tion algorithms o n Riemannian manifolds

are gradually gaining momentum in the optimization ﬁeld [3].

Several successful algorithms have been proposed to solve

non-convex problems, e.g., the low-rank matrix completion

[19]–[21], online learning [22], clustering [23], [24] and

tensor decomposition [25]. It is worth mentioning that these

works modify the optimization algorithm by using a general

connection instead of the genuine parallel vecto r transport to

move from a tangent space to the other while computing the

(approximate) Hessian. Such approach conserves the global

convergence of the quasi-Newton schem e but no longer en-

sures their superline ar c onvergence behavior [26].

Despite the advantages cited above, the use o f optimization

algorithm s o n manifold s is relatively limited. This is mainly

due to the lack of a systematic mech anism to turn a constrained

optimization problem into an optimization over a manifold

provided th a t the search spac e forms a m anifold, e.g., convex

optimization. Such re formulatio n, usually requiring some level

of understand ing of differential geometry and Riemannian

manifolds, is prohibitively complex for regular use. This paper

addresses the problem by introducing new manifo lds th a t

allow solving a non-negligible class of optimization problem

in which the variable of interest can be identiﬁed with a

multidimen sio nal probability distribution function.

B. Contributions

In [25], in a context of tensor decomposition, the authors

propose a framework to optimize fu nctions in which the

variable are stochastic m atrices. This paper proposes extending

the results to a more general class of manifolds b y proposing a

framework fo r solving a subset of convex pro grams includ ing

those in which the optimization variable represents a doubly

stochastic and possibly symmetric and/or deﬁnite multidimen-

sional probability distribution function. To this end, the paper

introdu ces three manifolds which generalize the multinomial

manifold. While the multinomial manifold allows represen t-

ing only stochastic matrices, the proposed o nes characterize

doubly stochastic, symmetric and deﬁnite arrays, respectively.

Therefore, the proposed framework allows solving a subset of

convex programs. To the best of the author’s knowledge, the

proposed manifolds have not been introduced or studied in th e

literature.

The ﬁrst part of the manuscript introduces all relevant

concepts of the Riemannian ge ometry and provides insights

on the optimization alg orithms on such manif olds. In an effort

to make the content of this document accessible to a larger

audience, it does not assume any p rerequisite on differential

geometry. As a resu lt, the d eﬁnitions, concepts, and results in

this pape r are tailored to the manifold of interest and may not

be applicable for abstract manifolds.

The paper investigates the ﬁrst and second order Rieman-

nian geometry of the proposed manifolds endowed with the

Fisher information me tric which guarantees that the manifolds

have a differentiab le structure. For e ach manifo ld, the tangent

space, Riemannian gradient, He ssian, and retraction are de-

rived. With the aforeme ntioned expressions, the manuscript

formu late s ﬁrst and a second order optimization algorithms

and characterizes their complexity. Simu la tion results are

provided to further illustrate the efﬁciency of the proposed

method against state of the art algorithms.

The rest of the manuscript is organize d as fo llows: Section II

introdu ces the optimization algorithms on manifolds and lists

the problems of interest in this paper. In Section III, the doub ly

stochastic manifold is introduced and its ﬁrst a nd second

order geometry derived. Section IV iterate a similar study

to a particular case of doubly stoc hastic matrices known as

the symmetric manifo ld. The study is extended to the deﬁnite

symmetric manifold in Section V. Section VI suggests ﬁrst and

second order algorithms and analyze their complexity. Finally,

before concluding in Section VIII, the sim ulation results a re

plotted and discussed in Section VII.

II. OPTIMIZATION ON RIEMANNIAN MANIFOLDS

This section introduces the numerical optimization methods

on smooth matrix manifolds. The ﬁrst part introduces the

Riemannian manifold notations and operations. The second

part extends the ﬁr st an d second order Euclidean optimization

algorithm to the Riemannian man ifolds and introduces the

necessary mac hinery. Finally, the problems of interest in this

paper are provided and the different manifolds identiﬁed.

A. Manifold Notation and Operations

The study of optimization algorithms on smooth manifolds

engaged a signiﬁcant attention in the previous years. However,

such studies re quire some level of knowledge of differen-

tial geometry. In this paper, only smooth embedded matrix

manifolds ar e con sid ered. Hence, the deﬁnitions and th eorems

may no t apply to abstract m anifolds. In addition, the authors

opted for a coordinate free analysis omitting the chart and the

differentiable structu re of the manif old. For a n introduction

to differential geometry, abstract manifold, and Riemannian

manifolds, we refer the readers to the following re ferences

[27]–[29], respectively.

An embedded matrix manifold M is a smooth subset of a

vector space E included in the set of matrices R

n×m

. The set

E is called the ambient or the embedding space. By smooth

subset, w e mean tha t the M can be mapped by a bijective

function, i.e. , a chart, to an open subset of R

where d is

called the dimension of the manifold. The dimension d can

be thought of as the degree of freedom of the manifold. In

particular, a vector space E is a manifold.

In the same line of though of approximating a function

locally by its derivative, a manif old M of dimension d can be

approximated locally at a point X by a d-dimensional vector

space T

M generated by taking derivatives o f all smooth

curves going through X. Formally, let γ(t) : I ⊂ R −→ M

(t)

(0)

x + T

Fig. 1. Tangent space of a 2-dimensional manifold embedded in R

. The

tangent space T

M is computed by taking derivatives of the curves going

through X at the origin.

be a curve on M with γ(0) = X. Deﬁne the derivative of

γ(t) at zero as follows:

′

(0) = lim

t→0

γ(t) −γ(0)

. (1)

The space generated by all γ

′

(0) represents a vector space

M called the tangent space of M at X. Figure 1 shows

an example of a two-dime nsion tan gent space generated by a

couple of curves. The tange nt space plays a primordial role in

the optimiza tion algorithms over manifold in the same way

as the derivative of a function plays an important role in

Euclidean optimization. The union of all tangent spaces T M

is referred to as the tangent bundle of M, i.e.,:

T M =

[

X∈M

M. (2)

As shown p reviously, the notion of tangent space generalizes

the notion o f directional derivative. However, to optimize

functions, one need s the notion of dire ctions and lengths which

can be achieved by endowing each tangent space T

M by a

bilinear, symmetric positive form h., .i

, i.e., an inner product.

Let g : T M×T M −→ R be a smoothly varying bilinear form

such that its restriction on each tangent spa ce is the previously

deﬁned inner prod uct. In other words:

g(ξ

, η

) = hξ

, η

, ∀ ξ

, η

∈ T

M (3)

Such metric, known as the Riemannian metric, turns the

manifold into a Riemannian ma nifold. Any manifold (in this

paper) admits at least a Riemannian metric. Lengths of tangent

vectors are naturally induced from the inner pr oduct. The norm

on the tangent space T

M is denoted by ||.||

and deﬁned

by:

||ξ

hξ

, ξ

, ∀ ξ

∈ T

M (4)

Both the ambient space and the ta ngent space being vector

spaces, one can de ﬁne the orthogonal projection Π

: E −→

M verifying Π

◦Π

= Π

. The projection is said to be

orthogonal with respect to the restriction of the Riemannian

metric to the tangen t space, i. e ., Π

is orthogonal in the h., .i

sens.

B. First an d Second Order Alg orithms

The general id ea behind unconstrained Euclidean numerical

optimization methods is to start with an initial point X

and

to iteratively update it according to certain predeﬁned rules in

order to obtain a sequence {X

} which converges to a local

minimizes of the objective func tion. A typical update strategy

is the following:

t+1

= X

+ α

, (5)

where α

is the step size and p

the search direction. Let

Grad f(X) be the Euclidea n g radient

of the objective func-

tion deﬁned as the unique vector satisfying:

hGrad f(X), ξi = Df(X)[ξ], ∀ ξ ∈ E, (6)

where h., .i is the inn er product on the vector space E and

Df(X)[ξ] is the directional deriva tive of f given by:

Df(X)[ξ] = lim

t→0

f(X + tξ) − f(X)

(7)

In order to obtain a descent dir ection, i.e. , f (X

t+1

) <

f(X

) for a small enough step size α

, the search direction p

is chosen in the half space spanned by −Grad f(X). In other

words, the following expression holds:

hGrad f(X

), p

i < 0. (8)

In particular, the choices of the search direction satisfying

= −

Grad f(X

)

||Grad f(X

)||

(9)

Hess f(X

)[p

] = Grad f (X) (10)

yield the celebrated steepest descent (9) and the Newton’s

method (10), wherein Hess f(X)[ξ] is the Euclide a n Hessian

of f at X deﬁned as an operator from E to E satisfying:

1) hHess f(X)[ξ], ξi = D

f(X)[ξ, ξ] = D(Df(X)[ξ])[ξ],

2) hHess f(X)[ξ], ηi = hξ, Hess f (X)[η]i, ∀ ξ, η ∈ E.

After choosing the search direction, the step size α

chosen so as to satisfy the Wolfe conditions for some constant

∈ (0, 1) and c

∈ (c

, 1), i.e.,

1) The Armijo condition:

f(X

+ α

) − f (X

) ≤ c

hGrad f(X

), p

i (11)

2) The curvature condition:

hGrad f(X

+ α

), p

i ≥ c

. (12)

The Riemannian version of the steepest descent, called the

line-search algorithm, follows a similar logic as the Euclidean

one. The search direction is obtained with respect to the

Riemannian gradient which is deﬁned in a similar manner

as the Euclidean one with the exceptio n that it uses the

Riemannian geometry, i.e.,:

Deﬁnition 1. The Riemannian gradient o f f at X denoted by

grad f (X) of a manifold M, is deﬁned as the unique vector

in T

M that satisﬁes:

hgrad f(X), ξ

= Df(X)[ξ

], ∀ ξ

∈ T

M. (14)

The expression of the Euclidean gradient (denoted by Grad) is explicitly

given to show the analogy with the Riemannian gradient (denoted by grad).

The nabla symbol ∇ is not used in the context of gradient as it is reserved

for the Riemannian connection. Similar notations are used for the Hessian.

The Euclidean Hessian is seen as an operator to show the connection with

the Riemanian Hessian. One can show that the proposed deﬁnition matches

the “usual” second order derivative matrix for ξ = I.

Algorithm 1 Line-Search Method on Riemannian Manifold

Require: Manifold M, function f, and retraction R.

1: Initialize X ∈ M.

2: while ||grad f(X)||

≥ ǫ do

3: Choose search direction ξ

∈ T

M such that:

hgrad f(X), ξ

< 0. (13)

4: Compute Armijo step size α.

5: Retract X = R

(αξ

6: end while

7: Output X.

After choosing the search direc tion a s mandated by (8), the

step size is selected according to Wolfe’s conditions similar

to the one in (11) and (12). A more general deﬁnition of a

descent direction, known as gradient related sequence, an d the

Riemannian Armijo step expression can be foun d in [13].

While the update step X

t+1

= X

+ α

is trivial in the

Euclidean optimization thanks to its vector space structure, it

might result on a point X

t+1

outside of the manifold. Moving

on a given direction of a tangent space while staying on the

manifold is realized by the concept of retrac tion. The ideal

retraction is the exp onential map Exp

as it maps point a

tangent vector ξ

∈ T

M to a point along the geodesic

curve (straight line on the manifold) that goes through X

in the direction of ξ

. However, computing the geodesic

curves is ch a llenging and may be more difﬁcult that the

original optimiz ation problem. Luckily, one can use a ﬁrst-

order retraction (called simply retraction in this paper) without

compromising the convergence property of the algorithms. A

ﬁrst-order retraction is deﬁned as follows:

Deﬁnition 2. A retraction on a manifold M is a smooth

mapping R from the tangent bundle T M onto M. For all

X ∈ M, the restriction of R to T

M, called R

satisfy the

following properties:

• Centering: R

(0) = X.

• Local rigidity: The curve γ

(τ) = R

(τξ

) satisfy

dγ

(τ)

dτ



τ =0

= ξ

, ∀ ξ

∈ T

For some predeﬁned Armijo step size, the procedure above

is guaranteed to converge for all re tractions [13]. The general-

ization of the steepest descent to the Riemannian manifold is

obtained by ﬁnding the sear c h direction that satisﬁes similar

equation as in the E uclidean scenario (9) using the Rieman -

nian gradient. T he u pdate is then retracted to the manifold.

The steps of the line-searc h method can be su mmarized in

Algorithm 1 and an illustration of an iteration of the algorithm

is given in Figure 2.

Generalizing the Newton’s method to the Riemannian set-

ting requires computing the Rieman nian Hessian operator

which requires taking a directional derivative of a vector

ﬁeld. As the vector ﬁeld be long to different tangent spaces,

one needs the notion of connection ∇ that generalizes the

notion of directional derivative of a vector ﬁeld. The notion

of connection is intimately related to the notion of vector

transport which allows moving from a tangent space to the

(α

)

t+1

+ T

γ(t)

Exp

(α

)

Fig. 2. The update step for the two-dimensional sphere embedded in R

The update direction ξ

and step length α

are computed in the tangent

space T

M. The point X

+ α

lies outside the manifold and needs

to be retracted to obtain the update X

t+1

. The update is not located on the

geodesic γ(t) due to the use of a retraction instead of the exponential map.

x + T

(ξ

) + T

(ξ

)

(ξ

)

(ξ

)

Fig. 3. An illustration of a vector transport T on a two-dimensional m anifold

embedded in R

that connects the tangent space of X with tangent vector

with the one of its retraction R

(ξ

). A connection ∇ can be obtained

from the speed at the origin of the inverse of the vector transport T

−1

other as shown in Figure 3. The deﬁnition of a connection is

given below:

Deﬁnition 3. An afﬁne connection ∇ is a mapping from

T M×T M to T M that associate to each (η, ξ) the tangent

vector ∇

ξ satisfying for all smoo th f, g : M −→ R,

a, b ∈ R:

• ∇

f(η)+g(χ)

ξ = f(∇

ξ) + g(∇

ξ)

• ∇

(aξ + bϕ) = a∇

ξ + b∇

• ∇

(f(ξ)) = ξ(f )η + f (∇

ξ),

wherein the vector ﬁ eld ξ acts on the func tion f by derivation,

i.e., ξ(f) = D(f )[ξ] also noted as ξf in the literature.

On a Riemannian manifold, the Levi-Civita is the canonical

choice as it preserve the Riemannian metric. The connection

is computed as:

Algorithm 2 Newton’s method on Riemannian Manifold

Require: Manifold M, function f, retr a ction R, and afﬁne

connection ∇.

1: Initialize X ∈ M.

2: while ||grad f(X)||

≥ ǫ do

3: Find descent direc tion ξ

∈ T

M such that:

hess f(X)[ξ

] = −grad f(X), (17)

wherein hess f(X)[ξ

] = ∇

grad f(X)

4: Retract X = R

(ξ

5: end while

6: Output X.

Deﬁnition 4. The Levi-Civita connection is the unique afﬁne

connection on M with the Reiman nian metric h., .i that satisfy

for all η, ξ, χ ∈ T M:

1) ∇

ξ − ∇

η = [η, ξ ]

2) χhη, ξi = h∇

η, ξi + hη, ∇

ξi,

where [ξ, η] is the Lie bracket, i.e., a function from the set of

smooth function to itself deﬁned by [ξ, η]g = ξ(η(g))−η(ξ(g)).

For the manifolds of interest in this paper, the Lie bracket

can b e written as the implicit dire ctional differentiation

[ξ, η] = D(η)[ξ] − D(ξ)[η]. The expression of the Levi-Civita

can be computed using the Koszul formula:

2h∇

η, ξi = χhη, ξi + ηhξ, χi − ξhχ, ηi

− hχ, [η, ξ]i + hη , [ξ, χ]i + hξ, [χ, η]i (15)

Note tha t c onnections and particularity the Levi-Civita, are

deﬁned for all vector ﬁelds on M. However for the purpose

of this paper, only the tangent bundle is of interest. With the

above notion of connection, the Riemannian Hessian can be

written as:

Deﬁnition 5. The Riemannian Hessian of f at X, denoted by

hess f(X), of a manifold M is a mapping from T

M into

itself deﬁned by:

hess f(X)[ξ

] = ∇

grad f (X), ∀ ξ

∈ T

M, (16)

where grad f (X) is the Riemannian gradient and ∇ is the

Riemannian connection on M.

It can readily be veriﬁed that the Riemannian Hessian verify

similar property as the Euclidean one, i. e . for all ξ

, η

∈

M, we have

hhess f(X)[ξ

], η

= hξ

, hess f(X)[η

Remark 1. The name of Riemannian gradient and Hessian is

due to the fact that the function f can be app roximated in a

neighborhood o f X by the following:

f(X + δX) = f (X) + hgrad f(X), Exp

−1

(δX)i

(18)

hhess f(X)[Exp

−1

(δX)], Exp

−1

(δX)i

Using the above deﬁnition s, the generalization o f Newton’s

method to Riemannian optimization is done by replacing both

the Euclidean gradient an d Hessian by the ir Riemannian coun-

terpart in (10). Hence, the sear c h d irection is the tangen t vector

that satisﬁes hess f(X)[ξ

] = −grad f(X). The update

is found by retraction th e tangent vector to the manifo ld. The

steps of the algorithm are illustrated in Algorithm 2.

C. P roblems of Interest

As shown in the previous section, computin g the Rieman-

nian gradie nt and Hessian for a given function over som e

manifold M allows the design of efﬁcient algorithms that

exploit the geometrical structure of the problem. Th e paper’s

main contribution is to propose a framework for solving

a subset of convex programs including those in which the

optimization variable rep resents a doubly stochastic and pos-

sibly symmetric and/or deﬁnite multidimensional probability

distribution function.

In particular, the paper derives the relationship between

the Euclidean gradient an d Hessian and their Riemannian

counterpart for the manifolds of doubly stochastic matri-

ces, symmetric stoc hastic matrices, and sym metric positive

stochastic matrices. In other words, for a convex function

f : R

n×m

−→ R, the paper proposes solving the following

problem:

min f(X) (19a)

s.t. X

> 0, ∀ 1 ≤ i ≤ n, 1 ≤ j ≤ m, (19b)

j=1

= 1, ∀ 1 ≤ i ≤ n, (19c)

i=1

= 1, ∀ 1 ≤ j ≤ m, (19d)

X = X

, (19e)

X ≻ 0, (19f)

wherein constraints ( 19b)-(19c) produce a stochastic matrix,

(19b)-(19d) a doubly stochastic one, (19b)-(19e) a symmetric

stochastic one, and (19b)-(19f) a deﬁnite symmetric matrix.

While the ﬁrst scenario is studied in [25], the next sections

study each pr oblem, respectively. Let 1 be the all ones vector

and deﬁne the multinomial, d oubly stochastic mu ltinomial,

symmetric multinomial, and deﬁnite multinomial, respectively,

as follows:



X ∈ R

n×m



> 0, X1 = 1





X ∈ R

n×n



> 0, X1 = 1, X

1 = 1





X ∈ R

n×n



> 0, X1 = 1, X = X





X ∈ R

n×n



> 0, X1 = 1, X = X

, X ≻ 0



For all the above manifolds, the paper uses the Fisher infor-

mation as the Riemannian metric g those restriction on T

is deﬁned by:

g(ξ

, η

) = hξ

, η

= Tr((ξ

⊘ X)(η

)

) (20)

i=1

j=1

(ξ

)

(η

)

, ∀ ξ

, η

∈ T

Endowing the multinomial with the Fishe r information as

Riemannian metric gives the manifold a differential structure

that is invariant over the choice of coordin a te system. More

informa tion about the Fisher information metric and its use

in information goemetry can be found in [30]. Using the

Manifold Optimization Over the Set of Doubly Stochastic Matrices: A Second-Order Geometry

Figures

Citations

Geoopt: Riemannian Optimization in PyTorch

Probabilistic Permutation Synchronization Using the Riemannian Structure of the Birkhoff Polytope

Geometry-aware Domain Adaptation for Unsupervised Alignment of Word Embeddings

Geometry-aware domain adaptation for unsupervised alignment of word embeddings.

High order discriminant analysis based on Riemannian optimization

References

Convex Optimization

Neural networks for pattern recognition

Numerical Optimization

The Matrix Cookbook

Optimization Algorithms on Matrix Manifolds

Related Papers (5)

Optimization Algorithms on Matrix Manifolds

Heterogeneous Tensor Decomposition for Clustering via Manifold Optimization

Manopt, a matlab toolbox for optimization on manifolds

Sinkhorn Distances: Lightspeed Computation of Optimal Transport

Doubly Stochastic Normalization for Spectral Clustering

Frequently Asked Questions (12)

Q1. What are the contributions in "Manifold optimization over the set of doubly stochastic matrices: a second-order geometry" ?

Q2. What are the contributions mentioned in the paper "Manifold optimization over the set of doubly stochastic matrices: a second-order geometry" ?

Q3. What is the general idea behind unconstrained Euclidean numerical optimization methods?

Q4. What is the explanation for the performance of the symmetric multinomial?

Q5. How many optimization problems are set up using regularizers?

Q6. What is the retraction of the set of doubly stochastic matrices?

Q7. What is the reason why the objective function is simpler than the manifold?

Q8. What is the role of the tangent space in the optimization algorithms?

Q9. What is the retraction of the definite symmetric multinomial manifold?

Q10. What is the plot of the proposed framework?

Q11. Why is the symmetric manifold more efficient than the unconstrained one?

Q12. What is the gain in performance of the proposed algorithm?