scispace - formally typeset
Open AccessJournal ArticleDOI

Manifold Optimization Over the Set of Doubly Stochastic Matrices: A Second-Order Geometry

TLDR
In this paper, a Riemannian optimization approach is proposed to solve a subset of convex optimization problems where the optimization variable is a doubly stochastic matrix and the set of doubly-stochastic matrices is crucial for multiple communications and signal processing applications.
Abstract
Over the decades, multiple approaches have been proposed to solve convex programs The development of interior-point methods allowed solving a more general set of convex programs known as semi-definite and second-order cone programs However, these methods are excessively slow for high dimensions On the other hand, optimization algorithms on manifolds have shown great abilities in finding solutions to non-convex problems in a reasonable time This paper suggests using a Riemannian optimization approach to solve a subset of convex optimization problems wherein the optimization variable is a doubly stochastic matrix Optimization over the set of doubly stochastic matrices is crucial for multiple communications and signal processing applications, especially graph-based clustering The paper introduces and investigates the geometries of three convex manifolds, namely the doubly stochastic, the symmetric, and the definite multinomial manifolds which generalize the simplex, also known as the multinomial manifold Theoretical complexity analysis and numerical simulation results testify the efficiency of the proposed method over state-of-the-art algorithms In particular, they reveal that the proposed framework outperforms conventional generic and specialized approaches, especially in high dimensions

read more

Content maybe subject to copyright    Report

arXiv:1802.02628v1 [math.OC] 7 Feb 2018
1
Manifold Optimization Over the Set of Doubly
Stochastic Matrices: A Second-Order Geometry
Ahmed Douik, Student Member, IEEE and Babak Hassibi, Member, IEEE
Abstract—Convex optimization is a well-established research
area with applications in almost all fields. Over the decades, mul-
tiple approaches have been proposed to solve convex programs.
The development of interior-point methods allowed solving a
more general set of convex programs known as semi-definite
programs and second-order cone programs. However, it has been
established that these methods are excessively slow for high
dimensions, i.e., they suffer from the curse of dimension ality.
On the other hand, optimization algorithms on manifold have
shown great ability in finding solutions to nonconvex problems
in reasonable time. This paper is interested in solving a subset
of convex optimization using a di f ferent approach. The main
idea behind Riemannian optimization is to view the constrained
optimization problem as an unconstrained one over a restricted
search space. The paper introduces three manifolds to solve
convex programs under particu lar box constraint s. The mani-
folds, called the doubly stochastic, symmetric and the definit e
multinomial manifolds, generalize the simplex also known as the
multinomial manifold. The proposed manifolds and algorithms
are well-adapted to solving convex programs in which the
variable of interest is a multidimensional probability distribution
function. Theoretical analysis and simulation results testify the
efficiency of the proposed method over state of t he art methods. In
particular, they reveal that t he proposed framework outperforms
conventional generic and specialized solvers, especially in high
dimensions.
Index Terms—Riemannian manifolds, symmetric doubly
stochastic matrices, positive matrices, convex optimization.
I. INTRODUCTION
Numerical optimization is the foundation of various en gi-
neering and compu tational sciences. Consider a mapping f
from a subset D of R
n
to R. The goal of the optimization
algorithm s is to fin d an extreme point x
D such that
f(x
) f (y) for all point y N
x
in the neighborho od of
x
. Unconstrained Euclidean
1
optimization refers to the setup
in which the domain of the objective function is the whole
space, i.e., D = R
n
. On th e other hand, co nstrained Euclidean
optimization denotes optimization problem in which the search
set is constrained, i.e., D ( R
n
.
Convex optimization is a special case o f constrained opti-
mization problem s in which both the objective function and
the search set are convex. Historically initiated with the study
of least-squares and linear programming problems, convex
optimization plays a crucial role in optimization algorith m
thanks to the desirable convergence property it exhibits. The
development of interior-point methods allowed solving a more
general set o f convex programs kn own as semi-definite pro-
grams and second-order cone programs. A summary of convex
optimization methods and performanc e a nalysis can be found
in the seminal bo ok [1].
Ahmed Douik and Babak Hassibi are with the Department of Electrical
Engineering, California Institute of Technology, Pasadena, CA 91125 USA
(e-mail: {ahmed.douik,hassibi}@caltech.edu).
1
The traditional optimization schemes are identified with the word Eu-
clidean in contrast with the Riemannian algorithm in the rest of the paper.
Another impo rtant property of convex optimization is that
the inter ior of the search space can be identified with a
manifold that is embedded in a higher-dimensional Euclidean
space. Using advanced tools to solve the constrained opti-
mization, e.g., [2], requires solving on the high dimension
space which can be excessively slow. Riemannian optimization
takes advantage of the fact that the manifold is of lower
dimension a nd exploits its und erlying geometric structure.
The optimization problem is reformulated from a constraine d
Euclidean optimization into an unconstrained optimization
over a restricted search space, a.k.a., a Riemannian manifold.
Thanks to the aforementioned low-dimension feature, opti-
mization over Riemannian m anifolds is expected to perform
more efficiently [3]. Therefore, a large body of literature ded-
icated to adapting traditional Euclidean o ptimization methods
and their convergence properties to Riemannian manifolds.
This paper introduces a framework for solving optimization
problems in whic h the optimization variable is a do ubly
stochastic matrix. Such framework is particularly interesting
for clustering applications. I n such pr oblems ,e.g., [4]–[7], one
wishes to recover the structure of a graph given a similarity
matrix. The recovery is performed by minimizing a p rede-
fined cost function under the constrain t that the optimization
variable is a doubly stochastic matrix. This work provides a
unified framework to carry such optimization.
A. State of the Art
Optimization algorithms on Riemannian manifolds appeared
in the o ptimization literature as early as the 1970’s with the
work of Lue nberger [8] wherein the standard Newton’s opti-
mization method has been adap te d to problems on manifolds.
A decade later, Gabay [9] introduces the steepest de scent and
the quasi-Newton algorithm on embedde d submanifolds of
R
n
. The work investigates th e globa l and local convergence
properties of both the steepest descent and the Newton’s
method. The analysis of the steepest descen t and the Newton
algorithm is extended in [10], [11] to Riemannian manifolds.
By using exact line sear ch, the auth ors con cluded the conver-
gence of their pro posed algorithms. The assumption is relaxed
in [12] wherein the author provides convergence rate and
guaran tees for the steepest descent and Newton’s m ethod for
Armijo step-size control.
The above-mentioned works substitute the concept of the
line search in Euclidian algorithms by searchin g along a
geodesic which g e neralizes the idea of a straight line. While
the m e thod is natur al and intuitive, it might not b e practical.
Indeed , finding the expression of the geodesic requires com-
puting the exponential map which may be as complicated as
solving the original optimization problem [ 13]. To overcome
the p roblem, the authors in [14] suggest ap proximating the
exponential map up to a given order, called a retraction,

and show quadratic convergence for Newton’s method under
such setu p. The work initiated more sophisticated op timization
algorithm such as the trust region methods [3], [15]–[18].
Analysis of the convergence of first and second order methods
on Riemannian manifolds, e.g., gradient and conjugate gradi-
ent d e scent, Newton’s method, and trust region method s, using
general retractions are summarized in [13].
Thanks to the theoretical convergence guarantees me ntioned
above, the optimiz a tion algorithms o n Riemannian manifolds
are gradually gaining momentum in the optimization field [3].
Several successful algorithms have been proposed to solve
non-convex problems, e.g., the low-rank matrix completion
[19]–[21], online learning [22], clustering [23], [24] and
tensor decomposition [25]. It is worth mentioning that these
works modify the optimization algorithm by using a general
connection instead of the genuine parallel vecto r transport to
move from a tangent space to the other while computing the
(approximate) Hessian. Such approach conserves the global
convergence of the quasi-Newton schem e but no longer en-
sures their superline ar c onvergence behavior [26].
Despite the advantages cited above, the use o f optimization
algorithm s o n manifold s is relatively limited. This is mainly
due to the lack of a systematic mech anism to turn a constrained
optimization problem into an optimization over a manifold
provided th a t the search spac e forms a m anifold, e.g., convex
optimization. Such re formulatio n, usually requiring some level
of understand ing of differential geometry and Riemannian
manifolds, is prohibitively complex for regular use. This paper
addresses the problem by introducing new manifo lds th a t
allow solving a non-negligible class of optimization problem
in which the variable of interest can be identified with a
multidimen sio nal probability distribution function.
B. Contributions
In [25], in a context of tensor decomposition, the authors
propose a framework to optimize fu nctions in which the
variable are stochastic m atrices. This paper proposes extending
the results to a more general class of manifolds b y proposing a
framework fo r solving a subset of convex pro grams includ ing
those in which the optimization variable represents a doubly
stochastic and possibly symmetric and/or definite multidimen-
sional probability distribution function. To this end, the paper
introdu ces three manifolds which generalize the multinomial
manifold. While the multinomial manifold allows represen t-
ing only stochastic matrices, the proposed o nes characterize
doubly stochastic, symmetric and definite arrays, respectively.
Therefore, the proposed framework allows solving a subset of
convex programs. To the best of the author’s knowledge, the
proposed manifolds have not been introduced or studied in th e
literature.
The first part of the manuscript introduces all relevant
concepts of the Riemannian ge ometry and provides insights
on the optimization alg orithms on such manif olds. In an effort
to make the content of this document accessible to a larger
audience, it does not assume any p rerequisite on differential
geometry. As a resu lt, the d efinitions, concepts, and results in
this pape r are tailored to the manifold of interest and may not
be applicable for abstract manifolds.
The paper investigates the first and second order Rieman-
nian geometry of the proposed manifolds endowed with the
Fisher information me tric which guarantees that the manifolds
have a differentiab le structure. For e ach manifo ld, the tangent
space, Riemannian gradient, He ssian, and retraction are de-
rived. With the aforeme ntioned expressions, the manuscript
formu late s first and a second order optimization algorithms
and characterizes their complexity. Simu la tion results are
provided to further illustrate the efficiency of the proposed
method against state of the art algorithms.
The rest of the manuscript is organize d as fo llows: Section II
introdu ces the optimization algorithms on manifolds and lists
the problems of interest in this paper. In Section III, the doub ly
stochastic manifold is introduced and its first a nd second
order geometry derived. Section IV iterate a similar study
to a particular case of doubly stoc hastic matrices known as
the symmetric manifo ld. The study is extended to the definite
symmetric manifold in Section V. Section VI suggests first and
second order algorithms and analyze their complexity. Finally,
before concluding in Section VIII, the sim ulation results a re
plotted and discussed in Section VII.
II. OPTIMIZATION ON RIEMANNIAN MANIFOLDS
This section introduces the numerical optimization methods
on smooth matrix manifolds. The first part introduces the
Riemannian manifold notations and operations. The second
part extends the fir st an d second order Euclidean optimization
algorithm to the Riemannian man ifolds and introduces the
necessary mac hinery. Finally, the problems of interest in this
paper are provided and the different manifolds identified.
A. Manifold Notation and Operations
The study of optimization algorithms on smooth manifolds
engaged a significant attention in the previous years. However,
such studies re quire some level of knowledge of differen-
tial geometry. In this paper, only smooth embedded matrix
manifolds ar e con sid ered. Hence, the definitions and th eorems
may no t apply to abstract m anifolds. In addition, the authors
opted for a coordinate free analysis omitting the chart and the
differentiable structu re of the manif old. For a n introduction
to differential geometry, abstract manifold, and Riemannian
manifolds, we refer the readers to the following re ferences
[27]–[29], respectively.
An embedded matrix manifold M is a smooth subset of a
vector space E included in the set of matrices R
n×m
. The set
E is called the ambient or the embedding space. By smooth
subset, w e mean tha t the M can be mapped by a bijective
function, i.e. , a chart, to an open subset of R
d
where d is
called the dimension of the manifold. The dimension d can
be thought of as the degree of freedom of the manifold. In
particular, a vector space E is a manifold.
In the same line of though of approximating a function
locally by its derivative, a manif old M of dimension d can be
approximated locally at a point X by a d-dimensional vector
space T
X
M generated by taking derivatives o f all smooth
curves going through X. Formally, let γ(t) : I R M
2

γ
1
(t)
M
γ
2
(t)
x
γ
0
1
(0)
γ
0
2
(0)
x + T
x
M
Fig. 1. Tangent space of a 2-dimensional manifold embedded in R
3
. The
tangent space T
X
M is computed by taking derivatives of the curves going
through X at the origin.
be a curve on M with γ(0) = X. Define the derivative of
γ(t) at zero as follows:
γ
(0) = lim
t0
γ(t) γ(0)
t
. (1)
The space generated by all γ
(0) represents a vector space
T
X
M called the tangent space of M at X. Figure 1 shows
an example of a two-dime nsion tan gent space generated by a
couple of curves. The tange nt space plays a primordial role in
the optimiza tion algorithms over manifold in the same way
as the derivative of a function plays an important role in
Euclidean optimization. The union of all tangent spaces T M
is referred to as the tangent bundle of M, i.e.,:
T M =
[
X∈M
T
X
M. (2)
As shown p reviously, the notion of tangent space generalizes
the notion o f directional derivative. However, to optimize
functions, one need s the notion of dire ctions and lengths which
can be achieved by endowing each tangent space T
X
M by a
bilinear, symmetric positive form h., .i
X
, i.e., an inner product.
Let g : T T M R be a smoothly varying bilinear form
such that its restriction on each tangent spa ce is the previously
defined inner prod uct. In other words:
g(ξ
X
, η
X
) = hξ
X
, η
X
i
X
, ξ
X
, η
X
T
X
M (3)
Such metric, known as the Riemannian metric, turns the
manifold into a Riemannian ma nifold. Any manifold (in this
paper) admits at least a Riemannian metric. Lengths of tangent
vectors are naturally induced from the inner pr oduct. The norm
on the tangent space T
X
M is denoted by ||.||
X
and defined
by:
||ξ
X
||
X
=
p
hξ
X
, ξ
X
i
X
, ξ
X
T
X
M (4)
Both the ambient space and the ta ngent space being vector
spaces, one can de fine the orthogonal projection Π
X
: E
T
X
M verifying Π
X
Π
X
= Π
X
. The projection is said to be
orthogonal with respect to the restriction of the Riemannian
metric to the tangen t space, i. e ., Π
X
is orthogonal in the h., .i
X
sens.
B. First an d Second Order Alg orithms
The general id ea behind unconstrained Euclidean numerical
optimization methods is to start with an initial point X
0
and
to iteratively update it according to certain predefined rules in
order to obtain a sequence {X
t
} which converges to a local
minimizes of the objective func tion. A typical update strategy
is the following:
X
t+1
= X
t
+ α
t
p
t
, (5)
where α
t
is the step size and p
t
the search direction. Let
Grad f(X) be the Euclidea n g radient
2
of the objective func-
tion defined as the unique vector satisfying:
hGrad f(X), ξi = Df(X)[ξ], ξ E, (6)
where h., .i is the inn er product on the vector space E and
Df(X)[ξ] is the directional deriva tive of f given by:
Df(X)[ξ] = lim
t0
f(X + ) f(X)
t
(7)
In order to obtain a descent dir ection, i.e. , f (X
t+1
) <
f(X
t
) for a small enough step size α
t
, the search direction p
t
is chosen in the half space spanned by Grad f(X). In other
words, the following expression holds:
hGrad f(X
t
), p
t
i < 0. (8)
In particular, the choices of the search direction satisfying
p
t
=
Grad f(X
t
)
||Grad f(X
t
)||
(9)
Hess f(X
t
)[p
t
] = Grad f (X) (10)
yield the celebrated steepest descent (9) and the Newton’s
method (10), wherein Hess f(X)[ξ] is the Euclide a n Hessian
3
of f at X defined as an operator from E to E satisfying:
1) hHess f(X)[ξ], ξi = D
2
f(X)[ξ, ξ] = D(Df(X)[ξ])[ξ],
2) hHess f(X)[ξ], ηi = hξ, Hess f (X)[η]i, ξ, η E.
After choosing the search direction, the step size α
t
is
chosen so as to satisfy the Wolfe conditions for some constant
c
1
(0, 1) and c
2
(c
1
, 1), i.e.,
1) The Armijo condition:
f(X
t
+ α
t
p
t
) f (X
t
) c
1
α
t
hGrad f(X
t
), p
t
i (11)
2) The curvature condition:
hGrad f(X
t
+ α
t
p
t
), p
t
i c
2
. (12)
The Riemannian version of the steepest descent, called the
line-search algorithm, follows a similar logic as the Euclidean
one. The search direction is obtained with respect to the
Riemannian gradient which is defined in a similar manner
as the Euclidean one with the exceptio n that it uses the
Riemannian geometry, i.e.,:
Definition 1. The Riemannian gradient o f f at X denoted by
grad f (X) of a manifold M, is defined as the unique vector
in T
X
M that satisfies:
hgrad f(X), ξ
X
i
X
= Df(X)[ξ
X
], ξ
X
T
X
M. (14)
2
The expression of the Euclidean gradient (denoted by Grad) is explicitly
given to show the analogy with the Riemannian gradient (denoted by grad).
The nabla symbol is not used in the context of gradient as it is reserved
for the Riemannian connection. Similar notations are used for the Hessian.
3
The Euclidean Hessian is seen as an operator to show the connection with
the Riemanian Hessian. One can show that the proposed definition matches
the “usual” second order derivative matrix for ξ = I.
3

Algorithm 1 Line-Search Method on Riemannian Manifold
Require: Manifold M, function f, and retraction R.
1: Initialize X M.
2: while ||grad f(X)||
X
ǫ do
3: Choose search direction ξ
X
T
X
M such that:
hgrad f(X), ξ
X
i
X
< 0. (13)
4: Compute Armijo step size α.
5: Retract X = R
X
(αξ
X
).
6: end while
7: Output X.
After choosing the search direc tion a s mandated by (8), the
step size is selected according to Wolfe’s conditions similar
to the one in (11) and (12). A more general definition of a
descent direction, known as gradient related sequence, an d the
Riemannian Armijo step expression can be foun d in [13].
While the update step X
t+1
= X
t
+ α
t
p
t
is trivial in the
Euclidean optimization thanks to its vector space structure, it
might result on a point X
t+1
outside of the manifold. Moving
on a given direction of a tangent space while staying on the
manifold is realized by the concept of retrac tion. The ideal
retraction is the exp onential map Exp
X
as it maps point a
tangent vector ξ
X
T
X
M to a point along the geodesic
curve (straight line on the manifold) that goes through X
in the direction of ξ
X
. However, computing the geodesic
curves is ch a llenging and may be more difficult that the
original optimiz ation problem. Luckily, one can use a first-
order retraction (called simply retraction in this paper) without
compromising the convergence property of the algorithms. A
first-order retraction is defined as follows:
Definition 2. A retraction on a manifold M is a smooth
mapping R from the tangent bundle T M onto M. For all
X M, the restriction of R to T
X
M, called R
X
satisfy the
following properties:
Centering: R
X
(0) = X.
Local rigidity: The curve γ
ξ
X
(τ) = R
X
(τξ
X
) satisfy
ξ
X
(τ)
τ =0
= ξ
X
, ξ
X
T
X
M.
For some predefined Armijo step size, the procedure above
is guaranteed to converge for all re tractions [13]. The general-
ization of the steepest descent to the Riemannian manifold is
obtained by finding the sear c h direction that satisfies similar
equation as in the E uclidean scenario (9) using the Rieman -
nian gradient. T he u pdate is then retracted to the manifold.
The steps of the line-searc h method can be su mmarized in
Algorithm 1 and an illustration of an iteration of the algorithm
is given in Figure 2.
Generalizing the Newton’s method to the Riemannian set-
ting requires computing the Rieman nian Hessian operator
which requires taking a directional derivative of a vector
field. As the vector field be long to different tangent spaces,
one needs the notion of connection that generalizes the
notion of directional derivative of a vector field. The notion
of connection is intimately related to the notion of vector
transport which allows moving from a tangent space to the
x
t
α
t
ξ
t
x
R
x
t
(α
t
ξ
t
x
)
x
t+1
x
t
+ T
x
t
M
M
γ(t)
Exp
x
t
(α
t
ξ
t
x
)
Fig. 2. The update step for the two-dimensional sphere embedded in R
3
.
The update direction ξ
t
X
and step length α
t
are computed in the tangent
space T
X
t
M. The point X
t
+ α
t
ξ
t
X
lies outside the manifold and needs
to be retracted to obtain the update X
t+1
. The update is not located on the
geodesic γ(t) due to the use of a retraction instead of the exponential map.
M
x + T
x
M
R
x
(ξ
x
) + T
R
x
(ξ
x
)
M
R
x
(ξ
x
)
ξ
x
η
R
x
(ξ
x
)
x
T
Fig. 3. An illustration of a vector transport T on a two-dimensional m anifold
embedded in R
3
that connects the tangent space of X with tangent vector
ξ
X
with the one of its retraction R
X
(ξ
X
). A connection can be obtained
from the speed at the origin of the inverse of the vector transport T
1
.
other as shown in Figure 3. The definition of a connection is
given below:
Definition 3. An affine connection is a mapping from
T M×T M to T M that associate to each (η, ξ) the tangent
vector
η
ξ satisfying for all smoo th f, g : M R,
a, b R:
f(η)+g(χ)
ξ = f(
η
ξ) + g(
χ
ξ)
η
( + ) = a
η
ξ + b
η
ϕ
η
(f(ξ)) = ξ(f )η + f (
η
ξ),
wherein the vector eld ξ acts on the func tion f by derivation,
i.e., ξ(f) = D(f )[ξ] also noted as ξf in the literature.
On a Riemannian manifold, the Levi-Civita is the canonical
choice as it preserve the Riemannian metric. The connection
is computed as:
4

Algorithm 2 Newton’s method on Riemannian Manifold
Require: Manifold M, function f, retr a ction R, and affine
connection .
1: Initialize X M.
2: while ||grad f(X)||
X
ǫ do
3: Find descent direc tion ξ
X
T
X
M such that:
hess f(X)[ξ
X
] = grad f(X), (17)
wherein hess f(X)[ξ
X
] =
ξ
X
grad f(X)
4: Retract X = R
X
(ξ
X
).
5: end while
6: Output X.
Definition 4. The Levi-Civita connection is the unique affine
connection on M with the Reiman nian metric h., .i that satisfy
for all η, ξ, χ T M:
1)
η
ξ
ξ
η = [η, ξ ]
2) χhη, ξi = h∇
χ
η, ξi + hη,
χ
ξi,
where [ξ, η] is the Lie bracket, i.e., a function from the set of
smooth function to itself defined by [ξ, η]g = ξ(η(g))η(ξ(g)).
For the manifolds of interest in this paper, the Lie bracket
can b e written as the implicit dire ctional differentiation
[ξ, η] = D(η)[ξ] D(ξ)[η]. The expression of the Levi-Civita
can be computed using the Koszul formula:
2h∇
χ
η, ξi = χhη, ξi + ηhξ, χi ξhχ, ηi
hχ, [η, ξ]i + hη , [ξ, χ]i + hξ, [χ, η]i (15)
Note tha t c onnections and particularity the Levi-Civita, are
defined for all vector fields on M. However for the purpose
of this paper, only the tangent bundle is of interest. With the
above notion of connection, the Riemannian Hessian can be
written as:
Definition 5. The Riemannian Hessian of f at X, denoted by
hess f(X), of a manifold M is a mapping from T
X
M into
itself defined by:
hess f(X)[ξ
X
] =
ξ
X
grad f (X), ξ
X
T
X
M, (16)
where grad f (X) is the Riemannian gradient and is the
Riemannian connection on M.
It can readily be verified that the Riemannian Hessian verify
similar property as the Euclidean one, i. e . for all ξ
X
, η
X
T
X
M, we have
hhess f(X)[ξ
X
], η
X
i
X
= hξ
X
, hess f(X)[η
X
]i
X
,
Remark 1. The name of Riemannian gradient and Hessian is
due to the fact that the function f can be app roximated in a
neighborhood o f X by the following:
f(X + δX) = f (X) + hgrad f(X), Exp
1
X
(δX)i
X
(18)
+
1
2
hhess f(X)[Exp
1
X
(δX)], Exp
1
X
(δX)i
X
.
Using the above definition s, the generalization o f Newton’s
method to Riemannian optimization is done by replacing both
the Euclidean gradient an d Hessian by the ir Riemannian coun-
terpart in (10). Hence, the sear c h d irection is the tangen t vector
ξ
X
that satisfies hess f(X)[ξ
X
] = grad f(X). The update
is found by retraction th e tangent vector to the manifo ld. The
steps of the algorithm are illustrated in Algorithm 2.
C. P roblems of Interest
As shown in the previous section, computin g the Rieman-
nian gradie nt and Hessian for a given function over som e
manifold M allows the design of efficient algorithms that
exploit the geometrical structure of the problem. Th e paper’s
main contribution is to propose a framework for solving
a subset of convex programs including those in which the
optimization variable rep resents a doubly stochastic and pos-
sibly symmetric and/or definite multidimensional probability
distribution function.
In particular, the paper derives the relationship between
the Euclidean gradient an d Hessian and their Riemannian
counterpart for the manifolds of doubly stochastic matri-
ces, symmetric stoc hastic matrices, and sym metric positive
stochastic matrices. In other words, for a convex function
f : R
n×m
R, the paper proposes solving the following
problem:
min f(X) (19a)
s.t. X
ij
> 0, 1 i n, 1 j m, (19b)
m
X
j=1
X
ij
= 1, 1 i n, (19c)
n
X
i=1
X
ij
= 1, 1 j m, (19d)
X = X
T
, (19e)
X 0, (19f)
wherein constraints ( 19b)-(19c) produce a stochastic matrix,
(19b)-(19d) a doubly stochastic one, (19b)-(19e) a symmetric
stochastic one, and (19b)-(19f) a definite symmetric matrix.
While the first scenario is studied in [25], the next sections
study each pr oblem, respectively. Let 1 be the all ones vector
and define the multinomial, d oubly stochastic mu ltinomial,
symmetric multinomial, and definite multinomial, respectively,
as follows:
P
m
n
=
X R
n×m
X
ij
> 0, X1 = 1
DP
n
=
X R
n×n
X
ij
> 0, X1 = 1, X
T
1 = 1
SP
n
=
X R
n×n
X
ij
> 0, X1 = 1, X = X
T
SP
+
n
=
X R
n×n
X
ij
> 0, X1 = 1, X = X
T
, X 0
For all the above manifolds, the paper uses the Fisher infor-
mation as the Riemannian metric g those restriction on T
X
M
is defined by:
g(ξ
X
, η
X
) = hξ
X
, η
X
i
X
= Tr((ξ
X
X)(η
X
)
T
) (20)
=
n
X
i=1
m
X
j=1
(ξ
X
)
ij
(η
X
)
ij
X
ij
, ξ
X
, η
X
T
X
M.
Endowing the multinomial with the Fishe r information as
Riemannian metric gives the manifold a differential structure
that is invariant over the choice of coordin a te system. More
informa tion about the Fisher information metric and its use
in information goemetry can be found in [30]. Using the
5

Citations
More filters
Posted Content

Geoopt: Riemannian Optimization in PyTorch

TL;DR: The core of Geoopt is a standard Manifold interface that allows for the generic implementation of optimization algorithms, and it provides several algorithms and arithmetic methods for supported manifolds, which allow composing geometry-aware neural network layers that can be integrated with existing models.
Proceedings ArticleDOI

Probabilistic Permutation Synchronization Using the Riemannian Structure of the Birkhoff Polytope

TL;DR: In this article, a new geometric and probabilistic approach to synchronization of correspondences across multiple sets of objects or images is presented, based on the first order retraction operators.
Posted Content

Geometry-aware Domain Adaptation for Unsupervised Alignment of Word Embeddings

TL;DR: A novel manifold based geometric approach for learning unsupervised alignment of word embeddings between the source and the target languages by formulating the alignment learning problem as a domain adaptation problem over the manifold of doubly stochastic matrices.
Proceedings ArticleDOI

Geometry-aware domain adaptation for unsupervised alignment of word embeddings.

TL;DR: The authors proposed a manifold based geometric approach for learning unsupervised alignment of word embeddings between the source and the target languages, which formulates the alignment learning problem as a domain adaptation problem over the manifold of doubly stochastic matrices.
Journal ArticleDOI

High order discriminant analysis based on Riemannian optimization

TL;DR: A novel method for dimensionality reduction of high-dimensional dataset, named manifold-based high order discriminant analysis (MHODA), which takes advantage of the fact that matrix manifold is actually of low dimension embedded into the ambient space and exploits the second order geometry of trust-region method.
References
More filters
Book

Convex Optimization

TL;DR: In this article, the focus is on recognizing convex optimization problems and then finding the most appropriate technique for solving them, and a comprehensive introduction to the subject is given. But the focus of this book is not on the optimization problem itself, but on the problem of finding the appropriate technique to solve it.
Book

Neural networks for pattern recognition

TL;DR: This is the first comprehensive treatment of feed-forward neural networks from the perspective of statistical pattern recognition, and is designed as a text, with over 100 exercises, to benefit anyone involved in the fields of neural computation and pattern recognition.
Book

Numerical Optimization

TL;DR: Numerical Optimization presents a comprehensive and up-to-date description of the most effective methods in continuous optimization, responding to the growing interest in optimization in engineering, science, and business by focusing on the methods that are best suited to practical problems.

The Matrix Cookbook

TL;DR: Theodorakopoulos et al. as mentioned in this paper used the Oticon Foundation for funding their PhD studies, and they would like to thank the following for contributions and suggestions: Bill Baxter, Brian Templeton, Christian Rishoj, Christian Schroppel Douglas L. Theobald, Esben Hoegh-Rasmussen, Glynne Casteel, Jan Larsen, Jun Bin Gao, Jurgen Struckmeier, Kamil Dedecius, Korbinian Strimmer, Lars Christiansen, Lars Kai Hansen, Leland Wilkinson, Lig
Book

Optimization Algorithms on Matrix Manifolds

TL;DR: Optimization Algorithms on Matrix Manifolds offers techniques with broad applications in linear algebra, signal processing, data mining, computer vision, and statistical analysis and will be of interest to applied mathematicians, engineers, and computer scientists.
Related Papers (5)
Frequently Asked Questions (12)
Q1. What are the contributions in "Manifold optimization over the set of doubly stochastic matrices: a second-order geometry" ?

This paper is interested in solving a subset of convex optimization using a different approach. The paper introduces three manifolds to solve convex programs under particular box constraints. 

This paper is interested in solving a subset of convex optimization using a different approach. The paper introduces three manifolds to solve convex programs under particular box constraints. 

The general idea behind unconstrained Euclidean numerical optimization methods is to start with an initial point X0 and to iteratively update it according to certain predefined rules in order to obtain a sequence {Xt} which converges to a local minimizes of the objective function. 

one can note that the symmetric multinomial performs better than the positive one which can be explained by the fact that the optimal solution has vanishing eigenvalues which make the retraction on the cone of positive matrices non-efficient. 

For each of the manifolds, an optimization problem is set up using regularizers is order to reach the optimal solution to optimization problem (70) with M = SP+n . 

Although the projection on the set of doubly stochastic matrices is difficult [33], this paper proposes a highly efficient retraction that take advantage of the structure of both the manifold and its tangent space. 

This can be explained by the fact that not only the objective function (72) is simpler than (73) but also by the fact that the manifold contains less degrees of freedom which makes the projections more efficient. 

The tangent space plays a primordial role in the optimization algorithms over manifold in the same way as the derivative of a function plays an important role in Euclidean optimization. 

Unlike the previous retractions that rely on the Euclidean structure of the embedding space, this retraction is obtained by direct computation of the properties of the retraction given in Section II. 

The plot reveals that the proposed framework is highly efficient in high dimension with significant gain over the specialized algorithm. 

This is mainly due to the fact that the Riemannian optimization approach convert a constrained optimization into an unconstrained one over a constrained set. 

The gain in performance can be explained by the fact that the proposed method uses the geometry of the problem efficiently unlike generic solvers which convert the problem in a standard form and solve it using standard methods.