scispace - formally typeset
Open AccessJournal ArticleDOI

Inexact spectral projected gradient methods on convex sets

Reads0
Chats0
TLDR
A new method is introduced for large-scale convex constrained optimization and is a generalization of the Spectral Projected Gradient method (SPG), but can be used when projections are difficult to compute.
Abstract
A new method is introduced for large-scale convex constrained optimization. The general model algorithm involves, at each iteration, the approximate minimization of a convex quadratic on the feasible set of the original problem and global convergence is obtained by means of nonmonotone line searches. A specific algorithm, the Inexact Spectral Projected Gradient method (ISPG), is implemented using inexact projections computed by Dykstra's alternating projection method and generates interior iterates. The ISPG method is a generalization of the Spectral Projected Gradient method (SPG), but can be used when projections are difficult to compute. Numerical results for constrained least-squares rectangular matrix problems are presented.

read more

Content maybe subject to copyright    Report

Inexact Spectral Projected Gradient Methods on Convex Sets
Ernesto G. Birgin
Jos´e Mario Mart´ınez
Marcos Raydan
March 26, 2003
Abstract
A new method is introduced for large scale convex constrained optimization. The
general model algorithm involves, at each iteration, the approximate minimization
of a convex quadratic on the feasible set of the original problem and global conver-
gence is obta ined by means of nonmonotone line sear ches. A specific algorithm, the
Inexact Spectral Projected Gradient method (ISPG), is implemented using inexact
projections computed by Dykstra’s alternating projection method and generates inte-
rior iterates. The ISPG method is a generalization of the Spectral Projected Gradient
method (SPG), but can be used when projections are difficult to c ompute. Numerical
results for constra ined least-squares rectangular matrix problems are presented.
Key words: Convex constrained optimization, projected gradient, nonmonotone line
search, spectral gradient, Dykstra’s algorithm.
AMS Subject Classification: 49M07, 49M10, 65K, 90C06, 90C20.
1 Introduction
We consider the problem
Minimize f(x) subject to x , (1)
where is a closed convex set in IR
n
. Th roughout this paper we assume that f is defin ed
and has continuous partial derivatives on an open set that contains Ω.
The Spectral Projected Gradient (SPG) m ethod [6, 7] was recently proposed for solving
(1), especially for large-scale problems since the storage requirements are minimal. This
Department of Computer Science, Institute of Mathematics and Statistics, University of ao Paulo, Rua
do Mat˜ao 1010 Cidade Universit´aria, 05508-090 ao Paulo, SP - Brazil (egbirgin@ime.usp.br). Sponsored
by FAPESP (Grants 01/04597-4 and 02/00094-0), CNPq (Grant 300151/00-4) and Pronex.
Departamento de Matem´atica Aplicada, IMECC-UNICAMP, CP 6065, 13081-970 Campinas SP, Brazil
(martinez@ime.unicamp.br). Sponsored by FAPESP (Grant 01/04597-4), CNPq and FAEP-UNICAMP.
Departamento de Computaci´on, Facultad de Ciencias, Universidad Central de Venezuela, Ap. 47002,
Caracas 1041-A, Venezuela (mraydan@reacciun.ve). Sponsored by the Center of Scientific Computing at
UCV.
1

method has proved to be effective for very large-scale convex programming problems.
In [7] a family of location problems was described with a variable number of variables
and constraints. The SP G method was able to solve problems of this family with up to
96254 variables and up to 578648 constraints in very few seconds of compu ter time. The
computer code that implements S P G and produ ces the mentioned results is pu blished [7]
and available. More recently, in [5] an active-set method which uses SPG to leave the faces
was introduced, and bound-constrained problems with up to 10
7
variables were solved.
The SPG method is related to the practical version of Bertsekas [3] of the classical
gradient projected method of Goldstein, Levitin and Polyak [21, 25]. However, some
critical differences make this method much more efficient than its gradient-projection
predecessors. The main point is that the first trial step at each iteration is taken using
the spectral steplength (also known as the Barzilai-Borwein choice) introduced in [2] and
later analyzed in [9, 19, 27] among others. The spectral step is a Rayleigh quotient related
with an average Hessian matrix. For a review containing the more recent advances on this
special choice of steplength see [20]. The second improvement over traditional gradient
projection methods is that a nonmonotone search must be used [10, 22]. This feature seems
to be essential to preserve the n ice and nonmonotone behaviour of the iterates produced
by single spectral gradient steps.
The reported efficiency of the SPG method in very large problems motivated us to
introduce the inexact-projection version of the method. I n fact, the main drawback of the
SPG method is that it requires the exact projection of an arbitrary point of IR
n
onto
at every iteration.
Projecting onto is a difficult problem unless is an easy set (i.e. it is easy to
project onto it) as a box, an affine subspace, a ball, etc. However, for many important
applications, is not an easy set and the projection can only be achieved inexactly. For
example, if is the intersection of a finite collection of closed and convex easy sets, cycles
of alternating projection methods could be used. This sequence of cycles could be stopped
prematurely leading to an inexact iterative scheme. In this work we are mainly concerned
with extending the machinery developed in [6, 7] for the more general case in which the
projection onto can only be achieved inexactly.
In Section 2 we define a general model algorithm and prove global convergence. In
Section 3 we introduce the ISPG method and we describe the use of Dykstra’s alternat-
ing projection method for obtaining inexact projections onto closed and convex s ets. In
Section 4 we present numerical experiments and in Section 5 we draw some conclusions.
2 A general model algorithm and its global convergence
We say that a point x is stationary, for problem (1), if
g(x)
T
d 0 (2)
for all d IR
n
such that x + d Ω.
In this work k · k denotes the 2-norm of vectors and matrices, although in some cases it
can be replaced by an arbitrary norm. We also denote g(x) = f (x) and IN = {0, 1, 2, . . .}.
2

Let B be the set of n × n positive definite matrices such that kBk L and kB
1
k L.
Therefore, B is a compact set of IR
n×n
. In the spectral gradient approach, the matrices
will be diagonal. However, the algorithm and theorem that we present below are quite
general. The matrices B
k
may be thought as defining a sequence of different metrics in IR
n
according to which we perform projections. For this reason, we give the name “Inexact
Variable Metric” to the method introduced below.
Algorithm 2.1: Inexact Variable Metric Met hod
Assume η (0, 1], γ (0, 1), 0 < σ
1
< σ
2
< 1, M a positive integer. Let x
0
be an
arbitrary initial point. We denote g
k
= g(x
k
) for all k IN. Given x
k
Ω, B
k
B, the
steps of the kth iteration of the algorithm are:
Step 1. Compute the search direction
Consider the subproblem
Minimize Q
k
(d) subject to x
k
+ d , (3)
where
Q
k
(d) =
1
2
d
T
B
k
d + g
T
k
d.
Let
¯
d
k
be the minimizer of (3). (This minimizer exists and is un ique by the strict convexity
of the subproblem (3), but we will see later that we do not need to compute it.)
Let d
k
be such that x
k
+ d
k
and
Q
k
(d
k
) η Q
k
(
¯
d
k
). (4)
If d
k
= 0, stop the execution of the algorithm declaring that x
k
is a stationary point.
Step 2. Compute the steplength
Set α 1 and f
max
= max{f(x
kj+1
) | 1 j min{k + 1, M}}.
If
f(x
k
+ αd
k
) f
max
+ γαg
T
k
d
k
, (5)
set α
k
= α, x
k+1
= x
k
+α
k
d
k
and finish the iteration. O th erwise, choose α
new
[σ
1
α, σ
2
α],
set α α
new
and repeat test (5).
Remark. I n the definition of Algorithm 2.1 the possibility η = 1 corresponds to th e case
in which the subproblem (3) is solved exactly.
Lemma 2.1. The algorithm is well defined.
3

Proof. Since Q
k
is strictly convex and the domain of (3) is convex, the problem (3) has a
unique solution
¯
d
k
. If
¯
d
k
= 0 then Q
k
(
¯
d
k
) = 0. Since d
k
is a feasible point of (3), and, by
(4), Q
k
(d
k
) 0, it turns out that d
k
=
¯
d
k
. Therefore, d
k
= 0 and the algorithm stops.
If
¯
d
k
6= 0, th en , since Q
k
(0) = 0 and the solution of (3) is unique, it follows that
Q
k
(
¯
d
k
) < 0. Then, by (4), Q
k
(d
k
) < 0. Since Q
k
is convex and Q
k
(0) = 0, it follows that
d
k
is a descent direction for Q
k
, therefore, g
T
k
d
k
< 0. So, for α > 0 small enough,
f(x
k
+ αd
k
) f(x
k
) + γαg
T
k
d
k
.
Therefore, the condition (5) must be satisfied if α is small enough. This completes the
proof. 2
Theorem 2.1. Assume that the level set {x | f (x) f(x
0
)} is bounded. Then, ei-
ther the algorithm stops at some stationary point x
k
, or every limit point of the generated
sequence is stationary.
The proof of Theorem 2.1 is based on the following lemmas.
Lemma 2.2. Assume that the sequence generated by Algorithm 2.1 stops at x
k
. Then, x
k
is stationary.
Proof. If the algorithm stops at some x
k
, we have that d
k
= 0. Therefore, Q
k
(d
k
) = 0.
Then, by (4), Q
k
(
¯
d
k
) = 0. So,
¯
d
k
= 0. Therefore, for all d IR
n
such that x
k
+ d we
have g
T
k
d 0. Thus, x
k
is a stationary point. 2
For the remaining results of this section we assume that the algorithm does not stop.
So, infinitely many iterates {x
k
}
kIN
are generated and, by (5), f(x
k
) f(x
0
) for all
k IN. Thus, under the hypothesis of T heorem 2.1, the s equ en ce {x
k
}
kIN
is bounded.
Lemma 2.3. Assume that {x
k
}
kIN
is a sequence generated by Algorithm 2.1. Define,
for all j = 1, 2, 3, . . .,
V
j
= max{f (x
jMM+1
), f (x
jMM+2
) . . . , f(x
jM
)},
and ν(j) {jM M + 1, jM M + 2, . . . , jM} such that
f(x
ν(j)
) = V
j
.
Then,
V
j+1
V
j
+ γα
ν(j+1)1
g
T
ν(j+1)1
d
ν(j+1)1
. (6)
for all j = 1, 2, 3, . . ..
Proof. We will prove by ind uction on that for all = 1, 2, . . . , M and f or all j = 1, 2, 3, . . .,
f(x
jM+
) V
j
+ γα
jM+1
g
T
jM+1
d
jM+1
< V
j
. (7)
4

By (5) we have that, for all j IN,
f(x
jM+1
) V
j
+ γα
jM
g
T
jM
d
jM
< V
j
,
so (7) holds f or = 1.
Assume, as the inductive hypothesis, that
f(x
jM+
) V
j
+ γα
jM+
1
g
T
jM+
1
d
jM+
1
< V
j
(8)
for
= 1, . . . , .
Now, by (5), and the definition of V
j
, we have that
f(x
jM++1
) max
1tM
{f(x
jM++1t
} + γα
jM+
g
T
jM+
d
jM+
= max{f (x
(j1)M ++1
), . . . , f(x
jM+
)} + γα
jM+
g
T
jM+
d
jM+
max{V
j
, f(x
jM+1
), . . . , f(x
jM+
)} + γα
jM+
g
T
jM+
d
jM+
.
But, by the inductive hypothesis,
max{f(x
jM+1
), . . . , f(x
jM+
)} < V
j
,
so,
f(x
jM++1
) V
j
+ γα
jM+
g
T
jM+
d
jM+
< V
j
.
Therefore, the inductive proof is complete and, so, (7) is proved. Since ν(j + 1) = jM +
for some {1, . . . , M}, this implies the d esired result. 2
From now on, we define
K = {ν(1) 1, ν(2) 1, ν(3) 1, . . .},
where {ν(j)} is the sequence of indices defined in Lemma 2.3. Clearly,
ν(j) < ν(j + 1) ν(j) + 2M (9)
for all j = 1, 2, 3, . . ..
Lemma 2.4.
lim
kK
α
k
Q
k
(
¯
d
k
) = 0.
Proof. By (6), since f is continuous and bounded below,
lim
kK
α
k
g
T
k
d
k
= 0. (10)
But, by (4),
0 > Q
k
(d
k
) =
1
2
d
T
k
B
k
d
k
+ g
T
k
d
k
g
T
k
d
k
k IN.
5

Citations
More filters
Journal ArticleDOI

Probing the Pareto Frontier for Basis Pursuit Solutions

TL;DR: A root-finding algorithm for finding arbitrary points on a curve that traces the optimal trade-off between the least-squares fit and the one-norm of the solution is described, and it is proved that this curve is convex and continuously differentiable over all points of interest.
Journal ArticleDOI

On Augmented Lagrangian Methods with General Lower-Level Constraints

TL;DR: The resolution of location problems in which many constraints of the lower-level set are nonlinear is addressed, employing the spectral projected gradient method for solving the subproblems.
Journal ArticleDOI

Projected Barzilai-Borwein methods for large-scale box-constrained quadratic programming

TL;DR: Numerical experiments show that the PABB method with the adaptive line search is the best BB-like method in the positive definite case, and it compares reasonably well against the GPCG algorithm of Moré and Toraldo.
Journal ArticleDOI

Spectral residual method without gradient information for solving large-scale nonlinear systems of equations

TL;DR: A fully derivative-free spectral residual method for solving largescale nonlinear systems of equations that uses in a systematic way the residual vector as a search direction, a spectral steplength that produces a nonmonotone process and a globalization strategy that allows for this nonmonothone behavior.
Journal ArticleDOI

A scaled gradient projection method for constrained image deblurring

TL;DR: In this paper, a special gradient projection method is introduced that exploits effective scaling strategies and steplength updating rules, appropriately designed for improving the convergence rate, and the authors give convergence results for this scheme and evaluate its effectiveness by means of an extensive computational study on the minimization problems arising from the maximum likelihood approach to image deblurring.
References
More filters
Book

Nonlinear Programming

Journal ArticleDOI

Two-Point Step Size Gradient Methods

TL;DR: Etude de nouvelles methodes de descente suivant le gradient for the solution approchee du probleme de minimisation sans contrainte. as mentioned in this paper.
Journal ArticleDOI

A nonmonotone line search technique for Newton's method

TL;DR: In this paper, a nonmonotone steplength selection rule for Newton's method is proposed, which can be viewed as a generalization of Armijo's rule.
Journal ArticleDOI

Nonmonotone Spectral Projected Gradient Methods on Convex Sets

TL;DR: The classical projected gradient schemes are extended to include a nonmonotone steplength strategy that is based on the Grippo--Lampariello--Lucidi non monotone line search that is combined with the spectral gradient choice of steplENGTH to accelerate the convergence process.
Frequently Asked Questions (15)
Q1. What have the authors contributed in "Inexact spectral projected gradient methods on convex sets" ?

A specific algorithm, the Inexact Spectral Projected Gradient method ( ISPG ), is implemented using inexact projections computed by Dykstra ’ s alternating projection method and generates interior iterates. The ISPG method is a generalization of the Spectral Projected Gradient method ( SPG ), but can be used when projections are difficult to compute. 

In this inner product space, the set S of symmetric matrices form a closed subspace and SDD+ is a closed and convex polyhedral cone [1, 18, 16]. 

An effective way of solving (41) is by means of alternating projection methods combined with a geometrical understanding of the feasible region. 

Since the set {dk}k∈K4 is bounded, there exists a sequence of indices K5 ⊂ K4 such that limk∈K5 dk = d and B ∈ B such that limk∈K5 Bk = B. 

The SPG method was able to solve problems of this family with up to 96254 variables and up to 578648 constraints in very few seconds of computer time. 

For the feasible region, L and U are given ncols×ncols real matrices, and SDD+ represents the cone of symmetric and diagonally dominant matrices with positive diagonal, i.e.,SDD+ = {X ∈ IRncols×ncols | XT = X and xii ≥ ∑j 6=i|xij | for all i}. 

Let us recall that for a given nonempty closed and convex set Ω of IRn, and any y0 ∈ IRn, there exists a unique solution y∗ to the problemmin y ∈ Ω‖y0 − y‖, (21)which is called the projection of y0 onto Ω and is denoted by PΩ(y0). 

Numerical experiments were presented concerning constrained least-squares rectangular matrix problems to illustrate the good features of the ISPG method. 

Let dk be such that xk + dk ∈ Ω andQk(dk) ≤ η Qk(d̄k). (4)If dk = 0, stop the execution of the algorithm declaring that xk is a stationary point. 

‖A‖F denotes the Frobenius norm of a real matrix A, defined as‖A‖2F = 〈A,A〉 = ∑i,j(aij) 2 ,where the inner product is given by 〈A,B〉 = trace(AT B). 

In this case,Bk = 1λspgk Iwhereλspgk ={min(λmax,max(λmin, s T k sk/s T k yk)), if s T k yk > 0, λmax, otherwise,sk = xk − xk−1 and yk = gk − gk−1; so thatQk(d) = ‖d‖22λspgk + gTk d. (20)When Bk = (1/λ spg k )I (spectral choice) the optimal direction d̄k is obtained by projecting xk − λ spg k gk onto Ω, with respect to the Euclidean norm. 

In particular, the authors consider the following problem:Minimize ‖AX −B‖2F subject toX ∈ SDD+0 ≤ L ≤ X ≤ U,(41)where A and B are given nrows× ncols real matrices, nrows ≥ ncols, rank(A) = ncols, and X is the symmetric ncols×ncols matrix that the authors wish to find. 

All the experiments were run on a Sun Ultra 60 Workstation with 2 UltraSPARC-II processors at 296-Mhz, 512 Mb of main memory, and SunOS 5.7 operating system. 

In Section 3 the authors introduce the ISPG method and the authors describe the use of Dykstra’s alternating projection method for obtaining inexact projections onto closed and convex sets. 

the authors assume that for all y ∈ IRn, the calculation of PΩ(y) is a difficult task, whereas, for each Ωi, PΩi(y) is easy to obtain.