Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

SIAM J. OPTIM.

c

2007 Society for Industrial and Applied Mathematics

Vol. 18, No. 2, pp. 666–690

ITERATIVE SOLUTION OF AUGMENTED SYSTEMS ARISING IN

INTERIOR METHODS

∗

ANDERS FORSGREN

†

, PHILIP E. GILL

‡

, AND JOSHUA D. GRIFFIN

§

Abstract. Iterative methods are proposed for certain augmented systems of linear equations

that arise in interior methods for general nonlinear optimization. Interior methods deﬁne a sequence

of KKT equations that represent the symmetrized (but indeﬁnite) equations associated with New-

ton’s method for a point satisfying the perturbed optimality conditions. These equations involve both

the primal and dual variables and become increasingly ill-conditioned as the optimization proceeds.

In this context, an iterative linear solver must not only handle the ill-conditioning but also detect the

occurrence of KKT matrices with the wrong matrix inertia. A one-parameter family of equivalent

linear equations is formulated that includes the KKT system as a special case. The discussion focuses

on a particular system from this family, known as the “doubly augmented system,” that is positive

deﬁnite with respect to both the primal and dual variables. This property means that a standard

preconditioned conjugate-gradient method involving both primal and dual variables will either termi-

nate successfully or detect if the KKT matrix has the wrong inertia. Constraint preconditioning is a

well-known technique for preconditioning the conjugate-gradient method on augmented systems. A

family of constraint preconditioners is proposed that provably eliminates the inherent ill-conditioning

in the augmented system. A considerable beneﬁt of combining constraint preconditioning with the

doubly augmented system is that the preconditioner need not be applied exactly. Two particular

“active-set” constraint preconditioners are formulated that involve only a subset of the rows of the

augmented system and thereby may be applied with considerably less work. Finally, some numerical

experiments illustrate the numerical performance of the proposed preconditioners and highlight some

theoretical properties of the preconditioned matrices.

Key words. large-scale nonlinear programming, nonconvex optimization, interior methods,

augmented systems, KKT systems, iterative methods, conjugate-gradient method, constraint pre-

conditioning

AMS subject classiﬁcations. 49J20, 49J15, 49M37, 49D37, 65F05, 65K05, 90C30

DOI. 10.1137/060650210

1. Introduction. This paper concerns the formulation and analysis of precon-

ditioned iterative methods for the solution of augmented systems of the form

(1.1)

H −A

T

AG

x

1

x

2

=

b

1

b

2

,

with A an m×n matrix, H symmetric, and G symmetric positive semideﬁnite. These

equations arise in a wide variety of scientiﬁc and engineering applications, where they

are known by a number of diﬀerent names, including “augmented systems,” “saddle-

point systems,” “KKT systems,” and “equilibrium systems.” (The bibliography of the

survey by Benzi, Golub, and Liesen [3] contains 513 related articles.) The main focus

∗

Received by the editors January 17, 2006; accepted for publication (in revised form) March 5,

2007; published electronically August 22, 2007. The research of the second and third authors was

supported by National Science Foundation grants DMS-9973276, CCF-0082100, and DMS-0511766.

http://www.siam.org/journals/siopt/18-2/65021.html

†

Optimization and Systems Theory, Department of Mathematics, Royal Institute of Technology,

SE-100 44 Stockholm, Sweden (andersf@kth.se). The research of this author was supported by the

Swedish Research Council (VR).

‡

Department of Mathematics, University of California, San Diego, La Jolla, CA 92093-0112

(pgill@ucsd.edu).

§

Sandia National Laboratories, Livermore, CA 94551-9217 (jgriﬃ@sandia.gov). Part of this work

was carried out during the Spring of 2003 while this author was visiting KTH with ﬁnancial support

from the G¨oran Gustafsson Foundation.

666

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

ITERATIVE SOLUTION OF AUGMENTED SYSTEMS 667

of this paper will be on the solution of augmented systems arising in interior methods

for general constrained optimization, in which case (1.1) is the system associated with

Newton’s method for ﬁnding values of the primal and dual variables that satisfy the

perturbed KKT optimality conditions (see, e.g., Wright [49] and Forsgren, Gill, and

Wright [15]). In this context H is the Hessian of the Lagrangian, A is the constraint

Jacobian, and G is diagonal.

Many of the beneﬁts associated with the methods discussed in this paper derive

from formulating the interior method so that the diagonal G is positive deﬁnite.We

begin by presenting results for G positive deﬁnite and consider the treatment of sys-

tems with positive semideﬁnite and singular G in section 5. Throughout, for the case

where G is positive deﬁnite, we denote G by D and rewrite (1.1) as an equivalent

symmetric system Bx = b, where

(1.2) B =

H −A

T

−A −D

and b =

b

1

−b

2

,

with D positive deﬁnite and diagonal. We will refer to this symmetric system as the

KKT system. (It is possible to symmetrize (1.1) in a number of diﬀerent ways. The

format (1.2) will simplify the linear algebra in later sections.) When D is nonsingular,

it is well known that the augmented system is equivalent to the two smaller systems

(1.3) (H + A

T

D

−1

A)x

1

= b

1

+ A

T

D

−1

b

2

and x

2

= D

−1

(b

2

− Ax

1

),

where the system for x

1

is known as the condensed system. It is less well known that

another equivalent system is the doubly augmented system

(1.4)

H +2A

T

D

−1

AA

T

AD

x

1

x

2

=

b

1

+2A

T

D

−1

b

2

b

2

,

which has been proposed for use with direct factorization methods by Forsgren and

Gill [16]. In this paper we investigate the properties of preconditioned iterative meth-

ods applied to system (1.2) directly or to the equivalent systems (1.3) and (1.4).

If the underlying optimization problem is not convex, the matrix H may be in-

deﬁnite. The KKT matrix B of (1.2) is said to have correct inertia if the matrix

H + A

T

D

−1

A is positive deﬁnite. This deﬁnition is based on the properties of the un-

derlying optimization problem. Broadly speaking, the KKT system has correct inertia

if the problem is locally convex (for further details see, e.g., Forsgren and Gill [16],

Forsgren [18], and Griﬃn [32]). If the KKT matrix has correct inertia, then systems

(1.2)–(1.4) have a common unique solution (see section 2).

1.1. Properties of the KKT system. The main issues associated with using

iterative methods to solve KKT systems are (i) termination control, (ii) inertia con-

trol, and (iii) inherent ill-conditioning. The ﬁrst of these issues is common to other

applications where the linear system represents a linearization of some underlying non-

linear system of equations. Issues (ii) and (iii), however, are unique to optimization

and will be the principal topics of this paper.

In the context of interior methods, the KKT system (1.2) is solved as part of

a two-level iterative scheme. At the outer level, nonlinear equations that deﬁne the

ﬁrst-order optimality conditions are parameterized by a small positive quantity μ.

The idea is that the solution of the parameterized equations should approach the

solution of the optimization problem as μ → 0. At the inner level, equations (1.2)

represent the symmetrized Newton equations associated with ﬁnding a zero of the

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

668 ANDERS FORSGREN, PHILIP E. GILL, AND JOSHUA D. GRIFFIN

perturbed optimality conditions for a given value of μ. Although systems (1.2)–(1.4)

have identical solutions, an iterative method will generally produce a diﬀerent se-

quence of iterates in each case (see section 3 for a discussion of the equivalence of

iterative solvers in this context). An iterative method applied to the augmented sys-

tem (1.2) or the doubly augmented system (1.4) treats x

1

and x

2

as independent

variables, which is appropriate in the optimization context because x

1

and x

2

are as-

sociated with independent quantities in the perturbed optimality conditions (i.e., the

primal and dual variables). In contrast, an iterative solver for the condensed system

(1.3) will generate approximations to x

1

only, with the variables x

2

being deﬁned as

x

2

= D

−1

(b

2

− Ax

1

). This becomes an important issue when an approximate solution

is obtained by truncating the iterations of the linear solver. During the early outer

iterations, it is usually ineﬃcient to solve the KKT system accurately, and it is better

to accept an inexact solution that gives a residual norm that is less than some factor

of the norm of the right-hand side (see, e.g., Dembo, Eisenstat, and Steihaug [7]).

For the condensed system, the residual for the second block of equations will be zero

regardless of the accuracy of x

1

, which implies that termination must be based on the

accuracy of x

1

alone. It is particularly important for the solver to place equal weight

on x

1

and x

2

when system (1.2) is being solved in conjunction with a primal-dual

trust-region method (see Gertz and Gill [20] and Griﬃn [32]). The conjugate-gradient

version of this method exploits the property that the norms of the (x

1

,x

2

) iterates

increase monotonically (see Steihaug [44]). This property does not hold for (x

1

,x

2

)

iterates generated for the condensed system.

If the KKT matrix does not have the correct inertia, the solution of (1.2) is not

useful, and the optimization continues with an alternative technique based on either

implicitly or explicitly modifying the matrix H (see, e.g., Toint [45], Steihaug [44],

Gould et al. [30], Hager [33], and Griﬃn [32]). It is therefore important that the

iterative solver is able to detect if B does not have correct inertia.

As the perturbation parameter μ is reduced, the KKT systems become increas-

ingly ill-conditioned. The precise form of this ill-conditioning depends on the formu-

lation of the interior method, but a common feature is that some diagonal elements

of D are big and some are small. (It is almost always possible to formulate an interior

method that requires the solution of an unsymmetric system that does not exhibit

inevitable ill-conditioning as μ → 0. This unsymmetric system could be solved us-

ing an unsymmetric solver such as

GMRES or QMR. Unfortunately, this approach

is unsuitable for general KKT systems because an unsymmetric solver is unable to

determine if the KKT matrix has correct inertia.) In section 3 we consider a pre-

conditioned conjugate-gradient (PCG) method that provably removes the inherent

ill-conditioning. In particular, we deﬁne a one-parameter family of preconditioners

related to the class of so-called constraint preconditioners proposed by Keller, Gould,

and Wathen [34]. Several authors have used constraint preconditioners in conjunction

with the conjugate-gradient method to solve the indeﬁnite KKT system (1.2) with

b

2

= 0 and D = 0 (see, e.g., Lukˇsan and Vlˇcek [36], Gould, Hribar, and Nocedal [29],

Perugia and Simoncini [40], and Bergamaschi, Gondzio, and Zilli [4]). Recently, Dol-

lar [12] and Dollar et al. [11] have proposed constraint preconditioners for system (1.2)

with no explicit inertial or diagonal condition on D, but a full row-rank requirement

on A and the assumption that b

2

=0.

Methods that require b

2

= 0 must perform an initial projection step that eﬀec-

tively shifts the right-hand side to zero. The constraint preconditioner then forces the

x

1

iterates to lie in the null space of A. A disadvantage with this approach is that the

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

ITERATIVE SOLUTION OF AUGMENTED SYSTEMS 669

constraint preconditioner must be applied exactly if subsequent iterates are to lie in

the null space. This limits the ability to perform approximate solves with the precon-

ditioner, as is often required when the matrix A has a PDE-like structure that also

must be handled using an iterative solver (see, e.g., Saad [41], Notay [37], Simoncini

and Szyld [43], and Elman et al. [14]). In section 3 we consider preconditioners that

do not require the assumption that b

2

= 0, and hence do not require an accurate solve

with the preconditioner.

1.2. A PCG method for the KKT system. The goal of this paper is to for-

mulate iterative methods that not only provide termination control and inertia control,

but also eliminate the inevitable ill-conditioning associated with interior methods. All

these features are present in an algorithm based on applying a PCG method to the

doubly augmented system (1.4). This system is positive deﬁnite if the KKT matrix

has correct inertia, and gives equal weight to x

1

and x

2

for early terminations. As

preconditioner we use the constraint preconditioner

(1.5) P =

M +2A

T

D

−1

AA

T

AD

,

where M is an approximation of H such that M + A

T

D

−1

A is positive deﬁnite.

The equations Pv = r used to apply the preconditioner are solved by exploiting the

equivalence of the systems

M +2A

T

D

−1

AA

T

AD

v

1

v

2

=

r

1

r

2

,(1.6a)

M −A

T

−A −D

v

1

v

2

=

r

1

− 2A

T

D

−1

r

2

−r

2

, and(1.6b)

(M + A

T

D

−1

A)v

1

= r

1

− A

T

D

−1

r

2

,v

2

= D

−1

(r

2

− Av

1

)(1.6c)

(see section 3). This allows us to compute the solution of (1.6a) by solving either

(1.6b) or (1.6c). (The particular choice will depend on the relative eﬃciency of the

methods available to solve the condensed and augmented systems.)

We emphasize that the doubly augmented systems are never formed or factored

explicitly. The matrix associated with the doubly augmented equations (1.4) is used

only as an operator to deﬁne products of the form v = Bu. As mentioned above, the

equations (1.6a) that apply the preconditioner are solved using either (1.6b) or (1.6c).

An important property of the method is that these equations also may be solved using

an iterative method. (It is safe to use the augmented or condensed system for the

preconditioner equations Pv = r because the inertia of P is guaranteed by the choice

of M (see section 3).)

In section 4 we formulate and analyze two variants of the preconditioner (1.5)

that exploit the asymptotic behavior of the elements of D. The use of these so-called

active-set preconditioners may require signiﬁcantly less work when the underlying

optimization problem has more constraints than variables. In section 5, we consider

the case where G is positive semideﬁnite and singular. Finally, in section 6, we present

some numerical examples illustrating the properties of the proposed preconditioners.

1.3. Notation and assumptions. Unless explicitly indicated otherwise, ·

denotes the vector two-norm or its subordinate matrix norm. The inertia of a real

symmetric matrix A, denoted by In(A), is the integer triple (a

+

,a

−

,a

0

) giving the

number of positive, negative, and zero eigenvalues of A. The spectrum of a (possi-

bly unsymmetric) matrix A is denoted by eig(A). As the analysis concerns matrices

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

670 ANDERS FORSGREN, PHILIP E. GILL, AND JOSHUA D. GRIFFIN

with only real eigenvalues, eig(A) is regarded as an ordered set, with the least (i.e.,

“leftmost”) eigenvalue, denoted by eig

min

(A), appearing ﬁrst. The quantity σ

k

(A)

denotes the kth largest singular value of A. Given a positive-deﬁnite A, the unique

positive-deﬁnite X such that X

2

= A is denoted by A

1/2

. Given vectors x

1

and x

2

,

the column vector consisting of the elements of x

1

augmented by the elements of x

2

is denoted by (x

1

,x

2

).

When μ is a positive scalar such that μ → 0, the notation p = O

μ

means that

there exists a constant K such that |p|≤Kμ for all μ suﬃciently small. For a positive

p, p = Ω(1/μ) implies that there exists a constant K such that 1/p ≤ Kμ for all

μ suﬃciently small. In particular, p = O

1

means that |p| is bounded, and, for a

positive p, p = Ω(1) means that p is bounded away from zero. For a positive p, the

notation p = Θ

1

is used for the case where both p = O

1

and p = Ω(1), so that p

remains bounded and is bounded away from zero as μ → 0.

As discussed in section 1.1, we are concerned with solving a sequence of systems of

the form (1.2), where the matrices A, H, and D depend implicitly on μ. In particular,

A and H are ﬁrst and second derivatives evaluated at a point depending on μ, and

D is an explicit function of μ. The notation deﬁned above allows us to characterize

the properties of H, A, and D in terms of their behavior as μ → 0. Throughout the

analysis, it is assumed that the following properties hold:

(A

1

) H and A are both O

1

.

(A

2

) The row indices of A may be partitioned into disjoint subsets S, M, and B

such that d

ii

= O

μ

for i ∈S, d

ii

= Θ

1

for i ∈M, and d

ii

= Ω(1/μ) for

i ∈B.

(A

3

)IfA

S

is the matrix of rows of A with indices in S and r = rank(A

S

), then r

remains constant as μ → 0 and σ

r

(A

S

) = Θ(1).

The second assumption reﬂects the fact that for μ suﬃciently small, some diagonal

elements of D are “small,” some are “medium,” and some are “big.”

It is often the case in practice that the equations and variables corresponding to

unit rows of A are eliminated directly from the KKT system. This elimination creates

no additional nonzero elements and provides a smaller “partially condensed” system

with an Ω(1/μ) diagonal term added to H. It will be shown that preconditioners for

both the full and partially condensed KKT systems depend on the eigenvalues of the

same matrix (see Lemmas 3.4 and 3.5). It follows that our analysis also applies to

preconditioners deﬁned for the partially condensed system.

2. A parameterized system of linear equations. In this section, it is shown

how the indeﬁnite KKT system (1.2) may be embedded in a family of equivalent

linear systems, parameterized by a scalar ν. This parameterization facilitates the

simultaneous analysis of the three systems (1.2)–(1.4).

Definition 2.1 (the parameterized system). Let ν denote a scalar. Associ-

ated with the KKT equations Bx = b of (1.2), we deﬁne the parameterized equations

B(ν)x = b(ν), with

B(ν)=

H +(1+ν)A

T

D

−1

AνA

T

νA νD

and b(ν)=

b

1

+(1+ν)A

T

D

−1

b

2

νb

2

,

where H is symmetric and D is positive deﬁnite and diagonal.

The following proposition states the equivalence of the KKT system (1.2) and the

parameterized system of Deﬁnition 2.1.

Proposition 2.2 (equivalence of the parameterized systems). Let ν denote a

scalar parameter. If ν =0, then the system Bx = b of (1.2) and the system B(ν)x =