LSQR: An Algorithm for Sparse Linear Equations and Sparse Least Squares

doi:10.1145/355984.355989

LSQR: An Algorithm for Sparse Linear

Equations and Sparse Least Squares

CHRISTOPHER C. PAIGE

McGill University, Canada

and

MICHAEL A. SAUNDERS

Stanford University

An iterative method is given for solving

Ax

~ffi b and minU

Ax

- b 112, where the matrix A is large and

sparse. The method is based on the bidiagonalization procedure of Golub and Kahan. It is analytically

equivalent to the standard method of conjugate gradients, but possesses more favorable numerical

properties.

Reliable stopping criteria are derived, along with estimates of standard errors for x and the

condition number of A. These are used in the FORTRAN implementation of the method, subroutine

LSQR. Numerical tests are described comparing I~QR with several other conjugate-gradient algori-

thms, indicating that I~QR is the most reliable algorithm when A is ill-conditioned.

Categories and Subject Descriptors: G.1.2 [Numerical Analysis]:

ApprorJmation--least squares

approximation;

G.1.3 [Numerical Analysis]: Numerical Linear

Algebra--linear systems (direct and

tterative methods); sparse and very large systems

General Terms: Algorithms

Additional Key Words and Phrases: analysis of variance

The Algorithm:

LSQR:

Sparse Linear Equations and Least Square Problems.

ACM Trans. Math.

Softw.

8, 2 (June 1982).

1. INTRODUCTION

A numerical method is presented here for computing a solution x to either of the

following problems:

Unsymmetric equations: solve

Ax ffi b

Linear least squares: minimize ][

Ax

- b 112

This work was supported in part by the Natural Sciences and Engineering Research Council of

Canada Grant A8652, the New Zealand Department of Scientific and Industrial Research, the U.S.

Office of Naval Research under Contract N00014-75-C-0267, National Science Foundation Grants

MCS 76-20019 and ENG 77-06761, and the Department of Energy under Contract DE.AC03-

76SF00326, PA No. DE-AT03-76ER72018.

Authors' addresses: C.C. Paige, School of Computer Science, McGill University, Montreal, P.Q.,

Canada H3C 3G1; M.A. Saunders, Department of Operations Research, Stanford University, Stanford,

CA 94305.

Permission to copy without fee all or part of this material is granted provided that the copies are not

made or distributed for direct commercial advantage, the ACM copyright notice and the title of the

publication and its date appear, and notice is given that copying is by permission of the Association

for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific

permission.

ACM Transactions on Mathematwal Software, Vol 8, No. 1, March 1982, Pages 43-71.

44 • C.C. Paige and M. A. Saunders

where A is a real matrix with m rows and n columns and b is a real vector. It will

usually be true that m _> n and rank(A) = n, but these conditions are not

essential. The method, to be called algorithm LSQR, is similar in style to the

well-known method of conjugate gradients (CG) as applied to the least-squares

problem [10]. The matrix A is used only to compute products of the form

Av

and

ATu for various vectors v and u. Hence A will normally be large and sparse or will

be expressible as a product of matrices that are sparse or have special structure.

A typical application is to the large least-squares problems arising from analysis

of variance (6.g., [12]).

CG-like methods are iterative in nature. They are characterized by

their

need

for only a few vectors of working storage and by their theoretical convergence

within at most n iterations (if exact arithmetic could be performed). In practice

such methods may require far fewer or far more than n iterations to reach an

acceptable approximation to x. The methods are most useful when A is well

conditioned and has many nearly equal singular values. These properties occur

naturally in many applications. In other cases it is often possible to divide the

solution procedure into a direct and an iterative part, such that the iterative part

has a better conditioned matrix for which CG-like methods will converge more

quickly. Some such transformation methods are considered in [21].

Algorithm LSQR is based on the bidiagonalization procedure of Golub and

Kahan [9]. It generates a sequence of approximations {xk } such that the residual

norm

II

rk [[2 decreases monotonically, where

rk = b - Axk.

Analytically, the

sequence (xh} is identical to the sequence generated by the standard CG algo-

rithm and by several other published algorithms. However, LSQR is shown by

example to be numerically more reliable in various circumstances than the other

methods considered.

The FORTRAN implementation of LSQR [22] is designed for practical appli-

cation. It incorporates reliable stopping criteria and provides the user with

computed estimates of the following quantities:

x, r = b - Ax, A

Tr,

II r 112, It A II

F,

standard errors for

x,

and the condition number of A.

1.1

Notation

Matrices are denoted by A, B ..... vectors by v, w,..., and scalars by ~, fl,....

Two exceptions are c and s, which denote the significant components of an

elementary orthogonal matrix, such that c 2 + s 2 = 1. For a vector v, H v [[ always

denotes the Euclidean norm H v [[2

=

(vTv) 1/2.

For a matrix A, [[A [[ will usually

mean the Frobenius norm, HA

HF = (~a2) 1/2,

and the condition number for an

unsymmetric matrix A is defined by cond(A) = ]] A ]] II A + ]], where A + denotes the

pseudoinverse of A. The relative precision of floating-point arithmetic is e, the

smallest machine-representable number such that 1 + e > 1.

2. MOTIVATION VIA THE LANCZOS PROCESS

In this section we review the symmetric Lanczos process [13] and its use in

solving symmetric linear equations

Bx = b.

Algorithm LSQR is then derived by

applying the Lanczos process to a particular symmetric system. Although a more

direct development is given in Section 4, the present derivation may remain

useful for a future error analysis of LSQR, since many of the rounding error

properties of the Lanczos process are already known [18].

ACM Transactions on Mathematmal Software, Vol 8, No 1, March 1982.

LSQR: An Algorithm for Sparse Linear Equattons and Sparse Least Squares • 45

Given a symmetric matrix B and a starting vector

b,

the Lanczos process is a

method for generating a sequence of vectors { v,) and scalars { a, ), (fli} such that

B is reduced to tridiagonal form. A reliable computational form of the method is

the following.

The Lanczos process

(reduction to tridiagonal form):

fll vl -- b,

w, = Bvt - fl, v,-~ ]

a, vTw'

I' i= 1, 2,..., (2.1)

~t+l

Vt+l Wt ~ OltVt

where vo - 0 and each fl, _> 0 is chosen so that

II v, II

= 1

(i > 0).

The situation after

k steps is summarized by

BVk = Vk Tk + flk+lvk+le T

(2.2)

where Tk - tridiag(fl, a,, fl,+l) and Vk ~ [Vl, v2 ..... vk]. If there were no rounding

error we would have V T Vk = I, and the process would therefore terminate with

fl,+~ = 0 for some k ___ n. Some other stopping criterion is needed in practice, since

ilk+, is unlikely to be negligible for any k. In any event, eq. (2.2) holds to within

machine precision.

Now suppose we wish to solve the symmetric system

Bx = b.

Multiplying (2.2)

by an arbitrary k-vector yk, whose last element is ~/h, gives

BVkyk ~- Vk Tkyk +

flk+~Vk+~k.

Since Vk (fl~ el) = b by definition, it follows that ifyk and xk are defined

by the equations

Thyk = file1,

(2.3)

xk = Vkyk,

then we shall have

Bxk = b + ~lkflk+lVk+~

to working accuracy. Hence xk may be

taken as the exact solution to a perturbed system and will solve the original

system whenever 7/kflk+l is negligibly small.

The above arguments are not complete, but they provide at least some

motivation for defining the sequence of vectors

{Xk}

according to (2.3). It is now

possible to derive several iterative algorithms for solving

Bx = b,

each character-

ized by the manner in which

yk

is eliminated from (2.3) (since it is usually not

practical to compute each

Yk

explicitly). In particular, the method of conjugate

gradients is known to be equivalent to using the Cholesky factorization Tk

LkDk L~ and is reliable when B (and hence Tk) is positive definite, while algorithm

SYMMLQ employs the orthogonal factorization Tk --/:k Q~ to retain stability for

arbitrary symmetric B. (See [20] for further details of these methods.)

2.1 The Least-Squares System

We now turn to a particular system that arises from the "damped least-squares"

problem

mioll[:l]X-[:]ll: ,2,,

ACM Transactmns on Mathematmal Software, Vol. 8, No. 1, March 1982

46 C.C. Paige and M. A. Saunders

where A and b are given data and ~ is an arbitrary real scalar. The solution of

(2.4) satisfies the symmetric (but indefinite) system

A r

where r is the residual vector

b -

Ax.

When the Lanczos process is applied to this

system, a certain structure appears in relations (2.1)-(2.3). In particular, (2.1)

reduces to the procedure defined as Bidiag 1 in Section 3, while (2.3) can be

permuted after 2k + 1 iterations to the form

[' 1P +,l _-

(2.6)

Irk] ffi [ U~+I 0 lrt.+,l,

x,

V,j L

Y'

J

where

Bk

is (k + 1) x k and lower bidiagonal. We see that y~ is the solution of

another damped least-squares problem,

minl][B;]yk-[fl~l][[2 ,

(2.7)

which can be solved reliably using orthogonal transformations. This observation

forms the basis of algorithm LSQR.

2.2 A Different Starting Vector

For completeness we note that a second least-squares algorithm can be derived

in an analogous way. Defining s =

-Ax,

we can write (2.5) as

A s

and apply the Lanczos process again, using the same matrix as before but with

the new starting vector shown. This time, (2.1) reduces to Bidiag 2 as defined in

Section 3, while (2.3) can be permuted to the form

-x IJLy J -O,e, '

(2.9)

x, V, JLyk j '

after 2k iterations, where Rk is k × k and upper bidiagonal. (The odd-numbered

systems are not useful because

T2k-~

is singular when ~ = 0.) We see that yk

satisfies the system

(RTRk + ~2I)yk

ffi 01el, (2.10)

which could easily be solved. However, (2.10) proves to be one form of the normal

equations associated with (2.7), and the algorithm it suggests for computing x

ACM Transactlov.s on Mathematical Software, VoL 8, No. 1, March 1982.

LSQR: An Algorithm for Sparse Linear Equations and Sparse Least Squares • 47

therefore has unsatisfactory numerical properties. We clarify this matter in

Section 7.4.

2.3 The Role of

The quantities generated by the Lanczos process from (2.5) and (2.8) are Bk, Uk+~,

Vk and

Rk, Pk, Vk,

respectively. These are all

independent of k,

which means

that they are the same as those generated when k ffi 0. We shall therefore assume

from now on that k ffi 0. A given k can be accommodated when solving (2.7), as

shown in [22]. Methods for actually choosing ?, are beyond the scope of this

paper.

3. THE BIDIAGONALIZATION PROCEDURES

The preceding use of the Lanczos process results in two forms of a bidiagonali-

zation procedure due to Golub and Kahan [9]. We state these forms as procedures

Bidiag 1 and 2, and then give some unexpected relationships between them.

Bidiag I

(starting vector b; reduction to lower bidiagonal form):

fllul = b, alvl = A Tul.

fl,+lu,+~

= Av, - a,u,

~, i ffi 1, 2 ..... (3.1)

0~+1V~+l = ATu~+I ~ flt+lVz

)

The scalars a, _> 0 and fl, _> 0 are chosen so that [[ u,[I -- [[

viii

ffi 1. With the

definitions

Uk ~-

[Ul, U2, • •.,

Uk], B2 012

Vk ~

[Vl, V2,

V, ], Bk -- [33 "'. '

• '', ". Otk

(where

Bk

is the rectangular matrix introduced in Section 2), the recurrence

relations (3.1) may be rewritten as

Uk+~(fl~e~) ffi b,

(3.2)

A Vk -~ Uk+lBk,

(3.3)

T

ATUk+I -~ VkB T + Olk+lVk+lek+l.

(3.4)

T V

If exact arithmetic were used, then we would also have UT+~ Uk+~ ffi I and Vk k

= /, but, in any event, the previous equations hold to within machine precision.

Bidiag 2

(starting vector

ATb;

reduction to upper bidiagonal form):

Oavl ffi ATb, p~pl ffi Avl.

O,+lV,+~ ffi ATp, -- p,v,

~, i ffi 1, 2 ..... (3.5)

p,+lp,+l = Av,+l -- 0,+1p,

J

ACM Transactions on Mathematmal Software, VoL 8, No 1, March 1982.

LSQR: An Algorithm for Sparse Linear Equations and Sparse Least Squares

Citations

Mapping P-wave azimuthal anisotropy in the crust and upper mantle beneath the United States

Dynamic OD demand estimation and prediction for dynamic traffic management

Curvelet-based migration preconditioning and scaling

Italian and Alpine three-dimensional crustal structure imaged by ambient-noise surface-wave dispersion

Lithospheric deformation beneath the San Gabriel Mountains in the southern California Transverse Ranges

References

Methods of Conjugate Gradients for Solving Linear Systems

The algebraic eigenvalue problem

An iteration method for the solution of the eigenvalue problem of linear differential and integral operators

The Algebraic Eigenvalue Problem

Calculating the Singular Values and Pseudo-Inverse of a Matrix

Related Papers (5)

Matrix computations

Methods of Conjugate Gradients for Solving Linear Systems

Traveltimes for global earthquake location and phase identification

Numerical Methods for Least Squares Problems

New, improved version of generic mapping tools released