LSQR: An Algorithm for Sparse Linear Equations and Sparse Least Squares

doi:10.1145/355984.355989

Journal Article•DOI•

LSQR: An Algorithm for Sparse Linear Equations and Sparse Least Squares

Christopher C. Paige¹, Michael A. Saunders²•Institutions (2)

McGill University¹, Stanford University²

01 Mar 1982-ACM Transactions on Mathematical Software (ACM)-Vol. 8, Iss: 1, pp 43-71

TL;DR: Numerical tests are described comparing I~QR with several other conjugate-gradient algorithms, indicating that I ~QR is the most reliable algorithm when A is ill-conditioned.

read less

Abstract: An iterative method is given for solving Ax ~ffi b and minU Ax b 112, where the matrix A is large and sparse. The method is based on the bidiagonalization procedure of Golub and Kahan. It is analytically equivalent to the standard method of conjugate gradients, but possesses more favorable numerical properties. Reliable stopping criteria are derived, along with estimates of standard errors for x and the condition number of A. These are used in the FORTRAN implementation of the method, subroutine LSQR. Numerical tests are described comparing I~QR with several other conjugate-gradient algorithms, indicating that I~QR is the most reliable algorithm when A is ill-conditioned.

...read moreread less

Summary (3 min read)

Jump to: [L i n e a r least squares: m i n i m i z e ][ A x -b 112] – [2. MOTIVATION VIA THE LANCZOS PROCESS] – [mioll[:l]X-[:]ll: ,2,,] – [3.1 Relationship Between the Bidiagonalizations] – [Thus] – [5. ESTIMATION OF NORMS] – [5.4 Standard Errors] – [6. STOPPING CRITERIA] – [6.2 Incompatible Systems] – [6.4 Singular Systems] – [7. OTHER METHODS] – [Algorithm C G L S] – [7.7 Storage and Work] – [8.1 Generation of Test Problems] and [9. SUMMARY]

L i n e a r least squares: m i n i m i z e ][ A x -b 112

A typical application is to the large least-squares problems arising from analysis of variance (6.g., [12] ).
They are characterized by their need for only a few vectors of working storage and by their theoretical convergence within at most n iterations (if exact arithmetic could be performed).
These properties occur naturally in many applications.
In other cases it is often possible to divide the solution procedure into a direct and an iterative part, such that the iterative part has a better conditioned matrix for which CG-like methods will converge more quickly.
The FORTRAN implementation of LSQR [22] is designed for practical application.

2. MOTIVATION VIA THE LANCZOS PROCESS

In any event, eq. (2.2) holds to within machine precision.
Hence xk may be taken as the exact solution to a perturbed system and will solve the original system whenever 7/kflk+l is negligibly small.
The above arguments are not complete, but they provide at least some motivation for defining the sequence of vectors {Xk} according to (2.3) .
In particular, the method of conjugate gradients is known to be equivalent to using the Cholesky factorization Tk LkDk L~ and is reliable when B (and hence Tk) is positive definite, while algorithm SYMMLQ employs the orthogonal factorization Tk --/:k Q~ to retain stability for arbitrary symmetric B. (See [20] for further details of these methods.).

mioll[:l]X-[:]ll: ,2,,

When the Lanczos process is applied to this system, a certain structure appears in relations (2.1)-(2.3).
This observation forms the basis of algorithm LSQR.

3.1 Relationship Between the Bidiagonalizations

The principal connection between the two bidiagonalization procedures is that the matrices.
Vk are the same for each, and that the identity BTBk ffi RTRk (3.9) holds.
The rather surprising conclusion is that Rk must be identical to the matrix that would be obtained from the conventional QR faetorization of Bk.

Thus

In the presence of rounding errors, these identities will, of course, cease to hold.
They throw light on the advantages of algorithm LSQR over two earlier methods, LSCG and LSLQ, as discussed in Section 7.4.
The relationship between the orthonormal matrices Uh and Pk can be shown to be for some vector rk.
The vectors yk and tk+l could then be found from EQUATION [o] and only the most recent iterates need be saved.
The broad outline of algorithm LSQR is now complete.

5. ESTIMATION OF NORMS

For the purposes of estimating norms, the authors shall often assume that the orthogonality relations UkTUk =.
In actual computations these are rarely true, but the resulting estimates have proved to be remarkably reliable.
Clearly the product of sines in (5.2) decreases monotonically.

5.4 Standard Errors

This facility has not been implemented in LSQR and the authors have not investigated the accuracy of such approximations.
Clearly only a limited number of pairs (i, j) could be dealt with efficiently on large problems.

6. STOPPING CRITERIA

The user may therefore set ATOL and BTOL according to the accuracy of the data.
It can be seen that these perturbations are within their allowable bounds when the inequality of S 1 holds.
Hence, criterion S 1 is consistent with the ideas of backward rounding error analysis and with knowledge of data accuracy.
Since this argument does not depend on orthogonality, $1 can be used in any method for solving compatible linear systems.

6.2 Incompatible Systems

In their particular method it happens that Ekxk ffi 0, since (5.3) shows that x~ ffi Vkyk is theoretically orthogonal to ATrk.
This strengthens the case for using rule $2.

6.4 Singular Systems

For large enough k, and hence will lie very nearly in the null space of A.
The vector 2k may reveal to the user where certain unexpected dependencies exist.

7. OTHER METHODS

Several other conjugate-gradient methods are discussed here.
All except the first (CGLS) are stated using notation consistent with Sections 3-6 ih order to illustrate certain analytic identities.

Algorithm C G L S

Otherwise the method is clearly simple and economical.
The vectors v,+~ and d, in LSQR are proportional to si and p,, respectively.
Note that qi and s, just given can share the same workspace.
A FORTRAN implementation of CGLS has been given by Bj6rck and Elfving [3] .
This incorporates an acceleration technique in a way that requires minimal additional storage.

7.7 Storage and Work

All methods require one product Av and one product ATo each iteration.
This could dominate the work requirements in some applications.

8.1 Generation of Test Problems

It is clear that results obtained from them could be very misleading.
This possibility is discussed in Section 8.4.
In fairness, Chen [4] did not intend RRLS to be applied to compatible systems.
At first sight it may appear that the residual-reducing methods possess some advantage on least-squares problems.
This anomalous behavior cannot be guaranteed; for example, it did not occur on P(80, 40, 4, 6), as shown in obtain more accurate solutions in fewer iterations.

9. SUMMARY

(1) The symmetric conjugate-gradient algorithm should be applied to the normal equations ATAx -AWb only if there is reason to believe that very few iterations will produce a satisfactory estimate of x. (2) The least-squares adaptation of symmetric CG will always be more reliable, at the expense of slightly more storage and work per iteration.
(This is the algorithm of Hestenes and Stiefel, described in Section 7.1 of this paper as algorithm CGLS.).

Did you find this useful? Give us your feedback

Content maybe subject to copyright Report

LSQR: An Algorithm for Sparse Linear

Equations and Sparse Least Squares

CHRISTOPHER C. PAIGE

McGill University, Canada

and

MICHAEL A. SAUNDERS

Stanford University

An iterative method is given for solving

~ffi b and minU

- b 112, where the matrix A is large and

sparse. The method is based on the bidiagonalization procedure of Golub and Kahan. It is analytically

equivalent to the standard method of conjugate gradients, but possesses more favorable numerical

properties.

Reliable stopping criteria are derived, along with estimates of standard errors for x and the

condition number of A. These are used in the FORTRAN implementation of the method, subroutine

LSQR. Numerical tests are described comparing I~QR with several other conjugate-gradient algori-

thms, indicating that I~QR is the most reliable algorithm when A is ill-conditioned.

Categories and Subject Descriptors: G.1.2 [Numerical Analysis]:

ApprorJmation--least squares

approximation;

G.1.3 [Numerical Analysis]: Numerical Linear

Algebra--linear systems (direct and

tterative methods); sparse and very large systems

General Terms: Algorithms

Additional Key Words and Phrases: analysis of variance

The Algorithm:

LSQR:

Sparse Linear Equations and Least Square Problems.

ACM Trans. Math.

Softw.

8, 2 (June 1982).

1. INTRODUCTION

A numerical method is presented here for computing a solution x to either of the

following problems:

Unsymmetric equations: solve

Ax ffi b

Linear least squares: minimize ][

- b 112

This work was supported in part by the Natural Sciences and Engineering Research Council of

Canada Grant A8652, the New Zealand Department of Scientific and Industrial Research, the U.S.

Office of Naval Research under Contract N00014-75-C-0267, National Science Foundation Grants

MCS 76-20019 and ENG 77-06761, and the Department of Energy under Contract DE.AC03-

76SF00326, PA No. DE-AT03-76ER72018.

Authors' addresses: C.C. Paige, School of Computer Science, McGill University, Montreal, P.Q.,

Canada H3C 3G1; M.A. Saunders, Department of Operations Research, Stanford University, Stanford,

CA 94305.

Permission to copy without fee all or part of this material is granted provided that the copies are not

made or distributed for direct commercial advantage, the ACM copyright notice and the title of the

publication and its date appear, and notice is given that copying is by permission of the Association

for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific

permission.

ACM Transactions on Mathematwal Software, Vol 8, No. 1, March 1982, Pages 43-71.

44 • C.C. Paige and M. A. Saunders

where A is a real matrix with m rows and n columns and b is a real vector. It will

usually be true that m _> n and rank(A) = n, but these conditions are not

essential. The method, to be called algorithm LSQR, is similar in style to the

well-known method of conjugate gradients (CG) as applied to the least-squares

problem [10]. The matrix A is used only to compute products of the form

and

ATu for various vectors v and u. Hence A will normally be large and sparse or will

be expressible as a product of matrices that are sparse or have special structure.

A typical application is to the large least-squares problems arising from analysis

of variance (6.g., [12]).

CG-like methods are iterative in nature. They are characterized by

their

need

for only a few vectors of working storage and by their theoretical convergence

within at most n iterations (if exact arithmetic could be performed). In practice

such methods may require far fewer or far more than n iterations to reach an

acceptable approximation to x. The methods are most useful when A is well

conditioned and has many nearly equal singular values. These properties occur

naturally in many applications. In other cases it is often possible to divide the

solution procedure into a direct and an iterative part, such that the iterative part

has a better conditioned matrix for which CG-like methods will converge more

quickly. Some such transformation methods are considered in [21].

Algorithm LSQR is based on the bidiagonalization procedure of Golub and

Kahan [9]. It generates a sequence of approximations {xk } such that the residual

norm

rk [[2 decreases monotonically, where

rk = b - Axk.

Analytically, the

sequence (xh} is identical to the sequence generated by the standard CG algo-

rithm and by several other published algorithms. However, LSQR is shown by

example to be numerically more reliable in various circumstances than the other

methods considered.

The FORTRAN implementation of LSQR [22] is designed for practical appli-

cation. It incorporates reliable stopping criteria and provides the user with

computed estimates of the following quantities:

x, r = b - Ax, A

Tr,

II r 112, It A II

standard errors for

and the condition number of A.

1.1

Notation

Matrices are denoted by A, B ..... vectors by v, w,..., and scalars by ~, fl,....

Two exceptions are c and s, which denote the significant components of an

elementary orthogonal matrix, such that c 2 + s 2 = 1. For a vector v, H v [[ always

denotes the Euclidean norm H v [[2

(vTv) 1/2.

For a matrix A, [[A [[ will usually

mean the Frobenius norm, HA

HF = (~a2) 1/2,

and the condition number for an

unsymmetric matrix A is defined by cond(A) = ]] A ]] II A + ]], where A + denotes the

pseudoinverse of A. The relative precision of floating-point arithmetic is e, the

smallest machine-representable number such that 1 + e > 1.

2. MOTIVATION VIA THE LANCZOS PROCESS

In this section we review the symmetric Lanczos process [13] and its use in

solving symmetric linear equations

Bx = b.

Algorithm LSQR is then derived by

applying the Lanczos process to a particular symmetric system. Although a more

direct development is given in Section 4, the present derivation may remain

useful for a future error analysis of LSQR, since many of the rounding error

properties of the Lanczos process are already known [18].

ACM Transactions on Mathematmal Software, Vol 8, No 1, March 1982.

LSQR: An Algorithm for Sparse Linear Equattons and Sparse Least Squares • 45

Given a symmetric matrix B and a starting vector

the Lanczos process is a

method for generating a sequence of vectors { v,) and scalars { a, ), (fli} such that

B is reduced to tridiagonal form. A reliable computational form of the method is

the following.

The Lanczos process

(reduction to tridiagonal form):

fll vl -- b,

w, = Bvt - fl, v,-~ ]

a, vTw'

I' i= 1, 2,..., (2.1)

~t+l

Vt+l Wt ~ OltVt

where vo - 0 and each fl, _> 0 is chosen so that

II v, II

= 1

(i > 0).

The situation after

k steps is summarized by

BVk = Vk Tk + flk+lvk+le T

(2.2)

where Tk - tridiag(fl, a,, fl,+l) and Vk ~ [Vl, v2 ..... vk]. If there were no rounding

error we would have V T Vk = I, and the process would therefore terminate with

fl,+~ = 0 for some k ___ n. Some other stopping criterion is needed in practice, since

ilk+, is unlikely to be negligible for any k. In any event, eq. (2.2) holds to within

machine precision.

Now suppose we wish to solve the symmetric system

Bx = b.

Multiplying (2.2)

by an arbitrary k-vector yk, whose last element is ~/h, gives

BVkyk ~- Vk Tkyk +

flk+~Vk+~k.

Since Vk (fl~ el) = b by definition, it follows that ifyk and xk are defined

by the equations

Thyk = file1,

(2.3)

xk = Vkyk,

then we shall have

Bxk = b + ~lkflk+lVk+~

to working accuracy. Hence xk may be

taken as the exact solution to a perturbed system and will solve the original

system whenever 7/kflk+l is negligibly small.

The above arguments are not complete, but they provide at least some

motivation for defining the sequence of vectors

{Xk}

according to (2.3). It is now

possible to derive several iterative algorithms for solving

Bx = b,

each character-

ized by the manner in which

is eliminated from (2.3) (since it is usually not

practical to compute each

explicitly). In particular, the method of conjugate

gradients is known to be equivalent to using the Cholesky factorization Tk

LkDk L~ and is reliable when B (and hence Tk) is positive definite, while algorithm

SYMMLQ employs the orthogonal factorization Tk --/:k Q~ to retain stability for

arbitrary symmetric B. (See [20] for further details of these methods.)

2.1 The Least-Squares System

We now turn to a particular system that arises from the "damped least-squares"

problem

mioll[:l]X-[:]ll: ,2,,

ACM Transactmns on Mathematmal Software, Vol. 8, No. 1, March 1982

46 C.C. Paige and M. A. Saunders

where A and b are given data and ~ is an arbitrary real scalar. The solution of

(2.4) satisfies the symmetric (but indefinite) system

A r

where r is the residual vector

b -

Ax.

When the Lanczos process is applied to this

system, a certain structure appears in relations (2.1)-(2.3). In particular, (2.1)

reduces to the procedure defined as Bidiag 1 in Section 3, while (2.3) can be

permuted after 2k + 1 iterations to the form

[' 1P +,l _-

(2.6)

Irk] ffi [ U~+I 0 lrt.+,l,

V,j L

where

is (k + 1) x k and lower bidiagonal. We see that y~ is the solution of

another damped least-squares problem,

minl][B;]yk-[fl~l][[2 ,

(2.7)

which can be solved reliably using orthogonal transformations. This observation

forms the basis of algorithm LSQR.

2.2 A Different Starting Vector

For completeness we note that a second least-squares algorithm can be derived

in an analogous way. Defining s =

-Ax,

we can write (2.5) as

A s

and apply the Lanczos process again, using the same matrix as before but with

the new starting vector shown. This time, (2.1) reduces to Bidiag 2 as defined in

Section 3, while (2.3) can be permuted to the form

-x IJLy J -O,e, '

(2.9)

x, V, JLyk j '

after 2k iterations, where Rk is k × k and upper bidiagonal. (The odd-numbered

systems are not useful because

T2k-~

is singular when ~ = 0.) We see that yk

satisfies the system

(RTRk + ~2I)yk

ffi 01el, (2.10)

which could easily be solved. However, (2.10) proves to be one form of the normal

equations associated with (2.7), and the algorithm it suggests for computing x

ACM Transactlov.s on Mathematical Software, VoL 8, No. 1, March 1982.

LSQR: An Algorithm for Sparse Linear Equations and Sparse Least Squares • 47

therefore has unsatisfactory numerical properties. We clarify this matter in

Section 7.4.

2.3 The Role of

The quantities generated by the Lanczos process from (2.5) and (2.8) are Bk, Uk+~,

Vk and

Rk, Pk, Vk,

respectively. These are all

independent of k,

which means

that they are the same as those generated when k ffi 0. We shall therefore assume

from now on that k ffi 0. A given k can be accommodated when solving (2.7), as

shown in [22]. Methods for actually choosing ?, are beyond the scope of this

paper.

3. THE BIDIAGONALIZATION PROCEDURES

The preceding use of the Lanczos process results in two forms of a bidiagonali-

zation procedure due to Golub and Kahan [9]. We state these forms as procedures

Bidiag 1 and 2, and then give some unexpected relationships between them.

Bidiag I

(starting vector b; reduction to lower bidiagonal form):

fllul = b, alvl = A Tul.

fl,+lu,+~

= Av, - a,u,

~, i ffi 1, 2 ..... (3.1)

0~+1V~+l = ATu~+I ~ flt+lVz

)

The scalars a, _> 0 and fl, _> 0 are chosen so that [[ u,[I -- [[

viii

ffi 1. With the

definitions

Uk ~-

[Ul, U2, • •.,

Uk], B2 012

Vk ~

[Vl, V2,

V, ], Bk -- [33 "'. '

• '', ". Otk

(where

is the rectangular matrix introduced in Section 2), the recurrence

relations (3.1) may be rewritten as

Uk+~(fl~e~) ffi b,

(3.2)

A Vk -~ Uk+lBk,

(3.3)

ATUk+I -~ VkB T + Olk+lVk+lek+l.

(3.4)

T V

If exact arithmetic were used, then we would also have UT+~ Uk+~ ffi I and Vk k

= /, but, in any event, the previous equations hold to within machine precision.

Bidiag 2

(starting vector

ATb;

reduction to upper bidiagonal form):

Oavl ffi ATb, p~pl ffi Avl.

O,+lV,+~ ffi ATp, -- p,v,

~, i ffi 1, 2 ..... (3.5)

p,+lp,+l = Av,+l -- 0,+1p,

ACM Transactions on Mathematmal Software, VoL 8, No 1, March 1982.

HTML Viewer

LSQR: An Algorithm for Sparse Linear Equations and Sparse Least Squares

Summary (3 min read)

L i n e a r least squares: m i n i m i z e ][ A x -b 112

2. MOTIVATION VIA THE LANCZOS PROCESS

mioll[:l]X-[:]ll: ,2,,

3.1 Relationship Between the Bidiagonalizations

Thus

5. ESTIMATION OF NORMS

5.4 Standard Errors

6. STOPPING CRITERIA

6.2 Incompatible Systems

6.4 Singular Systems

7. OTHER METHODS

Algorithm C G L S

7.7 Storage and Work

8.1 Generation of Test Problems

9. SUMMARY

Citations

Cites methods from "LSQR: An Algorithm for Sparse Linea..."

Cites methods from "LSQR: An Algorithm for Sparse Linea..."

References

"LSQR: An Algorithm for Sparse Linea..." refers methods in this paper

"LSQR: An Algorithm for Sparse Linea..." refers methods in this paper

Related Papers (5)