scispace - formally typeset
Open AccessJournal ArticleDOI

LSQR: An Algorithm for Sparse Linear Equations and Sparse Least Squares

Reads0
Chats0
TLDR
Numerical tests are described comparing I~QR with several other conjugate-gradient algorithms, indicating that I ~QR is the most reliable algorithm when A is ill-conditioned.
Abstract
An iterative method is given for solving Ax ~ffi b and minU Ax b 112, where the matrix A is large and sparse. The method is based on the bidiagonalization procedure of Golub and Kahan. It is analytically equivalent to the standard method of conjugate gradients, but possesses more favorable numerical properties. Reliable stopping criteria are derived, along with estimates of standard errors for x and the condition number of A. These are used in the FORTRAN implementation of the method, subroutine LSQR. Numerical tests are described comparing I~QR with several other conjugate-gradient algorithms, indicating that I~QR is the most reliable algorithm when A is ill-conditioned.

read more

Content maybe subject to copyright    Report

LSQR: An Algorithm for Sparse Linear
Equations and Sparse Least Squares
CHRISTOPHER C. PAIGE
McGill University, Canada
and
MICHAEL A. SAUNDERS
Stanford University
An iterative method is given for solving
Ax
~ffi b and minU
Ax
- b 112, where the matrix A is large and
sparse. The method is based on the bidiagonalization procedure of Golub and Kahan. It is analytically
equivalent to the standard method of conjugate gradients, but possesses more favorable numerical
properties.
Reliable stopping criteria are derived, along with estimates of standard errors for x and the
condition number of A. These are used in the FORTRAN implementation of the method, subroutine
LSQR. Numerical tests are described comparing I~QR with several other conjugate-gradient algori-
thms, indicating that I~QR is the most reliable algorithm when A is ill-conditioned.
Categories and Subject Descriptors: G.1.2 [Numerical Analysis]:
ApprorJmation--least squares
approximation;
G.1.3 [Numerical Analysis]: Numerical Linear
Algebra--linear systems (direct and
tterative methods); sparse and very large systems
General Terms: Algorithms
Additional Key Words and Phrases: analysis of variance
The Algorithm:
LSQR:
Sparse Linear Equations and Least Square Problems.
ACM Trans. Math.
Softw.
8, 2 (June 1982).
1. INTRODUCTION
A numerical method is presented here for computing a solution x to either of the
following problems:
Unsymmetric equations: solve
Ax ffi b
Linear least squares: minimize ][
Ax
- b 112
This work was supported in part by the Natural Sciences and Engineering Research Council of
Canada Grant A8652, the New Zealand Department of Scientific and Industrial Research, the U.S.
Office of Naval Research under Contract N00014-75-C-0267, National Science Foundation Grants
MCS 76-20019 and ENG 77-06761, and the Department of Energy under Contract DE.AC03-
76SF00326, PA No. DE-AT03-76ER72018.
Authors' addresses: C.C. Paige, School of Computer Science, McGill University, Montreal, P.Q.,
Canada H3C 3G1; M.A. Saunders, Department of Operations Research, Stanford University, Stanford,
CA 94305.
Permission to copy without fee all or part of this material is granted provided that the copies are not
made or distributed for direct commercial advantage, the ACM copyright notice and the title of the
publication and its date appear, and notice is given that copying is by permission of the Association
for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific
permission.
© 1982 ACM 0098-3500/82/0300-0043 $00.75
ACM Transactions on Mathematwal Software, Vol 8, No. 1, March 1982, Pages 43-71.

44 C.C. Paige and M. A. Saunders
where A is a real matrix with m rows and n columns and b is a real vector. It will
usually be true that m _> n and rank(A) = n, but these conditions are not
essential. The method, to be called algorithm LSQR, is similar in style to the
well-known method of conjugate gradients (CG) as applied to the least-squares
problem [10]. The matrix A is used only to compute products of the form
Av
and
ATu for various vectors v and u. Hence A will normally be large and sparse or will
be expressible as a product of matrices that are sparse or have special structure.
A typical application is to the large least-squares problems arising from analysis
of variance (6.g., [12]).
CG-like methods are iterative in nature. They are characterized by
their
need
for only a few vectors of working storage and by their theoretical convergence
within at most n iterations (if exact arithmetic could be performed). In practice
such methods may require far fewer or far more than n iterations to reach an
acceptable approximation to x. The methods are most useful when A is well
conditioned and has many nearly equal singular values. These properties occur
naturally in many applications. In other cases it is often possible to divide the
solution procedure into a direct and an iterative part, such that the iterative part
has a better conditioned matrix for which CG-like methods will converge more
quickly. Some such transformation methods are considered in [21].
Algorithm LSQR is based on the bidiagonalization procedure of Golub and
Kahan [9]. It generates a sequence of approximations {xk } such that the residual
norm
II
rk [[2 decreases monotonically, where
rk = b - Axk.
Analytically, the
sequence (xh} is identical to the sequence generated by the standard CG algo-
rithm and by several other published algorithms. However, LSQR is shown by
example to be numerically more reliable in various circumstances than the other
methods considered.
The FORTRAN implementation of LSQR [22] is designed for practical appli-
cation. It incorporates reliable stopping criteria and provides the user with
computed estimates of the following quantities:
x, r = b - Ax, A
Tr,
II r 112, It A II
F,
standard errors for
x,
and the condition number of A.
1.1
Notation
Matrices are denoted by A, B ..... vectors by v, w,..., and scalars by ~, fl,....
Two exceptions are c and s, which denote the significant components of an
elementary orthogonal matrix, such that c 2 + s 2 = 1. For a vector v, H v [[ always
denotes the Euclidean norm H v [[2
=
(vTv) 1/2.
For a matrix A, [[A [[ will usually
mean the Frobenius norm, HA
HF = (~a2) 1/2,
and the condition number for an
unsymmetric matrix A is defined by cond(A) = ]] A ]] II A + ]], where A + denotes the
pseudoinverse of A. The relative precision of floating-point arithmetic is e, the
smallest machine-representable number such that 1 + e > 1.
2. MOTIVATION VIA THE LANCZOS PROCESS
In this section we review the symmetric Lanczos process [13] and its use in
solving symmetric linear equations
Bx = b.
Algorithm LSQR is then derived by
applying the Lanczos process to a particular symmetric system. Although a more
direct development is given in Section 4, the present derivation may remain
useful for a future error analysis of LSQR, since many of the rounding error
properties of the Lanczos process are already known [18].
ACM Transactions on Mathematmal Software, Vol 8, No 1, March 1982.

LSQR: An Algorithm for Sparse Linear Equattons and Sparse Least Squares 45
Given a symmetric matrix B and a starting vector
b,
the Lanczos process is a
method for generating a sequence of vectors { v,) and scalars { a, ), (fli} such that
B is reduced to tridiagonal form. A reliable computational form of the method is
the following.
The Lanczos process
(reduction to tridiagonal form):
fll vl -- b,
w, = Bvt - fl, v,-~ ]
a, vTw'
I' i= 1, 2,..., (2.1)
~t+l
Vt+l Wt ~ OltVt
where vo - 0 and each fl, _> 0 is chosen so that
II v, II
= 1
(i > 0).
The situation after
k steps is summarized by
BVk = Vk Tk + flk+lvk+le T
(2.2)
where Tk - tridiag(fl, a,, fl,+l) and Vk ~ [Vl, v2 ..... vk]. If there were no rounding
error we would have V T Vk = I, and the process would therefore terminate with
fl,+~ = 0 for some k ___ n. Some other stopping criterion is needed in practice, since
ilk+, is unlikely to be negligible for any k. In any event, eq. (2.2) holds to within
machine precision.
Now suppose we wish to solve the symmetric system
Bx = b.
Multiplying (2.2)
by an arbitrary k-vector yk, whose last element is ~/h, gives
BVkyk ~- Vk Tkyk +
flk+~Vk+~k.
Since Vk (fl~ el) = b by definition, it follows that ifyk and xk are defined
by the equations
Thyk = file1,
(2.3)
xk = Vkyk,
then we shall have
Bxk = b + ~lkflk+lVk+~
to working accuracy. Hence xk may be
taken as the exact solution to a perturbed system and will solve the original
system whenever 7/kflk+l is negligibly small.
The above arguments are not complete, but they provide at least some
motivation for defining the sequence of vectors
{Xk}
according to (2.3). It is now
possible to derive several iterative algorithms for solving
Bx = b,
each character-
ized by the manner in which
yk
is eliminated from (2.3) (since it is usually not
practical to compute each
Yk
explicitly). In particular, the method of conjugate
gradients is known to be equivalent to using the Cholesky factorization Tk
LkDk L~ and is reliable when B (and hence Tk) is positive definite, while algorithm
SYMMLQ employs the orthogonal factorization Tk --/:k Q~ to retain stability for
arbitrary symmetric B. (See [20] for further details of these methods.)
2.1 The Least-Squares System
We now turn to a particular system that arises from the "damped least-squares"
problem
mioll[:l]X-[:]ll: ,2,,
ACM Transactmns on Mathematmal Software, Vol. 8, No. 1, March 1982

46 C.C. Paige and M. A. Saunders
where A and b are given data and ~ is an arbitrary real scalar. The solution of
(2.4) satisfies the symmetric (but indefinite) system
A r
where r is the residual vector
b -
Ax.
When the Lanczos process is applied to this
system, a certain structure appears in relations (2.1)-(2.3). In particular, (2.1)
reduces to the procedure defined as Bidiag 1 in Section 3, while (2.3) can be
permuted after 2k + 1 iterations to the form
[' 1P +,l _-
(2.6)
Irk] ffi [ U~+I 0 lrt.+,l,
x,
V,j L
Y'
J
where
Bk
is (k + 1) x k and lower bidiagonal. We see that y~ is the solution of
another damped least-squares problem,
minl][B;]yk-[fl~l][[2 ,
(2.7)
which can be solved reliably using orthogonal transformations. This observation
forms the basis of algorithm LSQR.
2.2 A Different Starting Vector
For completeness we note that a second least-squares algorithm can be derived
in an analogous way. Defining s =
-Ax,
we can write (2.5) as
A s
and apply the Lanczos process again, using the same matrix as before but with
the new starting vector shown. This time, (2.1) reduces to Bidiag 2 as defined in
Section 3, while (2.3) can be permuted to the form
-x IJLy J -O,e, '
(2.9)
x, V, JLyk j '
after 2k iterations, where Rk is k × k and upper bidiagonal. (The odd-numbered
systems are not useful because
T2k-~
is singular when ~ = 0.) We see that yk
satisfies the system
(RTRk + ~2I)yk
ffi 01el, (2.10)
which could easily be solved. However, (2.10) proves to be one form of the normal
equations associated with (2.7), and the algorithm it suggests for computing x
ACM Transactlov.s on Mathematical Software, VoL 8, No. 1, March 1982.

LSQR: An Algorithm for Sparse Linear Equations and Sparse Least Squares 47
therefore has unsatisfactory numerical properties. We clarify this matter in
Section 7.4.
2.3 The Role of
The quantities generated by the Lanczos process from (2.5) and (2.8) are Bk, Uk+~,
Vk and
Rk, Pk, Vk,
respectively. These are all
independent of k,
which means
that they are the same as those generated when k ffi 0. We shall therefore assume
from now on that k ffi 0. A given k can be accommodated when solving (2.7), as
shown in [22]. Methods for actually choosing ?, are beyond the scope of this
paper.
3. THE BIDIAGONALIZATION PROCEDURES
The preceding use of the Lanczos process results in two forms of a bidiagonali-
zation procedure due to Golub and Kahan [9]. We state these forms as procedures
Bidiag 1 and 2, and then give some unexpected relationships between them.
Bidiag I
(starting vector b; reduction to lower bidiagonal form):
fllul = b, alvl = A Tul.
fl,+lu,+~
= Av, - a,u,
~, i ffi 1, 2 ..... (3.1)
0~+1V~+l = ATu~+I ~ flt+lVz
)
The scalars a, _> 0 and fl, _> 0 are chosen so that [[ u,[I -- [[
viii
ffi 1. With the
definitions
Uk ~-
[Ul, U2, •.,
Uk], B2 012
Vk ~
[Vl, V2,
V, ], Bk -- [33 "'. '
'', ". Otk
(where
Bk
is the rectangular matrix introduced in Section 2), the recurrence
relations (3.1) may be rewritten as
Uk+~(fl~e~) ffi b,
(3.2)
A Vk -~ Uk+lBk,
(3.3)
T
ATUk+I -~ VkB T + Olk+lVk+lek+l.
(3.4)
T V
If exact arithmetic were used, then we would also have UT+~ Uk+~ ffi I and Vk k
= /, but, in any event, the previous equations hold to within machine precision.
Bidiag 2
(starting vector
ATb;
reduction to upper bidiagonal form):
Oavl ffi ATb, p~pl ffi Avl.
O,+lV,+~ ffi ATp, -- p,v,
~, i ffi 1, 2 ..... (3.5)
p,+lp,+l = Av,+l -- 0,+1p,
J
ACM Transactions on Mathematmal Software, VoL 8, No 1, March 1982.

Citations
More filters
Journal ArticleDOI

Mapping P-wave azimuthal anisotropy in the crust and upper mantle beneath the United States

TL;DR: In this paper, the authors used a large number of arrival-time data from local and distant earthquakes recorded by the USArray to determine the first P-wave azimuthal anisotropy tomography of the crust and upper mantle beneath the US.

Dynamic OD demand estimation and prediction for dynamic traffic management

Tamara Djukic
TL;DR: This thesis presents methods regarding the provision of efficient and reliable dynamic OD demand information for DTMS applications.
Journal ArticleDOI

Curvelet-based migration preconditioning and scaling

TL;DR: This work applies preconditioning, which significantly improves convergence of least-squares migration and corrections for the order of the migration operator, corrections for spherical spreading, and position- and reflector-dip-dependent amplitude errors.
Journal ArticleDOI

Italian and Alpine three-dimensional crustal structure imaged by ambient-noise surface-wave dispersion

TL;DR: In this paper, the 3D crustal structure (S wave velocity) under Italy and the Alpine region was derived using the database of ambient noise Rayleigh-wave phase and group velocity observations.
Journal ArticleDOI

Lithospheric deformation beneath the San Gabriel Mountains in the southern California Transverse Ranges

TL;DR: In this article, the authors found that the high-velocity, high-density upper mantle anomaly comprises a 60-80 km wide sheet of mantle material that lies directly below a substantial crustal root in the San Gabriel Mountains.
References
More filters
Journal ArticleDOI

Methods of Conjugate Gradients for Solving Linear Systems

TL;DR: An iterative algorithm is given for solving a system Ax=k of n linear equations in n unknowns and it is shown that this method is a special case of a very general method which also includes Gaussian elimination.
Book

The algebraic eigenvalue problem

TL;DR: Theoretical background Perturbation theory Error analysis Solution of linear algebraic equations Hermitian matrices Reduction of a general matrix to condensed form Eigenvalues of matrices of condensed forms The LR and QR algorithms Iterative methods Bibliography.
Journal ArticleDOI

An iteration method for the solution of the eigenvalue problem of linear differential and integral operators

TL;DR: In this article, a systematic method for finding the latent roots and principal axes of a matrix, without reducing the order of the matrix, has been proposed, which is characterized by a wide field of applicability and great accuracy, since the accumulation of rounding errors is avoided, through the process of minimized iterations.
Journal ArticleDOI

Calculating the Singular Values and Pseudo-Inverse of a Matrix

TL;DR: In this article, a numerically stable and fairly fast scheme is described to compute the unitary matrices U and V which transform a given matrix A into a diagonal form π = U^ * AV, thus exhibiting A's singular values on π's diagonal.