scispace - formally typeset
Open AccessJournal ArticleDOI

The Scaling and Squaring Method for the Matrix Exponential Revisited

Nicholas J. Higham
- 01 Nov 2009 - 
- Vol. 51, Iss: 4, pp 747-764
Reads0
Chats0
TLDR
A new backward error analysis of the scaling and squaring method is given that employs sharp bounds for the truncation errors and leads to an implementation of essentially optimal efficiency.
Abstract
The scaling and squaring method is the most widely used method for computing the matrix exponential, not least because it is the method implemented in the MATLAB function expm. The method scales the matrix by a power of 2 to reduce the norm to order 1, computes a Pade approximant to the matrix exponential, and then repeatedly squares to undo the effect of the scaling. We give a new backward error analysis of the method (in exact arithmetic) that employs sharp bounds for the truncation errors and leads to an implementation of essentially optimal efficiency. We also give a new rounding error analysis that shows the computed Pade approximant of the scaled matrix to be highly accurate. For IEEE double precision arithmetic the best choice of degree of Pade approximant turns out to be 13, rather than the 6 or 8 used by previous authors. Our implementation of the scaling and squaring method always requires at least two fewer matrix multiplications than the expm function in MATLAB 7.0 when the matrix norm exceeds 1, which can amount to a 37% saving in the number of multiplications, and it is typically more accurate, owing to the fewer required squarings. We also investigate a different scaling and squaring algorithm proposed by Najfeld and Havel that employs a Pade approximation to the function $x \coth(x)$. This method is found to be essentially a variation of the standard one with weaker supporting error analysis.

read more

Content maybe subject to copyright    Report

The Scaling and Squaring Method for the Matrix
Exponential Revisited
Higham, Nicholas J.
2009
MIMS EPrint: 2009.86
Manchester Institute for Mathematical Sciences
School of Mathematics
The University of Manchester
Reports available from: http://eprints.maths.manchester.ac.uk/
And by contacting: The MIMS Secretary
School of Mathematics
The University of Manchester
Manchester, M13 9PL, UK
ISSN 1749-9097

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
The calculation of the matrix exponential e
A
maybeoneofthebestknownmatrix
problems in numerical computation. It achieved folk status in our community from the
paper by Moler and Van Loan, “Nineteen Dubious Ways to Compute the Exponential
of a Matrix, published in this journal in 1978 (and revisited in this journal in 2003).
The matrix exponential is utilized in a wide variety of numerical methods for solving
differential equations and many other areas.
It is somewhat amazing given the long history and extensive study of the matrix
exponential problem that one can improve upon the best existing methods in terms of
both accuracy and efficiency, but that is what the SIGEST selection in this issue does.
“The Scaling and Squaring Method for the Matrix Exponential Revisited” by N. Higham,
originally published in the SIAM Journal on Matrix Analysis and Applications in 2005, applies
a new backward error analysis to the commonly used scaling and squaring method, as
well as a new rounding error analysis of the Pad´e approximant of the scaled matrix.
The analysis shows, and the accompanying experimental results verify, that a Pad´e
approximant of a higher order than currently used actually results in a more accurate
and efficient algorithm, due to the need to perform fewer matrix multiplications and
fewer squarings.
SIGEST papers are expected to combine important research, broad applicability,
and excellent exposition. In addition to the first two qualities that are evident from
the comments above, it is no surprise that this paper exhibits the latter, since the
author literally wrote the book—that is, the popular SIAM book, Handbook of Writing
for the Mathematical Sciences. The paper is a true pleasure to read, and the author has
added an extensive preamble for the SIGEST version that makes the topic even more
accessible to new audiences as well as commenting upon subsequent work.
This paper should become an excellent choice in teaching numerical analysis as
well as essential reading for researchers in numerical linear algebra. Its combination
of practical algorithmic issues such as scaling, numerical anal ysis techniques such as
backward error analysis, and a carefully conducted and presented computational study
make it a true exemplar for our field. One question remains, however, after all these
years: should we still assert that all the methods for calculating the matrix exponential
are dubious?
The Editors
745

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
SIAM REVIEW
c
2009 Society for Industrial and Applied Mathematics
Vol. 51, No. 4, pp. 747–764
The Scaling and Squaring
Method for the Matrix
Exponential Revisited
Nicholas J. Higham
Abstract. The scaling and squaring method is the most widely used method for computing the
matrix exponential, not least because it is the method implemented in the MATLAB
function expm. The method scales the matrix by a power of 2 to reduce the norm to
order 1, computes a Pad´e approximant to the matrix exponential, and then repeatedly
squares to undo the effect of the scaling. We give a new backward error analysis of the
method (in exact arithmetic) that employs sharp bounds for the truncation errors and
leads to an implementation of essentially optimal efficiency. We also give a new rounding
error analysis that shows the computed Pad´e approximant of the scaled matrix to be
highly accurate. For IEEE double precision arithmetic the best choice of degree of Pad´e
approximant turns out to be 13, rather than the 6 or 8 used by previous authors. Our
implementation of the scaling and squaring method always requires at least two fewer
matrix multiplications than the expm function in MATLAB 7.0 when the matrix norm
exceeds 1, which can amount to a 37% saving in the number of multiplications, and it
is typically more accurate, owing to the fewer required squarings. We also investigate
a different scaling and squaring algorithm proposed by Najfeld and Havel that employs
aPad´e approximation to the function x coth(x). This method is found to be essentially a
variation of the standard one with weaker supporting error analysis.
Key words. matrix function, matrix exponential, Pad´e approximation, matrix polynomial evaluation,
scaling and squaring method, MATLAB, expm, backward error analysis, performance
profile
AMS subject classification. 65F30
DOI. 10.1137/090768539
Preamble to SIGEST Article. I remember, as a graduate student in the early
1980s, queuing up in the university library to photocopy Moler and Van Loan’s “Nine-
teen Dubious Ways to Compute the Exponential of a Matrix” [28]. I was attracted by
the intriguing title, having already had my interest in the exponential piqued by Van
Loan’s earlier notes [35]. The paper more than lived up to its title, and it remains
for me one of the classics in numerical analysis. When I started to write Functions of
Matrices [21] in 2003 the chapter on the exponential was one of the first I attempted.
As I drafted a section on the scaling and squaring method, for which the basic algo-
rithm had remained unchanged through to the 2003 update of Moler and Van Loan’s
paper [29], I had some difficulty in justifying the choice of parameters used in existing
Published electronically November 6, 2009. This paper originally appeared in SIAM Journal on
Matrix Analysis and Applications, Volume 26, Number 4, 2005, pages 1179–1193. This work was
supported by Engineering and Physical Sciences Research Council grant GR/T08739 and by a Royal
Society-Wolfson Research Merit Award.
http://www.siam.org/journals/sirev/51-4/76853.html
School of Mathematics, The University of Manchester, Manchester, M13 9PL, UK (higham@ma.
man.ac.uk, http://www.ma.man.ac.uk/
higham/).
747

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
748 NICHOLAS J. HIGHAM
implementations. It was through studying and reworking the analysis in [28] that I
developed the ideas that led to this SIMAX article.
But what is the matrix exponential and why is it so important? The exponential
of a matrix was first introduced by Laguerre [26] in 1867, who gave the now standard
power series definition: for A C
n×n
,
(0.1) e
A
= I + A +
A
2
2!
+
A
3
3!
+ ··· .
One of the first references to emphasize the important role of the matrix exponential
in solving differential equations was the 1938 book Elementary Matrices and Some
Applications to Dynamics and Differential Equations [13] by Frazer, Duncan, and
Collar, who were in the Aerodynamics Department at the National Physical Labora-
tory in England. The use of e
A
in the practical solution of all kinds of differential
equations is of course now ubiquitous, but other applications have come along that
have no immediate connection with differential equations. I will mention just one,
which is very recent.
Consider a network representing interactions between pairs of entities in a system.
In recent years much work has focused on identifying computable measures that quan-
tify characteristics of the network. Many measures are available in the literature, and
they are typically expressed in terms of the network’s associated undirected graph G
with n nodes. The adjacency matrix A R
n×n
of the graph has (i, j)elementequal
to 1 if nodes i and j are connected and 0 otherwise. Assume a
ii
0, so that there are
no loops in the graph. A walk of length m between two nodes i and j is an ordered list
of nodes i, k
1
, k
2
,...,k
m1
, j such that successive nodes in the list are connected; the
nodes need not be distinct and any of them may be i or j.Wheni = j the walk starts
and ends at the same node and is called closed. The walk is a path if all the nodes
in the walk are distinct. Assume that the graph is connected, so that a path exists
between any two distinct nodes. It is a standard fact in graph theory that the (i, j)
element of A
m
is the number of different walks, if i = j,orclosedwalks,ifi = j,of
length m between nodes i and j. A variety of measures have been built by combining
different walk lengths into a single number. Estrada and Rodr´ıguez-Vel´azquez [11]
define the subgraph centrality of node i—a measure of its “well-connectedness”—by
SC
i
=
I + A +
A
2
2!
+
A
3
3!
+ ···
ii
=(e
A
)
ii
.
By combining walks of all possible lengths connecting node i to itself, and applying a
weighting that decreases rapidly with the walk length, the subgraph centrality aims
to capture the participation of the node in question in all subgraphs in the network.
The subgraph centrality has become known as the Estrada index. Based on similar
reasoning, Estrada and Hatano [8] define the communicability between nodes i and
j—a measure of how easy it is for “information” to pass from node i to node j—by
C
ij
=
I + A +
A
2
2!
+
A
3
3!
+ ···
ij
=(e
A
)
ij
.
Finally, the betweenness of node r is defined by Estrada, D. J. Higham, and Hatano
[10] by
1
(n 1)
2
(n 1)
i,j
i=j,i=r,j=r
(e
A
e
AE
r
)
ij
(e
A
)
ij
,

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
MATRIX EXPONENTIAL REVISITED 749
Ta b l e 0 . 1 Some formulae for e
A
.
Power series Limit Scaling and squaring
I + A +
A
2
2!
+
A
3
3!
+ ···
lim
s→∞
(I + A/s)
s
(e
A/2
s
)
2
s
Cauchy integral Jordan form Interpolation
1
2πi
Γ
e
z
(zI A)
1
dz Zdiag(e
J
k
)Z
1
n
i=1
f[λ
1
,...,λ
i
]
i1
j=1
(A λ
j
I)
Differential system Schur form Pad´e approximation
Y
(t)=AY (t),Y(0) = I Qe
T
Q
p
km
(A)q
km
(A)
1
where E
r
is zero except in row and column r, where it agrees with A. The betweenness
measures the relative change in communicability when node r is removed from the
network. Experiments in the papers cited above show that these three measures can
provide useful information about practically occurring networks that is not revealed
by most other measures. In this description A is symmetric, but these concepts can
be extended to directed graphs, for which the adjacency matrix is unsymmetric. Of
course, the matrix exponential owes its appearance to the choice of weights in the
sums over walk lengths. Other weights could be chosen, resulting in different matrix
functions in the definitions; see Estrada and D. J. Higham [9].
Table 0.1, taken from [21], summarizes a variety of formulae for e
A
, all of which
have been tried in the literature as the basis of a numerical method. The scaling and
squaring method, which combines scaling and squaring with Pad´e approximation, has
proved to be the most popular and generally applicable method.
The idea employed in this paper of adaptively choosing the degree of the Pad´e
approximant and the amount of scaling has been taken up in subsequent work on
other matrix functions, including the cosine and sine [16], [21, sect. 12.4] and the
logarithm [21, sect. 11.5]. Related to the exponential are the ψ functions, defined
explicitly as ψ
k
(z)=
j=0
z
j
/(j + k)!, k =0, 1, 2,..., or via the recurrence
ψ
k+1
(z)=
ψ
k
(z) 1/k!
z
0
(z)=e
z
.
They feature in exponential integrators—a broad class of numerical methods for solv-
ing an ordinary differential equation initial value problem in which the linear part of
the equation is treated exactly and the remaining nonlinear part is integrated numer-
ically. The algorithm here has been adapted for computation of the ψ functions by
Koikari [25] and Skaflestad and Wright [33].
Al-Mohy and Higham have extended this work in two recent papers. The first is
concerned with the Fechet derivative L at A in the direction E, which for a general
function f is defined by the condition that f (A + E) f(A) L(A, E)=o(E)for
all E. For the exponential, the Fr´echet derivative has the explicit representation
L(A, E)=
1
0
e
A(1s)
Ee
As
ds.
Al-Mohy and Higham [1] show how the algorithm in this paper can be “differentiated”
in order to obtain an algorithm that simultaneously computes e
A
and L(A, E)ata
cost about three times that of computing e
A
alone.

Citations
More filters
Journal ArticleDOI

Computing the Action of the Matrix Exponential, with an Application to Exponential Integrators

TL;DR: It is shown that the sums of the form $\sum_{k=0}^p \varphi_k(A)u_k$ that arise in exponential integrators can be expressed in terms of a single exponential of a matrix of dimension $n+p$ built by augmenting $A$ with additional rows and columns, and the algorithm of this paper can be employed.
Book

Functions of matrices

TL;DR: The most common matrix function is the matrix inverse, which is not treated specifically in this chapter, but is covered in Section~1.5 and Section~51.3 as discussed by the authors.
Journal ArticleDOI

A New Scaling and Squaring Algorithm for the Matrix Exponential

TL;DR: The numerical experiments show that the new algorithm generally provides accuracy at least as good as the existing algorithm of Higham at no higher cost, while for matrices that are triangular or cause overscaling it usually yields significant improvements in accuracy, cost, or both.
Journal ArticleDOI

A Fast and Log-Euclidean Polyaffine Framework for Locally Linear Registration

TL;DR: The results presented here on real 3D locally affine registration suggest that the novel framework provides a general and efficient way of fusing local rigid or affine deformations into a global invertible transformation without introducing artifacts, independently of the way local deformations are first estimated.
References
More filters
Book

Matrix computations

Gene H. Golub
Journal ArticleDOI

Matrix Iterative Analysis

Book

Digital control of dynamic systems

TL;DR: This well-respected, market-leading text discusses the use of digital computers in the real-time control of dynamic systems and thoroughly integrates MATLAB statements and problems to offer readers a complete design picture.
Journal ArticleDOI

Benchmarking optimization software with performance profiles

TL;DR: It is shown that performance profiles combine the best features of other tools for performance evaluation to create a single tool for benchmarking and comparing optimization software.
Journal ArticleDOI

Nineteen Dubious Ways to Compute the Exponential of a Matrix, Twenty-Five Years Later ⁄

TL;DR: Methods involv- ing approximation theory, dierential equations, the matrix eigenvalues, and the matrix characteristic polynomial have been proposed, indicating that some of the methods are preferable to others, but that none are completely satisfactory.
Frequently Asked Questions (18)
Q1. What are the contributions mentioned in the paper "The scaling and squaring method for the matrix exponential revisited∗" ?

The authors also give a new rounding error analysis that shows the computed Padé approximant of the scaled matrix to be highly accurate. For IEEE double precision arithmetic the best choice of degree of Padé approximant turns out to be 13, rather than the 6 or 8 used by previous authors. The authors also investigate a different scaling and squaring algorithm proposed by Najfeld and Havel that employs a Padé approximation to the function x coth ( x ). 

By combining walks of all possible lengths connecting node i to itself, and applying a weighting that decreases rapidly with the walk length, the subgraph centrality aims to capture the participation of the node in question in all subgraphs in the network. 

it is well known that rounding errors can significantly affect the scaling and squaring method, because the squaring phase can suffer from severe numerical cancellation. 

Of the available methods the most developed are Krylov methods, and these require the evaluation of eH for a much smaller Hessenberg matrix H related to A, for which the scaling and squaring method is well suited. 

Software implementing the algorithm described here is available in the function expm in MATLAB (Version 7.2, R2006a, onwards), the function MatrixExp in Mathematica (Version 5.1 onwards), and routine F01ECF in the NAG Library (from Mark 22). 

Since the curve for expm lies below all the other curves, expm is the least accurate method on this set of test matrices, as measured by the performance profile. 

To obtain rm the authors solve a multiple right-hand side linear system with qm(A) as coefficient matrix, so to be sure that this system is solved accurately the authors need to check that qm(A) is well conditioned. 

A weakness of the scaling and squaring method that was first pointed out by Kenney and Laub [24] is that a choice of scaling based on ‖A‖ sometimes produces a much stronger scaling than is necessary in order to achieve the desired accuracy—with potentially detrimental consequences for numerical stability. 

which can be evaluated with m + 1 matrix multiplications by forming A2, A4, . . . , A2m. Thenq2m(A) = U − V is available at no extra cost. 

By using standard error analysis techniques it is possible to derive a forward error bound for the scaling and squaring method, as has been done by Ward [38]. 

One of the first references to emphasize the important role of the matrix exponential in solving differential equations was the 1938 book Elementary Matrices and Some Applications to Dynamics and Differential Equations [13] by Frazer, Duncan, and Collar, who were in the Aerodynamics Department at the National Physical Laboratory in England. 

The effect of rounding errors on the final squaring phase remains an open question, but in their experiments the overall algorithm has performed in a numerically stable way throughout. 

The exponential of a matrix was first introduced by Laguerre [26] in 1867, who gave the now standard power series definition: for A ∈ Cn×n,(0.1) eA = The author+A+ A22! +A33! + · · · . 

For α = 1, the Exp(13) curve is the highest: it intersects the y-axis at p = 0.52, which means that this method has the smallest error in 52% of the examples—more often than any other method. 

it is designed to provide an explicit and easily computable error bound, and the resulting bound is far from being sharp. 

Algorithm 2.3 uses one to three fewer squarings than the algorithms with which the authors have compared it, and hence it has a potential advantage in accuracy. 

MATRIX EXPONENTIAL REVISITED 753Theorem 2.1 is a backward error result: it interprets the truncation errors in the Padé approximant as equivalent to a perturbation in the original matrix A. (The result holds, in fact, for any rational approximation rm, as the authors have not yet used specific properties of a Padé approximant.) 

The authors note that in padm, pm and qm are evaluated in increasing order of norms of the terms, as in Algorithm 2.3, whereas expm uses the reverse ordering.