What are the contributions mentioned in the paper "The scaling and squaring method for the matrix exponential revisited∗" ?

The authors also give a new rounding error analysis that shows the computed Padé approximant of the scaled matrix to be highly accurate. For IEEE double precision arithmetic the best choice of degree of Padé approximant turns out to be 13, rather than the 6 or 8 used by previous authors. The authors also investigate a different scaling and squaring algorithm proposed by Najfeld and Havel that employs a Padé approximation to the function x coth ( x ).

How can the squaring phase be affected?

it is well known that rounding errors can significantly affect the scaling and squaring method, because the squaring phase can suffer from severe numerical cancellation.

What is the developed method for eH?

Of the available methods the most developed are Krylov methods, and these require the evaluation of eH for a much smaller Hessenberg matrix H related to A, for which the scaling and squaring method is well suited.

What is the software implementing the algorithm described here?

Software implementing the algorithm described here is available in the function expm in MATLAB (Version 7.2, R2006a, onwards), the function MatrixExp in Mathematica (Version 5.1 onwards), and routine F01ECF in the NAG Library (from Mark 22).

What is the accurate method on this set of test problems?

Since the curve for expm lies below all the other curves, expm is the least accurate method on this set of test matrices, as measured by the performance profile.

What is the effect of rounding errors on the evaluation of qm(A)?

To obtain rm the authors solve a multiple right-hand side linear system with qm(A) as coefficient matrix, so to be sure that this system is solved accurately the authors need to check that qm(A) is well conditioned.

What is the main problem of the scaling and squaring method?

A weakness of the scaling and squaring method that was first pointed out by Kenney and Laub [24] is that a choice of scaling based on ‖A‖ sometimes produces a much stronger scaling than is necessary in order to achieve the desired accuracy—with potentially detrimental consequences for numerical stability.

What is the cost of evaluating rm(A)?

which can be evaluated with m + 1 matrix multiplications by forming A2, A4, . . . , A2m. Thenq2m(A) = U − V is available at no extra cost.

How can the authors derive a forward error bound for the scaling and squaring method?

By using standard error analysis techniques it is possible to derive a forward error bound for the scaling and squaring method, as has been done by Ward [38].

what is the effect of rounding errors on the final squaring phase?

The effect of rounding errors on the final squaring phase remains an open question, but in their experiments the overall algorithm has performed in a numerically stable way throughout.

What is the common error in the Exp(13) curve?

For α = 1, the Exp(13) curve is the highest: it intersects the y-axis at p = 0.52, which means that this method has the smallest error in 52% of the examples—more often than any other method.

What is the reason for the resulting bound?

it is designed to provide an explicit and easily computable error bound, and the resulting bound is far from being sharp.

How many squarings does Algorithm 2.3 require?

Algorithm 2.3 uses one to three fewer squarings than the algorithms with which the authors have compared it, and hence it has a potential advantage in accuracy.

what is the truncation error in the padé approximant?

MATRIX EXPONENTIAL REVISITED 753Theorem 2.1 is a backward error result: it interprets the truncation errors in the Padé approximant as equivalent to a perturbation in the original matrix A. (The result holds, in fact, for any rational approximation rm, as the authors have not yet used specific properties of a Padé approximant.)

What is the difference between expm and padm?

The authors note that in padm, pm and qm are evaluated in increasing order of norms of the terms, as in Algorithm 2.3, whereas expm uses the reverse ordering.

(Open Access) The Scaling and Squaring Method for the Matrix Exponential Revisited (2009) | Nicholas J. Higham

The Scaling and Squaring Method for the Matrix

Exponential Revisited

Higham, Nicholas J.

2009

MIMS EPrint: 2009.86

Manchester Institute for Mathematical Sciences

School of Mathematics

The University of Manchester

Reports available from: http://eprints.maths.manchester.ac.uk/

And by contacting: The MIMS Secretary

School of Mathematics

The University of Manchester

Manchester, M13 9PL, UK

ISSN 1749-9097

The calculation of the matrix exponential e

maybeoneofthebestknownmatrix

problems in numerical computation. It achieved folk status in our community from the

paper by Moler and Van Loan, “Nineteen Dubious Ways to Compute the Exponential

of a Matrix,” published in this journal in 1978 (and revisited in this journal in 2003).

The matrix exponential is utilized in a wide variety of numerical methods for solving

differential equations and many other areas.

It is somewhat amazing given the long history and extensive study of the matrix

exponential problem that one can improve upon the best existing methods in terms of

both accuracy and efﬁciency, but that is what the SIGEST selection in this issue does.

“The Scaling and Squaring Method for the Matrix Exponential Revisited” by N. Higham,

originally published in the SIAM Journal on Matrix Analysis and Applications in 2005, applies

a new backward error analysis to the commonly used scaling and squaring method, as

well as a new rounding error analysis of the Pad´e approximant of the scaled matrix.

The analysis shows, and the accompanying experimental results verify, that a Pad´e

approximant of a higher order than currently used actually results in a more accurate

and efﬁcient algorithm, due to the need to perform fewer matrix multiplications and

fewer squarings.

SIGEST papers are expected to combine important research, broad applicability,

and excellent exposition. In addition to the ﬁrst two qualities that are evident from

the comments above, it is no surprise that this paper exhibits the latter, since the

author literally wrote the book—that is, the popular SIAM book, Handbook of Writing

for the Mathematical Sciences. The paper is a true pleasure to read, and the author has

added an extensive preamble for the SIGEST version that makes the topic even more

accessible to new audiences as well as commenting upon subsequent work.

This paper should become an excellent choice in teaching numerical analysis as

well as essential reading for researchers in numerical linear algebra. Its combination

of practical algorithmic issues such as scaling, numerical anal ysis techniques such as

backward error analysis, and a carefully conducted and presented computational study

make it a true exemplar for our ﬁeld. One question remains, however, after all these

years: should we still assert that all the methods for calculating the matrix exponential

are dubious?

The Editors

745

SIAM REVIEW



2009 Society for Industrial and Applied Mathematics

Vol. 51, No. 4, pp. 747–764

The Scaling and Squaring

Method for the Matrix

Exponential Revisited

∗

Nicholas J. Higham

†

Abstract. The scaling and squaring method is the most widely used method for computing the

matrix exponential, not least because it is the method implemented in the MATLAB

function expm. The method scales the matrix by a power of 2 to reduce the norm to

order 1, computes a Pad´e approximant to the matrix exponential, and then repeatedly

squares to undo the eﬀect of the scaling. We give a new backward error analysis of the

method (in exact arithmetic) that employs sharp bounds for the truncation errors and

leads to an implementation of essentially optimal eﬃciency. We also give a new rounding

error analysis that shows the computed Pad´e approximant of the scaled matrix to be

highly accurate. For IEEE double precision arithmetic the best choice of degree of Pad´e

approximant turns out to be 13, rather than the 6 or 8 used by previous authors. Our

implementation of the scaling and squaring method always requires at least two fewer

matrix multiplications than the expm function in MATLAB 7.0 when the matrix norm

exceeds 1, which can amount to a 37% saving in the number of multiplications, and it

is typically more accurate, owing to the fewer required squarings. We also investigate

a diﬀerent scaling and squaring algorithm proposed by Najfeld and Havel that employs

aPad´e approximation to the function x coth(x). This method is found to be essentially a

variation of the standard one with weaker supporting error analysis.

Key words. matrix function, matrix exponential, Pad´e approximation, matrix polynomial evaluation,

scaling and squaring method, MATLAB, expm, backward error analysis, performance

proﬁle

AMS subject classiﬁcation. 65F30

DOI. 10.1137/090768539

Preamble to SIGEST Article. I remember, as a graduate student in the early

1980s, queuing up in the university library to photocopy Moler and Van Loan’s “Nine-

teen Dubious Ways to Compute the Exponential of a Matrix” [28]. I was attracted by

the intriguing title, having already had my interest in the exponential piqued by Van

Loan’s earlier notes [35]. The paper more than lived up to its title, and it remains

for me one of the classics in numerical analysis. When I started to write Functions of

Matrices [21] in 2003 the chapter on the exponential was one of the ﬁrst I attempted.

As I drafted a section on the scaling and squaring method, for which the basic algo-

rithm had remained unchanged through to the 2003 update of Moler and Van Loan’s

paper [29], I had some diﬃculty in justifying the choice of parameters used in existing

∗

Published electronically November 6, 2009. This paper originally appeared in SIAM Journal on

Matrix Analysis and Applications, Volume 26, Number 4, 2005, pages 1179–1193. This work was

supported by Engineering and Physical Sciences Research Council grant GR/T08739 and by a Royal

Society-Wolfson Research Merit Award.

http://www.siam.org/journals/sirev/51-4/76853.html

†

School of Mathematics, The University of Manchester, Manchester, M13 9PL, UK (higham@ma.

man.ac.uk, http://www.ma.man.ac.uk/

∼

higham/).

747

748 NICHOLAS J. HIGHAM

implementations. It was through studying and reworking the analysis in [28] that I

developed the ideas that led to this SIMAX article.

But what is the matrix exponential and why is it so important? The exponential

of a matrix was ﬁrst introduced by Laguerre [26] in 1867, who gave the now standard

power series deﬁnition: for A ∈ C

n×n

(0.1) e

= I + A +

+ ··· .

One of the ﬁrst references to emphasize the important role of the matrix exponential

in solving diﬀerential equations was the 1938 book Elementary Matrices and Some

Applications to Dynamics and Diﬀerential Equations [13] by Frazer, Duncan, and

Collar, who were in the Aerodynamics Department at the National Physical Labora-

tory in England. The use of e

in the practical solution of all kinds of diﬀerential

equations is of course now ubiquitous, but other applications have come along that

have no immediate connection with diﬀerential equations. I will mention just one,

which is very recent.

Consider a network representing interactions between pairs of entities in a system.

In recent years much work has focused on identifying computable measures that quan-

tify characteristics of the network. Many measures are available in the literature, and

they are typically expressed in terms of the network’s associated undirected graph G

with n nodes. The adjacency matrix A ∈ R

n×n

of the graph has (i, j)elementequal

to 1 if nodes i and j are connected and 0 otherwise. Assume a

≡ 0, so that there are

no loops in the graph. A walk of length m between two nodes i and j is an ordered list

of nodes i, k

, k

,...,k

m−1

, j such that successive nodes in the list are connected; the

nodes need not be distinct and any of them may be i or j.Wheni = j the walk starts

and ends at the same node and is called closed. The walk is a path if all the nodes

in the walk are distinct. Assume that the graph is connected, so that a path exists

between any two distinct nodes. It is a standard fact in graph theory that the (i, j)

element of A

is the number of diﬀerent walks, if i = j,orclosedwalks,ifi = j,of

length m between nodes i and j. A variety of measures have been built by combining

diﬀerent walk lengths into a single number. Estrada and Rodr´ıguez-Vel´azquez [11]

deﬁne the subgraph centrality of node i—a measure of its “well-connectedness”—by



I + A +

+ ···



=(e

)

By combining walks of all possible lengths connecting node i to itself, and applying a

weighting that decreases rapidly with the walk length, the subgraph centrality aims

to capture the participation of the node in question in all subgraphs in the network.

The subgraph centrality has become known as the Estrada index. Based on similar

reasoning, Estrada and Hatano [8] deﬁne the communicability between nodes i and

j—a measure of how easy it is for “information” to pass from node i to node j—by



I + A +

+ ···



=(e

)

Finally, the betweenness of node r is deﬁned by Estrada, D. J. Higham, and Hatano

[10] by

(n − 1)

− (n − 1)



i,j

i=j,i=r,j=r

− e

A−E

)

MATRIX EXPONENTIAL REVISITED 749

Ta b l e 0 . 1 Some formulae for e

Power series Limit Scaling and squaring

I + A +

+ ···

lim

s→∞

(I + A/s)

A/2

)

Cauchy integral Jordan form Interpolation

2πi



(zI − A)

−1

dz Zdiag(e

−1



i=1

f[λ

,...,λ

]

i−1



j=1

(A − λ

Diﬀerential system Schur form Pad´e approximation



(t)=AY (t),Y(0) = I Qe

∗

(A)q

(A)

−1

where E

is zero except in row and column r, where it agrees with A. The betweenness

measures the relative change in communicability when node r is removed from the

network. Experiments in the papers cited above show that these three measures can

provide useful information about practically occurring networks that is not revealed

by most other measures. In this description A is symmetric, but these concepts can

be extended to directed graphs, for which the adjacency matrix is unsymmetric. Of

course, the matrix exponential owes its appearance to the choice of weights in the

sums over walk lengths. Other weights could be chosen, resulting in diﬀerent matrix

functions in the deﬁnitions; see Estrada and D. J. Higham [9].

Table 0.1, taken from [21], summarizes a variety of formulae for e

, all of which

have been tried in the literature as the basis of a numerical method. The scaling and

squaring method, which combines scaling and squaring with Pad´e approximation, has

proved to be the most popular and generally applicable method.

The idea employed in this paper of adaptively choosing the degree of the Pad´e

approximant and the amount of scaling has been taken up in subsequent work on

other matrix functions, including the cosine and sine [16], [21, sect. 12.4] and the

logarithm [21, sect. 11.5]. Related to the exponential are the ψ functions, deﬁned

explicitly as ψ

(z)=



∞

j=0

/(j + k)!, k =0, 1, 2,..., or via the recurrence

k+1

(z)=

(z) − 1/k!

,ψ

(z)=e

They feature in exponential integrators—a broad class of numerical methods for solv-

ing an ordinary diﬀerential equation initial value problem in which the linear part of

the equation is treated exactly and the remaining nonlinear part is integrated numer-

ically. The algorithm here has been adapted for computation of the ψ functions by

Koikari [25] and Skaﬂestad and Wright [33].

Al-Mohy and Higham have extended this work in two recent papers. The ﬁrst is

concerned with the Fr´echet derivative L at A in the direction E, which for a general

function f is deﬁned by the condition that f (A + E) − f(A) − L(A, E)=o(E)for

all E. For the exponential, the Fr´echet derivative has the explicit representation

L(A, E)=



A(1−s)

ds.

Al-Mohy and Higham [1] show how the algorithm in this paper can be “diﬀerentiated”

in order to obtain an algorithm that simultaneously computes e

and L(A, E)ata

cost about three times that of computing e

alone.

The Scaling and Squaring Method for the Matrix Exponential Revisited

Figures

Citations

Multiconfiguration Self-Consistent Field and Multireference Configuration Interaction Methods and Applications

Computing the Action of the Matrix Exponential, with an Application to Exponential Integrators

Functions of matrices

A New Scaling and Squaring Algorithm for the Matrix Exponential

A Fast and Log-Euclidean Polyaffine Framework for Locally Linear Registration

References

Matrix computations

Matrix Iterative Analysis

Digital control of dynamic systems

Benchmarking optimization software with performance profiles

Nineteen Dubious Ways to Compute the Exponential of a Matrix, Twenty-Five Years Later ⁄

Related Papers (5)

Nineteen Dubious Ways to Compute the Exponential of a Matrix, Twenty-Five Years Later ⁄

Functions of Matrices: Theory and Computation

Nineteen Dubious Ways to Compute the Exponential of a Matrix

On Krylov Subspace Approximations to the Matrix Exponential Operator

Expokit: a software package for computing matrix exponentials

Frequently Asked Questions (18)

Q1. What are the contributions mentioned in the paper "The scaling and squaring method for the matrix exponential revisited∗" ?

Q2. What is the purpose of the subgraph centrality?

Q3. How can the squaring phase be affected?

Q4. What is the developed method for eH?

Q5. What is the software implementing the algorithm described here?

Q6. What is the accurate method on this set of test problems?

Q7. What is the effect of rounding errors on the evaluation of qm(A)?

Q8. What is the main problem of the scaling and squaring method?

Q9. What is the cost of evaluating rm(A)?

Q10. How can the authors derive a forward error bound for the scaling and squaring method?

Q11. What was the first reference to the matrix exponential in solving differential equations?

Q12. what is the effect of rounding errors on the final squaring phase?

Q13. What was the first definition of the exponential of a matrix?

Q14. What is the common error in the Exp(13) curve?

Q15. What is the reason for the resulting bound?

Q16. How many squarings does Algorithm 2.3 require?

Q17. what is the truncation error in the padé approximant?

Q18. What is the difference between expm and padm?