scispace - formally typeset
Open AccessProceedings ArticleDOI

Algorithms and design for a second-order automatic differentiation module

TLDR
This article describes approaches to computing second-order derivatives with automatic differentiation (AD) based on the forward mode and the propagation of univariate Taylor series and the underlying infrastructure used to create a language-independent translation tool.
Abstract
This article describes approaches to computing second-order derivatives with automatic differentiation (AD) based on the forward mode and the propagation of univariate Taylor series. Performance results are given that show the speedup possible with these techniques relative to existing approaches. The authors also describe a new source transformation AD module for computing second-order derivatives of C and Fortran codes and the underlying infrastructure used to create a language-independent translation tool.

read more

Content maybe subject to copyright    Report

c
1
&O
MF-
9
74
7
J?/-
-1
~
-
~ ~~
Algorithms and Design for
a
Second-Order Automatic Differentiation Module*
Jason Abate+
E!
1
1997
Texas Institute
for
Computational and Applied Mathematics
University
of
Texas at Austin
abateQticam.utexas. edu
http://www.ticam.utexas.edu/"abate
Christian Bischof and Lucas Roh
Mathematics and Computer Science Division
Argonne National Laboratory
{bischof ,roh}@mcs
.an1
.gov
;-e
E
Alan Carle
Center
for
Research on Parallel Computation
Rice University
carleQcs.rice.edu
http://www.cs.rice.edu/"carle
Abstract
This article describes approaches
to
computing second-order
derivatives with automatic differentiation (AD) based on
the forward mode and the propagation of univariate Tay-
lor series. Performance results are given that show the
speedup possible with these techniques relative to existing
approaches. We
also
describe a new source transformation
AD module for computing second-order derivatives of C and
Fortran codes and the underlying infrastructure used to cre-
ate
a
language-independent translation tool.
1
Introduction
Automatic differentiation (AD) provides an efficient and ac-
curate method to obtain derivatives for use in sensitivity
analysis, parameter identification and optimization. Cur-
rent tools are targeted primarily
at
computing first-order
derivatives, namely gradients and Jacobians. Prior to AD,
derivative values were obtained through divided difference
methods, symbolic manipulation or hand-coding,
all
of which
have drawbacks when compared with AD (see
[3]
for a dis-
'This work was supported by the Mathematical, Information, and
Computational Sciences Division subprogram
of
the Office
of
Compu-
tational and Technology Research,
U.S.
Department
of
Energy, under
Contract W-31-109-Eng-38; and by the National Science Foundation,
through the Center for Research
on
Parallel Computation, under Co-
operative Agreement
No.
CCR-9120008.
tThis work
was
partially performed while the author
was
a
re-
search associate at Argonne National Laboratory.
ENT
IS
UNLIMITED
xe
1
a
cussion). Accurate second-order derivatives are even harder
to obtain than first-order ones;
it
is possible to end up
with no accurate digits in the derivative value when using
a
divided-difference scheme.
One can repeatedly apply first-order derivative tools
to
obtain higher-order derivatives, but this approach
is
compli-
cated and ignores structural information about higher-order
derivatives such
as
symmetry. Additionally, in cases where
a full Hessian,
H,
is
not required, such
as
with Hessian-
vector products
(H
+
V)
and projected Hessians
(VT
.
H.
W)
where
V
and
W
are matrices with many fewer columns than
rows, it is possible to compute the desired values much more
efficiently than with the repeated differentiation approach.
There is no "best" approach to computing Hessians; the
most efficient approach to computing second-order deriva-
tives depends on the specifics of the code and, to a lesser ex-
tent, the target platform on which the code will be run
[4,
81.
In
all
cases, however, derivative values computed by AD are
computed to machine precision, without the roundoff errors
inherent in divided difference techniques.
AD via source transformation provides great flexibility
in implementing sophisticated algorithms that exploit the
associativity
of
the chain
rule
of calculus
(see
[6]
for
a
dis-
cussion).
Unfortunately, the development of robust source
transformation tools is a substantial effort. ADIFOR
[3]
and ADIC
[6],
source transformation tools for Fortran and
C respectively, both implement relatively simple algorithms
for propagating derivatives. Most of the development time
so
far has concentrated on producing tools that handle the
full range of the language, rather than on developing more
efficient algorithms to propagate derivatives.
To make it easier to experiment with algorithmic tech-
niques, we have developed AIF, the Automatic Differenti-
ation Intermediate Form. AIF acts
as
the glue layer be-
tween
a
language-specific front-end and
a
largely language-
independent transformation module that implements AD
The submitted manuscript has been authored
by a contractor of the
U.
S.
Government
under contract No. W-31-104ENG-38.
Accordingly, the
U.
S.
Government retains
a
nonexclusive. royalty-free license to publish
or reproduce the published form of this
contribution, or allow others to do
so,
for
U.
S.
Government
purposes.

DISCLAIMER
This report was prepared
as
an account of work spo~ored by an agency
of
the United
States Government. Neither the United States Government nor any agency thereof, nor
any
of
their employees, make any warranty, express
or
implied,
or
assumes
any
legal
iiabfii-
ty
or
responsibility for the
accuracy,
completeness,
or
usefulness
of
any
information, appa-
ratus,
product,
or
process
disclosed,
or
represents that its
use
would not infringe privately
owned
rights.
Reference herein
to
any specific commercial product, process,
or
service
by
trade
name,
trademark, manufacturer,
or
otherwise does not necessariiy constitute
or
imply
its
endorsement,
recommendation,
or
favoring
by
the United States Government
or
any agency thereof. The views and opinions of authors expressed herein do not necessar-
ily state
or
reflect those of
the
United States Government
or
any agency thereof.

Portions
of
this
document
mrry
be
iiicglble
in
tlectronic
image
products
Images
are
pduced
fmm
the
best
available
original
dOCRIIlalL

transformations at a high level of abstraction.
We have implemented an AIF-based module for comput-
ing second-order derivatives. The Hessian module,
as
we call
it, implements several different algorithms and selectively
chooses them in a fashion that is determined by the code
presented
to
it.
However, this context-sensitive logic, which
is based on machine-specific performance models, is trans-
parent to the AD front-end. The Hessian module currently
interfaces with ADIFOR and ADIC. First experimental re-
sults show that the resulting codes outperform the recur-
sive application of first-order tools by a factor of two when
computing full, dense Hessians and are able to compute full,
sparse Hessians and partial Hessians
at
significantly reduced
expense.
Section
2
outlines the two derivative propagation strate-
gies that we have explored for Hessians, including cost es-
timates for computing various types of Hessians. Section
3
shows the performance of the various approaches for
a
sam-
ple code, and Section
4
describes the infrastructure that was
used to develop the Hessian augmentation tool. Lastly, we
summarize our results and discuss future work.
2
2.1
Forward Mode Strategies
The standard forward mode of automatic differentiation can
easily be expanded to second order to compute Hessians.
For
z
=
f
(2,
y),
we can compute
Vz
and
V2z,
the gradient and
Hessian of
z
respectively, as
Strategies for Computing Second Derivatives
vz
=
v2z
=
(1)
(2)
dZ
az
-vx
+
-vy
dX dY
dX dY
-v2x
dZ
+
-v2y
az
+
-(Vx
a2
z
*
VxT)
+
-(Vy.
d2
Z
+
-(VX.
d2
z
VyT)
822
dY2
dxdy
VyT
+
vy
.
vx
T
)
This approach is conceptually simple and produces efficient
results for small numbers
of
independent variables.
For
n
independent variables, gradients are stored in arrays of
length
n
and Hessians, because of their symmetric nature,
are stored using the LAPACK
[l]
packed symmetric scheme,
which reduces the storage requirements from
n2
to
fn(n+l).
The cost of computing a full Hessian using the forward mode
is
O(n2)
relative to the cost of computing the original func-
tion.
Many algorithms do not need full knowledge of the Hes-
sian but require only
a
Hessian-vector product,
H.
V,
or
a
projected Hessian,
VT.
H.
W,
where
V
and
W
are matrices
with
nv
and
nw
columns, respectively. Rather than com-
puting the full Hessian at a cost of
O(n2)
followed by one
or
two matrix multiplications, we can multiply Equation
(2)
on
the left and/or right by
VT
and
W,
respectively, to produce
new propagation rules. By modifying the derivative objects
that get propagated, we can perform the required computa-
tions at a much lower cost. These costs are summarized in
Table
1.
In the case of large Hessians and relatively small
values of
nv
or
nw,
the savings can be significant. Addi-
tionally, the coloring techniques that have been applied to
structured Jacobians
[a]
can be applied to Hessians for a
significant savings.
'The cost
of
the symmetric and unsymmetric projected Hessians
Table
1:
Summary of Hessian costs using the forward mode
relative to the cost of computing the original function.
n
is
the number of independent variables,
nv
and
nw
are the
number of columns of
V
and
W,
respectively.
2.2
Taylor Series Strategies
As an alternative to the forward mode propagation of gra-
dients and Hessians, we can propagate two-term univariate
Taylor series expansions about each of the nonzero direc-
tions in the Hessian
[4].
To compute derivatives at
a
point
xo
in the direction
u,
we consider
f
as
a
scalar function
f
(xo
+
tu)
of
t.
Its Taylor series, up to second order, is
=
f
+
ftt
+
fttt2
(3)
where
ft
and
ftt
are the first and second Taylor coefficients.
The uniqueness of the Taylor series implies that for
u
=
ei,
the ith basis vector, we obtain
(4)
(5)
That
is,
we computed
a
scaled version of the ith diagonal
element in the Hessian. Similarly, to compute the
(i,j)
off-
diagonal entry in the Hessian, we set
u
=
e;
+
ej.
The
uniqueness of Taylor expansion implies
(7)
If Taylor expansions are also computed for the
i
and
j
di-
agonal elements, the off-diagonal Hessian entries can be re-
covered by interpolation. As with the forward mode, simple
rules specify the propagation of the expansions for all arith-
metic and intrinsic operators
[lo,
131.
To compute
a
full
gradient of length
n
and
t
Hessian entries above the diago-
nal, the cost of the Taylor series mode is
O(n+k).
If the full
gradient is not needed, this cost can be reduced somewhat.
The Taylor series approach can compute any set of Hes-
sian entries without computing the entire Hessian. This
technique
is
ideal for sparse Hessians when the sparsity pat-
tern
is
known in advance and
for
situations where only cer-
tain elements (such
as
the diagonal entries) are desired. Ad-
ditionally, each Taylor series expansion
is
independent. This
(VT
V2f.
V
and
VT
.
V2f
.
W)
are of the same order, but due
to symmetry, the storage and computation costs of
VT
.
V2f
.
V
are
roughly half of the costs of
VT
.
V2f.
W.
2

allows very large Hessians, which can easily overwhelm the
available memory, to be computed in
a
stripmined fashion by
partitioning the expansion directions and computing them
independently with multiple sweeps through the code in a
fashion that
is
similar
to
the stripmining technique described
in
[SI.
2.3
Preaccumulation
The associativity of the chain rule allows derivative prop-
agation to be performed
at
arbitrary levels of abstraction.
At the simplest, the forward mode works at the scope
of
a
single binary operation. By expanding the scope to a higher
level, such as an assignment statement,
a
loop body
or
a
subroutine,
it
is
possible to decrease the amount of work
necessary to propagate derivatives, as shown in
[7,
91.
A preaccumulation technique we employ in
our
work
computes the gradient and Hessian of the variable on the
left side of the assignment statement in two steps. Assume
that for the statement
z
=
f(x1,
x2,.
.
.
XN),
we have
Vxi
and
V2xz,
i
=
1,.
. .
N,
the global gradient and Hessian of
xi
and that we wish to compute,
for
z,
the global gradient
Vz
and the global Hessian
V2z.
Step
1:
Preaccumulation
of
local derivatives
The variables on the right side of the statement are
considered to be independent, and we compute ‘local”
derivative objects,
e
and
G,
z,j
=
1,.
.
.
N,
with
respect to the right-hand side variables. This can be
done using either the forward or Taylor series mode.
We accumulate the global gradient and Hessian of
z.
When using the forward mode for global propagation
of derivatives, this is done as follows:
82.2
.
Step
2:
Accumulation
of
global derivatives
N
vz
=
%VXi
i=l
The rules for Taylor series expansions can be general-
ized in
a
similar fashion.
Gradient codes produced by ADIFOR and ADIC cur-
rently employ statement-level preaccumulation
for
all
as-
signment statements more complicated than a single binary
operation. Experiments with similar “global” preaccumula-
tion strategies for computing Hessians have produced incon-
sistent results across various codes and machines.
No
global
strategy outperformed all other strategies on
all
test codes
and all machines.
Thus, we have developed an adaptive strategy where the
costs of using and not using statement level preaccumulation
are computed and compared when the derivative code is gen-
erated. These costs are estimated based on machine-specific
performance models of the actual propagation code. Thus,
the Hessian module decides which strategy to use based on
the structure of a particular computation. We believe that
such context-sensitive strategies are crucial for future im-
provement of AD tools.
3
-
Forward
Adaptive
Forward
Sparse
Taylor
Series
i
1.I)I.I.’.
0
100
200
300
400
HessianlFunction Execution Time Ratio
Figure
1:
Hessian performance of the Shubin Hessian test
code with
20
independent variables.
3
Hessian Performance on a
CFD
Code
Hessian code was generated for a steady shock tracking code
provided by Greg Shubin of the Boeing Company [14].
Be-
cause of memory constraints,
a
20
x
20
section
of
the full
190
x
190
Hessian was computed for each of the 190 de-
pendent variables. The section of the Hessian being studied
exhibits some sparsity, with
72
nonzero entries on
or
above
the diagonal.
Hessian codes were generated using four different strate-
gies. Figure
1
shows the ratio of the Hessian computation
time to the function computation time, while Figure
2
shows
the memory requirements of the augmented Hessian codes
on
a
Sun UltraSparc
1.
The original code required
8.0
x
seconds of execution time and used
360
kB of memory. The
first strategy, labeled “Twice ADIFOR”, was generated by
first producing a gradient code with ADIFOR
2.0,
and then
running the gradient code through ADIFOR again. The
“Forward” case implements the forward mode on a binary
operation level. The “Adaptive Forward” code uses the
for-
ward mode, with preaccumulation
at
a statement level where
deemed appropriate. The ‘Sparse Taylor Series” mode uses
the Taylor series mode to compute just the entries which are
known to be zero.
Clearly, the “Twice ADIFOR” scheme can be easily
beaten by exploiting the symmetry of the Hessian, both in
terms of execution speed and memory usage,
as
is done in
both the “Forward” and “Adaptive Forward” codes. This
result
also
shows that the use
of
an adaptive preaccumu-
lation strategy can outperform the operation-level forward
mode. Improvements in the strategy used to decide when to
use preaccumulation should further increase the efficiency
of the adaptive scheme. Finally, the “Sparse Taylor Series”
code shows that, if the sparsity structure of
a
problem is
known,
it
can be exploited for additional savings.
4
The algorithms of automatic differentiation are, for the most
part, independent of the language to which they are applied.
For example, the Fortran assignment statement
Language and
Tool
Independence
with
AIF
z
=
2.0
4.
x
*
y

Citations
More filters
Book

Automatic Differentiation Of Algorithms: From Simulation To Optimization

TL;DR: Automatic Differentiation of Algorithms provides a comprehensive and authoritative survey of all recent developments, new techniques, and tools for AD use.
Book ChapterDOI

Modern map methods in particle beam physics

TL;DR: In this article, the authors present differential algebraic techniques for differential algebraic geometry and differential algebraic techniques are used to calculate properties of fields and Spectrometers.
Journal ArticleDOI

Second-Order Information in Data Assimilation*

TL;DR: In this article, a comprehensive review of issues related to second-order analysis of the problem of variational data assimilation is presented along with many important issues closely connected to it.
Proceedings ArticleDOI

Implementation of automatic differentiation tools

TL;DR: A simple component architecture for developing tools for automatic differentiation and other mathematically oriented semantic transformations of scientific software is described, and how access to compiler optimization techniques can enable more efficient derivative augmentation is discussed.
Journal ArticleDOI

On the implementation of automatic differentiation tools

TL;DR: The forward and reverse modes of automatic differentiation are described and the challenges in the implementation are described, with a focus on tools based on source transformation.
References
More filters
Book

Automatic Differentiation: Techniques and Applications

Louis B. Rall
TL;DR: This paper presents a procedure for automatic computation of gradients, Jacobians, Hessians, and applications to optimization in the form of a discrete-time model.

Automatic differentiation: techniques and applications

Louis B. Rall
TL;DR: In this article, the authors present software for automatic differentiation and generation of Taylor coefficients, as well as automatic computation of gradients, Jacobians, Hessians, and applications to optimization.
Journal ArticleDOI

Adifor 2.0: automatic differentiation of Fortran 77 programs

TL;DR: This paper considers how Adifor 2.0, which won the 1995 Wilkinson Prize for Numerical Software, can automatically differentiate complicated Fortran code much faster than a programmer can do it by hand.
Journal ArticleDOI

ADIC: an extensible automatic differentiation tool for ANSI-C

TL;DR: ADIC (Automatic Differentiation of C), a new AD tool for ANSI-C programs, is introduced and a modular design that provides a foundation for both rapid prototyping of better AD algorithms and their sharing across AD tools for different languages is described.

The ADIFOR 2.0 system for the automatic differentiation of Fortran 77 programs

TL;DR: The ADIFOR 2.0 system provides automatic differentiation of Fortran 77 programs for first-order derivatives and has been successfully applied to a 60,000-line code believed to be a new record in automatic differentiation.
Related Papers (5)