Algorithm 799: revolve: an implementation of checkpointing for the reverse or adjoint mode of computational differentiation

doi:10.1145/347837.347846

Home
/
Papers
/
Algorithm 799: revolve: an implementation of checkpointing for the reverse or adjoint mode of computational differentiation

Journal Article•DOI•

Algorithm 799: revolve: an implementation of checkpointing for the reverse or adjoint mode of computational differentiation

Andreas Griewank¹, Andrea Walther¹•Institutions (1)

Dresden University of Technology¹

01 Mar 2000-ACM Transactions on Mathematical Software (ACM)-Vol. 26, Iss: 1, pp 19-45

TL;DR: This article presents the function revolve, which generates checkpointing schedules that are provably optimal with regard to a primary and a secondary criterion and is intended to be used as an explicit “controller” for running a time-dependent applications program.

read less

Abstract: In its basic form, the reverse mode of computational differentiation yields the gradient of a scalar-valued function at a cost that is a small multiple of the computational work needed to evaluate the function itself. However, the corresponding memory requirement is proportional to the run-time of the evaluation program. Therefore, the practical applicability of the reverse mode in its original formulation is limited despite the availability of ever larger memory systems. This observation leads to the development of checkpointing schedules to reduce the storage requirements. This article presents the function revolve, which generates checkpointing schedules that are provably optimal with regard to a primary and a secondary criterion. This routine is intended to be used as an explicit “controller” for running a time-dependent applications program.

...read moreread less

Citations

PDF

Open Access

More filters

Posted Content•

GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism

[...]

Yanping Huang¹, Youlong Cheng¹, Ankur Bapna¹, Orhan Firat¹, Mia Xu Chen¹, Dehao Chen¹, HyoukJoong Lee¹, Jiquan Ngiam¹, Quoc V. Le¹, Yonghui Wu¹, Zhifeng Chen¹ - Show less +7 more•Institutions (1)

Google¹

16 Nov 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: GPipe is introduced, a pipeline parallelism library that allows scaling any network that can be expressed as a sequence of layers by pipelining different sub-sequences of layers on separate accelerators, resulting in almost linear speedup when a model is partitioned across multiple accelerators.

...read moreread less

Abstract: Scaling up deep neural network capacity has been known as an effective approach to improving model quality for several different machine learning tasks. In many cases, increasing model capacity beyond the memory limit of a single accelerator has required developing special algorithms or infrastructure. These solutions are often architecture-specific and do not transfer to other tasks. To address the need for efficient and task-independent model parallelism, we introduce GPipe, a pipeline parallelism library that allows scaling any network that can be expressed as a sequence of layers. By pipelining different sub-sequences of layers on separate accelerators, GPipe provides the flexibility of scaling a variety of different networks to gigantic sizes efficiently. Moreover, GPipe utilizes a novel batch-splitting pipelining algorithm, resulting in almost linear speedup when a model is partitioned across multiple accelerators. We demonstrate the advantages of GPipe by training large-scale neural networks on two different tasks with distinct network architectures: (i) Image Classification: We train a 557-million-parameter AmoebaNet model and attain a top-1 accuracy of 84.4% on ImageNet-2012, (ii) Multilingual Neural Machine Translation: We train a single 6-billion-parameter, 128-layer Transformer model on a corpus spanning over 100 languages and achieve better quality than all bilingual models.

...read moreread less

688 citations

Cites background from "Algorithm 799: revolve: an implemen..."

...GPipe allows scaling arbitrary deep neural network architectures beyond the memory limitations of a single accelerator by partitioning the model across different accelerators and supporting re-materialization on every accelerator [13, 14]....
[...]

Posted Content•

Training Deep Nets with Sublinear Memory Cost.

[...]

Tianqi Chen, Bing Xu, Chiyuan Zhang, Carlos Guestrin

21 Apr 2016-arXiv: Learning

TL;DR: This work designs an algorithm that costs O( √ n) memory to train a n layer network, with only the computational cost of an extra forward pass per mini-batch, and shows that it is possible to trade computation for memory giving a more memory efficient training algorithm with a little extra computation cost.

...read moreread less

Abstract: We propose a systematic approach to reduce the memory consumption of deep neural network training. Specifically, we design an algorithm that costs O(sqrt(n)) memory to train a n layer network, with only the computational cost of an extra forward pass per mini-batch. As many of the state-of-the-art models hit the upper bound of the GPU memory, our algorithm allows deeper and more complex models to be explored, and helps advance the innovations in deep learning research. We focus on reducing the memory cost to store the intermediate feature maps and gradients during training. Computation graph analysis is used for automatic in-place operation and memory sharing optimizations. We show that it is possible to trade computation for memory - giving a more memory efficient training algorithm with a little extra computation cost. In the extreme case, our analysis also shows that the memory consumption can be reduced to O(log n) with as little as O(n log n) extra cost for forward computation. Our experiments show that we can reduce the memory cost of a 1,000-layer deep residual network from 48G to 7G with only 30 percent additional running time cost on ImageNet problems. Similarly, significant memory cost reduction is observed in training complex recurrent neural networks on very long sequences.

...read moreread less

673 citations

Cites background from "Algorithm 799: revolve: an implemen..."

...long standing topic in systems research. Although not widely known, the idea of dropping intermediate results is also known as gradient checkpointing technique in automatic differentiation literature [9]. We bring this idea to neural network gradient graph construction for general deep neural networks. Through the discussion with our colleagues [19], we know that the idea of dropping computation has ...
[...]

Book•

Full Seismic Waveform Modelling and Inversion

[...]

Andreas Fichtner

23 Jul 2011

TL;DR: In this article, the authors proposed a numerical solution of the Elastic Wave Equation and computing sensitivity kernel for full waveform tomography for upper-mantle structure in Australasian Region.

...read moreread less

Abstract: Introduction.- Numerical Solution of the Elastic Wave Equation.- Computing Sensitivity Kernels.- Seismological Data Functionals and their Associated Adjoint Sources.- Iterative Optimisation.- Full Waveform Tomography for Upper-mantle Structure in Australasian Region.- A Comparative Study of Local-scale full Waveform Tomographies.- Source Staking and Data Reduction in Global full Waveform Tomography.

...read moreread less

442 citations

Posted Content•

Learning Transferable Visual Models From Natural Language Supervision

[...]

Alec Radford¹, Jong Wook Kim¹, Chris Hallacy¹, Aditya Ramesh¹, Gabriel Goh¹, Sandhini Agarwal¹, Girish Sastry¹, Amanda Askell, Pamela Mishkin¹, Jack Clark¹, Gretchen Krueger¹, Ilya Sutskever¹ - Show less +8 more•Institutions (1)

OpenAI¹

26 Feb 2021-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, a pre-training task of predicting which caption goes with which image is used to learn SOTA image representations from scratch on a dataset of 400 million (image, text) pairs collected from the internet.

...read moreread less

Abstract: State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is needed to specify any other visual concept. Learning directly from raw text about images is a promising alternative which leverages a much broader source of supervision. We demonstrate that the simple pre-training task of predicting which caption goes with which image is an efficient and scalable way to learn SOTA image representations from scratch on a dataset of 400 million (image, text) pairs collected from the internet. After pre-training, natural language is used to reference learned visual concepts (or describe new ones) enabling zero-shot transfer of the model to downstream tasks. We study the performance of this approach by benchmarking on over 30 different existing computer vision datasets, spanning tasks such as OCR, action recognition in videos, geo-localization, and many types of fine-grained object classification. The model transfers non-trivially to most tasks and is often competitive with a fully supervised baseline without the need for any dataset specific training. For instance, we match the accuracy of the original ResNet-50 on ImageNet zero-shot without needing to use any of the 1.28 million training examples it was trained on. We release our code and pre-trained model weights at this https URL.

...read moreread less

403 citations

Journal Article•DOI•

Full seismic waveform tomography for upper-mantle structure in the Australasian region using adjoint methods

[...]

Andreas Fichtner¹, Brian Kennett², Heiner Igel¹, Hans-Peter Bunge¹•Institutions (2)

Ludwig Maximilian University of Munich¹, Australian National University²

01 Dec 2009-Geophysical Journal International

TL;DR: In this paper, the authors present a full seismic waveform tomography for upper-mantle structure in the Australasian region, based on spectral-element simulations of seismic wave propagation in 3-D heterogeneous earth models.

...read moreread less

Abstract: SUMMARY We present a full seismic waveform tomography for upper-mantle structure in the Australasian region. Our method is based on spectral-element simulations of seismic wave propagation in 3-D heterogeneous earth models. The accurate solution of the forward problem ensures that waveform misfits are solely due to as yet undiscovered Earth structure and imprecise source descriptions, thus leading to more realistic tomographic images and source parameter estimates. To reduce the computational costs, we implement a long-wavelength equivalent crustal model. We quantify differences between the observed and the synthetic waveforms using time–frequency (TF) misfits. Their principal advantages are the separation of phase and amplitude misfits, the exploitation of complete waveform information and a quasi-linear relation to 3-D Earth structure. Fr´ echet kernels for the TF misfits are computed via the adjoint method. We propose a simple data compression scheme and an accuracy-adaptive time integration of the wavefields that allows us to reduce the storage requirements of the adjoint method by almost two orders of magnitude. To minimize the waveform phase misfit, we implement a pre-conditioned conjugate gradient algorithm. Amplitude information is incorporated indirectly by a restricted line search. This ensures that the cumulative envelope misfit does not increase during the inversion. An efficient pre-conditioner is found empirically through numerical experiments. It prevents the concentration of structural heterogeneity near the sources and receivers. We apply our waveform tomographic method to ≈1000 high-quality vertical-component seismograms, recorded in the Australasian region between 1993 and 2008. The waveforms comprise fundamental- and higher-mode surface and long-period S body waves in the period range from 50 to 200 s. To improve the convergence of the algorithm, we implement a 3-D initial model that contains the long-wavelength features of the Australasian region. Resolution tests indicate that our algorithm converges after around 10 iterations and that both long- and short-wavelength features in the uppermost mantle are well resolved. There is evidence for effects related to the non-linearity in the inversion procedure. After 11 iterations we fit the data waveforms acceptably well; with no significant further improvements to be expected. During the inversion the total fitted seismogram length increases by 46 per cent, providing a clear indication of the efficiency and consistency of the iterative optimization algorithm. The resulting SV -wave velocity model reveals structural features of the Australasian upper mantle with great detail. We confirm the existence of a pronounced low-velocity band along the eastern margin of the continent that can be clearly distinguished against Precambrian Australia and the microcontinental Lord Howe Rise. The transition from Precambrian to Phanerozoic Australia (the Tasman Line) appears to be sharp down to at least 200 km depth. It mostly occurs further east of where it is inferred from gravity and magnetic anomalies. Also clearly visible are the Archean and Proterozoic cratons, the northward continuation of the continent and anomalously low S-wave velocities in the upper mantle in central Australia.

...read moreread less

393 citations

Cites background from "Algorithm 799: revolve: an implemen..."

...Checkpointing algorithms (e.g. Griewank & Walther 2000; Charpentier 2001) store the regular wavefield at a smaller number of time steps, called checkpoints, and solve the forward problem from there until the current time of the adjoint calculation is reached....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103

Collapse

References

PDF

Open Access

More filters

Book•

Numerical methods for conservation laws

[...]

Randall J. LeVeque

01 Jan 1990

TL;DR: In this paper, the authors describe the derivation of conservation laws and apply them to linear systems, including the linear advection equation, the Euler equation, and the Riemann problem.

...read moreread less

Abstract: I Mathematical Theory- 1 Introduction- 11 Conservation laws- 12 Applications- 13 Mathematical difficulties- 14 Numerical difficulties- 15 Some references- 2 The Derivation of Conservation Laws- 21 Integral and differential forms- 22 Scalar equations- 23 Diffusion- 3 Scalar Conservation Laws- 31 The linear advection equation- 311 Domain of dependence- 312 Nonsmooth data- 32 Burgers' equation- 33 Shock formation- 34 Weak solutions- 35 The Riemann Problem- 36 Shock speed- 37 Manipulating conservation laws- 38 Entropy conditions- 381 Entropy functions- 4 Some Scalar Examples- 41 Traffic flow- 411 Characteristics and "sound speed"- 42 Two phase flow- 5 Some Nonlinear Systems- 51 The Euler equations- 511 Ideal gas- 512 Entropy- 52 Isentropic flow- 53 Isothermal flow- 54 The shallow water equations- 6 Linear Hyperbolic Systems 58- 61 Characteristic variables- 62 Simple waves- 63 The wave equation- 64 Linearization of nonlinear systems- 641 Sound waves- 65 The Riemann Problem- 651 The phase plane- 7 Shocks and the Hugoniot Locus- 71 The Hugoniot locus- 72 Solution of the Riemann problem- 721 Riemann problems with no solution- 73 Genuine nonlinearity- 74 The Lax entropy condition- 75 Linear degeneracy- 76 The Riemann problem- 8 Rarefaction Waves and Integral Curves- 81 Integral curves- 82 Rarefaction waves- 83 General solution of the Riemann problem- 84 Shock collisions- 9 The Riemann problem for the Euler equations- 91 Contact discontinuities- 92 Solution to the Riemann problem- II Numerical Methods- 10 Numerical Methods for Linear Equations- 101 The global error and convergence- 102 Norms- 103 Local truncation error- 104 Stability- 105 The Lax Equivalence Theorem- 106 The CFL condition- 107 Upwind methods- 11 Computing Discontinuous Solutions- 111 Modified equations- 1111 First order methods and diffusion- 1112 Second order methods and dispersion- 112 Accuracy- 12 Conservative Methods for Nonlinear Problems- 121 Conservative methods- 122 Consistency- 123 Discrete conservation- 124 The Lax-Wendroff Theorem- 125 The entropy condition- 13 Godunov's Method- 131 The Courant-Isaacson-Rees method- 132 Godunov's method- 133 Linear systems- 134 The entropy condition- 135 Scalar conservation laws- 14 Approximate Riemann Solvers- 141 General theory- 1411 The entropy condition- 1412 Modified conservation laws- 142 Roe's approximate Riemann solver- 1421 The numerical flux function for Roe's solver- 1422 A sonic entropy fix- 1423 The scalar case- 1424 A Roe matrix for isothermal flow- 15 Nonlinear Stability- 151 Convergence notions- 152 Compactness- 153 Total variation stability- 154 Total variation diminishing methods- 155 Monotonicity preserving methods- 156 l1-contracting numerical methods- 157 Monotone methods- 16 High Resolution Methods- 161 Artificial Viscosity- 162 Flux-limiter methods- 1621 Linear systems- 163 Slope-limiter methods- 1631 Linear Systems- 1632 Nonlinear scalar equations- 1633 Nonlinear Systems- 17 Semi-discrete Methods- 171 Evolution equations for the cell averages- 172 Spatial accuracy- 173 Reconstruction by primitive functions- 174 ENO schemes- 18 Multidimensional Problems- 181 Semi-discrete methods- 182 Splitting methods- 183 TVD Methods- 184 Multidimensional approaches

...read moreread less

3,827 citations

Book•

Optimal Control of Systems Governed by Partial Differential Equations

[...]

Jacques-Louis Lions

03 Mar 1971

TL;DR: In this paper, the authors consider the problem of minimizing the sum of a differentiable and non-differentiable function in the context of a system governed by a Dirichlet problem.

...read moreread less

Abstract: Principal Notations.- I Minimization of Functions and Unilateral Boundary Value Problems.- 1. Minimization of Coercive Forms.- 1.1. Notation.- 1.2. The Case when ?: is Coercive.- 1.3. Characterization of the Minimizing Element. Variational Inequalities.- 1.4. Alternative Form of Variational Inequalities.- 1.5. Function J being the Sum of a Differentiable and Non-Differentiable Function.- 1.6. The Convexity Hypothesis on $$ {U_{ad}} $$.- 1.7. Orientation.- 2. A Direct Solution of Certain Variational Inequalities.- 2.1. Problem Statement.- 2.2. An Existence and Uniqueness Theorem.- 3. Examples.- 3.1. Function Spaces on ?.- 3.2. Function Spaces on ?.- 3.3. Subspaces of Hm(?).- 3.4. Examples of Boundary Value Problems.- 3.5. Unilateral Boundary Value Problems (I).- 3.6. Unilateral Boundary Value Problems (II).- 3.7. Unilateral Boundary Value Problems (III).- 3.8. Unilateral Boundary Value Problems Case of Systems.- 3.9. Elliptic Operators of Order Greater than Two.- 3.10. Non-differentiable Functionals.- 4. A Comparison Theorem.- 4.1. General Results.- 4.2. An Application.- 5. Non Coercive Forms.- 5.1. Convexity of the Set of Solutions.- 5.2. Approximation Theorem.- Notes.- II Control of Systems Governed by Elliptic Partial Differential Equations.- 1. Control of Elliptic Variational Problems.- 1.1. Problem Statement.- 1.2. First Remarks on the Control Problem.- 1.3. The Set of Inequalities Defining the Optimal Control.- 2. First Applications.- 2.1. System Governed by the Dirichlet Problem Distributed Control.- 2.2. The Case with No Constraints.- 2.3. System Governed by a Neumann Problem Distributed Control.- 2.4. System Governed by a Neumann Problem Boundary Control.- 2.5. Local and Global Constraints.- 2.6. System Governed by a Differential System.- 2.7. System Governed by a 4th Order Differential Operator.- 2.8. Orientation.- 3. A Family of Examples with N = 0 and $$ {U_{ad}} $$ Arbitrary.- 3.1. General Case.- 3.2. Application (I).- 3.3. Application (II).- 4. Observation on the Boundary.- 4.1. System Governed by a Dirichlet Problem (I).- 4.2. Some Results on Non-homogeneous Dirichlet Problems.- 4.3. System Governed by a Dirichlet Problem (II).- 4.4. System Governed by a Neumann Problem.- 5. Control and Observation on the Boundary. Case of the Dirichlet Problem.- 5.1. Orientation.- 5.2. Boundary Control in L2(?).- 5.3. A "Controllability-Like" Problem.- 5.4. Pointwise Control and Observation.- 6. Constraints on the State.- 6.1. Orientation.- 6.2. Control and Constraints on the Boundary.- 7. Existence Results for Optimal Controls.- 7.1. Orientation.- 7.2. Distributed Control.- 7.3. Singular Perturbation of the System.- 7.4. Boundary Control.- 7.5. Control of Systems Governed by Unilateral Problems.- 8. First Order Necessary Conditions.- 8.1. Statement of the Theorem.- 8.2. Proof of the Theorem.- 8.2.1. "Algebraic" Transformation.- 8.2.2. General Remarks on the Utilization of (8.13.).- 8.2.3. Proof that dj,?0.- Notes.- III Control of Systems Governed by Parabolic Partial Differential Equations.- 1. Equations of Evolution.- 1.1. Data.- 1.2. Evolution Problems.- 1.3. Proof of Uniqueness.- 1.4. Proof of Existence.- 1.5. Some Examples.- 1.6. Semi-groups.- 2. Problems of Control.- 2.1. Notation. Immediate Properties.- 2.2. Set of Inequalities Characterizing the Optimal Control.- 2.3. Case (i). Set of Inequalities.- 2.4. Case (ii). Set of Inequalities.- 2.5. Orientation.- 3. Examples.- 3.1. Mixed Dirichlet Problem for a Second Order Parabolic Equation.- 3.1.1. C = Injection Map of L2(0, T V)?L2(Q).- 3.1.2. C = Identity Map of L2(0, T V) into itself.- 3.1.3. Observation of the Final State.- 3.2. Mixed Neumann Problem for a Parabolic Equation of Second Order.- 3.2.1. Case (i).- 3.2.2. Case (ii).- 3.3. System of Equations and Equations of Higher Order.- 3.3.1. System of Equations.- 3.3.2. Higher Order Equations.- 3.4. Additional Results.- 3.5. Orientation.- 4. Decoupling and Integro-Differential Equation of Riccati Type (I).- 4.1. Notation and Assumptions.- 4.2. Operator P(t), Function r(t).- 4.3. Formal Calculations.- 4.4. The Finite Dimensional Case Approximation.- 4.5. Passage to the Limit.- 4.6. Integro-Differential Equation of Riccati Type.- 4.7. Connections with the Hamilton-Jacobi Theory.- 4.8. The Case where Constraints are Present.- 4.9. Various Remarks.- 4.9.1. Direct Study of the "Riccati Equation".- 4.9.2. Another Approach to the Direct Study of the "Riccati Equation".- 4.9.3. Yet Another Approach to the Direct Study of the "Riccati Equation".- 5. Decoupling and Integro-Differential Equation of Riccati Type (II).- 5.1. Application of the Schwartz-Kernel Theorem.- 5.2. Example of a Mixed Neumann Problem with Boundary Control.- 5.3. Example of a Mixed Neumann Problem with Observation of the Final State.- 5.4. Mixed Neumann Problem, Observation of the Final State and Constraints in a Vector Space.- 5.5. Remarks on Decoupling in the Presence of Constraints.- 6. Behaviour as T ? + ?.- 6.1. Orientation and Hypotheses.- 6.2. The Case T = ?.- 6.3. Passage to the Limit as T ? + ?.- 7. Problems which are not Necessarily Coercive.- 7.1. Distributed Observation.- 7.2. Observation of the Final State.- 7.3. Examples where N = 0 and $$ {U_{ad}} $$ is not Bounded.- 8. Other Observations of the State and other Types of Control.- 8.1. Pointwise Observation of the State.- 8.2. Pointwise Control.- 8.3. Control and Observation on the Boundary.- 9. Boundary Control and Observation on the Boundary or of the Final State for a System Governed by a Mixed Dirichlet Problem.- 9.1. Orientation and Problem Statement.- 9.2. Non Homogeneous Mixed Dirichlet Problem.- 9.3. Definition of $$ \frac{{\partial y}}{{\partial {v_A}}} $$ Observation.- 9.4. Cost Function Equations of Optimal Control.- 9.5. Regular Control.- 9.6. Observation of the Final State.- 9.7. Observation of the Final State, Second Order Parabolic Operator.- 10. Controllability.- 10.1. Problem Statement.- 10.2. Controllability and Uniqueness.- 10.3. Super-Controllability and Super-Uniqueness.- 11. Control via Initial Conditions Estimation.- 11.1. Problem Statement. General Results.- 11.2. Examples.- 11.3. Controllability.- 11.4. An Estimation Problem.- 12. Duality.- 12.1. General Remarks.- 12.2. Example.- 13. Constraints on the Control and the State.- 13.1. A General Result.- 13.2. Applications (I).- 13.3. Applications (II).- 14. Non Quadratic Cost Functions.- 14.1. Orientation.- 14.2. An Example.- 14.3. Remarks on Decoupling.- 15. Existence Results for Optimal Controls.- 15.1. Orientation.- 15.2. Non-linear Problem with Distributed Control (I).- 15.3. Non-linear Problem with Distributed Control. Singular Perturbation.- 15.4. Non-linear Problem. Boundary Control.- 15.5. Utilization of Convexity and the Maximum Principle for Second Order Parabolic Equations.- 15.6. Control of Systems Governed by Evolution Inequalities.- 16. First Order Necessary Conditions.- 16.1. Statement of the Theorem.- 16.2. Proof of Theorem 16.1.- 16.2.1. "Algebraic" Transformation.- 16.2.2. Utilization of (16.11.).- 16.2.3. Proof of (16.12.).- 16.3. Remarks.- 17. Time Optimal Control.- 17.1. Problem Statement.- 17.2. Existence Theorem.- 17.3. Bang-Bang Theorem.- 18. Miscellaneous.- 18.1. Equations with Delay.- 18.1.1. Definition of the State.- 18.1.2. Control Problem.- 18.2. Spaces which are not Normable.- Notes.- IV Control of Systems Governed by Hyperbolic Equations or by Equations which are well Posed in the Petrowsky Sense.- 1. Second Order Evolution Equations.- 1.1. Notation and Hypotheses.- 1.2. Problem Statement. An Existence and Uniqueness Result.- 1.3. Proof of Uniqueness.- 1.4. Proof of Existence.- 1.5. Examples (I).- 1.6. Examples (II).- 1.7. Orientation.- 2. Control Problems.- 2.1. Notation. Immediate Properties.- 2.2. Case (2.5.).- 2.3. Case (2.6.).- 2.4. Case (2.7.).- 2.5. Case (2.8.).- 3. Transposition and Applications to Control.- 3.1. Transposition of Theorem 1.1.- 3.2. Application (I).- 3.3. Application (II).- 3.4. Application (III).- 4. Examples.- 4.1. Examples of Hyperbolic Problems. Distributed Control, Distributed Observation.- 4.2. Examples of Hyperbolic Systems. Distributed Control, Observation of the Final State.- 4.3. Petrowsky Type Equation. Distributed Control. Distributed Observation.- 4.4. Petrowsky Type Equation. Distributed Control. Observation of the Final State.- 4.5. Orientation.- 5. Decoupling.- 5.1. Problem Statement. Rewriting as a System of First Order Equations.- 5.2. Rewriting of the Set of Equations Determining the Optimal Control.- 5.3. Decoupling.- 5.4. Riccati Integro-differential Equation.- 5.5. Another Optimal Control Problem. Decoupling.- 6. Control via Initial Conditions. Estimation.- 6.1. Problem Statement.- 6.2. Coercivity of J(?).- 6.3. System of Equations Determining the Optimal Control.- 7. Boundary Control (I).- 7.1. Problem Statement.- 7.2. Definition of the State of the System.- 7.3. Distributed Observation.- 7.4. Boundary Observation.- 8. Boundary Control (II).- 8.1. Problem Statement.- 8.2. Control ? Regular.- 8.3. Examples.- 9. Parabolic-Hyperbolic Systems.- 9.1. Recapitulation of Some General Results.- 9.2. Complement.- 9.3. Control Problems.- 9.4. Example (I).- 9.5. Example (II).- 9.6. Decoupling.- 10. Existence Theorems.- 10.1. Orientation.- 10.2. Example. Introduction of a "Viscosity" Term.- 10.3. Time Optimal Control.- Notes.- V Regularization, Approximation and Penalization.- 1. Regularization.- 1.1. Parabolic Regularization.- 1.2. Application to Optimal Control.- 1.3. Application to Decoupling.- 1.4. Various Remarks.- 1.5. Regularization of the Control.- 2. Approximation in Terms of Systems of Cauchy-Kowaleska Type.- 2.1. Evolution Equation on a Variety.- 2.2. Approximation by a System of Cauchy-Kowaleska Type.- 2.3. Linearized Navier-Stokes Equation.- 3. Penalization.- Notes.

...read moreread less

3,539 citations

Book•

Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation

[...]

Andreas Griewank¹, Andrea Walther•Institutions (1)

Dresden University of Technology¹

01 Jan 1987

TL;DR: This second edition has been updated and expanded to cover recent developments in applications and theory, including an elegant NP completeness argument by Uwe Naumann and a brief introduction to scarcity, a generalization of sparsity.

...read moreread less

Abstract: Algorithmic, or automatic, differentiation (AD) is a growing area of theoretical research and software development concerned with the accurate and efficient evaluation of derivatives for function evaluations given as computer programs. The resulting derivative values are useful for all scientific computations that are based on linear, quadratic, or higher order approximations to nonlinear scalar or vector functions. AD has been applied in particular to optimization, parameter identification, nonlinear equation solving, the numerical integration of differential equations, and combinations of these. Apart from quantifying sensitivities numerically, AD also yields structural dependence information, such as the sparsity pattern and generic rank of Jacobian matrices. The field opens up an exciting opportunity to develop new algorithms that reflect the true cost of accurate derivatives and to use them for improvements in speed and reliability. This second edition has been updated and expanded to cover recent developments in applications and theory, including an elegant NP completeness argument by Uwe Naumann and a brief introduction to scarcity, a generalization of sparsity. There is also added material on checkpointing and iterative differentiation. To improve readability the more detailed analysis of memory and complexity bounds has been relegated to separate, optional chapters.The book consists of three parts: a stand-alone introduction to the fundamentals of AD and its software; a thorough treatment of methods for sparse problems; and final chapters on program-reversal schedules, higher derivatives, nonsmooth problems and iterative processes. Each of the 15 chapters concludes with examples and exercises. Audience: This volume will be valuable to designers of algorithms and software for nonlinear computational problems. Current numerical software users should gain the insight necessary to choose and deploy existing AD software tools to the best advantage. Contents: Rules; Preface; Prologue; Mathematical Symbols; Chapter 1: Introduction; Chapter 2: A Framework for Evaluating Functions; Chapter 3: Fundamentals of Forward and Reverse; Chapter 4: Memory Issues and Complexity Bounds; Chapter 5: Repeating and Extending Reverse; Chapter 6: Implementation and Software; Chapter 7: Sparse Forward and Reverse; Chapter 8: Exploiting Sparsity by Compression; Chapter 9: Going beyond Forward and Reverse; Chapter 10: Jacobian and Hessian Accumulation; Chapter 11: Observations on Efficiency; Chapter 12: Reversal Schedules and Checkpointing; Chapter 13: Taylor and Tensor Coefficients; Chapter 14: Differentiation without Differentiability; Chapter 15: Implicit and Iterative Differentiation; Epilogue; List of Figures; List of Tables; Assumptions and Definitions; Propositions, Corollaries, and Lemmas; Bibliography; Index

...read moreread less

2,920 citations

"Algorithm 799: revolve: an implemen..." refers methods in this paper

...INTRODUCTION The reverse mode of computational differentiation is a discrete analog of the adjoint method known from the calculus of variations [Griewank 2000]....
[...]
...INTRODUCTION The reverse mode of computational differentiation is a discrete analog of the adjoint method known from the calculus of variations [Griewank 2000]....
[...]

Journal Article•DOI•

Upwind difference schemes for hyperbolic systems of conservation laws

[...]

Stanley Osher, Fred Solomon

01 Apr 1982-Mathematics of Computation

TL;DR: In this article, a new upwind finite difference approximation to systems of nonlinear hyperbolic conservation laws has been derived. But the scheme has desirable properties for shock calculations, such as unique and sharp shocks.

...read moreread less

Abstract: We derive a new upwind finite difference approximation to systems of nonlinear hyperbolic conservation laws. The scheme has desirable properties for shock calculations. Under fairly general hypotheses we prove that limit solutions satisfy the entropy condition and that discrete steady shocks exist which are unique and sharp. Numerical examples involving the Euler and Lagrange equations of compressible gas dynamics in one and two space dimensions are given.

...read moreread less

797 citations

"Algorithm 799: revolve: an implemen..." refers result in this paper

...Similar results with regard to computational complexity were obtained with more sophisticated schemes [Osher and Solomon 1982] that yield qualitatively better results in the transition layers....
[...]
...Similar results with regard to computational complexity were obtained with more sophisticated schemes [Osher and Solomon 1982] that yield qualitatively better results in the transition layers....
[...]

Journal Article•DOI•

Achieving logarithmic growth of temporal and spatial complexity in reverse automatic differentiation

[...]

Andreas Griewank¹•Institutions (1)

Argonne National Laboratory¹

01 Jan 1992-Optimization Methods & Software

TL;DR: It is shown here that, by a recursive scheme related to the multilevel differentiation approach of Volin and Ostrovskii, the growth in both temporal and spatial complexity can be limited to a fixed multiple of log(T).

...read moreread less

Abstract: In its basic form the reverse mode of automatic differentiation yields gradient vectors at a small multiple of the computational work needed to evaluate the underlying scalar function. The practical applicability of this temporal complexity result, due originally to Linnainmaa, seemed to be severely limited by the fact that the memory requirement of the basic implementation is proportional to the run timeT, of the original evaluation program. It is shown here that, by a recursive scheme related to the multilevel differentiation approach of Volin and Ostrovskii, the growth in both temporal and spatial complexity can be limited to a fixed multiple of log(T). Other compromises between the run time and memory requirement are possible, so that the reverse mode becomes applicable to computational problems of virtually any size.

...read moreread less

406 citations

"Algorithm 799: revolve: an implemen..." refers background or methods in this paper

...With this equality one obtains a logarithmic dependence of the memory requirement and of the number of operations relative on the run-time of the function evaluation [Griewank 1992]....
[...]
...(1) s With this equality one obtains a logarithmic dependence of the memory requirement and of the number of operations relative on the run-time of the function evaluation [Griewank 1992]....
[...]
...For this purpose, adjust computes a return value satisfying snaps ' log4~steps ! based on the theory developed in Griewank [1992]. The use of revolve is illustrated in the following code segment from an actual program....
[...]
...The value of {(s, t -1) (here {(3, 1)= 4) determines the next checkpoint [Griewank 1992] after the initial one at zero....
[...]
...The value of b~s, t 2 1! (here b~3, 1! 5 4) determines the next checkpoint [Griewank 1992] after the initial one at zero....
[...]