Algorithm 799: revolve: an implementation of checkpointing for the reverse or adjoint mode of computational differentiation
Citations
688 citations
Cites background from "Algorithm 799: revolve: an implemen..."
...GPipe allows scaling arbitrary deep neural network architectures beyond the memory limitations of a single accelerator by partitioning the model across different accelerators and supporting re-materialization on every accelerator [13, 14]....
[...]
673 citations
Cites background from "Algorithm 799: revolve: an implemen..."
...long standing topic in systems research. Although not widely known, the idea of dropping intermediate results is also known as gradient checkpointing technique in automatic differentiation literature [9]. We bring this idea to neural network gradient graph construction for general deep neural networks. Through the discussion with our colleagues [19], we know that the idea of dropping computation has ...
[...]
442 citations
403 citations
393 citations
Cites background from "Algorithm 799: revolve: an implemen..."
...Checkpointing algorithms (e.g. Griewank & Walther 2000; Charpentier 2001) store the regular wavefield at a smaller number of time steps, called checkpoints, and solve the forward problem from there until the current time of the adjoint calculation is reached....
[...]
References
3,827 citations
3,539 citations
2,920 citations
"Algorithm 799: revolve: an implemen..." refers methods in this paper
...INTRODUCTION The reverse mode of computational differentiation is a discrete analog of the adjoint method known from the calculus of variations [Griewank 2000]....
[...]
...INTRODUCTION The reverse mode of computational differentiation is a discrete analog of the adjoint method known from the calculus of variations [Griewank 2000]....
[...]
797 citations
"Algorithm 799: revolve: an implemen..." refers result in this paper
...Similar results with regard to computational complexity were obtained with more sophisticated schemes [Osher and Solomon 1982] that yield qualitatively better results in the transition layers....
[...]
...Similar results with regard to computational complexity were obtained with more sophisticated schemes [Osher and Solomon 1982] that yield qualitatively better results in the transition layers....
[...]
406 citations
"Algorithm 799: revolve: an implemen..." refers background or methods in this paper
...With this equality one obtains a logarithmic dependence of the memory requirement and of the number of operations relative on the run-time of the function evaluation [Griewank 1992]....
[...]
...(1) s With this equality one obtains a logarithmic dependence of the memory requirement and of the number of operations relative on the run-time of the function evaluation [Griewank 1992]....
[...]
...For this purpose, adjust computes a return value satisfying snaps ' log4~steps ! based on the theory developed in Griewank [1992]. The use of revolve is illustrated in the following code segment from an actual program....
[...]
...The value of {(s, t -1) (here {(3, 1)= 4) determines the next checkpoint [Griewank 1992] after the initial one at zero....
[...]
...The value of b~s, t 2 1! (here b~3, 1! 5 4) determines the next checkpoint [Griewank 1992] after the initial one at zero....
[...]