CheCUDA: A Checkpoint/Restart Tool for CUDA Applications
Citations
96 citations
61 citations
Cites background or methods from "CheCUDA: A Checkpoint/Restart Tool ..."
...In our previous work [6], CheCUDA has been developed as a CPR tool for CUDA, which is the de facto standard programming framework for current GPU computing [7]....
[...]
...Thus, CheCL remembers all the OpenCL objects that existed before checkpointing, and restores them after restarting....
[...]
...Finally, Section VI gives concluding remarks and our future work....
[...]
58 citations
51 citations
Cites methods from "CheCUDA: A Checkpoint/Restart Tool ..."
...proposed CheCUDA [7], which uses BLCR to enable CPR for CUDA applications....
[...]
50 citations
Cites background from "CheCUDA: A Checkpoint/Restart Tool ..."
...The increase in detection latency is however not a concern for coarse grain coordinated checkpointing solutions [15], [16], [17], [18]....
[...]
...This approach trades off the ability to detect the error until the end of the kernel and diagnose which thread is corrupted, which is not a concern for existing coarse grained checkpointing solutions [15], [16], [17], [18]....
[...]
...While this optimization may violate the error containment assumptions of some recovery schemes, it works fine for coarse-grain coordinated checkpoint systems that discard memory values in the event of a detected error to roll back to a previous checkpoint [15], [16], [17], [18]....
[...]
References
6,326 citations
[...]
1,570 citations
"CheCUDA: A Checkpoint/Restart Tool ..." refers background in this paper
...So far, many researchers have reported that various scientific and engineering applications can significantly be accelerated using GPUs[1]....
[...]
670 citations
Additional excerpts
...One of such unsupported functions is checkpoint/restart(CPR)....
[...]
439 citations
Additional excerpts
...One of such unsupported functions is checkpoint/restart(CPR)....
[...]
282 citations